diff --git a/docs/source/en/guides/cli.md b/docs/source/en/guides/cli.md index 49c6ab38f7..d75f766138 100644 --- a/docs/source/en/guides/cli.md +++ b/docs/source/en/guides/cli.md @@ -680,39 +680,36 @@ The `hf env` command prints details about your machine setup. This is useful whe Copy-and-paste the text below in your GitHub issue. -- huggingface_hub version: 0.19.0.dev0 -- Platform: Linux-6.2.0-36-generic-x86_64-with-glibc2.35 -- Python version: 3.10.12 +- huggingface_hub version: 1.0.0.rc6 +- Platform: Linux-6.8.0-85-generic-x86_64-with-glibc2.35 +- Python version: 3.11.14 - Running in iPython ?: No - Running in notebook ?: No - Running in Google Colab ?: No +- Running in Google Colab Enterprise ?: No - Token path ?: /home/wauplin/.cache/huggingface/token - Has saved token ?: True - Who am I ?: Wauplin - Configured git credential helpers: store -- FastAI: N/A -- Torch: 1.12.1 -- Jinja2: 3.1.2 -- Graphviz: 0.20.1 -- Pydot: 1.4.2 -- Pillow: 9.2.0 -- hf_transfer: 0.1.3 -- gradio: 4.0.2 -- tensorboard: 2.6 -- numpy: 1.23.2 -- pydantic: 2.4.2 -- aiohttp: 3.8.4 +- Installation method: unknown +- Torch: N/A +- httpx: 0.28.1 +- hf_xet: 1.1.10 +- gradio: 5.41.1 +- tensorboard: N/A +- pydantic: 2.11.7 - ENDPOINT: https://huggingface.co - HF_HUB_CACHE: /home/wauplin/.cache/huggingface/hub - HF_ASSETS_CACHE: /home/wauplin/.cache/huggingface/assets - HF_TOKEN_PATH: /home/wauplin/.cache/huggingface/token +- HF_STORED_TOKENS_PATH: /home/wauplin/.cache/huggingface/stored_tokens - HF_HUB_OFFLINE: False - HF_HUB_DISABLE_TELEMETRY: False - HF_HUB_DISABLE_PROGRESS_BARS: None - HF_HUB_DISABLE_SYMLINKS_WARNING: False - HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False - HF_HUB_DISABLE_IMPLICIT_TOKEN: False -- HF_HUB_ENABLE_HF_TRANSFER: False +- HF_HUB_DISABLE_XET: False - HF_HUB_ETAG_TIMEOUT: 10 - HF_HUB_DOWNLOAD_TIMEOUT: 10 ``` diff --git a/docs/source/en/guides/download.md b/docs/source/en/guides/download.md index 2c5e64157c..d3f9765506 100644 --- a/docs/source/en/guides/download.md +++ b/docs/source/en/guides/download.md @@ -243,13 +243,6 @@ Finally, you can also make a dry-run programmatically by passing `dry_run=True` ## Faster downloads -There are two options to speed up downloads. Both involve installing a Python package written in Rust. - -* `hf_xet` is newer and uses the Xet storage backend for upload/download. Xet storage is the [default for all new Hub users and organizations](https://huggingface.co/changelog/xet-default-for-new-users), and is in the process of being rolled out to all users. If you don't have access, join the [waitlist](https://huggingface.co/join/xet) to make Xet the default for all your repositories! -* `hf_transfer` is a power-tool to download and upload to our LFS storage backend (note: this is less future-proof than Xet). It is thoroughly tested and has been in production for a long time, but it has some limitations. - -### hf_xet - Take advantage of faster downloads through `hf_xet`, the Python binding to the [`xet-core`](https://github.com/huggingface/xet-core) library that enables chunk-based deduplication for faster downloads and uploads. `hf_xet` integrates seamlessly with `huggingface_hub`, but uses the Rust `xet-core` library and Xet storage instead of LFS. @@ -263,23 +256,6 @@ pip install -U "huggingface_hub" As of `huggingface_hub` 0.32.0, this will also install `hf_xet`. -Note: `hf_xet` will only be utilized when the files being downloaded are being stored with Xet Storage. - All other `huggingface_hub` APIs will continue to work without any modification. To learn more about the benefits of Xet storage and `hf_xet`, refer to this [section](https://huggingface.co/docs/hub/storage-backends). -### hf_transfer - -If you are running on a machine with high bandwidth, -you can increase your download speed with [`hf_transfer`](https://github.com/huggingface/hf_transfer), -a Rust-based library developed to speed up file transfers with the Hub. -To enable it: - -1. Specify the `hf_transfer` extra when installing `huggingface_hub` - (e.g. `pip install huggingface_hub[hf_transfer]`). -2. Set `HF_HUB_ENABLE_HF_TRANSFER=1` as an environment variable. - -> [!WARNING] -> `hf_transfer` is a power user tool! -> It is tested and production-ready, -> but it lacks user-friendly features like advanced error handling or proxies. -> For more details, please take a look at this [section](https://huggingface.co/docs/huggingface_hub/hf_transfer). +Note: `hf_transfer` was formerly used with the LFS storage backend and is now deprecated; use `hf_xet` instead. diff --git a/docs/source/en/guides/upload.md b/docs/source/en/guides/upload.md index 6936fbf9b2..7a4db01961 100644 --- a/docs/source/en/guides/upload.md +++ b/docs/source/en/guides/upload.md @@ -155,16 +155,7 @@ Check out our [Repository limitations and recommendations](https://huggingface.c - **Start small**: We recommend starting with a small amount of data to test your upload script. It's easier to iterate on a script when failing takes only a little time. - **Expect failures**: Streaming large amounts of data is challenging. You don't know what can happen, but it's always best to consider that something will fail at least once -no matter if it's due to your machine, your connection, or our servers. For example, if you plan to upload a large number of files, it's best to keep track locally of which files you already uploaded before uploading the next batch. You are ensured that an LFS file that is already committed will never be re-uploaded twice but checking it client-side can still save some time. This is what [`upload_large_folder`] does for you. -- **Use `hf_xet`**: this leverages the new storage backend for Hub, is written in Rust, and is being rolled out to users right now. In order to upload using `hf_xet` your repo must be enabled to use the Xet storage backend. It is being rolled out now, so join the [waitlist](https://huggingface.co/join/xet) to get onboarded soon! -- **Use `hf_transfer`**: this is a Rust-based [library](https://github.com/huggingface/hf_transfer) meant to speed up uploads on machines with very high bandwidth (uploads LFS files). To use `hf_transfer`: - 1. Specify the `hf_transfer` extra when installing `huggingface_hub` - (i.e., `pip install huggingface_hub[hf_transfer]`). - 2. Set `HF_HUB_ENABLE_HF_TRANSFER=1` as an environment variable. - -> [!WARNING] -> `hf_transfer` is a power user tool for uploading LFS files! It is tested and production-ready, but it is less future-proof and lacks user-friendly features like advanced error handling or proxies. For more details, please take a look at this [section](https://huggingface.co/docs/huggingface_hub/hf_transfer). -> -> Note that `hf_xet` and `hf_transfer` tools are mutually exclusive. The former is used to upload files to Xet-enabled repos while the later uploads LFS files to regular repos. +- **Use `hf_xet`**: this leverages the new storage backend for the Hub, is written in Rust, and is now available for everyone to use. In fact, `hf_xet` is already enabled by default when using `huggingface_hub`! For maximum performance, set [`HF_XET_HIGH_PERFORMANCE=1`](../package_reference/environment_variables.md#hf_xet_high_performance) as an environment variable. Be aware that when high performance mode is enabled, the tool will try to use all available bandwidth and CPU cores. ## Advanced features @@ -175,9 +166,6 @@ However, `huggingface_hub` has more advanced features to make things easier. Let Take advantage of faster uploads through `hf_xet`, the Python binding to the [`xet-core`](https://github.com/huggingface/xet-core) library that enables chunk-based deduplication for faster uploads and downloads. `hf_xet` integrates seamlessly with `huggingface_hub`, but uses the Rust `xet-core` library and Xet storage instead of LFS. -> [!WARNING] -> As of May 23rd, 2025, Xet-enabled repositories [are the default for all new Hugging Face Hub users and organizations](https://huggingface.co/changelog/xet-default-for-new-users). If your user or organization was created before then, you may need Xet enabled on your repo for `hf_xet` to actually upload to the Xet backend. Join the [waitlist](https://huggingface.co/join/xet) to make Xet the default for all your repositories. Also, note that while `hf_xet` works with in-memory bytes or bytearray data, support for BinaryIO streams is still pending. - `hf_xet` uses the Xet storage system, which breaks files down into immutable chunks, storing collections of these chunks (called blocks or xorbs) remotely and retrieving them to reassemble the file when requested. When uploading, after confirming the user is authorized to write to this repo, `hf_xet` will scan the files, breaking them down into their chunks and collecting those chunks into xorbs (and deduplicating across known chunks), and then will be upload these xorbs to the Xet content-addressable service (CAS), which will verify the integrity of the xorbs, register the xorb metadata along with the LFS SHA256 hash (to support lookup/download), and write the xorbs to remote storage. To enable it, simply install the latest version of `huggingface_hub`: diff --git a/docs/source/en/package_reference/environment_variables.md b/docs/source/en/package_reference/environment_variables.md index 249a106454..717d78313f 100644 --- a/docs/source/en/package_reference/environment_variables.md +++ b/docs/source/en/package_reference/environment_variables.md @@ -71,10 +71,6 @@ Defaults to `"warning"`. For more details, see [logging reference](../package_reference/utilities#huggingface_hub.utils.logging.get_verbosity). -### HF_HUB_LOCAL_DIR_AUTO_SYMLINK_THRESHOLD - -This environment variable has been deprecated and is now ignored by `huggingface_hub`. Downloading files to the local dir does not rely on symlinks anymore. - ### HF_HUB_ETAG_TIMEOUT Integer value to define the number of seconds to wait for server response when fetching the latest metadata from a repo before downloading a file. If the request times out, `huggingface_hub` will default to the locally cached files. Setting a lower value speeds up the workflow for machines with a slow connection that have already cached files. A higher value guarantees the metadata call to succeed in more cases. Default to 10s. @@ -177,31 +173,18 @@ Set to disable using `hf-xet`, even if it is available in your Python environmen ### HF_HUB_ENABLE_HF_TRANSFER -Set to `True` for faster uploads and downloads from the Hub using `hf_transfer`. - -By default, `huggingface_hub` uses the Python-based `httpx.get` and `httpx.post` functions. -Although these are reliable and versatile, -they may not be the most efficient choice for machines with high bandwidth. -[`hf_transfer`](https://github.com/huggingface/hf_transfer) is a Rust-based package developed to -maximize the bandwidth used by dividing large files into smaller parts -and transferring them simultaneously using multiple threads. -This approach can potentially double the transfer speed. -To use `hf_transfer`: +> [!WARNING] +> This is a deprecated environment variable. +> Now that the Hugging Face Hub is fully powered by the Xet storage backend, all file transfers go through the `hf-xet` binary package. It provides efficient transfers using a chunk-based deduplication strategy and integrates seamlessly with `huggingface_hub`. +> This means `hf_transfer` can't be used anymore. If you are interested in higher performance, check out the [`HF_XET_HIGH_PERFORMANCE` section](#hf_xet_high_performance) -1. Specify the `hf_transfer` extra when installing `huggingface_hub` - (e.g. `pip install huggingface_hub[hf_transfer]`). -2. Set `HF_HUB_ENABLE_HF_TRANSFER=1` as an environment variable. - -Please note that using `hf_transfer` comes with certain limitations. Since it is not purely Python-based, debugging errors may be challenging. Additionally, `hf_transfer` lacks several user-friendly features such as resumable downloads and proxies. These omissions are intentional to maintain the simplicity and speed of the Rust logic. Consequently, `hf_transfer` is not enabled by default in `huggingface_hub`. +### HF_XET_HIGH_PERFORMANCE -> [!TIP] -> `hf_xet` is an alternative to `hf_transfer`. It provides efficient file transfers through a chunk-based deduplication strategy, custom Xet storage (replacing Git LFS), and a seamless integration with `huggingface_hub`. -> -> [Read more about the package](https://huggingface.co/docs/hub/storage-backends) and enable with `pip install "huggingface_hub[hf_xet]"`. +Set `hf-xet` to operate with increased settings to maximize network and disk resources on the machine. Enabling high performance mode will try to saturate the network bandwidth of this machine and utilize all CPU cores for parallel upload/download activity. -### HF_XET_HIGH_PERFORMANCE +Consider this analogous to the legacy `HF_HUB_ENABLE_HF_TRANSFER=1` environment variable but applied to `hf-xet`. -Set `hf-xet` to operate with increased settings to maximize network and disk resources on the machine. Enabling high performance mode will try to saturate the network bandwidth of this machine and utilize all CPU cores for parallel upload/download activity. Consider this analogous to setting `HF_HUB_ENABLE_HF_TRANSFER=True` when uploading / downloading using `hf-xet` to the Xet storage backend. +To learn more about the benefits of Xet storage and `hf_xet`, refer to this [section](https://huggingface.co/docs/hub/storage-backends). ### HF_XET_RECONSTRUCT_WRITE_SEQUENTIALLY diff --git a/docs/source/ko/package_reference/environment_variables.md b/docs/source/ko/package_reference/environment_variables.md index 124ca04d75..25d2aabc3e 100644 --- a/docs/source/ko/package_reference/environment_variables.md +++ b/docs/source/ko/package_reference/environment_variables.md @@ -57,10 +57,6 @@ Hub에 인증하기 위한 사용자 액세스 토큰을 구성합니다. 이 더 자세한 정보를 알아보고 싶다면, [logging reference](../package_reference/utilities#huggingface_hub.utils.logging.get_verbosity)를 살펴보세요. -### HF_HUB_LOCAL_DIR_AUTO_SYMLINK_THRESHOLD[[hfhublocaldirautosymlinkthreshold]] - -이 환경 변수는 더 이상 사용되지 않으며 이제 `huggingface_hub`에서 무시됩니다. 로컬 디렉터리로 파일을 다운로드할 때 더 이상 심볼릭 링크에 의존하지 않습니다. - ### HF_HUB_ETAG_TIMEOUT[[hfhubetagtimeout]] 파일을 다운로드하기 전에 리포지토리에서 최신 메타데이터를 가져올 때 서버 응답을 기다리는 시간(초)을 정의하는 정수 값입니다. 요청 시간이 초과되면 `huggingface_hub`는 기본적으로 로컬에 캐시된 파일을 사용합니다. 값을 낮게 설정하면 이미 파일을 캐시한 연결 속도가 느린 컴퓨터의 워크플로 속도가 빨라집니다. 값이 클수록 더 많은 경우에서 메타데이터 호출이 성공할 수 있습니다. 기본값은 10초입니다. @@ -128,11 +124,11 @@ Hub에서 `hf_transfer`를 사용하여 더 빠르게 업로드 및 다운로드 Hugging Face 생태계의 모든 환경 변수를 표준화하기 위해 일부 변수는 사용되지 않는 것으로 표시되었습니다. 해당 변수는 여전히 작동하지만 더 이상 대체한 변수보다 우선하지 않습니다. 다음 표에는 사용되지 않는 변수와 해당 대체 변수가 간략하게 설명되어 있습니다: -| 사용되지 않는 변수 | 대체 변수 | -| --- | --- | -| `HUGGINGFACE_HUB_CACHE` | `HF_HUB_CACHE` | -| `HUGGINGFACE_ASSETS_CACHE` | `HF_ASSETS_CACHE` | -| `HUGGING_FACE_HUB_TOKEN` | `HF_TOKEN` | +| 사용되지 않는 변수 | 대체 변수 | +| --------------------------- | ------------------ | +| `HUGGINGFACE_HUB_CACHE` | `HF_HUB_CACHE` | +| `HUGGINGFACE_ASSETS_CACHE` | `HF_ASSETS_CACHE` | +| `HUGGING_FACE_HUB_TOKEN` | `HF_TOKEN` | | `HUGGINGFACE_HUB_VERBOSITY` | `HF_HUB_VERBOSITY` | ## 외부 도구[[from-external-tools]] diff --git a/setup.py b/setup.py index 9862deb896..65dc9a415d 100644 --- a/setup.py +++ b/setup.py @@ -47,16 +47,13 @@ def get_version() -> str: "torch", "safetensors[torch]", ] -extras["hf_transfer"] = [ - "hf_transfer>=0.1.4", # Pin for progress bars -] extras["fastai"] = [ "toml", "fastai>=2.4", "fastcore>=1.3.27", ] -extras["hf_xet"] = ["hf-xet>=1.1.2,<2.0.0"] +extras["hf_xet"] = ["hf-xet>=1.1.3,<2.0.0"] extras["mcp"] = [ "mcp>=1.8.0", diff --git a/src/huggingface_hub/_commit_api.py b/src/huggingface_hub/_commit_api.py index ecd7e0a2b5..3d32d78960 100644 --- a/src/huggingface_hub/_commit_api.py +++ b/src/huggingface_hub/_commit_api.py @@ -503,11 +503,7 @@ def _wrapped_lfs_upload(batch_action) -> None: except Exception as exc: raise RuntimeError(f"Error while uploading '{operation.path_in_repo}' to the Hub.") from exc - if constants.HF_HUB_ENABLE_HF_TRANSFER: - logger.debug(f"Uploading {len(filtered_actions)} LFS files to the Hub using `hf_transfer`.") - for action in hf_tqdm(filtered_actions, name="huggingface_hub.lfs_upload"): - _wrapped_lfs_upload(action) - elif len(filtered_actions) == 1: + if len(filtered_actions) == 1: logger.debug("Uploading 1 LFS file to the Hub") _wrapped_lfs_upload(filtered_actions[0]) else: diff --git a/src/huggingface_hub/_snapshot_download.py b/src/huggingface_hub/_snapshot_download.py index 9b5d5cfbff..5287c482fa 100644 --- a/src/huggingface_hub/_snapshot_download.py +++ b/src/huggingface_hub/_snapshot_download.py @@ -403,20 +403,14 @@ def _inner_hf_hub_download(repo_file: str) -> None: ) ) - if constants.HF_HUB_ENABLE_HF_TRANSFER and not dry_run: - # when using hf_transfer we don't want extra parallelism - # from the one hf_transfer provides - for file in filtered_repo_files: - _inner_hf_hub_download(file) - else: - thread_map( - _inner_hf_hub_download, - filtered_repo_files, - desc=tqdm_desc, - max_workers=max_workers, - # User can use its own tqdm class or the default one from `huggingface_hub.utils` - tqdm_class=tqdm_class or hf_tqdm, - ) + thread_map( + _inner_hf_hub_download, + filtered_repo_files, + desc=tqdm_desc, + max_workers=max_workers, + # User can use its own tqdm class or the default one from `huggingface_hub.utils` + tqdm_class=tqdm_class or hf_tqdm, + ) if dry_run: assert all(isinstance(r, DryRunFileInfo) for r in results) diff --git a/src/huggingface_hub/_upload_large_folder.py b/src/huggingface_hub/_upload_large_folder.py index 083b62f544..91bdedf4d4 100644 --- a/src/huggingface_hub/_upload_large_folder.py +++ b/src/huggingface_hub/_upload_large_folder.py @@ -27,7 +27,6 @@ from typing import TYPE_CHECKING, Any, Optional, Union from urllib.parse import quote -from . import constants from ._commit_api import CommitOperationAdd, UploadInfo, _fetch_upload_modes from ._local_folder import LocalUploadFileMetadata, LocalUploadFilePaths, get_local_upload_paths, read_upload_metadata from .constants import DEFAULT_REVISION, REPO_TYPES @@ -199,16 +198,7 @@ def upload_large_folder_internal( logger.info(f"Repo created: {repo_url}") repo_id = repo_url.repo_id # 2.1 Check if xet is enabled to set batch file upload size - is_xet_enabled = ( - is_xet_available() - and api.repo_info( - repo_id=repo_id, - repo_type=repo_type, - revision=revision, - expand="xetEnabled", - ).xet_enabled - ) - upload_batch_size = UPLOAD_BATCH_SIZE_XET if is_xet_enabled else UPLOAD_BATCH_SIZE_LFS + upload_batch_size = UPLOAD_BATCH_SIZE_XET if is_xet_available() else UPLOAD_BATCH_SIZE_LFS # 3. List files to upload filtered_paths_list = filter_repo_objects( @@ -559,10 +549,7 @@ def _determine_next_job(status: LargeUploadStatus) -> Optional[tuple[WorkerJob, return (WorkerJob.GET_UPLOAD_MODE, _get_n(status.queue_get_upload_mode, MAX_NB_FILES_FETCH_UPLOAD_MODE)) # 7. Preupload LFS file if at least `status.upload_batch_size` files - # Skip if hf_transfer is enabled and there is already a worker preuploading LFS - elif status.queue_preupload_lfs.qsize() >= status.upload_batch_size and ( - status.nb_workers_preupload_lfs == 0 or not constants.HF_HUB_ENABLE_HF_TRANSFER - ): + elif status.queue_preupload_lfs.qsize() >= status.upload_batch_size: status.nb_workers_preupload_lfs += 1 logger.debug("Job: preupload LFS") return (WorkerJob.PREUPLOAD_LFS, _get_n(status.queue_preupload_lfs, status.upload_batch_size)) diff --git a/src/huggingface_hub/cli/repo.py b/src/huggingface_hub/cli/repo.py index bb67ba9172..87741309ce 100644 --- a/src/huggingface_hub/cli/repo.py +++ b/src/huggingface_hub/cli/repo.py @@ -147,12 +147,6 @@ def repo_settings( help="Whether the repository should be private.", ), ] = None, - xet_enabled: Annotated[ - Optional[bool], - typer.Option( - help=" Whether the repository should be enabled for Xet Storage.", - ), - ] = None, token: TokenOpt = None, repo_type: RepoTypeOpt = RepoType.model, ) -> None: @@ -161,7 +155,6 @@ def repo_settings( repo_id=repo_id, gated=(gated.value if gated else None), # type: ignore [arg-type] private=private, - xet_enabled=xet_enabled, repo_type=repo_type.value, ) print(f"Successfully updated the settings of {ANSI.bold(repo_id)} on the Hub.") diff --git a/src/huggingface_hub/cli/upload.py b/src/huggingface_hub/cli/upload.py index 918e58f690..968b1b86cf 100644 --- a/src/huggingface_hub/cli/upload.py +++ b/src/huggingface_hub/cli/upload.py @@ -55,10 +55,8 @@ from huggingface_hub import logging from huggingface_hub._commit_scheduler import CommitScheduler -from huggingface_hub.constants import HF_HUB_ENABLE_HF_TRANSFER from huggingface_hub.errors import RevisionNotFoundError from huggingface_hub.utils import disable_progress_bars, enable_progress_bars -from huggingface_hub.utils._runtime import is_xet_available from ._cli_utils import PrivateOpt, RepoIdArg, RepoType, RepoTypeOpt, RevisionOpt, TokenOpt, get_hf_api @@ -156,12 +154,6 @@ def run_upload() -> str: if delete is not None and len(delete) > 0: warnings.warn("Ignoring --delete since a single file is uploaded.") - if not is_xet_available() and not HF_HUB_ENABLE_HF_TRANSFER: - logger.info( - "Consider using `hf_transfer` for faster uploads. This solution comes with some limitations. See" - " https://huggingface.co/docs/huggingface_hub/hf_transfer for more details." - ) - # Schedule commits if `every` is set if every is not None: if os.path.isfile(resolved_local_path): diff --git a/src/huggingface_hub/constants.py b/src/huggingface_hub/constants.py index 90ba0a5889..b0afa9563c 100644 --- a/src/huggingface_hub/constants.py +++ b/src/huggingface_hub/constants.py @@ -35,7 +35,6 @@ def _as_int(value: Optional[str]) -> Optional[int]: DEFAULT_DOWNLOAD_TIMEOUT = 10 DEFAULT_REQUEST_TIMEOUT = 10 DOWNLOAD_CHUNK_SIZE = 10 * 1024 * 1024 -HF_TRANSFER_CONCURRENCY = 100 MAX_HTTP_DOWNLOAD_SIZE = 50 * 1000 * 1000 * 1000 # 50 GB # Constants for serialization @@ -215,18 +214,18 @@ def _as_int(value: Optional[str]) -> Optional[int]: # Disable sending the cached token by default is all HTTP requests to the Hub HF_HUB_DISABLE_IMPLICIT_TOKEN: bool = _is_true(os.environ.get("HF_HUB_DISABLE_IMPLICIT_TOKEN")) -# Enable fast-download using external dependency "hf_transfer" -# See: -# - https://pypi.org/project/hf-transfer/ -# - https://github.com/huggingface/hf_transfer (private) -HF_HUB_ENABLE_HF_TRANSFER: bool = _is_true(os.environ.get("HF_HUB_ENABLE_HF_TRANSFER")) +HF_XET_HIGH_PERFORMANCE: bool = _is_true(os.environ.get("HF_XET_HIGH_PERFORMANCE")) +# hf_transfer is not used anymore. Let's warn user is case they set the env variable +if _is_true(os.environ.get("HF_HUB_ENABLE_HF_TRANSFER")) and not HF_XET_HIGH_PERFORMANCE: + import warnings -# UNUSED -# We don't use symlinks in local dir anymore. -HF_HUB_LOCAL_DIR_AUTO_SYMLINK_THRESHOLD: int = ( - _as_int(os.environ.get("HF_HUB_LOCAL_DIR_AUTO_SYMLINK_THRESHOLD")) or 5 * 1024 * 1024 -) + warnings.warn( + "The `HF_HUB_ENABLE_HF_TRANSFER` environment variable is deprecated as 'hf_transfer' is not used anymore. " + "Please use `HF_XET_HIGH_PERFORMANCE` instead to enable high performance transfer with Xet. " + "Visit https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#hfxethighperformance for more details.", + DeprecationWarning, + ) # Used to override the etag timeout on a system level HF_HUB_ETAG_TIMEOUT: int = _as_int(os.environ.get("HF_HUB_ETAG_TIMEOUT")) or DEFAULT_ETAG_TIMEOUT diff --git a/src/huggingface_hub/file_download.py b/src/huggingface_hub/file_download.py index 5b835ba0f3..9db9098ee6 100644 --- a/src/huggingface_hub/file_download.py +++ b/src/huggingface_hub/file_download.py @@ -379,38 +379,16 @@ def http_get( # If the file is already fully downloaded, we don't need to download it again. return - has_custom_range_header = headers is not None and any(h.lower() == "range" for h in headers) - hf_transfer = None - if constants.HF_HUB_ENABLE_HF_TRANSFER: - if resume_size != 0: - warnings.warn("'hf_transfer' does not support `resume_size`: falling back to regular download method") - elif has_custom_range_header: - warnings.warn("'hf_transfer' ignores custom 'Range' headers; falling back to regular download method") - else: - try: - import hf_transfer # type: ignore[no-redef] - except ImportError: - raise ValueError( - "Fast download using 'hf_transfer' is enabled" - " (HF_HUB_ENABLE_HF_TRANSFER=1) but 'hf_transfer' package is not" - " available in your environment. Try `pip install hf_transfer`." - ) - initial_headers = headers headers = copy.deepcopy(headers) or {} if resume_size > 0: headers["Range"] = _adjust_range_header(headers.get("Range"), resume_size) elif expected_size and expected_size > constants.MAX_HTTP_DOWNLOAD_SIZE: - # Any files over 50GB will not be available through basic http request. - # Setting the range header to 0-0 will force the server to return the file size in the Content-Range header. - # Since hf_transfer splits the download into chunks, the process will succeed afterwards. - if hf_transfer: - headers["Range"] = "bytes=0-0" - else: - raise ValueError( - "The file is too large to be downloaded using the regular download method. Use `hf_transfer` or `hf_xet` instead." - " Try `pip install hf_transfer` or `pip install hf_xet`." - ) + # Any files over 50GB will not be available through basic http requests. + raise ValueError( + "The file is too large to be downloaded using the regular download method. " + " Install `hf_xet` with `pip install hf_xet` for xet-powered downloads." + ) with http_stream_backoff( method="GET", @@ -451,31 +429,6 @@ def http_get( ) with progress_cm as progress: - if hf_transfer and total is not None and total > 5 * constants.DOWNLOAD_CHUNK_SIZE: - try: - hf_transfer.download( - url=url, - filename=temp_file.name, - max_files=constants.HF_TRANSFER_CONCURRENCY, - chunk_size=constants.DOWNLOAD_CHUNK_SIZE, - headers=initial_headers, - parallel_failures=3, - max_retries=5, - callback=progress.update, - ) - except Exception as e: - raise RuntimeError( - "An error occurred while downloading using `hf_transfer`. Consider" - " disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling." - ) from e - if expected_size is not None and expected_size != os.path.getsize(temp_file.name): - raise EnvironmentError( - consistency_error_message.format( - actual_size=os.path.getsize(temp_file.name), - ) - ) - return - new_resume_size = resume_size try: for chunk in response.iter_bytes(chunk_size=constants.DOWNLOAD_CHUNK_SIZE): @@ -1780,7 +1733,7 @@ def _download_to_tmp_and_move( Internal logic: - return early if file is already downloaded - resume download if possible (from incomplete file) - - do not resume download if `force_download=True` or `HF_HUB_ENABLE_HF_TRANSFER=True` + - do not resume download if `force_download=True` - check disk space before downloading - download content to a temporary file - set correct permissions on temporary file @@ -1792,16 +1745,11 @@ def _download_to_tmp_and_move( # Do nothing if already exists (except if force_download=True) return - if incomplete_path.exists() and (force_download or constants.HF_HUB_ENABLE_HF_TRANSFER): + if incomplete_path.exists() and force_download: # By default, we will try to resume the download if possible. - # However, if the user has set `force_download=True` or if `hf_transfer` is enabled, then we should + # However, if the user has set `force_download=True`, then we should # not resume the download => delete the incomplete file. - message = f"Removing incomplete file '{incomplete_path}'" - if force_download: - message += " (force_download=True)" - elif constants.HF_HUB_ENABLE_HF_TRANSFER: - message += " (hf_transfer=True)" - logger.info(message) + logger.info(f"Removing incomplete file '{incomplete_path}' (force_download=True)") incomplete_path.unlink(missing_ok=True) with incomplete_path.open("ab") as f: diff --git a/src/huggingface_hub/hf_api.py b/src/huggingface_hub/hf_api.py index 097d8673a6..721a255792 100644 --- a/src/huggingface_hub/hf_api.py +++ b/src/huggingface_hub/hf_api.py @@ -145,7 +145,6 @@ "trendingScore", "usedStorage", "widgetData", - "xetEnabled", ] ExpandDatasetProperty_T = Literal[ @@ -168,7 +167,6 @@ "tags", "trendingScore", "usedStorage", - "xetEnabled", ] ExpandSpaceProperty_T = Literal[ @@ -190,7 +188,6 @@ "tags", "trendingScore", "usedStorage", - "xetEnabled", ] USERNAME_PLACEHOLDER = "hf_user" @@ -801,7 +798,6 @@ class ModelInfo: spaces: Optional[list[str]] safetensors: Optional[SafeTensorsInfo] security_repo_status: Optional[dict] - xet_enabled: Optional[bool] def __init__(self, **kwargs): self.id = kwargs.pop("id") @@ -889,7 +885,6 @@ def __init__(self, **kwargs): else None ) self.security_repo_status = kwargs.pop("securityRepoStatus", None) - self.xet_enabled = kwargs.pop("xetEnabled", None) # backwards compatibility self.lastModified = self.last_modified self.cardData = self.card_data @@ -960,7 +955,6 @@ class DatasetInfo: trending_score: Optional[int] card_data: Optional[DatasetCardData] siblings: Optional[list[RepoSibling]] - xet_enabled: Optional[bool] def __init__(self, **kwargs): self.id = kwargs.pop("id") @@ -1006,7 +1000,6 @@ def __init__(self, **kwargs): if siblings is not None else None ) - self.xet_enabled = kwargs.pop("xetEnabled", None) # backwards compatibility self.lastModified = self.last_modified self.cardData = self.card_data @@ -1085,7 +1078,6 @@ class SpaceInfo: runtime: Optional[SpaceRuntime] models: Optional[list[str]] datasets: Optional[list[str]] - xet_enabled: Optional[bool] def __init__(self, **kwargs): self.id = kwargs.pop("id") @@ -1134,7 +1126,6 @@ def __init__(self, **kwargs): self.runtime = SpaceRuntime(runtime) if runtime else None self.models = kwargs.pop("models", None) self.datasets = kwargs.pop("datasets", None) - self.xet_enabled = kwargs.pop("xetEnabled", None) # backwards compatibility self.lastModified = self.last_modified self.cardData = self.card_data @@ -1838,7 +1829,7 @@ def list_models( expand (`list[ExpandModelProperty_T]`, *optional*): List properties to return in the response. When used, only the properties in the list will be returned. This parameter cannot be used if `full`, `cardData` or `fetch_config` are passed. - Possible values are `"author"`, `"cardData"`, `"config"`, `"createdAt"`, `"disabled"`, `"downloads"`, `"downloadsAllTime"`, `"gated"`, `"gguf"`, `"inference"`, `"inferenceProviderMapping"`, `"lastModified"`, `"library_name"`, `"likes"`, `"mask_token"`, `"model-index"`, `"pipeline_tag"`, `"private"`, `"safetensors"`, `"sha"`, `"siblings"`, `"spaces"`, `"tags"`, `"transformersInfo"`, `"trendingScore"`, `"widgetData"`, `"resourceGroup"` and `"xetEnabled"`. + Possible values are `"author"`, `"cardData"`, `"config"`, `"createdAt"`, `"disabled"`, `"downloads"`, `"downloadsAllTime"`, `"gated"`, `"gguf"`, `"inference"`, `"inferenceProviderMapping"`, `"lastModified"`, `"library_name"`, `"likes"`, `"mask_token"`, `"model-index"`, `"pipeline_tag"`, `"private"`, `"safetensors"`, `"sha"`, `"siblings"`, `"spaces"`, `"tags"`, `"transformersInfo"`, `"trendingScore"`, `"widgetData"`, and `"resourceGroup"`. full (`bool`, *optional*): Whether to fetch all model data, including the `last_modified`, the `sha`, the files and the `tags`. This is set to `True` by @@ -2049,7 +2040,7 @@ def list_datasets( expand (`list[ExpandDatasetProperty_T]`, *optional*): List properties to return in the response. When used, only the properties in the list will be returned. This parameter cannot be used if `full` is passed. - Possible values are `"author"`, `"cardData"`, `"citation"`, `"createdAt"`, `"disabled"`, `"description"`, `"downloads"`, `"downloadsAllTime"`, `"gated"`, `"lastModified"`, `"likes"`, `"paperswithcode_id"`, `"private"`, `"siblings"`, `"sha"`, `"tags"`, `"trendingScore"`, `"usedStorage"`, `"resourceGroup"` and `"xetEnabled"`. + Possible values are `"author"`, `"cardData"`, `"citation"`, `"createdAt"`, `"disabled"`, `"description"`, `"downloads"`, `"downloadsAllTime"`, `"gated"`, `"lastModified"`, `"likes"`, `"paperswithcode_id"`, `"private"`, `"siblings"`, `"sha"`, `"tags"`, `"trendingScore"`, `"usedStorage"`, and `"resourceGroup"`. full (`bool`, *optional*): Whether to fetch all dataset data, including the `last_modified`, the `card_data` and the files. Can contain useful information such as the @@ -2227,7 +2218,7 @@ def list_spaces( expand (`list[ExpandSpaceProperty_T]`, *optional*): List properties to return in the response. When used, only the properties in the list will be returned. This parameter cannot be used if `full` is passed. - Possible values are `"author"`, `"cardData"`, `"datasets"`, `"disabled"`, `"lastModified"`, `"createdAt"`, `"likes"`, `"models"`, `"private"`, `"runtime"`, `"sdk"`, `"siblings"`, `"sha"`, `"subdomain"`, `"tags"`, `"trendingScore"`, `"usedStorage"`, `"resourceGroup"` and `"xetEnabled"`. + Possible values are `"author"`, `"cardData"`, `"datasets"`, `"disabled"`, `"lastModified"`, `"createdAt"`, `"likes"`, `"models"`, `"private"`, `"runtime"`, `"sdk"`, `"siblings"`, `"sha"`, `"subdomain"`, `"tags"`, `"trendingScore"`, `"usedStorage"`, and `"resourceGroup"`. full (`bool`, *optional*): Whether to fetch all Spaces data, including the `last_modified`, `siblings` and `card_data` fields. @@ -2488,7 +2479,7 @@ def model_info( expand (`list[ExpandModelProperty_T]`, *optional*): List properties to return in the response. When used, only the properties in the list will be returned. This parameter cannot be used if `securityStatus` or `files_metadata` are passed. - Possible values are `"author"`, `"baseModels"`, `"cardData"`, `"childrenModelCount"`, `"config"`, `"createdAt"`, `"disabled"`, `"downloads"`, `"downloadsAllTime"`, `"gated"`, `"gguf"`, `"inference"`, `"inferenceProviderMapping"`, `"lastModified"`, `"library_name"`, `"likes"`, `"mask_token"`, `"model-index"`, `"pipeline_tag"`, `"private"`, `"safetensors"`, `"sha"`, `"siblings"`, `"spaces"`, `"tags"`, `"transformersInfo"`, `"trendingScore"`, `"widgetData"`, `"usedStorage"`, `"resourceGroup"` and `"xetEnabled"`. + Possible values are `"author"`, `"baseModels"`, `"cardData"`, `"childrenModelCount"`, `"config"`, `"createdAt"`, `"disabled"`, `"downloads"`, `"downloadsAllTime"`, `"gated"`, `"gguf"`, `"inference"`, `"inferenceProviderMapping"`, `"lastModified"`, `"library_name"`, `"likes"`, `"mask_token"`, `"model-index"`, `"pipeline_tag"`, `"private"`, `"safetensors"`, `"sha"`, `"siblings"`, `"spaces"`, `"tags"`, `"transformersInfo"`, `"trendingScore"`, `"widgetData"`, `"usedStorage"`, and `"resourceGroup"`. token (Union[bool, str, None], optional): A valid user access token (string). Defaults to the locally saved token, which is the recommended method for authentication (see @@ -2559,7 +2550,7 @@ def dataset_info( expand (`list[ExpandDatasetProperty_T]`, *optional*): List properties to return in the response. When used, only the properties in the list will be returned. This parameter cannot be used if `files_metadata` is passed. - Possible values are `"author"`, `"cardData"`, `"citation"`, `"createdAt"`, `"disabled"`, `"description"`, `"downloads"`, `"downloadsAllTime"`, `"gated"`, `"lastModified"`, `"likes"`, `"paperswithcode_id"`, `"private"`, `"siblings"`, `"sha"`, `"tags"`, `"trendingScore"`,`"usedStorage"`, `"resourceGroup"` and `"xetEnabled"`. + Possible values are `"author"`, `"cardData"`, `"citation"`, `"createdAt"`, `"disabled"`, `"description"`, `"downloads"`, `"downloadsAllTime"`, `"gated"`, `"lastModified"`, `"likes"`, `"paperswithcode_id"`, `"private"`, `"siblings"`, `"sha"`, `"tags"`, `"trendingScore"`,`"usedStorage"`, and `"resourceGroup"`. token (Union[bool, str, None], optional): A valid user access token (string). Defaults to the locally saved token, which is the recommended method for authentication (see @@ -2629,7 +2620,7 @@ def space_info( expand (`list[ExpandSpaceProperty_T]`, *optional*): List properties to return in the response. When used, only the properties in the list will be returned. This parameter cannot be used if `full` is passed. - Possible values are `"author"`, `"cardData"`, `"createdAt"`, `"datasets"`, `"disabled"`, `"lastModified"`, `"likes"`, `"models"`, `"private"`, `"runtime"`, `"sdk"`, `"siblings"`, `"sha"`, `"subdomain"`, `"tags"`, `"trendingScore"`, `"usedStorage"`, `"resourceGroup"` and `"xetEnabled"`. + Possible values are `"author"`, `"cardData"`, `"createdAt"`, `"datasets"`, `"disabled"`, `"lastModified"`, `"likes"`, `"models"`, `"private"`, `"runtime"`, `"sdk"`, `"siblings"`, `"sha"`, `"subdomain"`, `"tags"`, `"trendingScore"`, `"usedStorage"`, and `"resourceGroup"`. token (Union[bool, str, None], optional): A valid user access token (string). Defaults to the locally saved token, which is the recommended method for authentication (see @@ -3702,7 +3693,6 @@ def update_repo_settings( private: Optional[bool] = None, token: Union[str, bool, None] = None, repo_type: Optional[str] = None, - xet_enabled: Optional[bool] = None, ) -> None: """ Update the settings of a repository, including gated access and visibility. @@ -3728,8 +3718,6 @@ def update_repo_settings( repo_type (`str`, *optional*): The type of the repository to update settings from (`"model"`, `"dataset"` or `"space"`). Defaults to `"model"`. - xet_enabled (`bool`, *optional*): - Whether the repository should be enabled for Xet Storage. Raises: [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError) If gated is not one of "auto", "manual", or False. @@ -3758,9 +3746,6 @@ def update_repo_settings( if private is not None: payload["private"] = private - if xet_enabled is not None: - payload["xetEnabled"] = xet_enabled - if len(payload) == 0: raise ValueError("At least one setting must be updated.") @@ -5052,14 +5037,13 @@ def upload_large_folder( 4. Pre-upload LFS file if at least 1 file and no worker is pre-uploading. 5. Hash file if at least 1 file and no worker is hashing. 6. Get upload mode if at least 1 file and no worker is getting upload mode. - 7. Pre-upload LFS file if at least 1 file (exception: if hf_transfer is enabled, only 1 worker can preupload LFS at a time). + 7. Pre-upload LFS file if at least 1 file. 8. Hash file if at least 1 file to hash. 9. Get upload mode if at least 1 file to get upload mode. 10. Commit if at least 1 file to commit and at least 1 min since last commit attempt. 11. Commit if at least 1 file to commit and all other queues are empty. Special rules: - - If `hf_transfer` is enabled, only 1 LFS uploader at a time. Otherwise the CPU would be bloated by `hf_transfer`. - Only one worker can commit at a time. - If no tasks are available, the worker waits for 10 seconds before checking again. """ diff --git a/src/huggingface_hub/hf_file_system.py b/src/huggingface_hub/hf_file_system.py index 6dff8eca73..fd53c8a94e 100644 --- a/src/huggingface_hub/hf_file_system.py +++ b/src/huggingface_hub/hf_file_system.py @@ -991,9 +991,8 @@ def _upload_chunk(self, final: bool = False) -> None: def read(self, length=-1): """Read remote file. - If `length` is not provided or is -1, the entire file is downloaded and read. On POSIX systems and if - `hf_transfer` is not enabled, the file is loaded in memory directly. Otherwise, the file is downloaded to a - temporary file and read from there. + If `length` is not provided or is -1, the entire file is downloaded and read. On POSIX systems the file is + loaded in memory directly. Otherwise, the file is downloaded to a temporary file and read from there. """ if self.mode == "rb" and (length is None or length == -1) and self.loc == 0: with self.fs.open(self.path, "rb", block_size=0) as f: # block_size=0 enables fast streaming diff --git a/src/huggingface_hub/lfs.py b/src/huggingface_hub/lfs.py index 70935555ae..b33400f94f 100644 --- a/src/huggingface_hub/lfs.py +++ b/src/huggingface_hub/lfs.py @@ -16,11 +16,9 @@ import io import re -import warnings from dataclasses import dataclass from math import ceil from os.path import getsize -from pathlib import Path from typing import TYPE_CHECKING, BinaryIO, Iterable, Optional, TypedDict from urllib.parse import unquote @@ -33,12 +31,10 @@ hf_raise_for_status, http_backoff, logging, - tqdm, validate_hf_hub_args, ) from .utils._lfs import SliceFileObj from .utils.sha import sha256, sha_fileobj -from .utils.tqdm import is_tqdm_disabled if TYPE_CHECKING: @@ -332,23 +328,9 @@ def _upload_multi_part(operation: "CommitOperationAdd", header: dict, chunk_size # 1. Get upload URLs for each part sorted_parts_urls = _get_sorted_parts_urls(header=header, upload_info=operation.upload_info, chunk_size=chunk_size) - # 2. Upload parts (either with hf_transfer or in pure Python) - use_hf_transfer = constants.HF_HUB_ENABLE_HF_TRANSFER - if ( - constants.HF_HUB_ENABLE_HF_TRANSFER - and not isinstance(operation.path_or_fileobj, str) - and not isinstance(operation.path_or_fileobj, Path) - ): - warnings.warn( - "hf_transfer is enabled but does not support uploading from bytes or BinaryIO, falling back to regular" - " upload" - ) - use_hf_transfer = False - - response_headers = ( - _upload_parts_hf_transfer(operation=operation, sorted_parts_urls=sorted_parts_urls, chunk_size=chunk_size) - if use_hf_transfer - else _upload_parts_iteratively(operation=operation, sorted_parts_urls=sorted_parts_urls, chunk_size=chunk_size) + # 2. Upload parts (pure Python) + response_headers = _upload_parts_iteratively( + operation=operation, sorted_parts_urls=sorted_parts_urls, chunk_size=chunk_size ) # 3. Send completion request @@ -409,47 +391,3 @@ def _upload_parts_iteratively( hf_raise_for_status(part_upload_res) headers.append(part_upload_res.headers) return headers # type: ignore - - -def _upload_parts_hf_transfer( - operation: "CommitOperationAdd", sorted_parts_urls: list[str], chunk_size: int -) -> list[dict]: - # Upload file using an external Rust-based package. Upload is faster but support less features (no progress bars). - try: - from hf_transfer import multipart_upload - except ImportError: - raise ValueError( - "Fast uploading using 'hf_transfer' is enabled (HF_HUB_ENABLE_HF_TRANSFER=1) but 'hf_transfer' package is" - " not available in your environment. Try `pip install hf_transfer`." - ) - - total = operation.upload_info.size - desc = operation.path_in_repo - if len(desc) > 40: - desc = f"(…){desc[-40:]}" - - with tqdm( - unit="B", - unit_scale=True, - total=total, - initial=0, - desc=desc, - disable=is_tqdm_disabled(logger.getEffectiveLevel()), - name="huggingface_hub.lfs_upload", - ) as progress: - try: - output = multipart_upload( - file_path=operation.path_or_fileobj, - parts_urls=sorted_parts_urls, - chunk_size=chunk_size, - max_files=128, - parallel_failures=127, # could be removed - max_retries=5, - callback=progress.update, - ) - except Exception as e: - raise RuntimeError( - "An error occurred while uploading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for" - " better error handling." - ) from e - return output diff --git a/src/huggingface_hub/utils/__init__.py b/src/huggingface_hub/utils/__init__.py index bccc01174d..ce1b54fa43 100644 --- a/src/huggingface_hub/utils/__init__.py +++ b/src/huggingface_hub/utils/__init__.py @@ -75,7 +75,6 @@ get_gradio_version, get_graphviz_version, get_hf_hub_version, - get_hf_transfer_version, get_jinja_version, get_numpy_version, get_pillow_version, @@ -94,7 +93,6 @@ is_google_colab, is_gradio_available, is_graphviz_available, - is_hf_transfer_available, is_jinja_available, is_notebook, is_numpy_available, diff --git a/src/huggingface_hub/utils/_runtime.py b/src/huggingface_hub/utils/_runtime.py index 445be52baf..32b15dbdc8 100644 --- a/src/huggingface_hub/utils/_runtime.py +++ b/src/huggingface_hub/utils/_runtime.py @@ -36,7 +36,6 @@ "fastcore": {"fastcore"}, "gradio": {"gradio"}, "graphviz": {"graphviz"}, - "hf_transfer": {"hf_transfer"}, "hf_xet": {"hf_xet"}, "jinja": {"Jinja2"}, "httpx": {"httpx"}, @@ -145,15 +144,6 @@ def get_graphviz_version() -> str: return _get_version("graphviz") -# hf_transfer -def is_hf_transfer_available() -> bool: - return is_package_available("hf_transfer") - - -def get_hf_transfer_version() -> str: - return _get_version("hf_transfer") - - # httpx def is_httpx_available() -> bool: return is_package_available("httpx") @@ -414,13 +404,10 @@ def dump_environment_info() -> dict[str, Any]: info["Installation method"] = installation_method() # Installed dependencies - info["Torch"] = get_torch_version() info["httpx"] = get_httpx_version() - info["hf_transfer"] = get_hf_transfer_version() info["hf_xet"] = get_xet_version() info["gradio"] = get_gradio_version() info["tensorboard"] = get_tensorboard_version() - info["pydantic"] = get_pydantic_version() # Environment variables info["ENDPOINT"] = constants.ENDPOINT @@ -435,9 +422,9 @@ def dump_environment_info() -> dict[str, Any]: info["HF_HUB_DISABLE_EXPERIMENTAL_WARNING"] = constants.HF_HUB_DISABLE_EXPERIMENTAL_WARNING info["HF_HUB_DISABLE_IMPLICIT_TOKEN"] = constants.HF_HUB_DISABLE_IMPLICIT_TOKEN info["HF_HUB_DISABLE_XET"] = constants.HF_HUB_DISABLE_XET - info["HF_HUB_ENABLE_HF_TRANSFER"] = constants.HF_HUB_ENABLE_HF_TRANSFER info["HF_HUB_ETAG_TIMEOUT"] = constants.HF_HUB_ETAG_TIMEOUT info["HF_HUB_DOWNLOAD_TIMEOUT"] = constants.HF_HUB_DOWNLOAD_TIMEOUT + info["HF_XET_HIGH_PERFORMANCE"] = constants.HF_XET_HIGH_PERFORMANCE print("\nCopy-and-paste the text below in your GitHub issue.\n") print("\n".join([f"- {prop}: {val}" for prop, val in info.items()]) + "\n") diff --git a/tests/test_cli.py b/tests/test_cli.py index 7ea7d084c6..ecd95ee484 100644 --- a/tests/test_cli.py +++ b/tests/test_cli.py @@ -346,8 +346,6 @@ def test_upload_explicit_paths(self) -> None: class TestUploadImpl: - @patch("huggingface_hub.cli.upload.is_xet_available", return_value=True) - @patch("huggingface_hub.cli.upload.HF_HUB_ENABLE_HF_TRANSFER", False) def test_upload_folder_mock(self, *_: object) -> None: api = Mock() api.create_repo.return_value = Mock(repo_id="my-model") @@ -390,8 +388,6 @@ def test_upload_folder_mock(self, *_: object) -> None: ) print_mock.assert_called_once_with("done") - @patch("huggingface_hub.cli.upload.is_xet_available", return_value=True) - @patch("huggingface_hub.cli.upload.HF_HUB_ENABLE_HF_TRANSFER", False) def test_upload_file_mock(self, *_: object) -> None: api = Mock() api.create_repo.return_value = Mock(repo_id="my-dataset") @@ -430,8 +426,6 @@ def test_upload_file_mock(self, *_: object) -> None: ) print_mock.assert_called_once_with("uploaded") - @patch("huggingface_hub.cli.upload.is_xet_available", return_value=True) - @patch("huggingface_hub.cli.upload.HF_HUB_ENABLE_HF_TRANSFER", False) def test_upload_file_no_revision_mock(self, *_: object) -> None: api = Mock() api.create_repo.return_value = Mock(repo_id="my-model") @@ -450,8 +444,6 @@ def test_upload_file_no_revision_mock(self, *_: object) -> None: ) api.repo_info.assert_not_called() - @patch("huggingface_hub.cli.upload.is_xet_available", return_value=True) - @patch("huggingface_hub.cli.upload.HF_HUB_ENABLE_HF_TRANSFER", False) def test_upload_file_with_revision_mock(self, *_: object) -> None: api = Mock() api.create_repo.return_value = Mock(repo_id="my-model") @@ -475,8 +467,6 @@ def test_upload_file_with_revision_mock(self, *_: object) -> None: repo_id="my-model", repo_type="model", branch="my-branch", exist_ok=True ) - @patch("huggingface_hub.cli.upload.is_xet_available", return_value=True) - @patch("huggingface_hub.cli.upload.HF_HUB_ENABLE_HF_TRANSFER", False) def test_upload_file_revision_and_create_pr_mock(self, *_: object) -> None: api = Mock() api.create_repo.return_value = Mock(repo_id="my-model") @@ -498,8 +488,6 @@ def test_upload_file_revision_and_create_pr_mock(self, *_: object) -> None: api.repo_info.assert_not_called() api.create_branch.assert_not_called() - @patch("huggingface_hub.cli.upload.is_xet_available", return_value=True) - @patch("huggingface_hub.cli.upload.HF_HUB_ENABLE_HF_TRANSFER", False) def test_upload_missing_path(self, *_: object) -> None: api = Mock() with pytest.raises(FileNotFoundError): @@ -914,7 +902,6 @@ def test_repo_settings_basic(self, runner: CliRunner) -> None: repo_id=DUMMY_MODEL_ID, gated=None, private=None, - xet_enabled=None, repo_type="model", ) @@ -942,7 +929,6 @@ def test_repo_settings_with_all_options(self, runner: CliRunner) -> None: assert kwargs["repo_id"] == DUMMY_MODEL_ID assert kwargs["repo_type"] == "dataset" assert kwargs["private"] is True - assert kwargs["xet_enabled"] is None assert kwargs["gated"] == "manual" diff --git a/tests/test_file_download.py b/tests/test_file_download.py index b1bbfc9790..a16e93e94d 100644 --- a/tests/test_file_download.py +++ b/tests/test_file_download.py @@ -42,7 +42,7 @@ http_get, try_to_load_from_cache, ) -from huggingface_hub.utils import SoftTemporaryDirectory, get_session, hf_raise_for_status, is_hf_transfer_available +from huggingface_hub.utils import SoftTemporaryDirectory, get_session, hf_raise_for_status from huggingface_hub.utils._headers import build_hf_headers from huggingface_hub.utils._http import _http_backoff_base @@ -53,7 +53,6 @@ DUMMY_MODEL_ID, DUMMY_MODEL_ID_REVISION_ONE_SPECIFIC_COMMIT, DUMMY_RENAMED_OLD_MODEL_ID, - DUMMY_TINY_FILE_NAME, SAMPLE_DATASET_IDENTIFIER, repo_name, use_tmp_repo, @@ -1309,13 +1308,12 @@ def test_etag_timeout_set_as_env_variable_parameter_ignored(self): @with_production_testing class TestExtraLargeFileDownloadPaths(unittest.TestCase): - @patch("huggingface_hub.file_download.constants.HF_HUB_ENABLE_HF_TRANSFER", False) @patch("huggingface_hub.file_download.constants.HF_HUB_DISABLE_XET", True) def test_large_file_http_path_error(self): with SoftTemporaryDirectory() as cache_dir: with self.assertRaises( ValueError, - msg="The file is too large to be downloaded using the regular download method. Use `hf_transfer` or `xet_get` instead. Try `pip install hf_transfer` or `pip install hf_xet`.", + msg="The file is too large to be downloaded using the regular download method. Install `hf_xet` with `pip install hf_xet` for xet-powered downloads.", ): hf_hub_download( DUMMY_EXTRA_LARGE_FILE_MODEL_ID, @@ -1325,29 +1323,6 @@ def test_large_file_http_path_error(self): etag_timeout=10, ) - # Test "large" file download with hf_transfer. Use a tiny file to keep the tests fast and avoid - # internal gateway transfer quotas. - @unittest.skipIf( - not is_hf_transfer_available(), - "hf_transfer not installed, so skipping large file download with hf_transfer check.", - ) - @patch("huggingface_hub.file_download.constants.HF_HUB_ENABLE_HF_TRANSFER", True) - @patch("huggingface_hub.file_download.constants.HF_HUB_DISABLE_XET", True) - @patch("huggingface_hub.file_download.constants.MAX_HTTP_DOWNLOAD_SIZE", 44) - @patch("huggingface_hub.file_download.constants.DOWNLOAD_CHUNK_SIZE", 2) # make sure hf_download is used - def test_large_file_download_with_hf_transfer(self): - with SoftTemporaryDirectory() as cache_dir: - path = hf_hub_download( - DUMMY_EXTRA_LARGE_FILE_MODEL_ID, - filename=DUMMY_TINY_FILE_NAME, - cache_dir=cache_dir, - revision="main", - etag_timeout=10, - ) - with open(path, "rb") as f: - content = f.read() - self.assertEqual(content, b"test\n" * 9) # the file is 9 lines of "test" - def _recursive_chmod(path: str, mode: int) -> None: # Taken from https://stackoverflow.com/a/2853934 diff --git a/tests/test_hf_api.py b/tests/test_hf_api.py index 935bdbb1d4..24c648e366 100644 --- a/tests/test_hf_api.py +++ b/tests/test_hf_api.py @@ -261,13 +261,6 @@ def test_update_dataset_repo_settings(self, repo_url: RepoUrl): assert info.gated == gated_value assert info.private == private_value - @use_tmp_repo(repo_type="model") - def test_update_repo_settings_xet_enabled(self, repo_url: RepoUrl): - repo_id = repo_url.repo_id - self._api.update_repo_settings(repo_id=repo_id, xet_enabled=True) - info = self._api.model_info(repo_id, expand="xetEnabled") - assert info.xet_enabled - class CommitApiTest(HfApiCommonTest): def setUp(self) -> None: @@ -4378,6 +4371,7 @@ def _check_expand_property_is_up_to_date(self, repo_url: RepoUrl): defined_args = set(get_args(property_type)) expected_args = set(message.replace('"expand" must be one of ', "").strip("[]").split(", ")) expected_args.discard("gitalyUid") # internal one, do not document + expected_args.discard("xetEnabled") # all repos are xetEnabled now, so we don't document it anymore if defined_args != expected_args: should_be_removed = defined_args - expected_args diff --git a/tests/test_hf_file_system.py b/tests/test_hf_file_system.py index 26c1eb8ae3..f00740bd1e 100644 --- a/tests/test_hf_file_system.py +++ b/tests/test_hf_file_system.py @@ -459,7 +459,7 @@ def test_get_file_with_temporary_file(self): assert temp_file.read() == b"dummy text data" def test_get_file_with_temporary_folder(self): - # Test passing a file path works => compatible with hf_transfer + # Test passing a file path works with tempfile.TemporaryDirectory() as temp_dir: temp_file = os.path.join(temp_dir, "temp_file.txt") self.hffs.get_file(self.text_file, temp_file) diff --git a/tests/test_xet_upload.py b/tests/test_xet_upload.py index 471d2150b9..d7163d2f88 100644 --- a/tests/test_xet_upload.py +++ b/tests/test_xet_upload.py @@ -57,7 +57,6 @@ def api(): @pytest.fixture def repo_url(api, repo_type: str = "model"): repo_url = api.create_repo(repo_id=repo_name(prefix=repo_type), repo_type=repo_type) - api.update_repo_settings(repo_id=repo_url.repo_id, xet_enabled=True) yield repo_url