better default for `num_workers` #3532

Qubitium · 2025-11-07T03:13:31Z

The num_workers (threads) spawned by hub appears to be excessive on large core count systems, i.e. 96 cores. Having 94 cores threads doesn't help, only hurts, with upload speed or with hash as it assumes no other tasks are running on the system.

Also there is a strange default min of 2 for threads on a micro-vm with only 1 core. Not sure why 2 is used for system with 1 cpu core.

This PR changes default num_workers to cpu cores // 2 and min 1.

Wauplin · 2025-11-07T13:19:49Z

Hey @Qubitium, thanks for raising the question. So far we've never seen any complaints regarding (almost) maxxing the CPU cores. We consider that someone using hf upload-large-folder wants to upload as fast as possible their data, hence trying to use as many cores as possible when that's possible. I admit it's not perfect but having a correct rule that works for everyone is close to impossible. That's why the recommended approach is to use the --num-workers parameter, since only the user knows how their data look like, what machine and bandwidth they have, etc.

I am therefore not convinced we should change the current rule. Happy to reconsider if we get more reports about that.

Qubitium · 2025-11-07T13:47:21Z

Hey @Qubitium, thanks for raising the question. So far we've never seen any complaints regarding (almost) maxxing the CPU cores. We consider that someone using hf upload-large-folder wants to upload as fast as possible their data, hence trying to use as many cores as possible when that's possible. I admit it's not perfect but having a correct rule that works for everyone is close to impossible. That's why the recommended approach is to use the --num-workers parameter, since only the user knows how their data look like, what machine and bandwidth they have, etc.

I am therefore not convinced we should change the current rule. Happy to reconsider if we get more reports about that.

A single thread with large enough file can sufficiently saturate a 10Gb/s connection. So I have just proven that threads != upload speed. Threads only help if you have lots of very small files.

The code I am changing is the default case not the per user optimal case which users can set num_workers.

From my point of view, threads are not free nor are they useful in large numbers. You only want to launch as many as you need and no more. It is actually better to under-thread than over-thread.

So far we've never seen any complaints regarding (almost) maxxing the CPU cores.

I am that person.

Wauplin · 2025-11-07T14:25:13Z

Threads only help if you have lots of very small files.

This is actually very true, ... and happening :) We do have users uploading thousands of files to a dataset.

A single thread with large enough file can sufficiently saturate a 10Gb/s connection

The threads are doing 2 things: hashing files and uploading them. Hashing requires disk I/O + compute time while upload requires disk I/O and "internet" I/O. So even if you have a thread saturating a 10GB/s connection, you would still want to compute the hash of the next file in the background. I know the GIL is limiting this a lot but without knowing what is the limiting factor between disk I/O, outbound I/O and compute, you can't make much assumptions.

From my point of view, threads are not free nor are they useful in large numbers. You only want to launch as many as you need and no more. It is actually better to under-thread than over-thread.

Kinda agree with that, though not convinced we should change to half the cores by default.

I am that person.

A few more upvotes and happy to reconsider

Qubitium · 2025-11-07T18:38:37Z

Threads only help if you have lots of very small files.

This is actually very true, ... and happening :) We do have users uploading thousands of files to a dataset.

So the default is optmize for this case? I consider this not the default case. This PR title has title better default for num_workers for a reason. What about models?

A single thread with large enough file can sufficiently saturate a 10Gb/s connection

The threads are doing 2 things: hashing files and uploading them. Hashing requires disk I/O + compute time while upload requires disk I/O and "internet" I/O. So even if you have a thread saturating a 10GB/s connection, you would still want to compute the hash of the next file in the background. I know the GIL is limiting this a lot but without knowing what is the limiting factor between disk I/O, outbound I/O and compute, you can't make much assumptions.

Ah. Now you are finally talking about merit instead of blasting about upload speed. Having100 threads reading from disk also does not speed up disk i/o. It does speed hashing after loading from disk i/o but you can check any nvme or linux disk i/o stack and no i/o is optimized for 100 random i/o threads in normal/default setup, unless you have like tons of nvme drives or cluster nfs/io. Again, we are talking about default. I am not trying to optimize anything. Your default is.

It is likely that the imaginarly speed up you users are seeing is actually in reality a slowdown.

Bechmark it, you will see that launching so many threads will actually hurt your disk i/o.

If threads are so good, your datbase will be spawning theads == cpu cores but they don't for a reason and datbases are kind of like this thousands of small i/os you are talking about.

So lots of threads only:

not help disk i/o: it can only hurt
not help network i/o: it can only hurt
may help hashing if system is not doing something else. Hasing is cpu only will scale linearly. So this part is perfect for large threads except python has a gil threadlock. You may need to benchmark this too. It may not be that much faster with so many threads.

From my point of view, threads are not free nor are they useful in large numbers. You only want to launch as many as you need and no more. It is actually better to under-thread than over-thread.

Kinda agree with that, though not convinced we should change to half the cores by default.

I am that person.

A few more upvotes and happy to reconsider

Why the min value 2 threads btw. which another point in this PR? What if I have an vm with 1 core?

This PR optmizes nothing. It moves the threads to better mean value. The current default is the optimized version based on assumptions.

src/huggingface_hub/_upload_large_folder.py

Wauplin

I'm not a fan of the tone of the conversation but we've discussed it a bit internally and fine for this change. Thanks for suggesting and arguing for it.

HuggingFaceDocBuilderDev · 2025-11-12T15:23:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

better default for num_workers

f4c26ad

Merge branch 'main' into default-workers

7c44d81

Wauplin reviewed Nov 12, 2025

View reviewed changes

src/huggingface_hub/_upload_large_folder.py Outdated Show resolved Hide resolved

Wauplin and others added 2 commits November 12, 2025 16:17

Update src/huggingface_hub/_upload_large_folder.py

d6e64f2

Merge branch 'main' into default-workers

ab62ea4

Wauplin approved these changes Nov 12, 2025

View reviewed changes

Wauplin merged commit 2a9efce into huggingface:main Nov 12, 2025
17 of 18 checks passed

Qubitium deleted the default-workers branch November 12, 2025 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

better default for `num_workers` #3532

better default for `num_workers` #3532

Uh oh!

Qubitium commented Nov 7, 2025 •

edited

Loading

Uh oh!

Wauplin commented Nov 7, 2025

Uh oh!

Qubitium commented Nov 7, 2025 •

edited

Loading

Uh oh!

Wauplin commented Nov 7, 2025

Uh oh!

Qubitium commented Nov 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Wauplin left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

better default for num_workers #3532

better default for num_workers #3532

Uh oh!

Conversation

Qubitium commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Wauplin commented Nov 7, 2025

Uh oh!

Qubitium commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Wauplin commented Nov 7, 2025

Uh oh!

Qubitium commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

better default for `num_workers` #3532

better default for `num_workers` #3532

Qubitium commented Nov 7, 2025 •

edited

Loading

Qubitium commented Nov 7, 2025 •

edited

Loading

Qubitium commented Nov 7, 2025 •

edited

Loading