-
Notifications
You must be signed in to change notification settings - Fork 843
better default for num_workers
#3532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hey @Qubitium, thanks for raising the question. So far we've never seen any complaints regarding (almost) maxxing the CPU cores. We consider that someone using I am therefore not convinced we should change the current rule. Happy to reconsider if we get more reports about that. |
A single thread with large enough file can sufficiently saturate a 10Gb/s connection. So I have just proven that threads != upload speed. Threads only help if you have lots of very small files. The code I am changing is the default case not the per user optimal case which users can set From my point of view, threads are not free nor are they useful in large numbers. You only want to launch as many as you need and no more. It is actually better to under-thread than over-thread.
I am that person. |
This is actually very true, ... and happening :) We do have users uploading thousands of files to a dataset.
The threads are doing 2 things: hashing files and uploading them. Hashing requires disk I/O + compute time while upload requires disk I/O and "internet" I/O. So even if you have a thread saturating a 10GB/s connection, you would still want to compute the hash of the next file in the background. I know the GIL is limiting this a lot but without knowing what is the limiting factor between disk I/O, outbound I/O and compute, you can't make much assumptions.
Kinda agree with that, though not convinced we should change to half the cores by default.
A few more upvotes and happy to reconsider |
So the default is optmize for this case? I consider this not the default case. This PR title has title
Ah. Now you are finally talking about merit instead of blasting about upload speed. Having100 threads reading from disk also does not speed up disk i/o. It does speed hashing after loading from disk i/o but you can check any nvme or linux disk i/o stack and no i/o is optimized for 100 random i/o threads in normal/default setup, unless you have like tons of nvme drives or cluster nfs/io. Again, we are talking about It is likely that the imaginarly speed up you users are seeing is actually in reality a slowdown. Bechmark it, you will see that launching so many threads will actually hurt your disk i/o. If threads are so good, your datbase will be spawning theads == cpu cores but they don't for a reason and datbases are kind of like this thousands of small i/os you are talking about. So lots of threads only:
Why the min value 2 threads btw. which another point in this PR? What if I have an vm with 1 core? This PR optmizes nothing. It moves the threads to |
Wauplin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of the tone of the conversation but we've discussed it a bit internally and fine for this change. Thanks for suggesting and arguing for it.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
The
num_workers(threads) spawned by hub appears to be excessive on large core count systems, i.e. 96 cores. Having 94 cores threads doesn't help, only hurts, with upload speed or with hash as it assumes no other tasks are running on the system.Also there is a strange default min of
2for threads on a micro-vm with only 1 core. Not sure why 2 is used for system with 1 cpu core.This PR changes default
num_workerstocpu cores // 2and min 1.