Skip to content

Conversation

@d4l3k
Copy link
Member

@d4l3k d4l3k commented Mar 20, 2025

Tokio by default creates the runtime with threads equal to the number of CPUs. On beefy GPU boxes there may be hundreds of cores leading to hundreds of useless threads that make it harder to debug issues via lldb.

This also sets the thread names so it's easier to understand what pools are for what.

Test plan:

$ torchft_lighthouse
$ lldb -p 1234
> thread backtrace all
  thread #243, name = 'torchft-lighths', stop reason = signal SIGSTOP
    frame #0: 0x00007fd7fcd0792d libc.so.6`syscall + 29
    frame #1: 0x00007fd65a51e0a9 _torchft.cpython-312-x86_64-linux-gnu.so`parking_lot::condvar::Condvar::wait_until_internal::ha421dfa1f35f6d78 + 649
    frame #2: 0x00007fd65a50fdee _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::scheduler::multi_thread::park::Parker::park::h1064fe7e072bea0b + 222
    frame #3: 0x00007fd65a5166da _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::scheduler::multi_thread::worker::Context::park_timeout::hb6e439a80eb82fbc + 154
    frame #4: 0x00007fd65a515dbf _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::scheduler::multi_thread::worker::Context::run::h925c9d2cbee36e7e + 2879
    frame #5: 0x00007fd65a502fc4 _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::context::runtime::enter_runtime::h15ac70dde2453af5 + 692
    frame #6: 0x00007fd65a5151fa _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::scheduler::multi_thread::worker::run::h6967ec6caf4789c9 + 138
    frame #7: 0x00007fd65a4fa357 _torchft.cpython-312-x86_64-linux-gnu.so`_$LT$tokio..runtime..blocking..task..BlockingTask$LT$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::he2363864a5567ead + 135
    frame #8: 0x00007fd65a4fddd3 _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::task::core::Core$LT$T$C$S$GT$::poll::hdbdd87c38a1334ff + 147
    frame #9: 0x00007fd65a4f3164 _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::task::harness::Harness$LT$T$C$S$GT$::poll::h635b9aa31f062af8 + 180
    frame #10: 0x00007fd65a4f63ff _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::blocking::pool::Inner::run::h40998686924d7eab + 239
    frame #11: 0x00007fd65a4f844e _torchft.cpython-312-x86_64-linux-gnu.so`std::sys::backtrace::__rust_begin_short_backtrace::h71277f9d3c6edc88 + 206
    frame #12: 0x00007fd65a4f8bc2 _torchft.cpython-312-x86_64-linux-gnu.so`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h0987e2e9d3ea45b5 + 162
    frame #13: 0x00007fd65a54b52b _torchft.cpython-312-x86_64-linux-gnu.so`std::sys::pal::unix::thread::Thread::new::thread_start::hcdbd1049068002f4 + 43
    frame #14: 0x00007fd7fcc8a3b2 libc.so.6`start_thread + 722
    frame #15: 0x00007fd7fcd0f430 libc.so.6`__clone3 + 48

@d4l3k d4l3k requested a review from H-Huang March 20, 2025 23:41
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 20, 2025
@d4l3k d4l3k requested a review from fegin March 20, 2025 23:41
src/lib.rs Outdated
py.allow_threads(move || {
let runtime = Runtime::new()?;
let runtime = tokio::runtime::Builder::new_multi_thread()
.worker_threads(4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just nit: do you want this to be configurable? Or we are ok with 4 threads in any cases.

let opt = lighthouse::LighthouseOpt::from_iter(args);
let rt = Runtime::new()?;
let rt = tokio::runtime::Builder::new_multi_thread()
.thread_name("torchft-lighths")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to set number of threads here as well?

@d4l3k d4l3k force-pushed the d4l3k/tokio_threads branch from e602c05 to 8fd028c Compare March 21, 2025 17:17
@d4l3k d4l3k force-pushed the d4l3k/tokio_threads branch from 8fd028c to 4c662fe Compare March 21, 2025 17:20
@d4l3k d4l3k merged commit 3724f7c into main Mar 21, 2025
7 checks passed
@d4l3k d4l3k deleted the d4l3k/tokio_threads branch March 21, 2025 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants