Skip to content

fix: write_env breaks task's affinity#1980

Closed
lixin-wei wants to merge 1 commit intoNVIDIA:mainfrom
lixin-wei:fix_env_affinity
Closed

fix: write_env breaks task's affinity#1980
lixin-wei wants to merge 1 commit intoNVIDIA:mainfrom
lixin-wei:fix_env_affinity

Conversation

@lixin-wei
Copy link
Copy Markdown
Contributor

@lixin-wei lixin-wei commented Apr 2, 2026

\When a task is wrapped with write_env(prop{get_scheduler, inline_scheduler{}}), the outer affine_on (added by the outer task's await_transform) was incorrectly skipping the rescheduling step.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@lixin-wei lixin-wei changed the title fix: write_env breaks tasks's affinity fix: write_env breaks task's affinity Apr 2, 2026
@ericniebler
Copy link
Copy Markdown
Collaborator

could you pls post some code that exhibits the problem, and tell me what behavior you're expecting? a godbolt link would be great. i think there is more going on here, and i'd like to get to the bottom of it.

@lixin-wei
Copy link
Copy Markdown
Contributor Author

lixin-wei commented Apr 2, 2026

@ericniebler You can have a look at the test I wrote in this PR.

In test_task_affinity_with_write_env, I expect after co_await std::move(task) the the coroutine will resume at the run_loop in sync_wait. But it was not before my change.

  template <ex::scheduler Worker>
  ex::task<int> inner_task(Worker worker)
  {
    CHECK(get_id() == 0);
    int i = co_await ex::starts_on(worker, just_int(42) | ...);
    CHECK(get_id() != 0);
    co_return i;
  }

  template <ex::scheduler Worker>
  ex::task<int> test_task_affinity_with_write_env(Worker worker)
  {
    CHECK(get_id() == 0);
    auto task = inner_task(worker)
              | ex::write_env(ex::prop{ex::get_scheduler, ex::inline_scheduler{}});
    int i = co_await std::move(task);
    CHECK(get_id() == 0); // <-------- I expect it resumes at main thread.
    co_return i;
  }

  TEST_CASE("test task scheduler affinity works with write_env", "[types][task]")
  {
    exec::single_thread_context ctx;
    auto                        t = test_task_affinity_with_write_env(ctx.get_scheduler());
    auto [i]                      = ex::sync_wait(std::move(t)).value();
    CHECK(i == 42);
  }

@lixin-wei
Copy link
Copy Markdown
Contributor Author

lixin-wei commented Apr 3, 2026

@ericniebler BTW, if I put the env directly in the task's template param, there will be no problem. So I think it's related to write_env.

@ericniebler
Copy link
Copy Markdown
Collaborator

i think what is happening here is easier to see with a second worker thread instead of inline_scheduler. this godbolt has some comments that should clear it up.

https://godbolt.org/z/axzhe6cz4

note that if you use write_env to tell the inner task that it is being started on the inline_scheduler, that scheduler will be used to reschedule at the end of the co_await. the end result is that once the starts_on puts the work on worker1, it stays there.

i don't think there's a bug here.

@lixin-wei
Copy link
Copy Markdown
Contributor Author

lixin-wei commented Apr 4, 2026

@ericniebler thank you for the explaination!

I'm still curious: in outer task's await_transform, shouldn't it affine_on the inner task back to the main thread? The scheduler it remembered in its promise_type is main thread right? write_env should only affect the inner task.

I'd very appreciate it if you can say more🙏


attached some statements from standard for references:

https://eel.is/c++draft/exec#task.promise-9

template
auto await_transform(Sender&& sndr) noexcept;
Returns: If same_as<inline_scheduler, scheduler_type> is true returns as_awaitable(​std​::​​forward(sndr), *this); otherwise returns as_awaitable(affine_on(​std​::​​forward(sndr), SCHED(*this)), *this).

https://eel.is/c++draft/task.state#4.3

SCHED(prom) is the object initialized with scheduler_type(get_scheduler(get_env(rcvr))) if that expression is valid and scheduler_type() otherwise. If neither of these expressions is valid, the program is ill-formed.

@ericniebler
Copy link
Copy Markdown
Collaborator

That's a good question. The reason the outer task doesn't transition back to the main thread after awaiting the inner task is because the inner task claimed to be affine already.

@lixin-wei
Copy link
Copy Markdown
Contributor Author

lixin-wei commented Apr 4, 2026

Thanks for the hint! I understand now - So we shouldn't use env(neither write_env nor Env param) to specify a fallback scheduler for task right? Which is very dangerous, will mislead the optimization in affine_on. Because what we specified there is not necessary the scheduler that starts it.

My essential requirement is to specify a fallback scheduler for task. Then my user can ex::spawn it more eassily. Seems I did the wrong way. I'll switch to provide a spawn_inline util function for my user.

Anyway, this PR can be closed. Thanks again for your detailed explaination @ericniebler ! I've learned a lot :)

@ericniebler
Copy link
Copy Markdown
Collaborator

this might interest you. we decided last week at the C++ committee meeting that we wanted a separate get_start_scheduler receiver query, and that task and on should use that to establish their starting context instead of get_scheduler. that frees up the get_scheduler query for your use. once i apply that change to stdexec, your original code should work as expected.

@ericniebler ericniebler closed this Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants