[Sharktank] Llm Task Scheduler #2418

stbaione · 2025-10-03T17:33:44Z

Description

Introduces a Scheduler into the vmfb-runner
Adds a submit + run flow to the Batcher
- Each request is given a single LlmTaskInput, which gets added to the Scheduler, with a count variable.
- This count var allows the scheduler to determine how long each request should be tracked. For example, the decode_scheduler will track each task for at most steps number of schedules.
The Decoder submits the tasks to the Batcher, then returns the results from calling run
Currently, keeping the standalone prefill/decode functions for the other components using Batcher (i.e. perplexity). Would need to think of a way to handle the fact that PPL needs all of the raw logits, not just the selections, but that should be possible to do. Maybe by moving the selection logic back outside of Batcher, and just providing it a callback for where it should submit the logits + indices.
Splits components into llm_scheduler.py, llm_task.py and llm_utils.py, because the file was getting complex

Next Steps

Introduce a ChunkPrefill scheduler
Modularize llm_utils, but maintain the same cli commands

- Add `task_id` to `LlmTaskInput`

codecov-commenter · 2025-10-03T17:41:33Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 93.49112% with 11 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@dd21dd0). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
sharktank/sharktank/utils/llm_task.py	93.40%	6 Missing ⚠️
sharktank/sharktank/utils/llm_scheduler.py	92.10%	3 Missing ⚠️
sharktank/sharktank/utils/llm_utils.py	95.00%	2 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2418   +/-   ##
=======================================
  Coverage        ?   78.21%           
=======================================
  Files           ?      243           
  Lines           ?    22786           
  Branches        ?        0           
=======================================
  Hits            ?    17821           
  Misses          ?     4965           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…aione/SHARK-Platform into sharktank-llm-task-scheduler

rsuderman · 2025-10-03T17:50:55Z

sharktank/sharktank/utils/llm_task.py

+    start_position: Optional[int] = None
+
+
+class LlmTask(ABC):


I would move this to the scheduler and generalize it. Most of the basics are pretty use case vague and it would be easy to write some tests on the scheduler + task

rsuderman · 2025-10-03T17:54:29Z

sharktank/sharktank/utils/llm_utils.py

+                pages=page_ids[i],
+            )
+            # Submit prefill task
+            self._prefill_scheduler.schedule_task(task_input, 1)


This feels wrong - why are you submitting to both prefill and decode?

rsuderman · 2025-10-03T17:57:09Z

sharktank/sharktank/utils/llm_utils.py

+                batch_size=self._decode_bs,
+                block_stride=self._block_stride,
+            )
+            logits, indices = decode_task.run(*self._cache)


This looks wrong. You make a task then call run on it then call the scheduler? You should be making a job and submitting it. The scheduler should basically have a schedule command for each task (a task being a separate request). Then a call for competion. Something is off on this.

rsuderman · 2025-10-03T17:57:50Z

sharktank/sharktank/utils/llm_utils.py

+            last = selection_fn(logits, indices, [0] * len(task_inputs))
+            for task_input, token in zip(task_inputs, last):
+                selections_map[task_input.task_id].append(token)
+                self._decode_scheduler.on_task_complete(


This on task completion callback is wrong. You should submit then process on the returned results.

rsuderman · 2025-10-03T17:58:13Z

sharktank/sharktank/utils/llm_utils.py

+                    token,
+                )
+                if token == eos_token:
+                    self._decode_scheduler.remove_task(task_input.task_id)


Schedulers should take requests, process them, then be done. You shouldn't be externally managing.

rsuderman · 2025-10-03T17:59:02Z

sharktank/sharktank/utils/llm_scheduler.py

+    def has_pending_tasks(self) -> bool:
+        return len(self._queue) > 0
+
+    def on_task_complete(self, task_id: str, last_token: int) -> bool:


I am pretty sure this is a bad section. Calling a callback on completion is likely going to product bad results.

rsuderman · 2025-10-03T17:59:50Z

sharktank/sharktank/utils/llm_scheduler.py

Make enough parts in scheduler and write some tests for it. I think you will understand the issues with your API if you attempt to use it standalone vs retrofitting it into the old process.

stbaione added 2 commits October 3, 2025 14:19

- Move llm_task and llm_utils into separate files

ac04cd4

- Add `task_id` to `LlmTaskInput`

- Introduce and use scheduler in schedule + run flow for LlmBatcher

012f25f

stbaione requested a review from rsuderman October 3, 2025 17:33

stbaione self-assigned this Oct 3, 2025

Merge branch 'main' into sharktank-llm-task-scheduler

6c894f4

stbaione added 2 commits October 3, 2025 17:48

- Clarify type for selection_fn

e405218

Merge branch 'sharktank-llm-task-scheduler' of https://github.com/stb…

cf3aeb4

…aione/SHARK-Platform into sharktank-llm-task-scheduler

stbaione marked this pull request as ready for review October 3, 2025 17:57

rsuderman requested changes Oct 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Sharktank] Llm Task Scheduler #2418

[Sharktank] Llm Task Scheduler #2418

Uh oh!

stbaione commented Oct 3, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Oct 3, 2025 •

edited

Loading

Uh oh!

rsuderman Oct 3, 2025

Uh oh!

rsuderman Oct 3, 2025

Uh oh!

rsuderman Oct 3, 2025

Uh oh!

rsuderman Oct 3, 2025

Uh oh!

rsuderman Oct 3, 2025

Uh oh!

rsuderman Oct 3, 2025

Uh oh!

rsuderman Oct 3, 2025

Uh oh!

Uh oh!

[Sharktank] Llm Task Scheduler #2418

Are you sure you want to change the base?

[Sharktank] Llm Task Scheduler #2418

Uh oh!

Conversation

stbaione commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Next Steps

Uh oh!

codecov-commenter commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rsuderman Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

rsuderman Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

rsuderman Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

rsuderman Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

rsuderman Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

rsuderman Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

rsuderman Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stbaione commented Oct 3, 2025 •

edited

Loading

codecov-commenter commented Oct 3, 2025 •

edited

Loading