Skip to content

Conversation

stbaione
Copy link
Contributor

@stbaione stbaione commented Oct 3, 2025

Description

  • Introduces a Scheduler into the vmfb-runner
  • Adds a submit + run flow to the Batcher
    • Each request is given a single LlmTaskInput, which gets added to the Scheduler, with a count variable.
    • This count var allows the scheduler to determine how long each request should be tracked. For example, the decode_scheduler will track each task for at most steps number of schedules.
  • The Decoder submits the tasks to the Batcher, then returns the results from calling run
  • Currently, keeping the standalone prefill/decode functions for the other components using Batcher (i.e. perplexity). Would need to think of a way to handle the fact that PPL needs all of the raw logits, not just the selections, but that should be possible to do. Maybe by moving the selection logic back outside of Batcher, and just providing it a callback for where it should submit the logits + indices.
  • Splits components into llm_scheduler.py, llm_task.py and llm_utils.py, because the file was getting complex

Next Steps

  • Introduce a ChunkPrefill scheduler
  • Modularize llm_utils, but maintain the same cli commands

@stbaione stbaione requested a review from rsuderman October 3, 2025 17:33
@stbaione stbaione self-assigned this Oct 3, 2025
@codecov-commenter
Copy link

codecov-commenter commented Oct 3, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 93.49112% with 11 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@dd21dd0). Learn more about missing BASE report.

Files with missing lines Patch % Lines
sharktank/sharktank/utils/llm_task.py 93.40% 6 Missing ⚠️
sharktank/sharktank/utils/llm_scheduler.py 92.10% 3 Missing ⚠️
sharktank/sharktank/utils/llm_utils.py 95.00% 2 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2418   +/-   ##
=======================================
  Coverage        ?   78.21%           
=======================================
  Files           ?      243           
  Lines           ?    22786           
  Branches        ?        0           
=======================================
  Hits            ?    17821           
  Misses          ?     4965           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stbaione stbaione marked this pull request as ready for review October 3, 2025 17:57
start_position: Optional[int] = None


class LlmTask(ABC):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this to the scheduler and generalize it. Most of the basics are pretty use case vague and it would be easy to write some tests on the scheduler + task

pages=page_ids[i],
)
# Submit prefill task
self._prefill_scheduler.schedule_task(task_input, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels wrong - why are you submitting to both prefill and decode?

batch_size=self._decode_bs,
block_stride=self._block_stride,
)
logits, indices = decode_task.run(*self._cache)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong. You make a task then call run on it then call the scheduler? You should be making a job and submitting it. The scheduler should basically have a schedule command for each task (a task being a separate request). Then a call for competion. Something is off on this.

last = selection_fn(logits, indices, [0] * len(task_inputs))
for task_input, token in zip(task_inputs, last):
selections_map[task_input.task_id].append(token)
self._decode_scheduler.on_task_complete(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This on task completion callback is wrong. You should submit then process on the returned results.

token,
)
if token == eos_token:
self._decode_scheduler.remove_task(task_input.task_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schedulers should take requests, process them, then be done. You shouldn't be externally managing.

def has_pending_tasks(self) -> bool:
return len(self._queue) > 0

def on_task_complete(self, task_id: str, last_token: int) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure this is a bad section. Calling a callback on completion is likely going to product bad results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make enough parts in scheduler and write some tests for it. I think you will understand the issues with your API if you attempt to use it standalone vs retrofitting it into the old process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants