-
Notifications
You must be signed in to change notification settings - Fork 29
LLM pipeline implementation #1040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…re pipeline cannot handle an input size larger than the max prefill size
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
…lemented performance benchmark for LLM pipeline
…y input and issue_query only handles output tokens
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
namespace mobile { | ||
|
||
// A method to be called by the backend as soon as the first token is generated (only for token based benchmarks) | ||
static void FirstTokenCallback(void* context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the use of context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the context is the arguments that get passed to loadGen, these are created by the driver and sent to the backend. Backend only needs to pass those to the callback without reading/modifying them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freedomtan to check it.
No description provided.