-
Notifications
You must be signed in to change notification settings - Fork 67
[sharktank] Llama 3.1 f16 HF import presets #2290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
"--preset=meta_llama3_1_8b_instruct_f16", | ||
] | ||
) | ||
assert irpa_path.exists() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should verify these files in some form instead of seeing that they exist in case of a bad download or something. Maybe have an md5sum that we can compare?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that it fails silently in such a way. It will be pretty sad that the HF hub package would fail there as robust downloading I would assume is a major goal.
I actually have not decided yet what should be our attitude towards changing IRPA files. Should we force a strict manual file hash change? Meaning that the we make explicit assumption that it does not change.
It may be a problem for example if we add another filed to the model config, which would change the IRPA metadata and its hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works for now. But like Ian pointed out, if there are more reliable ways to verify if the irpa generation was complete/successful, that would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I think ultimately should happen is to run a job for model importation before running the CI test jobs.
Another option is to run nightly a job that imports from HF then uploads to Azure so that other runners can update their model cache. This has the problem that we may overwrite existing model files with faulty ones if some bug appeared. In this scenario a more thorough model validation would be needed before uploading.
) | ||
) | ||
parser.add_argument( | ||
"--output-irpa-file", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would hesitate to add this flag as we currently can not directly consume it. sharktank expects naming conventions from gguf format, not huggingface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we supposed to consume it in GGUF format? The converter would do on-the-fly conversion to our format which is derived from GGUF. We save only in IRPA. We can't save in GGUF, we can only read it as sharktank.types.Dataset
.
Add registry for import presets and populate it with Llama 3.1 f16 models. Add support for referencing a HF dataset during import. This decouples specification of what needs to be downloaded versus how to import the dataset after it is download. Expand the HF datasets to specify the models files not completely explicitly, but by using filters like `huggingface_hub.snapshot_download`. Next steps are: 1. Make the CI use this new mechanism. 2. Add importation of models with more complicated transformations like quantization.
729498e
to
e18016c
Compare
Add registry for import presets and populate it with Llama 3.1 f16 models.
Add support for referencing a HF dataset during import. This decouples specification of what needs to be downloaded versus how to import the dataset after it is download.
Expand the HF datasets to specify the model files not completely explicitly, but by using filters like
huggingface_hub.snapshot_download
.Next steps are: