Skip to content

Conversation

sogartar
Copy link
Contributor

@sogartar sogartar commented Sep 19, 2025

Add registry for import presets and populate it with Llama 3.1 f16 models.

Add support for referencing a HF dataset during import. This decouples specification of what needs to be downloaded versus how to import the dataset after it is download.

Expand the HF datasets to specify the model files not completely explicitly, but by using filters like huggingface_hub.snapshot_download.

Next steps are:

  1. Make the CI use this new mechanism.
  2. Add importation of models with more complicated transformations like quantization.

Copy link
Contributor

github-actions bot commented Sep 19, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  sharktank/sharktank/utils
  hf.py 65, 89-100, 115-141, 152-157, 215
  hf_datasets.py 40-43, 46-47, 65, 68, 88
  sharktank/tests/models/llama
  test_llama.py 213, 218-231
Project Total  

This report was generated by python-coverage-comment-action

"--preset=meta_llama3_1_8b_instruct_f16",
]
)
assert irpa_path.exists()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should verify these files in some form instead of seeing that they exist in case of a bad download or something. Maybe have an md5sum that we can compare?

Copy link
Contributor Author

@sogartar sogartar Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that it fails silently in such a way. It will be pretty sad that the HF hub package would fail there as robust downloading I would assume is a major goal.
I actually have not decided yet what should be our attitude towards changing IRPA files. Should we force a strict manual file hash change? Meaning that the we make explicit assumption that it does not change.
It may be a problem for example if we add another filed to the model config, which would change the IRPA metadata and its hash.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for now. But like Ian pointed out, if there are more reliable ways to verify if the irpa generation was complete/successful, that would be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I think ultimately should happen is to run a job for model importation before running the CI test jobs.

Another option is to run nightly a job that imports from HF then uploads to Azure so that other runners can update their model cache. This has the problem that we may overwrite existing model files with faulty ones if some bug appeared. In this scenario a more thorough model validation would be needed before uploading.

)
)
parser.add_argument(
"--output-irpa-file",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hesitate to add this flag as we currently can not directly consume it. sharktank expects naming conventions from gguf format, not huggingface

Copy link
Contributor Author

@sogartar sogartar Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we supposed to consume it in GGUF format? The converter would do on-the-fly conversion to our format which is derived from GGUF. We save only in IRPA. We can't save in GGUF, we can only read it as sharktank.types.Dataset.

@sogartar sogartar requested a review from IanNod September 26, 2025 22:07
Add registry for import presets and populate it with Llama 3.1 f16 models.

Add support for referencing a HF dataset during import.
This decouples specification of what needs to be downloaded versus how to import the
dataset after it is download.

Expand the HF datasets to specify the models files not completely explicitly, but
by using filters like `huggingface_hub.snapshot_download`.

Next steps are:
1. Make the CI use this new mechanism.
2. Add importation of models with more complicated transformations like quantization.
@sogartar sogartar force-pushed the users/sogartar/llama3-hf-import-presets branch from 729498e to e18016c Compare October 7, 2025 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants