- 
                Notifications
    You must be signed in to change notification settings 
- Fork 837
          [CLI] Add hf cache verify
          #3461
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. | 
| Hey! Thanks for opening this PR. Here are some high-level thoughts about this feature: 
 In the end, the CLI I suggest would look like this: hf cache verify <repo-id> [--repo-type ...] [--revision ...] [--cache-dir ...] [--token ...] [--local-dir ...] [--fail-on-missing-files]  [--fail-on-extra-files]
# Verify main revision of "deepseek-ai/DeepSeek-OCR" in cache
hf cache verify deepseek-ai/DeepSeek-OCR
# Verify specific revision
hf cache verify deepseek-ai/DeepSeek-OCR --revision refs/pr/1
hf cache verify deepseek-ai/DeepSeek-OCR --revision abcdef123
# Verify using private repo
hf cache verify me/private-model --token ...
# Verify dataset
hf cache verify karpathy/fineweb-edu-100b-shuffle --repo-type dataset
# Verify local dir
hf cache verify deepseek-ai/DeepSeek-OCR --local-dir /path/to/repoLet me know what you think. I might now have thought of all possible use cases so happy to get it challenged ^^ | 
| agh the commit history is messed up since we merged  | 
1611e8d    to
    a3e5a67      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(haven't reviewed the tests)
Co-authored-by: Lucain <[email protected]>
…rify-checksum-cli
| thanks @Wauplin for the very thorough review! I addressed all your comments and refactored a bit the logic | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the iteration! This time I've checked the tests which look great 🤗
Left a last round of comments but overall looks good :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @Wauplin for the review! i addressed all your comments, could you have a final look? 🙏
This PR adds a new CLI command that checks cached files against their checksums on the Hub. It verifies all cached revisions for a repo, or specific snapshots if a revision is provided.
Under the hoods, it lists remote files for each revision using
list_repo_tree, maps them to local snapshots, and compares the sets to find files that are missing locally or on the Hub. Then for each file, it computes and compares checksums.