-
Notifications
You must be signed in to change notification settings - Fork 290
fix checking tokenizers version #2667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix checking tokenizers version #2667
Conversation
Co-authored-by: Copilot <[email protected]>
ccb8cf5
to
79cacde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes the issue where GGUF models cannot get string representations of special tokens by properly setting version information for tokenizer models. The fix ensures that GGUF models are correctly identified as newer than version 24.5, enabling proper decode operations with special tokens.
Key changes:
- Added version information to GGUF tokenizer models to indicate they are newer than 24.5
- Fixed version checking logic to properly handle GGUF models
- Enhanced test coverage for special token handling in GGUF models
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
src/cpp/src/tokenizer/tokenizer_impl.cpp |
Sets openvino_genai_version for GGUF models and fixes version checking logic |
src/cpp/src/gguf_utils/gguf_tokenizer.cpp |
Adds quote_meta function to properly escape special tokens in regex patterns |
tests/python_tests/test_gguf_reader.py |
Expands test coverage with special token prompts and token validation |
.github/workflows/windows.yml |
Increases timeout for GGUF reader tests from 60 to 100 minutes |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
std::string quote_meta(const std::string& str) { | ||
std::string result = "("; | ||
|
||
// todo: add also utf validate | ||
for (char c : str) { | ||
if (!std::isalnum(c) && c != '_') { | ||
result += '\\'; | ||
} | ||
result += c; | ||
} | ||
result += ")"; | ||
return result; | ||
} |
Copilot
AI
Sep 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error in comment on line 128: 'utf validate' should be 'UTF validation' or 'UTF-8 validation'.
Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <[email protected]>
502e9c2
to
9049fc5
Compare
ab1af43
to
13329e4
Compare
9d350f8
skip_special_tokens=False
.skip_special_tokens=False
.2024.5
.Ticket: CVS-172426