-
Notifications
You must be signed in to change notification settings - Fork 300
[JS] Update tokenizer methods #3012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[JS] Update tokenizer methods #3012
Conversation
Signed-off-by: Kirill Suvorov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the JavaScript tokenizer implementation to use Tensor objects from openvino-node and extends the tokenizer API with encode/decode methods. The changes enable direct manipulation of tokenized inputs using Tensor objects rather than raw arrays, improving type safety and integration with the OpenVINO ecosystem.
Key changes:
- Added encode and decode methods to the Tokenizer that work with Tensor objects
- Created a separate TypeScript interface file (tokenizer.ts) for better type definitions
- Integrated openvino-node addon storage in AddonData for cross-module entity access
- Updated tests to verify new tokenizer behavior and aligned imports across test files
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/js/tests/tokenizer.test.js | Added comprehensive tests for new encode/decode methods and removed async markers from synchronous tests |
| src/js/tests/bindings.test.js | Updated import to use named export for LLMPipeline |
| src/js/src/tokenizer.cpp | Implemented encode/decode methods with support for multiple input types and chat template getters/setters |
| src/js/src/helper.cpp | Added conversion helpers for int64_t, vectors of int64_t, Tensor, and TokenizedInputs |
| src/js/src/addon.cpp | Added setOpenvinoAddon function to store openvino-node addon reference |
| src/js/lib/tokenizer.ts | Created TypeScript interface definitions for Tokenizer and related types |
| src/js/lib/pipelines/textEmbeddingPipeline.ts | Updated to use named import for TextEmbeddingPipeline |
| src/js/lib/pipelines/llmPipeline.ts | Moved Tokenizer interface to separate file and updated imports |
| src/js/lib/index.ts | Exported tokenizer types |
| src/js/lib/addon.ts | Integrated openvino-node addon and updated exports |
| src/js/include/tokenizer.hpp | Added method declarations for new tokenizer functionality |
| src/js/include/helper.hpp | Added template specializations for new conversion types |
| src/js/include/addon.hpp | Added openvino_addon reference to AddonData |
| samples/js/text_generation/benchmark_genai.js | Added example usage of encode method to get prompt token size |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/js/src/addon.cpp
Outdated
| exports.Set(class_name, prototype); | ||
| } | ||
|
|
||
| Napi::Value init_ov_addon(const Napi::CallbackInfo& info) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Napi::Value init_ov_addon(const Napi::CallbackInfo& info) { | |
| void init_ov_addon(const Napi::CallbackInfo& info) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is used as a JavaScript function, so we usually return a Napi::Value even if nothing is returned. I understand your confusion because it is called like init_class, but they have different usages. To be clearer, I will rename this function to set_ov_node.
|
Docs also can be updated with new JS tokenizer functionality - in Node.js Bindings page and in Tokenization Guide (add js code example) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Description
Ticket
CVS-174909
Checklist: