Skip to content

Conversation

@Retribution98
Copy link
Contributor

@Retribution98 Retribution98 commented Nov 13, 2025

Description

  • Using Tensor from openvino-node in Tokenizer encode and decode
  • Extended the Tensor API
  • Added constructors for Tensor
  • Moved Tokenizer to the separate TS file
  • Use BigInt for tokenId
  • Store openvino-node addon in the AddondData to manipulate with entity from the core part
  • Aligned tests with the new API and verify tokenizer and binding behavior.
  • Updated benchmark sample with using Tensor encode
  • Update documentation - https://retribution98.github.io/openvino.genai/

Ticket

CVS-174909

Checklist:

  • Tests have been updated or added to cover the new code.
  • This patch fully addresses the ticket.
  • I have made corresponding changes to the documentation.

Signed-off-by: Kirill Suvorov <[email protected]>
Copilot AI review requested due to automatic review settings November 13, 2025 20:47
@github-actions github-actions bot added category: LLM samples GenAI LLM samples category: JS API GenAI JS API labels Nov 13, 2025
@Retribution98
Copy link
Contributor Author

Retribution98 commented Nov 13, 2025

This PR is blocked by both PR#32802 and PR#32850 in openvino repository

@Retribution98 Retribution98 requested review from almilosz and removed request for Copilot November 13, 2025 20:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the JavaScript tokenizer implementation to use Tensor objects from openvino-node and extends the tokenizer API with encode/decode methods. The changes enable direct manipulation of tokenized inputs using Tensor objects rather than raw arrays, improving type safety and integration with the OpenVINO ecosystem.

Key changes:

  • Added encode and decode methods to the Tokenizer that work with Tensor objects
  • Created a separate TypeScript interface file (tokenizer.ts) for better type definitions
  • Integrated openvino-node addon storage in AddonData for cross-module entity access
  • Updated tests to verify new tokenizer behavior and aligned imports across test files

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/js/tests/tokenizer.test.js Added comprehensive tests for new encode/decode methods and removed async markers from synchronous tests
src/js/tests/bindings.test.js Updated import to use named export for LLMPipeline
src/js/src/tokenizer.cpp Implemented encode/decode methods with support for multiple input types and chat template getters/setters
src/js/src/helper.cpp Added conversion helpers for int64_t, vectors of int64_t, Tensor, and TokenizedInputs
src/js/src/addon.cpp Added setOpenvinoAddon function to store openvino-node addon reference
src/js/lib/tokenizer.ts Created TypeScript interface definitions for Tokenizer and related types
src/js/lib/pipelines/textEmbeddingPipeline.ts Updated to use named import for TextEmbeddingPipeline
src/js/lib/pipelines/llmPipeline.ts Moved Tokenizer interface to separate file and updated imports
src/js/lib/index.ts Exported tokenizer types
src/js/lib/addon.ts Integrated openvino-node addon and updated exports
src/js/include/tokenizer.hpp Added method declarations for new tokenizer functionality
src/js/include/helper.hpp Added template specializations for new conversion types
src/js/include/addon.hpp Added openvino_addon reference to AddonData
samples/js/text_generation/benchmark_genai.js Added example usage of encode method to get prompt token size

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

exports.Set(class_name, prototype);
}

Napi::Value init_ov_addon(const Napi::CallbackInfo& info) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Napi::Value init_ov_addon(const Napi::CallbackInfo& info) {
void init_ov_addon(const Napi::CallbackInfo& info) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used as a JavaScript function, so we usually return a Napi::Value even if nothing is returned. I understand your confusion because it is called like init_class, but they have different usages. To be clearer, I will rename this function to set_ov_node.

@yatarkan
Copy link
Contributor

yatarkan commented Nov 14, 2025

Docs also can be updated with new JS tokenizer functionality - in Node.js Bindings page and in Tokenization Guide (add js code example)

@github-actions github-actions bot added the category: GH Pages Docs Github Pages documentation label Nov 14, 2025
Copilot AI review requested due to automatic review settings November 14, 2025 21:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 14, 2025 21:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GH Pages Docs Github Pages documentation category: JS API GenAI JS API category: LLM samples GenAI LLM samples

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants