Skip to content
Open
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
551b5c4
Initial version
AsyaPronina Jul 31, 2025
a2eed7c
Fixes to make pipe functional
AsyaPronina Aug 6, 2025
4b6eabb
Removed sampler, using greedy decoding for initial implementation
AsyaPronina Aug 6, 2025
bf9b826
Fixes after debug
AsyaPronina Aug 6, 2025
b00f972
More fixes for accuracy
AsyaPronina Aug 7, 2025
b572b63
Fixed issue with infer on 1 token after KV-cache trim
AsyaPronina Aug 7, 2025
e9c229e
Handled specifics for launch of models on NPU
AsyaPronina Aug 7, 2025
3003de2
Added perf and sd statistics
AsyaPronina Aug 19, 2025
cfa4856
Removed unneccessary copy of properties
AsyaPronina Aug 20, 2025
b3c32f3
Fixed setting for NPUW target generate model
AsyaPronina Aug 20, 2025
7305238
Fixes for perf metrics
AsyaPronina Aug 25, 2025
77ee8d8
Polishing
AsyaPronina Aug 25, 2025
6c25242
Polishing
AsyaPronina Aug 27, 2025
efb0851
Fixed perf metrics
AsyaPronina Aug 28, 2025
c129caa
Fixed SD metrics
AsyaPronina Aug 28, 2025
9a8ccd7
1) 1 Returned SD metrics by pipeline. 2) Removed NPU constraint in ll…
AsyaPronina Aug 28, 2025
ee1369f
Fixed typos in CB Speculative Decode perf metrics
AsyaPronina Aug 29, 2025
b4840b1
Extended the ManualTimer to be created only once
AsyaPronina Sep 1, 2025
67fa0e5
Renaming pipeline
AsyaPronina Sep 1, 2025
16ee0db
Dispatching of LLM pipeline on NPU to Stateful or StatefulSpeculative…
AsyaPronina Sep 2, 2025
d11ce99
Refixed llm_bench
AsyaPronina Sep 2, 2025
6dbba16
Factory method is StatefulPipeline now
AsyaPronina Sep 8, 2025
a8d1dd0
Removed PA backend constraint for Speculative Decode and added check …
AsyaPronina Sep 9, 2025
62081ed
Removed PA constraint for llm_bench
AsyaPronina Sep 10, 2025
0150e1d
Updated sample to reflect enabled feature
AsyaPronina Sep 10, 2025
7352b1c
Addressed review comments
AsyaPronina Sep 18, 2025
1596460
GenerationConfig.num_assistant_tokens behaviour is specified, added n…
AsyaPronina Sep 22, 2025
4787403
Rewritten NOTE-s
AsyaPronina Sep 22, 2025
9bd43d3
Alignment of behavior between Stateful and ContinuousBatching Specula…
AsyaPronina Sep 23, 2025
90a4124
Fixed review comments
AsyaPronina Sep 23, 2025
1bf82fc
Fixed last comments and added tests
AsyaPronina Sep 29, 2025
f72feae
Fixed new review comments after team discussion
AsyaPronina Oct 1, 2025
057b2e0
Fixed setting of `max_new_tokens` for draft model
AsyaPronina Oct 1, 2025
b885f8e
Used SmolLM2-360M as main model in tests
AsyaPronina Oct 2, 2025
ec10cb7
Added assert on launch of StatefulSpeculativeLLMPipeline with GPU
AsyaPronina Oct 7, 2025
efcbb66
Restrict StatefulSpeculativeLLMPipeline to launch only if NPU specifi…
AsyaPronina Oct 7, 2025
abf71c1
Merge branch 'master' into spec_decode_on_npu
AsyaPronina Oct 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions samples/cpp/text_generation/speculative_decoding_lm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,15 @@ int main(int argc, char* argv[]) try {

ov::genai::GenerationConfig config;
config.max_new_tokens = 100;
// Speculative decoding generation parameters like `num_assistant_tokens` and `assistant_confidence_threshold` are mutually excluded
// add parameter to enable speculative decoding to generate `num_assistant_tokens` candidates by draft_model per iteration
config.num_assistant_tokens = 5;
// add parameter to enable speculative decoding to generate candidates by draft_model while candidate probability is higher than `assistant_confidence_threshold`
// Speculative decoding generation parameters like `num_assistant_tokens` and `assistant_confidence_threshold` are mutually excluded.
// Add parameter to enable speculative decoding to generate `num_assistant_tokens` candidates by draft_model per iteration.
// NOTE: ContinuousBatching backend uses `num_assistant_tokens` as is. Stateful backend uses `num_assistant_tokens`'s copy as initial
// value and adjusts it based on recent number of accepted tokens. If `num_assistant_tokens` is not set, it defaults to `5` for both
// backends.
config.num_assistant_tokens = 4;
// Add parameter to enable speculative decoding to generate candidates by draft_model while candidate probability is higher than
// `assistant_confidence_threshold`.
// NOTE: `assistant_confidence_threshold` is supported only by ContinuousBatching backend.
// config.assistant_confidence_threshold = 0.4;

std::string main_model_path = argv[1];
Expand All @@ -25,6 +30,8 @@ int main(int argc, char* argv[]) try {

// User can run main and draft model on different devices.
// Please, set device for main model in `LLMPipeline` constructor and in in `ov::genai::draft_model` for draft.
// CPU, GPU and NPU can be used. Please be aware that GPU is performant only with Continuous Batching pipeline, so it is not recommented
// to use it in conjuction with NPU or in configuration when main model doesn't work in Paged Attention mode.
std::string main_device = "CPU", draft_device = "CPU";

ov::genai::LLMPipeline pipe(
Expand Down
21 changes: 14 additions & 7 deletions samples/python/text_generation/speculative_decoding_lm.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The accompanying Accuracy_Performance.xlsx doesn't contain perf data comparison with greedy. The point of speculative decoding is to outperform the corresponding single model sampling implementation. If you are going to collect that data it's worth paying attention to text generated by greedy. It should match speculative decoding exactly (if the device is the same), you can cover that in tests as well.

Does random sampling work?

I won't be able review in time, so I'd rely on @as-suvorov, @popovaan, @sbalandi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will share results of validation against stateful pipepline with greedy decoding. Thanks!

Random sampling doesn't work for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test verifying that optimum-intel greedily generated text matches exactly new speculative decoding. That should cover the alignment with optimum and the case I described earlier.

There are tests validating perf metrics for LLMs:

def test_perf_metrics(generation_config, prompt):
and https://github.com/openvinotoolkit/openvino.genai/blob/master/tests/python_tests/test_continuous_batching.py Add your case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!!

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env python3
# Copyright (C) 2024 Intel Corporation
# Copyright (C) 2024-2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

import argparse
Expand All @@ -19,8 +19,10 @@ def main():
args = parser.parse_args()

# User can run main and draft model on different devices.
# Please, set device for main model in `openvino_genai.LLMPipeline` constructor and in openvino_genai.draft_model` for draft.
main_device = 'CPU' # GPU can be used as well
# Please, set device for main model in `openvino_genai.LLMPipeline` constructor and in `openvino_genai.draft_model` for draft.
# CPU, GPU and NPU can be used. Please be aware that GPU is performant only with Continuous Batching pipeline, so it is not
# recommented to use it in conjuction with NPU or in configuration when main model doesn't work in Paged Attention mode.
main_device = 'CPU'
draft_device = 'CPU'

draft_model = openvino_genai.draft_model(args.draft_model_dir, draft_device)
Expand All @@ -29,10 +31,15 @@ def main():

config = openvino_genai.GenerationConfig()
config.max_new_tokens = 100
# Speculative decoding generation parameters like `num_assistant_tokens` and `assistant_confidence_threshold` are mutually excluded
# add parameter to enable speculative decoding to generate `num_assistant_tokens` candidates by draft_model per iteration
config.num_assistant_tokens = 5
# add parameter to enable speculative decoding to generate candidates by draft_model while candidate probability is higher than `assistant_confidence_threshold`
# Speculative decoding generation parameters like `num_assistant_tokens` and `assistant_confidence_threshold` are mutually excluded.
# Add parameter to enable speculative decoding to generate `num_assistant_tokens` candidates by draft_model per iteration.
# NOTE: ContinuousBatching backend uses `num_assistant_tokens` as is. Stateful backend uses `num_assistant_tokens`'s copy as initial
# value and adjusts it based on recent number of accepted tokens. If `num_assistant_tokens` is not set, it defaults to `5` for both
# backends.
config.num_assistant_tokens = 4
# Add parameter to enable speculative decoding to generate candidates by draft_model while candidate probability is higher than
# `assistant_confidence_threshold`.
# NOTE: `assistant_confidence_threshold` is supported only by ContinuousBatching backend.
# config.assistant_confidence_threshold = 0.4

# Since the streamer is set, the results will be printed
Expand Down
3 changes: 3 additions & 0 deletions src/cpp/include/openvino/genai/generation_config.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,10 @@ operator|(const StructuredOutputConfig::CompoundGrammar& lhs,
*
* Assisting generation parameters:
* @param assistant_confidence_threshold the lower token probability of candidate to be validated by main model in case of dynamic strategy candidates number update.
NOTE: `assistant_confidence_threshold` is supported only by ContinuousBatching backend for Speculative Decode.
* @param num_assistant_tokens the defined candidates number to be generated by draft model/prompt lookup in case of static strategy candidates number update.
* NOTE: ContinuousBatching backend for Speculative Decode uses `num_assistant_tokens` as is. Stateful backend for Speculative Decode uses `num_assistant_tokens`'s
* copy as initial value and adjusts it based on recent number of accepted tokens. If `num_assistant_tokens` is not set, it defaults to `5` for both backends.
* @param max_ngram_size is maximum ngram to use when looking for matches in the prompt.
*
* @param structured_output_config if set, the output will be a string constrained by the specified json_schema, regex, or EBNF grammar.
Expand Down
18 changes: 4 additions & 14 deletions src/cpp/src/continuous_batching/pipeline.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,6 @@
using namespace ov::genai;

namespace {
ov::genai::ModelDesc
extract_draft_model_from_config(ov::AnyMap& config) {
ov::genai::ModelDesc draft_model;
if (config.find(utils::DRAFT_MODEL_ARG_NAME) != config.end()) {
draft_model = config.at(utils::DRAFT_MODEL_ARG_NAME).as<ov::genai::ModelDesc>();
config.erase(utils::DRAFT_MODEL_ARG_NAME);
}
return draft_model;
}

bool
extract_prompt_lookup_from_config(ov::AnyMap& config) {
bool res = false;
Expand All @@ -53,7 +43,7 @@ ContinuousBatchingPipeline::ContinuousBatchingPipeline( const std::filesystem::p
const ov::AnyMap& vision_encoder_properties) {
auto start_time = std::chrono::steady_clock::now();
auto properties_without_draft_model = properties;
auto draft_model_desr = extract_draft_model_from_config(properties_without_draft_model);
auto draft_model_desr = utils::extract_draft_model_from_config(properties_without_draft_model);
auto is_prompt_lookup_enabled = extract_prompt_lookup_from_config(properties_without_draft_model);

auto model = utils::read_model(models_path, properties);
Expand Down Expand Up @@ -92,7 +82,7 @@ ContinuousBatchingPipeline::ContinuousBatchingPipeline(
const ov::AnyMap& properties) {
auto start_time = std::chrono::steady_clock::now();
auto properties_without_draft_model = properties;
auto draft_model_desr = extract_draft_model_from_config(properties_without_draft_model);
auto draft_model_desr = utils::extract_draft_model_from_config(properties_without_draft_model);
auto is_prompt_lookup_enabled = extract_prompt_lookup_from_config(properties_without_draft_model);

auto model = utils::read_model(models_path, properties_without_draft_model);
Expand Down Expand Up @@ -133,7 +123,7 @@ ContinuousBatchingPipeline::ContinuousBatchingPipeline(
auto start_time = std::chrono::steady_clock::now();

auto properties_without_draft_model = properties;
auto draft_model_desr = extract_draft_model_from_config(properties_without_draft_model);
auto draft_model_desr = utils::extract_draft_model_from_config(properties_without_draft_model);
auto is_prompt_lookup_enabled = extract_prompt_lookup_from_config(properties_without_draft_model);
auto model = utils::singleton_core().read_model(model_str, weights_tensor);

Expand Down Expand Up @@ -176,7 +166,7 @@ ContinuousBatchingPipeline::ContinuousBatchingPipeline(
auto start_time = std::chrono::steady_clock::now();

auto properties_without_draft_model = properties;
auto draft_model_desr = extract_draft_model_from_config(properties_without_draft_model);
auto draft_model_desr = utils::extract_draft_model_from_config(properties_without_draft_model);
auto is_prompt_lookup_enabled = extract_prompt_lookup_from_config(properties_without_draft_model);
auto model_pair = utils::get_model_weights_pair(models_map, "language");
auto model = utils::singleton_core().read_model(model_pair.first, model_pair.second);
Expand Down
8 changes: 8 additions & 0 deletions src/cpp/src/continuous_batching/timer.hpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review it with the GenAI team

Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ class ManualTimer {
public:
ManualTimer(const std::string& title) :
m_total(0.),
m_start(),
m_end(),
m_title(title) {
}

Expand Down Expand Up @@ -42,6 +44,12 @@ class ManualTimer {
return m_total;
}

void clear() {
m_total = 0.0;
m_start = std::chrono::steady_clock::time_point();
m_end = std::chrono::steady_clock::time_point();
}

~ManualTimer() {
// std::cout << m_title << ": " << m_total / 1e6 << " secs" << std::endl;
}
Expand Down
93 changes: 62 additions & 31 deletions src/cpp/src/llm/pipeline.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@
#include "openvino/genai/llm_pipeline.hpp"
#include "openvino/genai/perf_metrics.hpp"

#include "llm/pipeline_static.hpp"
#include "llm/pipeline_stateful.hpp"
#include "llm/pipeline_continuous_batching_adapter.hpp"
#include "speculative_decoding/speculative_decoding_impl.hpp"
#include "speculative_decoding/speculative_decoding_stateful.hpp"
#include "utils.hpp"

namespace ov {
Expand Down Expand Up @@ -60,6 +60,47 @@ std::pair<std::string, Any> draft_model(
return { utils::DRAFT_MODEL_ARG_NAME, Any::make<ModelDesc>(model, tokenizer, device, plugin_config, scheduler_config, generation_config) };
}

class StatefulPipeline {
public:
static std::unique_ptr<LLMPipelineImplBase> create(
const std::filesystem::path& models_path,
const ov::genai::Tokenizer& tokenizer,
const std::string& device,
const ov::AnyMap& properties) {
return create(
ov::genai::utils::read_model(models_path, properties),
tokenizer,
device,
properties,
utils::from_config_json_if_exists(models_path));
}

static std::unique_ptr<LLMPipelineImplBase> create(
const std::filesystem::path& models_path,
const std::string& device,
const ov::AnyMap& plugin_config) {
return create(models_path, Tokenizer(models_path, plugin_config), device, plugin_config);
}

static std::unique_ptr<LLMPipelineImplBase> create(
const std::shared_ptr<ov::Model>& model,
const ov::genai::Tokenizer& tokenizer,
const std::string& device,
const ov::AnyMap& properties,
const ov::genai::GenerationConfig& generation_config) {

auto properties_without_draft_model = properties;
auto draft_model_descr = ov::genai::utils::extract_draft_model_from_config(properties_without_draft_model);
if (draft_model_descr.model != nullptr) {
auto main_model_descr = ov::genai::ModelDesc(model, tokenizer, device, properties_without_draft_model, {}, generation_config);
return std::make_unique<StatefulSpeculativeLLMPipeline>(main_model_descr, draft_model_descr);
}

return std::make_unique<StatefulLLMPipeline>(model, tokenizer, device,
properties_without_draft_model, generation_config);
}
};

// Public LLMPipeline

ov::genai::LLMPipeline::LLMPipeline(
Expand All @@ -80,14 +121,12 @@ ov::genai::LLMPipeline::LLMPipeline(
auto start_time = std::chrono::steady_clock::now();
auto [properties, attention_backend] = utils::extract_attention_backend(user_properties);

// If CB is invoked explicitly, create CB adapter as is and re-throw in case if internal issues
if (utils::explicitly_requires_paged_attention(user_properties)) {
if (ov::genai::utils::is_npu_requested(device, properties)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we basically have 3 constructors with different parameters.
The logic is looking complex enough not to repeat it 3 times.

We could try to use constructor delegation, move the logic into separate function with most generic parameters or (least preferrable) use template function with parameters pack and forwarding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed internally, will create a follow up refactoring ticket

m_pimpl = StatefulPipeline::create(models_path, tokenizer, device, properties);
} else if (utils::explicitly_requires_paged_attention(user_properties)) {
// If CB is invoked explicitly, create CB adapter as is and re-throw in case if internal issues
auto [device_properties, scheduler_config] = utils::extract_scheduler_config(properties, utils::get_latency_oriented_scheduler_config());
m_pimpl = std::make_unique<ContinuousBatchingAdapter>(models_path, tokenizer, scheduler_config, device, device_properties);
} else if (device == "NPU") {
m_pimpl = properties.count("STATIC_PIPELINE")
? static_llm::LLMPipelineFactory::create(models_path, tokenizer, properties)
: std::make_unique<StatefulLLMPipeline>(models_path, tokenizer, device, properties);
} else if (attention_backend == PA_BACKEND) {
// try to call CB adapter one more time, but with safe guard to silent exception
try {
Expand All @@ -102,7 +141,7 @@ ov::genai::LLMPipeline::LLMPipeline(
}

if (m_pimpl == nullptr) {
m_pimpl = std::make_unique<StatefulLLMPipeline>(models_path, tokenizer, device, properties);
m_pimpl = StatefulPipeline::create(models_path, tokenizer, device, properties);
}

m_pimpl->save_load_time(start_time);
Expand All @@ -117,14 +156,12 @@ ov::genai::LLMPipeline::LLMPipeline(

auto [properties, attention_backend] = utils::extract_attention_backend(user_properties);

// If CB is invoked explicitly, create CB adapter as is and re-throw in case if internal issues
if (utils::explicitly_requires_paged_attention(user_properties)) {
if (ov::genai::utils::is_npu_requested(device, properties)) {
m_pimpl = StatefulPipeline::create(models_path, device, properties);
} else if (utils::explicitly_requires_paged_attention(user_properties)) {
// If CB is invoked explicitly, create CB adapter as is and re-throw in case if internal issues
auto [device_properties, scheduler_config] = utils::extract_scheduler_config(properties, utils::get_latency_oriented_scheduler_config());
m_pimpl = std::make_unique<ContinuousBatchingAdapter>(models_path, scheduler_config, device, device_properties);
} else if (device == "NPU") {
m_pimpl = properties.count("STATIC_PIPELINE")
? static_llm::LLMPipelineFactory::create(models_path, properties)
: std::make_unique<StatefulLLMPipeline>(models_path, device, properties);
} else if (attention_backend == PA_BACKEND) {
// try to call CB adapter one more time, but with safe guard to silent exception
try {
Expand All @@ -139,7 +176,7 @@ ov::genai::LLMPipeline::LLMPipeline(
}

if (m_pimpl == nullptr) {
m_pimpl = std::make_unique<StatefulLLMPipeline>(models_path, device, properties);
m_pimpl = StatefulPipeline::create(models_path, device, properties);
}

m_pimpl->save_load_time(start_time);
Expand All @@ -157,24 +194,18 @@ ov::genai::LLMPipeline::LLMPipeline(

auto [properties, attention_backend] = utils::extract_attention_backend(user_properties);

// If CB is invoked explicitly, create CB adapter as is and re-throw in case if internal issues
if (utils::explicitly_requires_paged_attention(user_properties)) {
if (ov::genai::utils::is_npu_requested(device, properties)) {
m_pimpl = StatefulPipeline::create(
utils::singleton_core().read_model(model_str, weights_tensor),
tokenizer,
device,
properties,
generation_config);
} else if (utils::explicitly_requires_paged_attention(user_properties)) {
// If CB is invoked explicitly, create CB adapter as is and re-throw in case if internal issues
auto [device_properties, scheduler_config] = utils::extract_scheduler_config(properties, utils::get_latency_oriented_scheduler_config());
m_pimpl = std::make_unique<ContinuousBatchingAdapter>(model_str, weights_tensor,
tokenizer, scheduler_config, device, device_properties, generation_config);
} else if (device == "NPU") {
m_pimpl = properties.count("STATIC_PIPELINE")
? static_llm::LLMPipelineFactory::create(
utils::singleton_core().read_model(model_str, weights_tensor),
tokenizer,
properties,
generation_config)
: std::make_unique<StatefulLLMPipeline>(
utils::singleton_core().read_model(model_str, weights_tensor),
tokenizer,
device,
properties,
generation_config);
} else if (attention_backend == PA_BACKEND) {
// try to call CB adapter one more time, but with safe guard to silent exception
try {
Expand All @@ -190,7 +221,7 @@ ov::genai::LLMPipeline::LLMPipeline(
}

if (m_pimpl == nullptr) {
m_pimpl = std::make_unique<StatefulLLMPipeline>(
m_pimpl = StatefulPipeline::create(
utils::singleton_core().read_model(model_str, weights_tensor),
tokenizer,
device,
Expand Down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to review it with the GenAI team

Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ ContinuousBatchingPipeline::ContinuousBatchingForSpeculativeDecodingImpl::Contin
bool is_validation_mode_enabled) {
m_tokenizer = tokenizer;
m_generation_config = generation_config;
if (m_generation_config.assistant_confidence_threshold == 0.f) {
if (m_generation_config.num_assistant_tokens == 0) {
m_generation_config.num_assistant_tokens = default_num_assistant_tokens;
}
}
m_is_validation_mode_enabled = is_validation_mode_enabled;
initialize_pipeline(model, scheduler_config, device, plugin_config);
}
Expand Down Expand Up @@ -319,7 +324,7 @@ void ContinuousBatchingPipeline::ContinuousBatchingForSpeculativeDecodingImpl::m
auto pipeline_metrics = get_metrics();
if (num_generated_tokens > 0) {
raw_perf_metrics.m_durations.emplace_back(generation_duration);
raw_perf_metrics.m_inference_durations[0] = MicroSeconds(pipeline_metrics.inference_duration);
raw_perf_metrics.m_inference_durations[0] += MicroSeconds(pipeline_metrics.inference_duration);
raw_perf_metrics.m_batch_sizes.emplace_back(num_generated_tokens);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
namespace ov::genai {
class ContinuousBatchingPipeline::ContinuousBatchingForSpeculativeDecodingImpl : public ContinuousBatchingPipeline::ContinuousBatchingImpl {
public:
const std::size_t default_num_assistant_tokens = 5;

ContinuousBatchingForSpeculativeDecodingImpl() = default;

ContinuousBatchingForSpeculativeDecodingImpl(const std::shared_ptr<ov::Model>& model,
Expand Down
Loading
Loading