Skip to content

Commit 458f82d

Browse files
committed
update model to qwen-8b
Signed-off-by: junq <[email protected]>
1 parent 71250c5 commit 458f82d

File tree

1 file changed

+5
-10
lines changed

1 file changed

+5
-10
lines changed

examples/llm-api/llm_kv_cache_offloading.py

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,11 @@
55

66

77
def main(args):
8-
98
prompt_a = (
10-
"the following question and four candidate answers (A, B, C and D), choose the best answer."
11-
"The following excerpt is from a pamphlet. You will do me the justice to remember, "
12-
)
13-
14-
prompt_b = (
15-
"Given the following question and four candidate answers (A, B, C and D), choose the best answer."
16-
"The following excerpt is from a pamphlet. You will do me the justice to remember, "
17-
)
9+
"Returns the per-iterations statistics computed since last call to this method. "
10+
"Contains at most iter_stats_max_iterations iterations.")
11+
prompt_b = ("Use for skipping decoding step for non generation model, "
12+
"and return the batch_output (such as mm_embeddings)")
1813
max_batch_size = 1
1914
max_seq_len = 256
2015

@@ -24,7 +19,7 @@ def main(args):
2419

2520
sampling_params = SamplingParams(max_tokens=max_seq_len)
2621

27-
llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
22+
llm = LLM(model="Qwen/Qwen3-8B",
2823
max_batch_size=max_batch_size,
2924
max_seq_len=max_seq_len,
3025
kv_cache_config=KvCacheConfig(enable_block_reuse=True,

0 commit comments

Comments
 (0)