fix: cap qwen3-max-thinking max_tokens to provider limit#5885
Conversation
🦋 Changeset detectedLatest commit: c7d5865 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Pull request overview
This PR fixes provider-side validation errors for qwen/qwen3-max-thinking by ensuring max_tokens never exceeds the upstream 32,768 limit, while preserving the existing “20% of context window” clamp behavior for other models.
Changes:
- Add a Qwen3 Max Thinking–specific max output token cap at 32,768 during max-token resolution.
- Keep the existing 20% context-window clamp for all other models (and still apply it first for Qwen3).
- Add unit tests covering both the provider cap case and the lower context-based cap case.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/shared/api.ts |
Adds model-ID detection for qwen3-max-thinking and caps resolved output tokens to 32,768 after applying the existing context-based clamp. |
src/shared/__tests__/api.spec.ts |
Adds tests validating the new Qwen3-specific cap behavior and ensuring the context-based cap still applies when lower. |
| @@ -143,6 +144,8 @@ export const getModelMaxOutputTokens = ({ | |||
| return ANTHROPIC_DEFAULT_MAX_TOKENS | |||
| } | |||
|
|
|||
| const isQwen3MaxThinkingModel = modelId.toLowerCase().includes("qwen3-max-thinking") | |||
|
|
|||
| // If model has explicit maxTokens, clamp it to 20% of the context window | |||
| // Exception: GPT-5 models should use their exact configured max output tokens | |||
| if (model.maxTokens) { | |||
| @@ -154,8 +157,15 @@ export const getModelMaxOutputTokens = ({ | |||
| return model.maxTokens | |||
| } | |||
|
|
|||
| const contextCappedMaxTokens = Math.min(model.maxTokens, Math.ceil(model.contextWindow * 0.2)) | |||
|
|
|||
| // qwen3-max-thinking currently rejects values above 32,768 (upstream provider constraint). | |||
| if (isQwen3MaxThinkingModel) { | |||
| return Math.min(contextCappedMaxTokens, QWEN3_MAX_THINKING_OUTPUT_TOKEN_LIMIT) | |||
| } | |||
There was a problem hiding this comment.
New Kilocode-specific logic/constants should be marked with // kilocode_change (or a start/end block) to keep future merges with upstream Roo manageable. Please annotate the newly added Qwen3 provider cap constant and the qwen3-max-thinking special-case branch accordingly.
There was a problem hiding this comment.
Addressed in 13caa14: added // kilocode_change markers around the new qwen3-specific constant and branch in src/shared/api.ts.
| test("should cap qwen3-max-thinking to provider max output limit of 32,768", () => { | ||
| const model: ModelInfo = { | ||
| contextWindow: 300_000, | ||
| supportsPromptCache: false, | ||
| maxTokens: 200_000, | ||
| } | ||
|
|
||
| const result = getModelMaxOutputTokens({ | ||
| modelId: "qwen/qwen3-max-thinking", | ||
| model, | ||
| settings: {}, | ||
| format: "openrouter", | ||
| }) | ||
|
|
||
| // 20% cap would be 60,000, but model-specific provider cap is 32,768. | ||
| expect(result).toBe(32_768) | ||
| }) | ||
|
|
||
| test("should still honor lower context-based cap for qwen3-max-thinking", () => { | ||
| const model: ModelInfo = { | ||
| contextWindow: 100_000, | ||
| supportsPromptCache: false, | ||
| maxTokens: 200_000, | ||
| } | ||
|
|
||
| const result = getModelMaxOutputTokens({ | ||
| modelId: "qwen/qwen3-max-thinking", | ||
| model, | ||
| settings: {}, | ||
| format: "openrouter", | ||
| }) | ||
|
|
||
| // 20% cap is 20,000 which is lower than 32,768. | ||
| expect(result).toBe(20_000) | ||
| }) |
There was a problem hiding this comment.
The new qwen3-max-thinking tests add Kilocode-specific behavior but aren’t marked with // kilocode_change comments. Please wrap these new test cases (or annotate the added lines) so downstream merges can distinguish fork changes from upstream Roo.
There was a problem hiding this comment.
Addressed in 13caa14: wrapped the two qwen3-max-thinking tests with // kilocode_change start/end markers in src/shared/tests/api.spec.ts.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 913a39ae55
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return ANTHROPIC_DEFAULT_MAX_TOKENS | ||
| } | ||
|
|
||
| const isQwen3MaxThinkingModel = modelId.toLowerCase().includes("qwen3-max-thinking") |
There was a problem hiding this comment.
Mark shared src changes with kilocode_change comments
The new qwen3-max-thinking logic is added in src/ without kilocode_change markers, but the repository guideline in /workspace/kilocode/AGENTS.md requires all core-extension edits under src/ to be wrapped so upstream fork merges can isolate Kilo-specific patches; leaving this unmarked increases the chance of merge conflicts or accidental overwrite during the scripted Roo sync process.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Done in 13caa14. Added explicit // kilocode_change annotations for the qwen3-max-thinking logic in src/shared/api.ts and corresponding tests.
|
Added coverage in shared/tests/api.spec.ts for the qwen3-max-thinking cap behavior. Local �itest shared/tests/api.spec.ts passed before push. |
|
All required checks are green. This is ready for maintainer review. |
|
Addressed the inline review feedback in commit 13caa14 by adding // kilocode_change markers around the new qwen3-specific logic and tests. Re-ran �itest shared/tests/api.spec.ts locally (pass). |
|
Follow-up update: CI is fully green on commit 13caa14 after addressing the inline kilocode_change annotation feedback. Ready for maintainer re-review. |
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Files Reviewed (3 files)
|
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Clean, well-scoped fix that caps Files Reviewed (3 files)
|
Summary
Why
qwen/qwen3-max-thinking can error with:
Range of max_tokens should be [1, 32768]
when the computed token value exceeds provider limits.
Fixes #5871