fix: cap qwen3-max-thinking max_tokens to provider limit by Olusammytee · Pull Request #5885 · Kilo-Org/kilocode

Olusammytee · 2026-02-15T16:03:32Z

Summary

cap qwen3-max-thinking output tokens at 32,768 in model max-token resolution
preserve existing 20% context-window cap behavior for all other models
add unit tests for both capped and lower-context scenarios

Why

qwen/qwen3-max-thinking can error with:
Range of max_tokens should be [1, 32768]
when the computed token value exceeds provider limits.

Fixes #5871

changeset-bot · 2026-02-15T16:03:37Z

🦋 Changeset detected

Latest commit: c7d5865

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
kilo-code	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copilot

Pull request overview

This PR fixes provider-side validation errors for qwen/qwen3-max-thinking by ensuring max_tokens never exceeds the upstream 32,768 limit, while preserving the existing “20% of context window” clamp behavior for other models.

Changes:

Add a Qwen3 Max Thinking–specific max output token cap at 32,768 during max-token resolution.
Keep the existing 20% context-window clamp for all other models (and still apply it first for Qwen3).
Add unit tests covering both the provider cap case and the lower context-based cap case.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`src/shared/api.ts`	Adds model-ID detection for `qwen3-max-thinking` and caps resolved output tokens to 32,768 after applying the existing context-based clamp.
`src/shared/__tests__/api.spec.ts`	Adds tests validating the new Qwen3-specific cap behavior and ensuring the context-based cap still applies when lower.

Copilot · 2026-02-15T16:05:48Z

src/shared/api.ts

@@ -143,6 +144,8 @@ export const getModelMaxOutputTokens = ({
 		return ANTHROPIC_DEFAULT_MAX_TOKENS
 	}

+	const isQwen3MaxThinkingModel = modelId.toLowerCase().includes("qwen3-max-thinking")
+
 	// If model has explicit maxTokens, clamp it to 20% of the context window
 	// Exception: GPT-5 models should use their exact configured max output tokens
 	if (model.maxTokens) {
@@ -154,8 +157,15 @@ export const getModelMaxOutputTokens = ({
 			return model.maxTokens
 		}

+		const contextCappedMaxTokens = Math.min(model.maxTokens, Math.ceil(model.contextWindow * 0.2))
+
+		// qwen3-max-thinking currently rejects values above 32,768 (upstream provider constraint).
+		if (isQwen3MaxThinkingModel) {
+			return Math.min(contextCappedMaxTokens, QWEN3_MAX_THINKING_OUTPUT_TOKEN_LIMIT)
+		}


New Kilocode-specific logic/constants should be marked with // kilocode_change (or a start/end block) to keep future merges with upstream Roo manageable. Please annotate the newly added Qwen3 provider cap constant and the qwen3-max-thinking special-case branch accordingly.

Addressed in 13caa14: added // kilocode_change markers around the new qwen3-specific constant and branch in src/shared/api.ts.

Copilot · 2026-02-15T16:05:49Z

src/shared/__tests__/api.spec.ts

+	test("should cap qwen3-max-thinking to provider max output limit of 32,768", () => {
+		const model: ModelInfo = {
+			contextWindow: 300_000,
+			supportsPromptCache: false,
+			maxTokens: 200_000,
+		}
+
+		const result = getModelMaxOutputTokens({
+			modelId: "qwen/qwen3-max-thinking",
+			model,
+			settings: {},
+			format: "openrouter",
+		})
+
+		// 20% cap would be 60,000, but model-specific provider cap is 32,768.
+		expect(result).toBe(32_768)
+	})
+
+	test("should still honor lower context-based cap for qwen3-max-thinking", () => {
+		const model: ModelInfo = {
+			contextWindow: 100_000,
+			supportsPromptCache: false,
+			maxTokens: 200_000,
+		}
+
+		const result = getModelMaxOutputTokens({
+			modelId: "qwen/qwen3-max-thinking",
+			model,
+			settings: {},
+			format: "openrouter",
+		})
+
+		// 20% cap is 20,000 which is lower than 32,768.
+		expect(result).toBe(20_000)
+	})


The new qwen3-max-thinking tests add Kilocode-specific behavior but aren’t marked with // kilocode_change comments. Please wrap these new test cases (or annotate the added lines) so downstream merges can distinguish fork changes from upstream Roo.

Addressed in 13caa14: wrapped the two qwen3-max-thinking tests with // kilocode_change start/end markers in src/shared/tests/api.spec.ts.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 913a39ae55

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-15T16:07:36Z

src/shared/api.ts

 		return ANTHROPIC_DEFAULT_MAX_TOKENS
 	}

+	const isQwen3MaxThinkingModel = modelId.toLowerCase().includes("qwen3-max-thinking")


Mark shared src changes with kilocode_change comments

The new qwen3-max-thinking logic is added in src/ without kilocode_change markers, but the repository guideline in /workspace/kilocode/AGENTS.md requires all core-extension edits under src/ to be wrapped so upstream fork merges can isolate Kilo-specific patches; leaving this unmarked increases the chance of merge conflicts or accidental overwrite during the scripted Roo sync process.

Useful? React with 👍 / 👎.

Done in 13caa14. Added explicit // kilocode_change annotations for the qwen3-max-thinking logic in src/shared/api.ts and corresponding tests.

Olusammytee · 2026-02-15T16:07:48Z

Added coverage in shared/tests/api.spec.ts for the qwen3-max-thinking cap behavior. Local �itest shared/tests/api.spec.ts passed before push.

Olusammytee · 2026-02-16T00:37:59Z

All required checks are green. This is ready for maintainer review.

Olusammytee · 2026-02-16T05:12:40Z

Addressed the inline review feedback in commit 13caa14 by adding // kilocode_change markers around the new qwen3-specific logic and tests. Re-ran �itest shared/tests/api.spec.ts locally (pass).

Olusammytee · 2026-02-16T05:21:25Z

Follow-up update: CI is fully green on commit 13caa14 after addressing the inline kilocode_change annotation feedback. Ready for maintainer re-review.

Olusammytee · 2026-02-17T23:32:26Z

Gentle bump for maintainer review when convenient. Latest commit 13caa14 has all required checks green and addresses #5871.

kilo-code-bot · 2026-02-21T20:09:12Z

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (3 files)

.changeset/tender-otters-pay.md - Changeset for patch release
src/shared/api.ts - Added qwen3-max-thinking output token cap (32,768) with proper kilocode_change markers
src/shared/__tests__/api.spec.ts - Two test cases covering both cap scenarios (provider limit wins, context-based cap wins)

kilo-code-bot · 2026-02-21T20:09:19Z

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Clean, well-scoped fix that caps qwen3-max-thinking output tokens to the upstream provider limit of 32,768. The implementation follows existing patterns (similar to the GPT-5 bypass) and correctly applies the cap as Math.min(contextCappedMaxTokens, 32_768), ensuring the lower of the two limits always wins. Tests cover both the cap-active and cap-inactive scenarios. All kilocode_change markers are properly in place.

Files Reviewed (3 files)

.changeset/tender-otters-pay.md - Standard patch changeset
src/shared/api.ts - Added QWEN3_MAX_THINKING_OUTPUT_TOKEN_LIMIT constant and model-specific cap logic
src/shared/__tests__/api.spec.ts - Two new test cases covering both cap scenarios

kevinvandijk

Thanks!

fix: cap qwen3-max-thinking max tokens to provider limit

913a39a

Copilot AI reviewed Feb 15, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Feb 15, 2026

View reviewed changes

chore: mark qwen token-cap changes as kilocode specific

13caa14

kevinvandijk added 2 commits February 21, 2026 21:07

Merge branch 'main' into fix/5871-qwen3-max-thinking-token-cap

b1d201b

Add changeset

c7d5865

kevinvandijk approved these changes Feb 21, 2026

View reviewed changes

kevinvandijk merged commit 78015e2 into Kilo-Org:main Feb 21, 2026
9 checks passed

Conversation

Olusammytee commented Feb 15, 2026

Summary

Why

Uh oh!

changeset-bot bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Olusammytee Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Olusammytee Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Olusammytee Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Olusammytee commented Feb 15, 2026

Uh oh!

Olusammytee commented Feb 16, 2026

Uh oh!

Olusammytee commented Feb 16, 2026

Uh oh!

Olusammytee commented Feb 16, 2026

Uh oh!

Olusammytee commented Feb 17, 2026

Uh oh!

kilo-code-bot bot commented Feb 21, 2026

Code Review Summary

Uh oh!

kilo-code-bot bot commented Feb 21, 2026

Code Review Summary

Uh oh!

kevinvandijk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

changeset-bot bot commented Feb 15, 2026 •

edited

Loading