Skip to content

fix: cap qwen3-max-thinking max_tokens to provider limit#5885

Merged
kevinvandijk merged 4 commits intoKilo-Org:mainfrom
Olusammytee:fix/5871-qwen3-max-thinking-token-cap
Feb 21, 2026
Merged

fix: cap qwen3-max-thinking max_tokens to provider limit#5885
kevinvandijk merged 4 commits intoKilo-Org:mainfrom
Olusammytee:fix/5871-qwen3-max-thinking-token-cap

Conversation

@Olusammytee
Copy link

Summary

  • cap qwen3-max-thinking output tokens at 32,768 in model max-token resolution
  • preserve existing 20% context-window cap behavior for all other models
  • add unit tests for both capped and lower-context scenarios

Why

qwen/qwen3-max-thinking can error with:
Range of max_tokens should be [1, 32768]
when the computed token value exceeds provider limits.

Fixes #5871

@changeset-bot
Copy link

changeset-bot bot commented Feb 15, 2026

🦋 Changeset detected

Latest commit: c7d5865

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
kilo-code Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes provider-side validation errors for qwen/qwen3-max-thinking by ensuring max_tokens never exceeds the upstream 32,768 limit, while preserving the existing “20% of context window” clamp behavior for other models.

Changes:

  • Add a Qwen3 Max Thinking–specific max output token cap at 32,768 during max-token resolution.
  • Keep the existing 20% context-window clamp for all other models (and still apply it first for Qwen3).
  • Add unit tests covering both the provider cap case and the lower context-based cap case.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/shared/api.ts Adds model-ID detection for qwen3-max-thinking and caps resolved output tokens to 32,768 after applying the existing context-based clamp.
src/shared/__tests__/api.spec.ts Adds tests validating the new Qwen3-specific cap behavior and ensuring the context-based cap still applies when lower.

Comment on lines 111 to +165
@@ -143,6 +144,8 @@ export const getModelMaxOutputTokens = ({
return ANTHROPIC_DEFAULT_MAX_TOKENS
}

const isQwen3MaxThinkingModel = modelId.toLowerCase().includes("qwen3-max-thinking")

// If model has explicit maxTokens, clamp it to 20% of the context window
// Exception: GPT-5 models should use their exact configured max output tokens
if (model.maxTokens) {
@@ -154,8 +157,15 @@ export const getModelMaxOutputTokens = ({
return model.maxTokens
}

const contextCappedMaxTokens = Math.min(model.maxTokens, Math.ceil(model.contextWindow * 0.2))

// qwen3-max-thinking currently rejects values above 32,768 (upstream provider constraint).
if (isQwen3MaxThinkingModel) {
return Math.min(contextCappedMaxTokens, QWEN3_MAX_THINKING_OUTPUT_TOKEN_LIMIT)
}
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Kilocode-specific logic/constants should be marked with // kilocode_change (or a start/end block) to keep future merges with upstream Roo manageable. Please annotate the newly added Qwen3 provider cap constant and the qwen3-max-thinking special-case branch accordingly.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 13caa14: added // kilocode_change markers around the new qwen3-specific constant and branch in src/shared/api.ts.

Comment on lines +214 to +248
test("should cap qwen3-max-thinking to provider max output limit of 32,768", () => {
const model: ModelInfo = {
contextWindow: 300_000,
supportsPromptCache: false,
maxTokens: 200_000,
}

const result = getModelMaxOutputTokens({
modelId: "qwen/qwen3-max-thinking",
model,
settings: {},
format: "openrouter",
})

// 20% cap would be 60,000, but model-specific provider cap is 32,768.
expect(result).toBe(32_768)
})

test("should still honor lower context-based cap for qwen3-max-thinking", () => {
const model: ModelInfo = {
contextWindow: 100_000,
supportsPromptCache: false,
maxTokens: 200_000,
}

const result = getModelMaxOutputTokens({
modelId: "qwen/qwen3-max-thinking",
model,
settings: {},
format: "openrouter",
})

// 20% cap is 20,000 which is lower than 32,768.
expect(result).toBe(20_000)
})
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new qwen3-max-thinking tests add Kilocode-specific behavior but aren’t marked with // kilocode_change comments. Please wrap these new test cases (or annotate the added lines) so downstream merges can distinguish fork changes from upstream Roo.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 13caa14: wrapped the two qwen3-max-thinking tests with // kilocode_change start/end markers in src/shared/tests/api.spec.ts.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 913a39ae55

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return ANTHROPIC_DEFAULT_MAX_TOKENS
}

const isQwen3MaxThinkingModel = modelId.toLowerCase().includes("qwen3-max-thinking")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Mark shared src changes with kilocode_change comments

The new qwen3-max-thinking logic is added in src/ without kilocode_change markers, but the repository guideline in /workspace/kilocode/AGENTS.md requires all core-extension edits under src/ to be wrapped so upstream fork merges can isolate Kilo-specific patches; leaving this unmarked increases the chance of merge conflicts or accidental overwrite during the scripted Roo sync process.

Useful? React with 👍 / 👎.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 13caa14. Added explicit // kilocode_change annotations for the qwen3-max-thinking logic in src/shared/api.ts and corresponding tests.

@Olusammytee
Copy link
Author

Added coverage in shared/tests/api.spec.ts for the qwen3-max-thinking cap behavior. Local �itest shared/tests/api.spec.ts passed before push.

@Olusammytee
Copy link
Author

All required checks are green. This is ready for maintainer review.

@Olusammytee
Copy link
Author

Addressed the inline review feedback in commit 13caa14 by adding // kilocode_change markers around the new qwen3-specific logic and tests. Re-ran �itest shared/tests/api.spec.ts locally (pass).

@Olusammytee
Copy link
Author

Follow-up update: CI is fully green on commit 13caa14 after addressing the inline kilocode_change annotation feedback. Ready for maintainer re-review.

@Olusammytee
Copy link
Author

Gentle bump for maintainer review when convenient. Latest commit 13caa14 has all required checks green and addresses #5871.

@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Feb 21, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (3 files)
  • .changeset/tender-otters-pay.md - Changeset for patch release
  • src/shared/api.ts - Added qwen3-max-thinking output token cap (32,768) with proper kilocode_change markers
  • src/shared/__tests__/api.spec.ts - Two test cases covering both cap scenarios (provider limit wins, context-based cap wins)

@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Feb 21, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Clean, well-scoped fix that caps qwen3-max-thinking output tokens to the upstream provider limit of 32,768. The implementation follows existing patterns (similar to the GPT-5 bypass) and correctly applies the cap as Math.min(contextCappedMaxTokens, 32_768), ensuring the lower of the two limits always wins. Tests cover both the cap-active and cap-inactive scenarios. All kilocode_change markers are properly in place.

Files Reviewed (3 files)
  • .changeset/tender-otters-pay.md - Standard patch changeset
  • src/shared/api.ts - Added QWEN3_MAX_THINKING_OUTPUT_TOKEN_LIMIT constant and model-specific cap logic
  • src/shared/__tests__/api.spec.ts - Two new test cases covering both cap scenarios

Copy link
Collaborator

@kevinvandijk kevinvandijk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API error due to range of max_token for Qwen3 Max Thinking

3 participants