Skip to content

fix: include tool definition tokens in cost optimization token count#3733

Open
jyoti369 wants to merge 10 commits intoarchestra-ai:mainfrom
jyoti369:fix/cost-optimization-tool-tokens-3423
Open

fix: include tool definition tokens in cost optimization token count#3733
jyoti369 wants to merge 10 commits intoarchestra-ai:mainfrom
jyoti369:fix/cost-optimization-tool-tokens-3423

Conversation

@jyoti369
Copy link
Copy Markdown
Contributor

Problem

The cost optimization logic (getOptimizedModel) only counted message tokens when evaluating maxLength rules. Tool definitions were completely ignored in the token count -- a request with 100 tokens of messages but 5000 tokens worth of tool definitions (e.g. many MCP tools with detailed JSON schemas) would be incorrectly routed to a cheaper/smaller model.

Issue: #3423

Changes

cost-optimization.ts

  • Added estimateToolTokens() -- serializes tool names, descriptions, and input schemas, then uses the same chars/4 approximation as the base tokenizer
  • Updated getOptimizedModel() to accept an optional tools parameter and add tool token count to the total before rule evaluation
  • Enhanced logging to show messageTokenCount and toolTokenCount separately

llm-proxy-handler.ts

  • Now passes the actual tools array from requestAdapter.getTools() into the optimizer (previously only passed a hasTools boolean)

cost-optimization.test.ts

  • Added 5 unit tests for estimateToolTokens: empty array, single tool, large vs small schemas, multiple tools accumulation, tools without description

How it works

Before:

tokenCount = tokenizer.countTokens(messages)  // tools ignored

After:

messageTokenCount = tokenizer.countTokens(messages)
toolTokenCount = estimateToolTokens(tools)      // serialize & count tool defs
tokenCount = messageTokenCount + toolTokenCount  // total used for maxLength check

The cost optimization logic only counted message tokens when evaluating
maxLength rules, ignoring tool definitions entirely. Requests with small
messages but large tool schemas (e.g. many MCP tools) could be incorrectly
routed to cheaper models.

Now estimateToolTokens() serializes tool names, descriptions, and input
schemas, using the same chars/4 approximation as the base tokenizer. The
tool token count is added to the message token count before rule evaluation.

Closes archestra-ai#3423
Copy link
Copy Markdown
Contributor

@joeyorlando joeyorlando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @jyoti369 👋 thanks for the contribution. Some quick/initial review comments.

@jyoti369 jyoti369 requested a review from joeyorlando April 14, 2026 03:12
Copy link
Copy Markdown
Contributor

@joeyorlando joeyorlando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there’s one blocking regression to address before merge.

hasTools changed semantics in llm-proxy-handler.ts. Previously we passed requestAdapter.hasTools(), which reflects whether the incoming request declared any tools at all. The new code derives it from requestAdapter.getTools().length > 0, but getTools() is a normalized subset used for persistence/token estimation and it can intentionally drop some provider-native tools.

For example, the Anthropic adapter filters out non-custom tools like bash / text_editor in getTools(), while hasTools() still reports that the request included tools. That means existing optimization rules that key off hasTools can stop matching for requests that still do have tools.

I’d keep the original hasTools = requestAdapter.hasTools() for rule evaluation, and pass the normalized tools array separately only for token estimation.

Once that’s fixed, the main direction here looks good to me.

(lastly, please address the linting errors on CI)

@jyoti369
Copy link
Copy Markdown
Contributor Author

jyoti369 commented Apr 14, 2026

Hey @joeyorlando, good catch! You're right, getTools() only returns the normalized/filtered subset and would drop provider-native tools like Anthropic's bash/text_editor, so deriving hasTools from its length was wrong. Reverted to requestAdapter.hasTools() for rule evaluation, keeping tools = requestAdapter.getTools() only for token estimation. Fix is pushed.

@jyoti369 jyoti369 requested a review from joeyorlando April 14, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants