fix: include tool definition tokens in cost optimization token count#3733
fix: include tool definition tokens in cost optimization token count#3733jyoti369 wants to merge 10 commits intoarchestra-ai:mainfrom
Conversation
The cost optimization logic only counted message tokens when evaluating maxLength rules, ignoring tool definitions entirely. Requests with small messages but large tool schemas (e.g. many MCP tools) could be incorrectly routed to cheaper models. Now estimateToolTokens() serializes tool names, descriptions, and input schemas, using the same chars/4 approximation as the base tokenizer. The tool token count is added to the message token count before rule evaluation. Closes archestra-ai#3423
joeyorlando
left a comment
There was a problem hiding this comment.
hey @jyoti369 👋 thanks for the contribution. Some quick/initial review comments.
There was a problem hiding this comment.
I think there’s one blocking regression to address before merge.
hasTools changed semantics in llm-proxy-handler.ts. Previously we passed requestAdapter.hasTools(), which reflects whether the incoming request declared any tools at all. The new code derives it from requestAdapter.getTools().length > 0, but getTools() is a normalized subset used for persistence/token estimation and it can intentionally drop some provider-native tools.
For example, the Anthropic adapter filters out non-custom tools like bash / text_editor in getTools(), while hasTools() still reports that the request included tools. That means existing optimization rules that key off hasTools can stop matching for requests that still do have tools.
I’d keep the original hasTools = requestAdapter.hasTools() for rule evaluation, and pass the normalized tools array separately only for token estimation.
Once that’s fixed, the main direction here looks good to me.
(lastly, please address the linting errors on CI)
|
Hey @joeyorlando, good catch! You're right, getTools() only returns the normalized/filtered subset and would drop provider-native tools like Anthropic's bash/text_editor, so deriving hasTools from its length was wrong. Reverted to requestAdapter.hasTools() for rule evaluation, keeping tools = requestAdapter.getTools() only for token estimation. Fix is pushed. |
Problem
The cost optimization logic (
getOptimizedModel) only counted message tokens when evaluatingmaxLengthrules. Tool definitions were completely ignored in the token count -- a request with 100 tokens of messages but 5000 tokens worth of tool definitions (e.g. many MCP tools with detailed JSON schemas) would be incorrectly routed to a cheaper/smaller model.Issue: #3423
Changes
cost-optimization.tsestimateToolTokens()-- serializes tool names, descriptions, and input schemas, then uses the samechars/4approximation as the base tokenizergetOptimizedModel()to accept an optionaltoolsparameter and add tool token count to the total before rule evaluationmessageTokenCountandtoolTokenCountseparatelyllm-proxy-handler.tsrequestAdapter.getTools()into the optimizer (previously only passed ahasToolsboolean)cost-optimization.test.tsestimateToolTokens: empty array, single tool, large vs small schemas, multiple tools accumulation, tools without descriptionHow it works
Before:
After: