Open
Conversation
juliusalba
added a commit
to juliusalba/agentic-toolcall
that referenced
this pull request
Apr 7, 2026
CRITICAL fixes: - scoreModelResults() now accepts scenarioPool parameter — enterprise and memory suites were silently scoring 0% across all categories because it was hardcoded to base SCENARIOS (audit finding stevibe#1) - TC-06: Added partial credit for translating 1 of 2 languages; relaxed text matching from exact-string to includesText (finding stevibe#2) - TC-09: Datetime matching now accepts multiple ISO formats — "2026-03-21T08:00", "08:00:00Z", etc. (finding stevibe#3) - EC-12: Cron validation now parses 5 fields properly instead of loose string.includes() that matched false positives (finding stevibe#5) IMPORTANT fairness fixes: - TC-03: Models that verify common knowledge with tools now get partial credit (1pt) instead of fail (0pts) — removes bias against safety-trained models (finding #9) - MR-10: Expanded admission phrases from 5 to 13 patterns for "I don't know" detection (finding stevibe#6) - MR-11: Contradiction detection no longer uses fragile exact string match; uses regex pattern matching (finding stevibe#7) - EC-02: Cursor detection now searches all arguments, not just URL+body+headers concatenation (finding #10) - EC-05: Turn ordering now allows same-turn parallel calls with <= instead of strict < (finding #11) - MR-04: Requires 2 of 3 key facts (Acme, $50M, Series C) instead of just 1 (finding #15) - EC-03: MCP tool name matching tightened from includes("issue") to specific create-issue variants (finding #19) - MR-07: Generic queries now return partial results to reward targeted queries (finding #20) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add OpenCode GO & vLLM Routing
Summary
Added support for two new providers in ToolCall-15:
opencode) - Connect to OpenCode GO agentsvllm) - Connect to local vLLM instances for benchmarking local modelsChanges
lib/models.ts- AddedopencodeandvllmtoProviderNametype and routing logic.env.example- AddedOPENCODE_GO_URL,OPENCODE_API_KEY, andVLLM_HOSTenvironment variablesUsage
OpenCode GO
vLLM
Testing