Fix premature agent termination on invalid done tool calls#1763
Fix premature agent termination on invalid done tool calls#1763
Conversation
|
Greptile SummaryFixes agent prematurely terminating when LLM hallucinates or produces invalid Key changes:
Minor improvement opportunity:
Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
Start([Tool Call Event]) --> CheckToolName{toolName == 'done'?}
CheckToolName -->|No| ProcessNormal[Process as normal tool call]
CheckToolName -->|Yes| CheckDynamic{dynamic == true?}
CheckDynamic -->|Yes| NonTerminal[Non-terminal done call]
CheckDynamic -->|No| CheckInvalid{invalid == true?}
CheckInvalid -->|Yes| NonTerminal
CheckInvalid -->|No| CheckTaskComplete{taskComplete === true?}
CheckTaskComplete -->|No| NonTerminal
CheckTaskComplete -->|Yes| Terminal[Terminal done call]
NonTerminal --> LogSkip[Log: 'Ignoring non-terminal done']
LogSkip --> Continue[Continue to next tool call]
Terminal --> MarkComplete[Set state.completed = true]
MarkComplete --> SetMessage[Set finalMessage with reasoning]
SetMessage --> MapActions[Map to actions]
ProcessNormal --> MapActions
MapActions --> End([Continue execution])
Last reviewed commit: d87b298 |
| it("returns false for dynamic/invalid done calls", () => { | ||
| expect( | ||
| isTerminalDoneToolCall({ | ||
| toolName: "done", | ||
| dynamic: true, | ||
| invalid: true, | ||
| input: { taskComplete: true }, | ||
| }), | ||
| ).toBe(false); |
There was a problem hiding this comment.
Test both dynamic and invalid together - consider separate test cases for each condition to ensure the OR logic works independently
| it("returns false for dynamic/invalid done calls", () => { | |
| expect( | |
| isTerminalDoneToolCall({ | |
| toolName: "done", | |
| dynamic: true, | |
| invalid: true, | |
| input: { taskComplete: true }, | |
| }), | |
| ).toBe(false); | |
| it("returns false for dynamic done calls", () => { | |
| expect( | |
| isTerminalDoneToolCall({ | |
| toolName: "done", | |
| dynamic: true, | |
| input: { taskComplete: true }, | |
| }), | |
| ).toBe(false); | |
| }); | |
| it("returns false for invalid done calls", () => { | |
| expect( | |
| isTerminalDoneToolCall({ | |
| toolName: "done", | |
| invalid: true, | |
| input: { taskComplete: true }, | |
| }), | |
| ).toBe(false); | |
| }); |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
No issues found across 3 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant Agent as V3AgentHandler
participant Validator as isTerminalDoneToolCall
participant Logger as Logger
participant State as Run State
Note over Agent,State: Agent Processing Loop (Step Execution)
Agent->>Agent: LLM generates Tool Calls
loop For each Tool Call
alt toolName is "done"
Agent->>Validator: NEW: isTerminalDoneToolCall(toolCall)
Note right of Validator: Checks taskComplete=true,<br/>!dynamic, and !invalid
Validator-->>Agent: Result (boolean)
alt Result is FALSE (Invalid/Non-Terminal)
Agent->>Logger: NEW: Log "Ignoring non-terminal done tool call"
Agent->>Agent: CHANGED: continue (skip to next tool call)
else Result is TRUE (Valid Termination)
Agent->>State: Set completed = true
Agent->>State: CHANGED: Construct finalMessage from reasoning
end
else other tool calls
Agent->>Agent: Process tool actions normally
end
end
Note over Agent,State: Step Termination Check
Agent->>Validator: NEW: Check if any toolCall is terminal
Validator-->>Agent: boolean
alt isTerminal OR maxSteps reached
Agent-->>Agent: Exit Agent Loop
else
Agent-->>Agent: Continue to next step
end
Summary
doneas terminal only when it is non-dynamic, non-invalid, and explicitlytaskComplete: truedonecalls in the step handler so hallucinated/invalid done calls do not mark the run completedonecall detectionValidation
pnpm --filter @browserbasehq/stagehand run build:esmpnpm --filter @browserbasehq/stagehand run test:core -- packages/core/dist/esm/tests/unit/done-tool-call.test.jspnpm --filter @browserbasehq/stagehand run typecheckSummary by cubic
Fixes premature agent termination by ignoring invalid or dynamic done tool calls. A run now ends only when a done call is non-dynamic, non-invalid, and input.taskComplete is true.
Written for commit 73f51d1. Summary will update on new commits. Review in cubic