Skip to content

Fix premature agent termination on invalid done tool calls#1763

Draft
shrey150 wants to merge 2 commits intomainfrom
codex/fix-invalid-done-termination
Draft

Fix premature agent termination on invalid done tool calls#1763
shrey150 wants to merge 2 commits intomainfrom
codex/fix-invalid-done-termination

Conversation

@shrey150
Copy link
Contributor

@shrey150 shrey150 commented Feb 26, 2026

Summary

  • treat done as terminal only when it is non-dynamic, non-invalid, and explicitly taskComplete: true
  • ignore non-terminal done calls in the step handler so hallucinated/invalid done calls do not mark the run complete
  • add unit tests covering terminal/non-terminal done call detection

Validation

  • pnpm --filter @browserbasehq/stagehand run build:esm
  • pnpm --filter @browserbasehq/stagehand run test:core -- packages/core/dist/esm/tests/unit/done-tool-call.test.js
  • pnpm --filter @browserbasehq/stagehand run typecheck

Summary by cubic

Fixes premature agent termination by ignoring invalid or dynamic done tool calls. A run now ends only when a done call is non-dynamic, non-invalid, and input.taskComplete is true.

  • Bug Fixes
    • Added isTerminalDoneToolCall to validate terminal done calls.
    • Updated v3AgentHandler to skip non-terminal done calls, only mark runs complete on terminal ones, and build finalMessage from collected reasoning.
    • Adjusted step termination check to use the new validator and added unit and integration tests.

Written for commit 73f51d1. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link

changeset-bot bot commented Feb 26, 2026

⚠️ No Changeset found

Latest commit: 73f51d1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Greptile Summary

Fixes agent prematurely terminating when LLM hallucinates or produces invalid done tool calls. The PR introduces isTerminalDoneToolCall() to distinguish between valid terminal done calls (non-dynamic, non-invalid, with taskComplete: true) and hallucinated/invalid ones that should be ignored.

Key changes:

  • New utility isTerminalDoneToolCall() validates done calls with strict checks for taskComplete === true and absence of dynamic/invalid flags
  • Handler now skips non-terminal done calls in the main loop instead of marking the run as complete
  • handleStop() updated to only stop on terminal done calls
  • Unit tests cover major scenarios (non-done tools, dynamic/invalid flags, false taskComplete, non-object input)

Minor improvement opportunity:

  • Test coverage could be enhanced by testing dynamic and invalid flags separately to verify the OR logic works independently

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • Clean, focused fix with proper type safety and test coverage. The logic correctly addresses the stated problem of premature termination. The only minor gap is test coverage for testing dynamic/invalid flags independently, which doesn't impact functionality.
  • No files require special attention

Important Files Changed

Filename Overview
packages/core/lib/v3/agent/utils/doneToolCall.ts New utility function correctly validates terminal done calls with proper null/type safety
packages/core/lib/v3/handlers/v3AgentHandler.ts Handler correctly integrates terminal done call check, skipping invalid/dynamic done calls
packages/core/tests/unit/done-tool-call.test.ts Good test coverage with minor gap: dynamic and invalid properties tested together rather than separately

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start([Tool Call Event]) --> CheckToolName{toolName == 'done'?}
    CheckToolName -->|No| ProcessNormal[Process as normal tool call]
    CheckToolName -->|Yes| CheckDynamic{dynamic == true?}
    
    CheckDynamic -->|Yes| NonTerminal[Non-terminal done call]
    CheckDynamic -->|No| CheckInvalid{invalid == true?}
    
    CheckInvalid -->|Yes| NonTerminal
    CheckInvalid -->|No| CheckTaskComplete{taskComplete === true?}
    
    CheckTaskComplete -->|No| NonTerminal
    CheckTaskComplete -->|Yes| Terminal[Terminal done call]
    
    NonTerminal --> LogSkip[Log: 'Ignoring non-terminal done']
    LogSkip --> Continue[Continue to next tool call]
    
    Terminal --> MarkComplete[Set state.completed = true]
    MarkComplete --> SetMessage[Set finalMessage with reasoning]
    SetMessage --> MapActions[Map to actions]
    
    ProcessNormal --> MapActions
    MapActions --> End([Continue execution])
Loading

Last reviewed commit: d87b298

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +14 to +22
it("returns false for dynamic/invalid done calls", () => {
expect(
isTerminalDoneToolCall({
toolName: "done",
dynamic: true,
invalid: true,
input: { taskComplete: true },
}),
).toBe(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test both dynamic and invalid together - consider separate test cases for each condition to ensure the OR logic works independently

Suggested change
it("returns false for dynamic/invalid done calls", () => {
expect(
isTerminalDoneToolCall({
toolName: "done",
dynamic: true,
invalid: true,
input: { taskComplete: true },
}),
).toBe(false);
it("returns false for dynamic done calls", () => {
expect(
isTerminalDoneToolCall({
toolName: "done",
dynamic: true,
input: { taskComplete: true },
}),
).toBe(false);
});
it("returns false for invalid done calls", () => {
expect(
isTerminalDoneToolCall({
toolName: "done",
invalid: true,
input: { taskComplete: true },
}),
).toBe(false);
});

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant Agent as V3AgentHandler
    participant Validator as isTerminalDoneToolCall
    participant Logger as Logger
    participant State as Run State

    Note over Agent,State: Agent Processing Loop (Step Execution)

    Agent->>Agent: LLM generates Tool Calls
    
    loop For each Tool Call
        alt toolName is "done"
            Agent->>Validator: NEW: isTerminalDoneToolCall(toolCall)
            Note right of Validator: Checks taskComplete=true,<br/>!dynamic, and !invalid
            Validator-->>Agent: Result (boolean)

            alt Result is FALSE (Invalid/Non-Terminal)
                Agent->>Logger: NEW: Log "Ignoring non-terminal done tool call"
                Agent->>Agent: CHANGED: continue (skip to next tool call)
            else Result is TRUE (Valid Termination)
                Agent->>State: Set completed = true
                Agent->>State: CHANGED: Construct finalMessage from reasoning
            end
        else other tool calls
            Agent->>Agent: Process tool actions normally
        end
    end

    Note over Agent,State: Step Termination Check

    Agent->>Validator: NEW: Check if any toolCall is terminal
    Validator-->>Agent: boolean
    alt isTerminal OR maxSteps reached
        Agent-->>Agent: Exit Agent Loop
    else
        Agent-->>Agent: Continue to next step
    end
Loading

@shrey150 shrey150 marked this pull request as draft February 26, 2026 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant