Fix premature agent termination on invalid done tool calls by shrey150 · Pull Request #1763 · browserbase/stagehand

shrey150 · 2026-02-26T19:23:48Z

Summary

treat done as terminal only when it is non-dynamic, non-invalid, and explicitly taskComplete: true
ignore non-terminal done calls in the step handler so hallucinated/invalid done calls do not mark the run complete
add unit tests covering terminal/non-terminal done call detection

Validation

pnpm --filter @browserbasehq/stagehand run build:esm
pnpm --filter @browserbasehq/stagehand run test:core -- packages/core/dist/esm/tests/unit/done-tool-call.test.js
pnpm --filter @browserbasehq/stagehand run typecheck

Summary by cubic

Fixes premature agent termination by ignoring invalid or dynamic done tool calls. A run now ends only when a done call is non-dynamic, non-invalid, and input.taskComplete is true.

Bug Fixes
- Added isTerminalDoneToolCall to validate terminal done calls.
- Updated v3AgentHandler to skip non-terminal done calls, only mark runs complete on terminal ones, and build finalMessage from collected reasoning.
- Adjusted step termination check to use the new validator and added unit and integration tests.

^{Written for commit 73f51d1. Summary will update on new commits. Review in cubic}

changeset-bot · 2026-02-26T19:23:55Z

⚠️ No Changeset found

Latest commit: 73f51d1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

greptile-apps · 2026-02-26T19:28:15Z

Greptile Summary

Fixes agent prematurely terminating when LLM hallucinates or produces invalid done tool calls. The PR introduces isTerminalDoneToolCall() to distinguish between valid terminal done calls (non-dynamic, non-invalid, with taskComplete: true) and hallucinated/invalid ones that should be ignored.

Key changes:

New utility isTerminalDoneToolCall() validates done calls with strict checks for taskComplete === true and absence of dynamic/invalid flags
Handler now skips non-terminal done calls in the main loop instead of marking the run as complete
handleStop() updated to only stop on terminal done calls
Unit tests cover major scenarios (non-done tools, dynamic/invalid flags, false taskComplete, non-object input)

Minor improvement opportunity:

Test coverage could be enhanced by testing dynamic and invalid flags separately to verify the OR logic works independently

Confidence Score: 5/5

This PR is safe to merge with minimal risk
Clean, focused fix with proper type safety and test coverage. The logic correctly addresses the stated problem of premature termination. The only minor gap is test coverage for testing dynamic/invalid flags independently, which doesn't impact functionality.
No files require special attention

Important Files Changed

Filename	Overview
packages/core/lib/v3/agent/utils/doneToolCall.ts	New utility function correctly validates terminal done calls with proper null/type safety
packages/core/lib/v3/handlers/v3AgentHandler.ts	Handler correctly integrates terminal done call check, skipping invalid/dynamic done calls
packages/core/tests/unit/done-tool-call.test.ts	Good test coverage with minor gap: dynamic and invalid properties tested together rather than separately

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start([Tool Call Event]) --> CheckToolName{toolName == 'done'?}
    CheckToolName -->|No| ProcessNormal[Process as normal tool call]
    CheckToolName -->|Yes| CheckDynamic{dynamic == true?}
    
    CheckDynamic -->|Yes| NonTerminal[Non-terminal done call]
    CheckDynamic -->|No| CheckInvalid{invalid == true?}
    
    CheckInvalid -->|Yes| NonTerminal
    CheckInvalid -->|No| CheckTaskComplete{taskComplete === true?}
    
    CheckTaskComplete -->|No| NonTerminal
    CheckTaskComplete -->|Yes| Terminal[Terminal done call]
    
    NonTerminal --> LogSkip[Log: 'Ignoring non-terminal done']
    LogSkip --> Continue[Continue to next tool call]
    
    Terminal --> MarkComplete[Set state.completed = true]
    MarkComplete --> SetMessage[Set finalMessage with reasoning]
    SetMessage --> MapActions[Map to actions]
    
    ProcessNormal --> MapActions
    MapActions --> End([Continue execution])

_{Last reviewed commit: d87b298}

greptile-apps

_{3 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-26T19:28:19Z

packages/core/tests/unit/done-tool-call.test.ts

+  it("returns false for dynamic/invalid done calls", () => {
+    expect(
+      isTerminalDoneToolCall({
+        toolName: "done",
+        dynamic: true,
+        invalid: true,
+        input: { taskComplete: true },
+      }),
+    ).toBe(false);


Test both dynamic and invalid together - consider separate test cases for each condition to ensure the OR logic works independently

Suggested change

it("returns false for dynamic/invalid done calls", () => {

expect(

isTerminalDoneToolCall({

toolName: "done",

dynamic: true,

invalid: true,

input: { taskComplete: true },

}),

).toBe(false);

it("returns false for dynamic done calls", () => {

expect(

isTerminalDoneToolCall({

toolName: "done",

dynamic: true,

input: { taskComplete: true },

}),

).toBe(false);

});

it("returns false for invalid done calls", () => {

expect(

isTerminalDoneToolCall({

toolName: "done",

invalid: true,

input: { taskComplete: true },

}),

).toBe(false);

});

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

cubic-dev-ai

No issues found across 3 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

Architecture diagram

sequenceDiagram
    participant Agent as V3AgentHandler
    participant Validator as isTerminalDoneToolCall
    participant Logger as Logger
    participant State as Run State

    Note over Agent,State: Agent Processing Loop (Step Execution)

    Agent->>Agent: LLM generates Tool Calls
    
    loop For each Tool Call
        alt toolName is "done"
            Agent->>Validator: NEW: isTerminalDoneToolCall(toolCall)
            Note right of Validator: Checks taskComplete=true,<br/>!dynamic, and !invalid
            Validator-->>Agent: Result (boolean)

            alt Result is FALSE (Invalid/Non-Terminal)
                Agent->>Logger: NEW: Log "Ignoring non-terminal done tool call"
                Agent->>Agent: CHANGED: continue (skip to next tool call)
            else Result is TRUE (Valid Termination)
                Agent->>State: Set completed = true
                Agent->>State: CHANGED: Construct finalMessage from reasoning
            end
        else other tool calls
            Agent->>Agent: Process tool actions normally
        end
    end

    Note over Agent,State: Step Termination Check

    Agent->>Validator: NEW: Check if any toolCall is terminal
    Validator-->>Agent: boolean
    alt isTerminal OR maxSteps reached
        Agent-->>Agent: Exit Agent Loop
    else
        Agent-->>Agent: Continue to next step
    end

Fix premature agent termination on invalid done tool calls

d87b298

greptile-apps bot reviewed Feb 26, 2026

View reviewed changes

cubic-dev-ai bot reviewed Feb 26, 2026

View reviewed changes

shrey150 marked this pull request as draft February 26, 2026 19:31

Add integration test for invalid done tool call handling

73f51d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix premature agent termination on invalid done tool calls#1763

Fix premature agent termination on invalid done tool calls#1763
shrey150 wants to merge 2 commits intomainfrom
codex/fix-invalid-done-termination

shrey150 commented Feb 26, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

changeset-bot bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 26, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 26, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shrey150 commented Feb 26, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by cubic

Uh oh!

changeset-bot bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

greptile-apps bot commented Feb 26, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shrey150 commented Feb 26, 2026 •

edited by cubic-dev-ai bot

Loading

changeset-bot bot commented Feb 26, 2026 •

edited

Loading