Skip to content

fix: allow manual page control for agent() when parameter included#1785

Open
koennjb wants to merge 5 commits intobrowserbase:mainfrom
koennjb:koennjb/thread-page-through-tools
Open

fix: allow manual page control for agent() when parameter included#1785
koennjb wants to merge 5 commits intobrowserbase:mainfrom
koennjb:koennjb/thread-page-through-tools

Conversation

@koennjb
Copy link

@koennjb koennjb commented Mar 5, 2026

why

The agent() method accepts a page parameter through options, which follows the pattern of act, extract, observe - but regardless all agent tools use awaitActivePage().

This causes issues for customers that manually control pages and run multiple agents in parallel in multiple tabs. As the active page changes during navigations, clicks etc - the other agents get very confused because the page is constantly changing state while it's trying to use tools itself. Documentation reads as it's intended for the agent to allow manual control the same way: https://docs.stagehand.dev/v3/best-practices/using-multiple-tabs#the-stagehand-page.

what changed

The page parameter from agent options is passed through to all tool factories so the page acted on can be resolved from that if provided, otherwise awaitActivePage as before.
This includes CUA agents and cached agent replays. One question I have is if it seems right to do this for cache replay.

test plan

I have a reproduction end to end test in our application that integrates with Stagehand and shows the "cross talk" issue when running parallel agents in one browser.

Tested via:

  • Unit tests for page resolution in tools, unit tests for cache replay actions
  • Built stagehand locally, installed into our project repo
  • Re-running our end to end tests and confirmed that manually provided pages are used and no more cross talk occurs

Summary by cubic

Allows agent({ page }) to explicitly control which tab a tool acts on across the whole agent flow, including CUA and cache replay. Fixes cross-talk when running multiple agents in parallel and makes replays deterministic on the chosen page.

  • Bug Fixes

    • Forward the page option through all tool factories and v3 act/extract/observe calls; tools resolve the provided page instead of awaitActivePage().
    • Update V3AgentHandler and V3CuaAgentHandler to use the explicit page for URLs, screenshots, cursor overlay, and client updates.
    • AgentCache now accepts a page and replays goto, scroll, navback, keys, act, and fillForm on that page; supports streaming and non-streaming replay.
    • v3.ts wires the page through agent execution paths and sets the active page when provided.
    • Added unit tests covering page resolution in tools and cache replay.
  • Refactors

    • Introduced AgentToolFactoryOptions and a resolvePage helper; standardized tool signatures to accept a single options object. When no page is provided, behavior falls back to awaitActivePage().

Written for commit 2a02805. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link

changeset-bot bot commented Mar 5, 2026

⚠️ No Changeset found

Latest commit: 2a02805

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@koennjb koennjb marked this pull request as ready for review March 5, 2026 01:07
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 23 files

Confidence score: 3/5

  • There is some concrete regression risk here: in packages/core/lib/v3/cache/AgentCache.ts, replayAgentActStep not forwarding a manually supplied page in the no-actions fallback can route replay to the wrong page context.
  • packages/core/lib/v3/agent/utils/resolvePage.ts introduces relative ESM imports without .js extensions, which conflicts with the repo convention and could cause module resolution/runtime issues in ESM environments.
  • Given two medium-severity, high-confidence findings with user/runtime impact potential, this is not a merge-blocker by default but does carry meaningful risk until addressed.
  • Pay close attention to packages/core/lib/v3/cache/AgentCache.ts and packages/core/lib/v3/agent/utils/resolvePage.ts - page-target consistency during replay and ESM import resolution need verification.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/agent/utils/resolvePage.ts">

<violation number="1" location="packages/core/lib/v3/agent/utils/resolvePage.ts:1">
P2: New relative ESM imports omit required `.js` extensions, violating repository import convention and risking ESM resolution issues.</violation>
</file>

<file name="packages/core/lib/v3/cache/AgentCache.ts">

<violation number="1" location="packages/core/lib/v3/cache/AgentCache.ts:178">
P2: `replayAgentActStep` does not forward the manually supplied `page` in its no-actions fallback, causing inconsistent page targeting during cache replay.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant U as User Code
    participant V3 as V3 Class
    participant H as Agent Handler (V3/CUA)
    participant C as Agent Cache
    participant TF as Tool Factory / Tools
    participant R as resolvePage (Util)
    participant B as Browser Context / Page

    Note over U,B: Manual Page Control Flow (Prevents Cross-talk)

    U->>V3: agent({ instruction, page: customPage })
    V3->>V3: CHANGED: normalizeToV3Page(customPage)
    
    alt NEW: Cache Replay Path
        V3->>C: tryReplay(context, llm, page)
        C->>R: NEW: resolvePage(v3, page)
        R-->>C: Target Page
        C->>B: Perform deterministic actions on Target Page
        C-->>V3: Cached Result
    end

    V3->>H: NEW: Instantiate Handler with explicit page
    
    H->>TF: NEW: createAgentTools(v3, { ..., page })
    Note over TF: Each tool factory (act, click, goto, etc.)<br/>now holds a reference to the target page.

    loop Agent Reasoning Loop
        H->>TF: execute(toolInput)
        TF->>R: NEW: resolvePage(v3, page)
        
        alt page is provided
            R-->>TF: Return target page
        else page is undefined
            R->>B: awaitActivePage()
            B-->>R: Current active tab
            R-->>TF: Return active tab (Legacy behavior)
        end

        TF->>B: Perform Browser Action (click, goto, etc.)
        Note right of B: Interaction is locked to the<br/>resolved page, avoiding cross-talk.
        
        B-->>TF: Result / Screenshot
        TF-->>H: Tool Output
        
        H->>H: NEW: resolvePage() for URL/Screenshot updates
        H->>B: Get current URL or take screenshot
    end

    H-->>V3: Final Result
    V3-->>U: AgentResult
Loading

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR fixes cross-talk between parallel agent() calls by threading an explicit page option all the way from agent({ page }) through the handler layer, every tool factory, and cache replay — so each agent's tools operate on their assigned tab rather than whatever awaitActivePage() happens to return.

Key changes:

  • Introduced AgentToolFactoryOptions (new types.ts) and a resolvePage utility to centralize the "prefer explicit page, fall back to active page" pattern.
  • All 14+ tool factories were refactored from positional arguments to an options object, receiving page and using resolvePage in their execute callbacks.
  • V3AgentHandler and V3CuaAgentHandler store this.page and use it for URL reads, cursor overlay, screenshot emission, and tool creation.
  • AgentCache.tryReplay / tryReplayAsStream propagate page through all replay step types (goto, scroll, navback, keys, act, fillForm).

Two issues warrant attention:

  • resolvePage.ts uses bare (non-.js) import paths, which breaks Node.js ESM module resolution and is inconsistent with the rest of the codebase.
  • V3AgentHandler re-implements resolvePage logic locally instead of delegating to the shared utility (unlike V3CuaAgentHandler), creating unnecessary duplication.

Confidence Score: 4/5

  • The core fix is sound and well-implemented, but a build-breaking ESM import issue in resolvePage.ts and code duplication in v3AgentHandler.ts need to be addressed before merging.
  • The architectural approach is correct and comprehensive — all tool factories and handlers properly thread the explicit page. However, two issues reduce confidence: (1) resolvePage.ts uses bare imports without .js extensions, which breaks Node.js ESM module resolution and will cause build failures; (2) v3AgentHandler duplicates the resolvePage logic instead of using the shared utility, reducing maintainability. Both are straightforward fixes but necessary before merge.
  • packages/core/lib/v3/agent/utils/resolvePage.ts (ESM import extensions) and packages/core/lib/v3/handlers/v3AgentHandler.ts (code duplication)

Sequence Diagram

sequenceDiagram
    participant Caller as Caller (parallel tabs)
    participant V3 as v3.ts agent()
    participant Handler as V3AgentHandler / V3CuaAgentHandler
    participant Tools as Tool Factories (click, goto, screenshot…)
    participant ResolvePage as resolvePage()
    participant Context as V3Context

    Caller->>V3: agent({ page: explicitPage }).execute(instruction)
    V3->>V3: normalizeToV3Page(explicitPage) → resolvedPage
    V3->>Context: setActivePage(resolvedPage) [still mutates global state]
    V3->>Handler: new Handler(..., resolvedPage)
    Handler->>Handler: stores this.page = resolvedPage

    Handler->>Handler: resolvePage() → this.page ?? awaitActivePage()
    Handler->>Context: (URL for system prompt via resolvePage)
    Handler->>Tools: createAgentTools(v3, { page: resolvedPage, … })

    Note over Tools,ResolvePage: Each tool execute() callback:
    Tools->>ResolvePage: resolvePage(v3, options.page)
    ResolvePage-->>Tools: returns explicitPage (no awaitActivePage call)
    Tools->>explicitPage: click / goto / screenshot / scroll / …

    Note over V3,Context: Cache replay path also threads resolvedPage
    V3->>V3: agentCache.tryReplay(cacheContext, llmClient, resolvedPage)
    V3-->>Caller: AgentResult
Loading

Last reviewed commit: e7e47e9

Comment on lines +103 to 105
private async resolvePage(): Promise<Page> {
return this.page ?? (await this.v3.context.awaitActivePage());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v3AgentHandler defines its own private resolvePage() method, while v3CuaAgentHandler imports and uses the shared utility from ../agent/utils/resolvePage.js. This inconsistency means any future changes to the resolution logic must be made in two places.

Consider importing the shared utility for consistency:

private async resolvePage(): Promise<Page> {
  return resolvePage(this.v3, this.page);
}

This aligns with how v3CuaAgentHandler handles it and the broader refactoring in this PR toward standardized page handling.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@koennjb koennjb force-pushed the koennjb/thread-page-through-tools branch from d8ee045 to 2a02805 Compare March 5, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant