fix: allow manual page control for agent() when parameter included#1785
fix: allow manual page control for agent() when parameter included#1785koennjb wants to merge 5 commits intobrowserbase:mainfrom
Conversation
|
There was a problem hiding this comment.
2 issues found across 23 files
Confidence score: 3/5
- There is some concrete regression risk here: in
packages/core/lib/v3/cache/AgentCache.ts,replayAgentActStepnot forwarding a manually suppliedpagein the no-actions fallback can route replay to the wrong page context. packages/core/lib/v3/agent/utils/resolvePage.tsintroduces relative ESM imports without.jsextensions, which conflicts with the repo convention and could cause module resolution/runtime issues in ESM environments.- Given two medium-severity, high-confidence findings with user/runtime impact potential, this is not a merge-blocker by default but does carry meaningful risk until addressed.
- Pay close attention to
packages/core/lib/v3/cache/AgentCache.tsandpackages/core/lib/v3/agent/utils/resolvePage.ts- page-target consistency during replay and ESM import resolution need verification.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/lib/v3/agent/utils/resolvePage.ts">
<violation number="1" location="packages/core/lib/v3/agent/utils/resolvePage.ts:1">
P2: New relative ESM imports omit required `.js` extensions, violating repository import convention and risking ESM resolution issues.</violation>
</file>
<file name="packages/core/lib/v3/cache/AgentCache.ts">
<violation number="1" location="packages/core/lib/v3/cache/AgentCache.ts:178">
P2: `replayAgentActStep` does not forward the manually supplied `page` in its no-actions fallback, causing inconsistent page targeting during cache replay.</violation>
</file>
Architecture diagram
sequenceDiagram
participant U as User Code
participant V3 as V3 Class
participant H as Agent Handler (V3/CUA)
participant C as Agent Cache
participant TF as Tool Factory / Tools
participant R as resolvePage (Util)
participant B as Browser Context / Page
Note over U,B: Manual Page Control Flow (Prevents Cross-talk)
U->>V3: agent({ instruction, page: customPage })
V3->>V3: CHANGED: normalizeToV3Page(customPage)
alt NEW: Cache Replay Path
V3->>C: tryReplay(context, llm, page)
C->>R: NEW: resolvePage(v3, page)
R-->>C: Target Page
C->>B: Perform deterministic actions on Target Page
C-->>V3: Cached Result
end
V3->>H: NEW: Instantiate Handler with explicit page
H->>TF: NEW: createAgentTools(v3, { ..., page })
Note over TF: Each tool factory (act, click, goto, etc.)<br/>now holds a reference to the target page.
loop Agent Reasoning Loop
H->>TF: execute(toolInput)
TF->>R: NEW: resolvePage(v3, page)
alt page is provided
R-->>TF: Return target page
else page is undefined
R->>B: awaitActivePage()
B-->>R: Current active tab
R-->>TF: Return active tab (Legacy behavior)
end
TF->>B: Perform Browser Action (click, goto, etc.)
Note right of B: Interaction is locked to the<br/>resolved page, avoiding cross-talk.
B-->>TF: Result / Screenshot
TF-->>H: Tool Output
H->>H: NEW: resolvePage() for URL/Screenshot updates
H->>B: Get current URL or take screenshot
end
H-->>V3: Final Result
V3-->>U: AgentResult
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Add one-off context when rerunning by tagging
@cubic-dev-aiwith guidance or docs links (includingllms.txt) - Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Greptile SummaryThis PR fixes cross-talk between parallel Key changes:
Two issues warrant attention:
Confidence Score: 4/5
Sequence DiagramsequenceDiagram
participant Caller as Caller (parallel tabs)
participant V3 as v3.ts agent()
participant Handler as V3AgentHandler / V3CuaAgentHandler
participant Tools as Tool Factories (click, goto, screenshot…)
participant ResolvePage as resolvePage()
participant Context as V3Context
Caller->>V3: agent({ page: explicitPage }).execute(instruction)
V3->>V3: normalizeToV3Page(explicitPage) → resolvedPage
V3->>Context: setActivePage(resolvedPage) [still mutates global state]
V3->>Handler: new Handler(..., resolvedPage)
Handler->>Handler: stores this.page = resolvedPage
Handler->>Handler: resolvePage() → this.page ?? awaitActivePage()
Handler->>Context: (URL for system prompt via resolvePage)
Handler->>Tools: createAgentTools(v3, { page: resolvedPage, … })
Note over Tools,ResolvePage: Each tool execute() callback:
Tools->>ResolvePage: resolvePage(v3, options.page)
ResolvePage-->>Tools: returns explicitPage (no awaitActivePage call)
Tools->>explicitPage: click / goto / screenshot / scroll / …
Note over V3,Context: Cache replay path also threads resolvedPage
V3->>V3: agentCache.tryReplay(cacheContext, llmClient, resolvedPage)
V3-->>Caller: AgentResult
Last reviewed commit: e7e47e9 |
| private async resolvePage(): Promise<Page> { | ||
| return this.page ?? (await this.v3.context.awaitActivePage()); | ||
| } |
There was a problem hiding this comment.
v3AgentHandler defines its own private resolvePage() method, while v3CuaAgentHandler imports and uses the shared utility from ../agent/utils/resolvePage.js. This inconsistency means any future changes to the resolution logic must be made in two places.
Consider importing the shared utility for consistency:
private async resolvePage(): Promise<Page> {
return resolvePage(this.v3, this.page);
}This aligns with how v3CuaAgentHandler handles it and the broader refactoring in this PR toward standardized page handling.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
d8ee045 to
2a02805
Compare
why
The
agent()method accepts apageparameter through options, which follows the pattern ofact,extract,observe- but regardless all agent tools useawaitActivePage().This causes issues for customers that manually control pages and run multiple agents in parallel in multiple tabs. As the active page changes during navigations, clicks etc - the other agents get very confused because the page is constantly changing state while it's trying to use tools itself. Documentation reads as it's intended for the agent to allow manual control the same way: https://docs.stagehand.dev/v3/best-practices/using-multiple-tabs#the-stagehand-page.
what changed
The
pageparameter from agent options is passed through to all tool factories so the page acted on can be resolved from that if provided, otherwiseawaitActivePageas before.This includes CUA agents and cached agent replays. One question I have is if it seems right to do this for cache replay.
test plan
I have a reproduction end to end test in our application that integrates with Stagehand and shows the "cross talk" issue when running parallel agents in one browser.
Tested via:
Summary by cubic
Allows agent({ page }) to explicitly control which tab a tool acts on across the whole agent flow, including CUA and cache replay. Fixes cross-talk when running multiple agents in parallel and makes replays deterministic on the chosen page.
Bug Fixes
Refactors
Written for commit 2a02805. Summary will update on new commits. Review in cubic