-
Notifications
You must be signed in to change notification settings - Fork 6.3k
core/tui: non-blocking MCP startup #6334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
codex/codex-rs/exec/src/lib.rs
Lines 332 to 336 in d54213e
| let initial_prompt_task_id = conversation | |
| .submit(Op::UserTurn { | |
| items, | |
| cwd: default_cwd, | |
| approval_policy: default_approval_policy, |
Session initialization now just spawns spawn_startup_job (see codex-rs/core/src/codex.rs lines 548‑617) so the McpConnectionManager remains empty until the background job finishes. The codex-exec CLI (and any other non‑TUI client that uses ConversationManager) still calls conversation.submit(Op::UserTurn { … }) immediately after SessionConfigured is received (lines 332‑336 below) and never waits for the new McpStartupUpdate/McpStartupComplete events. When that turn reaches run_turn (codex-rs/core/src/codex.rs line 1871) it reads mcp_connection_manager.list_all_tools().await, which is still empty, so the router exposes zero MCP tools. Because CLI invocations usually consist of a single turn, repo_map, file_context, etc. simply never run anymore. We now need to block or queue the first CLI submission until the MCP startup job finishes (or wait for McpStartupComplete) to avoid losing all MCP functionality outside the TUI.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Start MCP servers in the background and stream startup updates. | ||
| match spawn_startup_job( | ||
| config.mcp_servers.clone(), | ||
| config.mcp_oauth_credentials_store_mode, | ||
| sess.services.mcp_connection_manager.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent turns from starting before MCPs are ready
Session creation now instantiates an empty McpConnectionManager and immediately spawns spawn_startup_job (this hunk) instead of awaiting MCP clients. Nothing in core blocks submissions until that job finishes, yet other callers—e.g. the JSON‑RPC server’s SendUserTurn handler in codex-rs/app-server/src/codex_message_processor.rs and the RMCP integration tests—still call conversation.submit(Op::UserTurn { … }) as soon as the session is configured. run_turn snapshots the available tools at the top (codex-rs/core/src/codex.rs lines 1871‑1874), so if startup is still in flight those turns see an empty tool list and cannot invoke any MCP tools. This regresses every non‑TUI/CLI client: the first request after connecting silently runs without MCP integrations whenever startup takes a few seconds. We need to block/queue user turns until McpStartupComplete or gate run_turn so that MCP tools are always ready before executing the first task.
Useful? React with 👍 / 👎.
Co-authored-by: pakrym-oai <[email protected]>
codex-rs/exec/src/lib.rs
Outdated
| }); | ||
| let mut error_seen = false; | ||
| while let Some(event) = rx.recv().await { | ||
| if matches!(event.msg, EventMsg::McpStartupComplete(_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a different/more generic event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels almost like another SessionCOnfigured
| McpStartupUpdate(McpStartupUpdateEvent), | ||
|
|
||
| /// Aggregate MCP startup completion summary. | ||
| McpStartupComplete(McpStartupCompleteEvent), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty non-obvious that all consumers must wait for McpStartupComplete now before doing anything.
Can you add an integration test to ensure it always fires?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pakrym-oai instead, i added code so that consumers don't need to wait on this :)
ultimately i'd like for session initialization to be refactored so it can be broken down into its various parts (e.g. it should be possible to navigate history before MCPs are booted), but I think that change escapes the scope of this PR.
| if has_active_turn { | ||
| self.abort_all_tasks(TurnAbortReason::Interrupted).await; | ||
| } else { | ||
| self.cancel_mcp_startup().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to cancel mcp startup on control c? Do we ever cancel it today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if an mcp takes a long time to start, it should be possible to interrupt it to let messages send.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This muddles interrupt op semantics. Interrup today is specific to a running a running turn. Now it can interrupt other things that happen in the background.
| cancellation_token: CancellationToken, | ||
| ) -> CodexResult<TurnRunResult> { | ||
| let mcp_tools = sess.services.mcp_connection_manager.list_all_tools(); | ||
| sess.wait_for_mcp_startup(cancellation_token.child_token()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we encapsulate this wait in list_all_tools ?
| } | ||
| loop { | ||
| tokio::select! { | ||
| _ = cancellation_token.cancelled() => return Err(CodexErr::Interrupted), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider or_cancel()
| .parse_tool_name(tool_name) | ||
| } | ||
|
|
||
| async fn wait_for_mcp_startup(&self, cancellation_token: CancellationToken) -> CodexResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to push more of this into connection manager and have some simpler interface that codex can use.
| /// | ||
| /// The server name originates from the keys of the `mcp_servers` map in | ||
| /// the user configuration. | ||
| struct McpConnectionManagerInner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other services we store the Arc directly in SessionServices. Why wrap here?
| Some(res) = join_set.join_next() => { | ||
| match res { | ||
| Ok((server_name, StartupOutcome::Ready { managed })) => { | ||
| manager.add_client(server_name.clone(), managed).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the point of adding clients one by one? should we set all of them at once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't create the ManagedClient until tools are listed, because ManagedClient includes the tool list.
| manager: McpConnectionManager, | ||
| tx_event: Sender<Event>, | ||
| auth_entries: HashMap<String, McpAuthStatusEntry>, | ||
| ) -> Result<McpStartupJobHandle> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this return the entire McpConnectionManagerInner struct? So we don't have to go through a ton of locks while initializing.
| manager.add_client(server_name.clone(), managed).await; | ||
| summary.ready.push(server_name.clone()); | ||
| inflight.remove(&server_name); | ||
| let _ = emit_update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not ignore these errors?
| server_name: String, | ||
| transport: McpServerTransportConfig, | ||
| store_mode: OAuthCredentialsStoreMode, | ||
| startup_timeout: Duration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be handled via cancel_token ?
pakrym-oai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like cancellation and the startup in general can be greatly simplified in mcp_connection_manager.
Would be nice to push more logic around startup tracking from codex into mcp_connection_manager
This makes MCP startup not block TUI startup. Messages sent while MCPs are booting will be queued.
Screen.Recording.2025-11-06.at.11.53.24.AM.mov
Fixes #6317