Skip to content

Conversation

@nornagon-openai
Copy link
Collaborator

@nornagon-openai nornagon-openai commented Nov 6, 2025

This makes MCP startup not block TUI startup. Messages sent while MCPs are booting will be queued.

Screen.Recording.2025-11-06.at.11.53.24.AM.mov

Fixes #6317

@nornagon-openai nornagon-openai requested a review from gpeal November 6, 2025 19:54
Copy link
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

let initial_prompt_task_id = conversation
.submit(Op::UserTurn {
items,
cwd: default_cwd,
approval_policy: default_approval_policy,

P1 Badge Queue CLI prompt until MCP startup completes

Session initialization now just spawns spawn_startup_job (see codex-rs/core/src/codex.rs lines 548‑617) so the McpConnectionManager remains empty until the background job finishes. The codex-exec CLI (and any other non‑TUI client that uses ConversationManager) still calls conversation.submit(Op::UserTurn { … }) immediately after SessionConfigured is received (lines 332‑336 below) and never waits for the new McpStartupUpdate/McpStartupComplete events. When that turn reaches run_turn (codex-rs/core/src/codex.rs line 1871) it reads mcp_connection_manager.list_all_tools().await, which is still empty, so the router exposes zero MCP tools. Because CLI invocations usually consist of a single turn, repo_map, file_context, etc. simply never run anymore. We now need to block or queue the first CLI submission until the MCP startup job finishes (or wait for McpStartupComplete) to avoid losing all MCP functionality outside the TUI.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@nornagon-openai
Copy link
Collaborator Author

@codex review

Copy link
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +591 to +595
// Start MCP servers in the background and stream startup updates.
match spawn_startup_job(
config.mcp_servers.clone(),
config.mcp_oauth_credentials_store_mode,
sess.services.mcp_connection_manager.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent turns from starting before MCPs are ready

Session creation now instantiates an empty McpConnectionManager and immediately spawns spawn_startup_job (this hunk) instead of awaiting MCP clients. Nothing in core blocks submissions until that job finishes, yet other callers—e.g. the JSON‑RPC server’s SendUserTurn handler in codex-rs/app-server/src/codex_message_processor.rs and the RMCP integration tests—still call conversation.submit(Op::UserTurn { … }) as soon as the session is configured. run_turn snapshots the available tools at the top (codex-rs/core/src/codex.rs lines 1871‑1874), so if startup is still in flight those turns see an empty tool list and cannot invoke any MCP tools. This regresses every non‑TUI/CLI client: the first request after connecting silently runs without MCP integrations whenever startup takes a few seconds. We need to block/queue user turns until McpStartupComplete or gate run_turn so that MCP tools are always ready before executing the first task.

Useful? React with 👍 / 👎.

@etraut-openai etraut-openai added the oai-pr PRs posted by Codex team members label Nov 6, 2025
});
let mut error_seen = false;
while let Some(event) = rx.recv().await {
if matches!(event.msg, EventMsg::McpStartupComplete(_))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a different/more generic event?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels almost like another SessionCOnfigured

McpStartupUpdate(McpStartupUpdateEvent),

/// Aggregate MCP startup completion summary.
McpStartupComplete(McpStartupCompleteEvent),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty non-obvious that all consumers must wait for McpStartupComplete now before doing anything.

Can you add an integration test to ensure it always fires?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pakrym-oai instead, i added code so that consumers don't need to wait on this :)

ultimately i'd like for session initialization to be refactored so it can be broken down into its various parts (e.g. it should be possible to navigate history before MCPs are booted), but I think that change escapes the scope of this PR.

if has_active_turn {
self.abort_all_tasks(TurnAbortReason::Interrupted).await;
} else {
self.cancel_mcp_startup().await;
Copy link
Collaborator

@pakrym-oai pakrym-oai Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to cancel mcp startup on control c? Do we ever cancel it today?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if an mcp takes a long time to start, it should be possible to interrupt it to let messages send.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This muddles interrupt op semantics. Interrup today is specific to a running a running turn. Now it can interrupt other things that happen in the background.

cancellation_token: CancellationToken,
) -> CodexResult<TurnRunResult> {
let mcp_tools = sess.services.mcp_connection_manager.list_all_tools();
sess.wait_for_mcp_startup(cancellation_token.child_token())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we encapsulate this wait in list_all_tools ?

}
loop {
tokio::select! {
_ = cancellation_token.cancelled() => return Err(CodexErr::Interrupted),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider or_cancel()

.parse_tool_name(tool_name)
}

async fn wait_for_mcp_startup(&self, cancellation_token: CancellationToken) -> CodexResult<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to push more of this into connection manager and have some simpler interface that codex can use.

///
/// The server name originates from the keys of the `mcp_servers` map in
/// the user configuration.
struct McpConnectionManagerInner {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other services we store the Arc directly in SessionServices. Why wrap here?

Some(res) = join_set.join_next() => {
match res {
Ok((server_name, StartupOutcome::Ready { managed })) => {
manager.add_client(server_name.clone(), managed).await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point of adding clients one by one? should we set all of them at once?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't create the ManagedClient until tools are listed, because ManagedClient includes the tool list.

manager: McpConnectionManager,
tx_event: Sender<Event>,
auth_entries: HashMap<String, McpAuthStatusEntry>,
) -> Result<McpStartupJobHandle> {
Copy link
Collaborator

@pakrym-oai pakrym-oai Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this return the entire McpConnectionManagerInner struct? So we don't have to go through a ton of locks while initializing.

manager.add_client(server_name.clone(), managed).await;
summary.ready.push(server_name.clone());
inflight.remove(&server_name);
let _ = emit_update(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not ignore these errors?

server_name: String,
transport: McpServerTransportConfig,
store_mode: OAuthCredentialsStoreMode,
startup_timeout: Duration,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be handled via cancel_token ?

Copy link
Collaborator

@pakrym-oai pakrym-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like cancellation and the startup in general can be greatly simplified in mcp_connection_manager.

Would be nice to push more logic around startup tracking from codex into mcp_connection_manager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oai-pr PRs posted by Codex team members

Projects

None yet

Development

Successfully merging this pull request may close these issues.

codex-cli 0.55.0 opens in a zombie state

4 participants