[Frontend] Add MCP tool streaming support to Responses API #30192

daniel-salib · 2025-12-07T04:46:48Z

Purpose
This change enables streaming support for MCP tools when using GPT OSS. It extends the harmony utilities and response serving infrastructure to handle tool streaming, allowing tool calls and their results to be incrementally streamed back to clients rather than returned as a single batch.

Test Plan
curl -X POST "http://localhost:8000/v1/responses" -H "Content-Type: application/json" -H "Authorization: Bearer dummy-api-key" -d '{
"model": "default",
"input": "Multiply 123*456 using the mcp.code_interpreter tool.",
"tools": [{
"type": "mcp",
"server_label": "code_interpreter",
"headers": {"test": "test"},
"server_url": "IGNORED"
}],
"stream": true,
"enable_response_messages": true
}'
Test Result
event: response.created
data: {"response":{"id":"resp_634aa3735d374e609c59128a4ca4c9ff","created_at":1762333234,"incomplete_details":null,"instructions":null,"metadata":null,"model":"default","object":"response","output":[],"parallel_tool_calls":true,"temperature":1.0,"tool_choice":"auto","tools":[{"server_label":"code_interpreter","type":"mcp","allowed_tools":null,"authorization":null,"connector_id":null,"headers":{"test":"test"},"require_approval":null,"server_description":null,"server_url":"IGNORED"}],"top_p":1.0,"background":false,"max_output_tokens":130895,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"reasoning":null,"service_tier":"auto","status":"in_progress","text":null,"top_logprobs":null,"truncation":"disabled","usage":null,"user":null,"input_messages":null,"output_messages":null},"sequence_number":0,"type":"response.created"}

event: response.in_progress
data: {"response":{"id":"resp_634aa3735d374e609c59128a4ca4c9ff","created_at":1762333234,"incomplete_details":null,"instructions":null,"metadata":null,"model":"default","object":"response","output":[],"parallel_tool_calls":true,"temperature":1.0,"tool_choice":"auto","tools":[{"server_label":"code_interpreter","type":"mcp","allowed_tools":null,"authorization":null,"connector_id":null,"headers":{"test":"test"},"require_approval":null,"server_description":null,"server_url":"IGNORED"}],"top_p":1.0,"background":false,"max_output_tokens":130895,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"reasoning":null,"service_tier":"auto","status":"in_progress","text":null,"top_logprobs":null,"truncation":"disabled","usage":null,"user":null,"input_messages":null,"output_messages":null},"sequence_number":1,"type":"response.in_progress"}

event: response.output_item.added
data: {"item":{"id":"msg_91e0b5be583e4ac38cfe7d55f025def7","summary":[],"type":"reasoning","content":null,"encrypted_content":null,"status":"in_progress"},"output_index":0,"sequence_number":2,"type":"response.output_item.added"}

event: response.reasoning_part.added
data: {"content_index":0,"item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"part":{"text":"","type":"reasoning_text"},"sequence_number":3,"type":"response.reasoning_part.added"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"We","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":4,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" need","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":5,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" to","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":6,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" compute","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":7,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" ","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":8,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"123","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":9,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"*","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":10,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"456","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":11,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":".","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":12,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" Use","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":13,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" python","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":14,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":".","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":15,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.done
data: {"content_index":-1,"item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":16,"text":"We need to compute 123*456. Use python.","type":"response.reasoning_text.done"}

event: response.reasoning_part.done
data: {"content_index":-1,"item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"part":{"text":"We need to compute 123*456. Use python.","type":"reasoning_text"},"sequence_number":17,"type":"response.reasoning_part.done"}

event: response.output_item.done
data: {"item":{"id":"msg_91e0b5be583e4ac38cfe7d55f025def7","summary":[],"type":"reasoning","content":[{"text":"We need to compute 123*456. Use python.","type":"reasoning_text"}],"encrypted_content":null,"status":"completed"},"output_index":0,"sequence_number":18,"type":"response.output_item.done"}

event: response.output_item.added
data: {"item":{"id":"mcp_4e15766739ed49a1860e8d7b377348d7","arguments":"","name":"python","server_label":"code_interpreter","type":"mcp_call","approval_request_id":null,"error":null,"output":null,"status":"in_progress","call_id":"mcp_f38d222820be4db7ba44e4b7e63b0c0f"},"output_index":1,"sequence_number":19,"type":"response.output_item.added"}

event: response.mcp_call.in_progress
data: {"item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":20,"type":"response.mcp_call.in_progress"}

event: response.mcp_call_arguments.delta
data: {"delta":"123","item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":21,"type":"response.mcp_call_arguments.delta"}

event: response.mcp_call_arguments.delta
data: {"delta":"*","item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":22,"type":"response.mcp_call_arguments.delta"}

event: response.mcp_call_arguments.delta
data: {"delta":"456","item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":23,"type":"response.mcp_call_arguments.delta"}

event: response.mcp_call_arguments.delta
data: {"delta":"\n","item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":24,"type":"response.mcp_call_arguments.delta"}

event: response.mcp_call_arguments.done
data: {"arguments":"123*456\n","item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":25,"type":"response.mcp_call_arguments.done","name":"python"}

event: response.mcp_call.completed
data: {"item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":26,"type":"response.mcp_call.completed"}

event: response.output_item.done
data: {"item":{"id":"mcp_4e15766739ed49a1860e8d7b377348d7","arguments":"123*456\n","name":"python","server_label":"code_interpreter","type":"mcp_call","approval_request_id":null,"error":null,"output":null,"status":"completed","call_id":"mcp_13e054b550474cd5aa66c71aefcebf00"},"output_index":1,"sequence_number":27,"type":"response.output_item.done"}

event: response.output_item.added
data: {"item":{"id":"msg_5e2d50b2c1704e9eb848d78929716445","content":[],"role":"assistant","status":"in_progress","type":"message"},"output_index":2,"sequence_number":28,"type":"response.output_item.added"}

event: response.content_part.added
data: {"content_index":0,"item_id":"msg_5e2d50b2c1704e9eb848d78929716445","output_index":2,"part":{"annotations":[],"text":"","type":"output_text","logprobs":[]},"sequence_number":29,"type":"response.content_part.added"}

event: response.output_text.delta
data: {"content_index":0,"delta":"The","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":30,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" product","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":31,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" of","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":32,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" \(","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":33,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"123","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":34,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" \","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":35,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"times","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":36,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" ","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":37,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"456","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":38,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"\","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":39,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":")","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":40,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" is","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":41,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" **","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":42,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"56","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":43,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":",","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":44,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"088","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":45,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"**","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":46,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":".","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":47,"type":"response.output_text.delta"}

event: response.output_text.done
data: {"content_index":-1,"item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":48,"text":"The product of \(123 \times 456\) is 56,088.","type":"response.output_text.done"}

event: response.content_part.done
data: {"content_index":-1,"item_id":"msg_5e2d50b2c1704e9eb848d78929716445","output_index":2,"part":{"annotations":[],"text":"The product of \(123 \times 456\) is 56,088.","type":"output_text","logprobs":null},"sequence_number":49,"type":"response.content_part.done"}

event: response.output_item.done
data: {"item":{"id":"msg_5e2d50b2c1704e9eb848d78929716445","content":[{"annotations":[],"text":"The product of \(123 \times 456\) is 56,088.","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"},"output_index":2,"sequence_number":50,"type":"response.output_item.done"}

event: response.completed
data: {"response":{"id":"resp_634aa3735d374e609c59128a4ca4c9ff","created_at":1762333234,"incomplete_details":null,

Signed-off-by: Daniel Salib <[email protected]>

gemini-code-assist

Code Review

This pull request introduces comprehensive support for Multi-Channel Protocol (MCP) tools within the OpenAI API response framework. Key changes include adding McpCall as a new response output item type and corresponding streaming events (response.mcp_call_arguments.delta, response.mcp_call_arguments.done, response.mcp_call.in_progress, response.mcp_call.completed). The harmony_utils.py parser has been refactored to treat any non-built-in, non-function recipient in Harmony messages as an MCP call, replacing previous ValueError exceptions for unknown recipients. New utility functions _parse_mcp_recipient and _parse_mcp_call were added to handle the parsing of MCP recipients and creation of McpCall objects, including support for dotted recipients (e.g., repo_browser.list). Built-in tools like 'python', 'browser', and 'container' are now explicitly handled as reasoning output rather than generic MCP calls. The serving_responses.py module was updated to emit the new MCP streaming events and correctly distinguish between function calls, built-in tools, and generic MCP tools during streaming. Test cases were significantly expanded to cover basic MCP call parsing, dotted recipients, differentiation between MCP and function/built-in calls, and multi-turn streaming interactions with MCP tools, including code_interpreter via MCP. A review comment highlighted and provided a fix for issues in a new test case, test_mcp_tool_calling_streaming_types, specifically addressing an overly strict assertion and incorrect conditional logic in event handling, ensuring proper validation of the MCP streaming event sequence.

gemini-code-assist · 2025-12-07T04:49:04Z

tests/entrypoints/openai/test_response_api_mcp_tools.py

+    async for event in stream_response:
+        assert "mcp_call" in event.type
+
+        if event.type == "response.created":
+            stack_of_event_types.append(event.type)
+        elif event.type == "response.completed":
+            assert stack_of_event_types[-1] == pairs_of_event_types[event.type]
+            stack_of_event_types.pop()
+        if (
+            event.type.endswith("added")
+            or event.type == "response.mcp_call.in_progress"
+        ):
+            stack_of_event_types.append(event.type)
+        elif event.type.endswith("delta"):
+            if stack_of_event_types[-1] == event.type:
+                continue
+            stack_of_event_types.append(event.type)
+        elif event.type.endswith("done") or event.type == "response.mcp_call.completed":
+            assert stack_of_event_types[-1] == pairs_of_event_types[event.type]
+            stack_of_event_types.pop()
+
+    assert len(stack_of_event_types) == 0


The logic in this test has a couple of issues:

The assertion assert "mcp_call" in event.type on line 245 is too strict. The event stream includes many events unrelated to MCP calls (e.g., response.created, response.reasoning_text.delta), which will cause this assertion to fail.

The event handling logic uses an if followed by an elif chain, and then another if (line 252). This second if should be an elif to form a single conditional block. Otherwise, an event matching endswith("added") will be processed by the second if block, and then the subsequent elif blocks for delta and done will be skipped for that event, which is not the intended logic for pairing events.

I've suggested a fix that addresses both points by introducing a flag to check for MCP events and correcting the conditional logic.

mcp_event_seen = False stack_of_event_types = [] async for event in stream_response: if "mcp_call" in event.type: mcp_event_seen = True if event.type == "response.created": stack_of_event_types.append(event.type) elif event.type == "response.completed": assert stack_of_event_types.pop() == pairs_of_event_types[event.type] elif ( event.type.endswith("added") or event.type == "response.mcp_call.in_progress" ): stack_of_event_types.append(event.type) elif event.type.endswith("delta"): if not stack_of_event_types or stack_of_event_types[-1] != event.type: stack_of_event_types.append(event.type) elif event.type.endswith("done") or event.type == "response.mcp_call.completed": assert stack_of_event_types.pop() == pairs_of_event_types[event.type] assert mcp_event_seen, "No MCP call events were observed in the stream." assert len(stack_of_event_types) == 0

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-07T04:50:27Z

tests/entrypoints/openai/test_response_api_mcp_tools.py

+    stack_of_event_types = []
+    async for event in stream_response:
+        assert "mcp_call" in event.type


Fix blanket mcp_call assertion in streaming test

In test_mcp_tool_calling_streaming_types every streamed event is immediately asserted to contain the substring "mcp_call" before any dispatch logic runs. The Responses API stream always begins with response.created (and response.in_progress) which do not include that substring, so this assertion fails on the first event and the rest of the test logic never executes. As written the test cannot pass even when the streaming implementation is correct; the assertion needs to be limited to the MCP events it is intended to check.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-12-07T04:50:27Z

vllm/entrypoints/openai/serving_responses.py

+            return False
+
+        # Function calls have "functions." prefix
+        # Everything else is an MCP tool
+        return not recipient.startswith("functions.")


Don’t classify built‑in tools as MCP during streaming

The helper _is_mcp_tool_by_namespace now treats any recipient that is not functions.* as an MCP tool. In the Harmony streaming path this flag is used to route tool calls into the MCP branch, so built‑in recipients such as "python", "browser", or "container" now emit response.mcp_call* events instead of the expected response.code_interpreter_call*/built‑in events. A request streaming with tools=[{"type": "code_interpreter"}] will therefore deliver only MCP events, breaking API semantics and diverging from the non‑streaming parsing logic that keeps built‑ins separate from MCP connectors.

Useful? React with 👍 / 👎.

mergify · 2025-12-07T04:50:49Z

Hi @daniel-salib, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

ApostaC · 2025-12-08T21:49:21Z

cc @robertgshaw2-redhat @chaunceyjiang Can you help review this PR? Thanks!

daniel-salib added 2 commits December 6, 2025 17:25

[Frontend] Add MCP type support infrastructure to Responses API

8f24ea8

Signed-off-by: Daniel Salib <[email protected]>

[Frontend] Add MCP tool streaming support to Responses API

6907c90

Signed-off-by: Daniel Salib <[email protected]>

daniel-salib requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang and robertgshaw2-redhat as code owners December 7, 2025 04:46

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Dec 7, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Dec 7, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Dec 7, 2025

gemini-code-assist bot reviewed Dec 7, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Dec 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Frontend] Add MCP tool streaming support to Responses API #30192

[Frontend] Add MCP tool streaming support to Responses API #30192

daniel-salib commented Dec 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 7, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 7, 2025

Uh oh!

chatgpt-codex-connector bot Dec 7, 2025

Uh oh!

mergify bot commented Dec 7, 2025

Uh oh!

ApostaC commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Frontend] Add MCP tool streaming support to Responses API #30192

Are you sure you want to change the base?

[Frontend] Add MCP tool streaming support to Responses API #30192

Conversation

daniel-salib commented Dec 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Dec 7, 2025

Uh oh!

ApostaC commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daniel-salib commented Dec 7, 2025 •

edited by github-actions bot

Loading