Skip to content

Conversation

@daniel-salib
Copy link
Contributor

@daniel-salib daniel-salib commented Dec 7, 2025

Purpose
This change enables streaming support for MCP tools when using GPT OSS. It extends the harmony utilities and response serving infrastructure to handle tool streaming, allowing tool calls and their results to be incrementally streamed back to clients rather than returned as a single batch.

Test Plan
curl -X POST "http://localhost:8000/v1/responses" -H "Content-Type: application/json" -H "Authorization: Bearer dummy-api-key" -d '{
"model": "default",
"input": "Multiply 123*456 using the mcp.code_interpreter tool.",
"tools": [{
"type": "mcp",
"server_label": "code_interpreter",
"headers": {"test": "test"},
"server_url": "IGNORED"
}],
"stream": true,
"enable_response_messages": true
}'
Test Result
event: response.created
data: {"response":{"id":"resp_634aa3735d374e609c59128a4ca4c9ff","created_at":1762333234,"incomplete_details":null,"instructions":null,"metadata":null,"model":"default","object":"response","output":[],"parallel_tool_calls":true,"temperature":1.0,"tool_choice":"auto","tools":[{"server_label":"code_interpreter","type":"mcp","allowed_tools":null,"authorization":null,"connector_id":null,"headers":{"test":"test"},"require_approval":null,"server_description":null,"server_url":"IGNORED"}],"top_p":1.0,"background":false,"max_output_tokens":130895,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"reasoning":null,"service_tier":"auto","status":"in_progress","text":null,"top_logprobs":null,"truncation":"disabled","usage":null,"user":null,"input_messages":null,"output_messages":null},"sequence_number":0,"type":"response.created"}

event: response.in_progress
data: {"response":{"id":"resp_634aa3735d374e609c59128a4ca4c9ff","created_at":1762333234,"incomplete_details":null,"instructions":null,"metadata":null,"model":"default","object":"response","output":[],"parallel_tool_calls":true,"temperature":1.0,"tool_choice":"auto","tools":[{"server_label":"code_interpreter","type":"mcp","allowed_tools":null,"authorization":null,"connector_id":null,"headers":{"test":"test"},"require_approval":null,"server_description":null,"server_url":"IGNORED"}],"top_p":1.0,"background":false,"max_output_tokens":130895,"max_tool_calls":null,"previous_response_id":null,"prompt":null,"reasoning":null,"service_tier":"auto","status":"in_progress","text":null,"top_logprobs":null,"truncation":"disabled","usage":null,"user":null,"input_messages":null,"output_messages":null},"sequence_number":1,"type":"response.in_progress"}

event: response.output_item.added
data: {"item":{"id":"msg_91e0b5be583e4ac38cfe7d55f025def7","summary":[],"type":"reasoning","content":null,"encrypted_content":null,"status":"in_progress"},"output_index":0,"sequence_number":2,"type":"response.output_item.added"}

event: response.reasoning_part.added
data: {"content_index":0,"item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"part":{"text":"","type":"reasoning_text"},"sequence_number":3,"type":"response.reasoning_part.added"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"We","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":4,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" need","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":5,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" to","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":6,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" compute","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":7,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" ","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":8,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"123","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":9,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"*","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":10,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":"456","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":11,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":".","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":12,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" Use","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":13,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":" python","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":14,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.delta
data: {"content_index":0,"delta":".","item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":15,"type":"response.reasoning_text.delta"}

event: response.reasoning_text.done
data: {"content_index":-1,"item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"sequence_number":16,"text":"We need to compute 123*456. Use python.","type":"response.reasoning_text.done"}

event: response.reasoning_part.done
data: {"content_index":-1,"item_id":"msg_91e0b5be583e4ac38cfe7d55f025def7","output_index":0,"part":{"text":"We need to compute 123*456. Use python.","type":"reasoning_text"},"sequence_number":17,"type":"response.reasoning_part.done"}

event: response.output_item.done
data: {"item":{"id":"msg_91e0b5be583e4ac38cfe7d55f025def7","summary":[],"type":"reasoning","content":[{"text":"We need to compute 123*456. Use python.","type":"reasoning_text"}],"encrypted_content":null,"status":"completed"},"output_index":0,"sequence_number":18,"type":"response.output_item.done"}

event: response.output_item.added
data: {"item":{"id":"mcp_4e15766739ed49a1860e8d7b377348d7","arguments":"","name":"python","server_label":"code_interpreter","type":"mcp_call","approval_request_id":null,"error":null,"output":null,"status":"in_progress","call_id":"mcp_f38d222820be4db7ba44e4b7e63b0c0f"},"output_index":1,"sequence_number":19,"type":"response.output_item.added"}

event: response.mcp_call.in_progress
data: {"item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":20,"type":"response.mcp_call.in_progress"}

event: response.mcp_call_arguments.delta
data: {"delta":"123","item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":21,"type":"response.mcp_call_arguments.delta"}

event: response.mcp_call_arguments.delta
data: {"delta":"*","item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":22,"type":"response.mcp_call_arguments.delta"}

event: response.mcp_call_arguments.delta
data: {"delta":"456","item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":23,"type":"response.mcp_call_arguments.delta"}

event: response.mcp_call_arguments.delta
data: {"delta":"\n","item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":24,"type":"response.mcp_call_arguments.delta"}

event: response.mcp_call_arguments.done
data: {"arguments":"123*456\n","item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":25,"type":"response.mcp_call_arguments.done","name":"python"}

event: response.mcp_call.completed
data: {"item_id":"mcp_4e15766739ed49a1860e8d7b377348d7","output_index":1,"sequence_number":26,"type":"response.mcp_call.completed"}

event: response.output_item.done
data: {"item":{"id":"mcp_4e15766739ed49a1860e8d7b377348d7","arguments":"123*456\n","name":"python","server_label":"code_interpreter","type":"mcp_call","approval_request_id":null,"error":null,"output":null,"status":"completed","call_id":"mcp_13e054b550474cd5aa66c71aefcebf00"},"output_index":1,"sequence_number":27,"type":"response.output_item.done"}

event: response.output_item.added
data: {"item":{"id":"msg_5e2d50b2c1704e9eb848d78929716445","content":[],"role":"assistant","status":"in_progress","type":"message"},"output_index":2,"sequence_number":28,"type":"response.output_item.added"}

event: response.content_part.added
data: {"content_index":0,"item_id":"msg_5e2d50b2c1704e9eb848d78929716445","output_index":2,"part":{"annotations":[],"text":"","type":"output_text","logprobs":[]},"sequence_number":29,"type":"response.content_part.added"}

event: response.output_text.delta
data: {"content_index":0,"delta":"The","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":30,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" product","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":31,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" of","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":32,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" \(","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":33,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"123","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":34,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" \","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":35,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"times","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":36,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" ","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":37,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"456","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":38,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"\","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":39,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":")","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":40,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" is","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":41,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":" **","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":42,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"56","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":43,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":",","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":44,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"088","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":45,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":"**","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":46,"type":"response.output_text.delta"}

event: response.output_text.delta
data: {"content_index":0,"delta":".","item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":47,"type":"response.output_text.delta"}

event: response.output_text.done
data: {"content_index":-1,"item_id":"msg_5e2d50b2c1704e9eb848d78929716445","logprobs":[],"output_index":2,"sequence_number":48,"text":"The product of \(123 \times 456\) is 56,088.","type":"response.output_text.done"}

event: response.content_part.done
data: {"content_index":-1,"item_id":"msg_5e2d50b2c1704e9eb848d78929716445","output_index":2,"part":{"annotations":[],"text":"The product of \(123 \times 456\) is 56,088.","type":"output_text","logprobs":null},"sequence_number":49,"type":"response.content_part.done"}

event: response.output_item.done
data: {"item":{"id":"msg_5e2d50b2c1704e9eb848d78929716445","content":[{"annotations":[],"text":"The product of \(123 \times 456\) is 56,088.","type":"output_text","logprobs":null}],"role":"assistant","status":"completed","type":"message"},"output_index":2,"sequence_number":50,"type":"response.output_item.done"}

event: response.completed
data: {"response":{"id":"resp_634aa3735d374e609c59128a4ca4c9ff","created_at":1762333234,"incomplete_details":null,

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive support for Multi-Channel Protocol (MCP) tools within the OpenAI API response framework. Key changes include adding McpCall as a new response output item type and corresponding streaming events (response.mcp_call_arguments.delta, response.mcp_call_arguments.done, response.mcp_call.in_progress, response.mcp_call.completed). The harmony_utils.py parser has been refactored to treat any non-built-in, non-function recipient in Harmony messages as an MCP call, replacing previous ValueError exceptions for unknown recipients. New utility functions _parse_mcp_recipient and _parse_mcp_call were added to handle the parsing of MCP recipients and creation of McpCall objects, including support for dotted recipients (e.g., repo_browser.list). Built-in tools like 'python', 'browser', and 'container' are now explicitly handled as reasoning output rather than generic MCP calls. The serving_responses.py module was updated to emit the new MCP streaming events and correctly distinguish between function calls, built-in tools, and generic MCP tools during streaming. Test cases were significantly expanded to cover basic MCP call parsing, dotted recipients, differentiation between MCP and function/built-in calls, and multi-turn streaming interactions with MCP tools, including code_interpreter via MCP. A review comment highlighted and provided a fix for issues in a new test case, test_mcp_tool_calling_streaming_types, specifically addressing an overly strict assertion and incorrect conditional logic in event handling, ensuring proper validation of the MCP streaming event sequence.

Comment on lines +244 to +265
async for event in stream_response:
assert "mcp_call" in event.type

if event.type == "response.created":
stack_of_event_types.append(event.type)
elif event.type == "response.completed":
assert stack_of_event_types[-1] == pairs_of_event_types[event.type]
stack_of_event_types.pop()
if (
event.type.endswith("added")
or event.type == "response.mcp_call.in_progress"
):
stack_of_event_types.append(event.type)
elif event.type.endswith("delta"):
if stack_of_event_types[-1] == event.type:
continue
stack_of_event_types.append(event.type)
elif event.type.endswith("done") or event.type == "response.mcp_call.completed":
assert stack_of_event_types[-1] == pairs_of_event_types[event.type]
stack_of_event_types.pop()

assert len(stack_of_event_types) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic in this test has a couple of issues:

  1. The assertion assert "mcp_call" in event.type on line 245 is too strict. The event stream includes many events unrelated to MCP calls (e.g., response.created, response.reasoning_text.delta), which will cause this assertion to fail.
  2. The event handling logic uses an if followed by an elif chain, and then another if (line 252). This second if should be an elif to form a single conditional block. Otherwise, an event matching endswith("added") will be processed by the second if block, and then the subsequent elif blocks for delta and done will be skipped for that event, which is not the intended logic for pairing events.

I've suggested a fix that addresses both points by introducing a flag to check for MCP events and correcting the conditional logic.

    mcp_event_seen = False
    stack_of_event_types = []
    async for event in stream_response:
        if "mcp_call" in event.type:
            mcp_event_seen = True

        if event.type == "response.created":
            stack_of_event_types.append(event.type)
        elif event.type == "response.completed":
            assert stack_of_event_types.pop() == pairs_of_event_types[event.type]
        elif (
            event.type.endswith("added")
            or event.type == "response.mcp_call.in_progress"
        ):
            stack_of_event_types.append(event.type)
        elif event.type.endswith("delta"):
            if not stack_of_event_types or stack_of_event_types[-1] != event.type:
                stack_of_event_types.append(event.type)
        elif event.type.endswith("done") or event.type == "response.mcp_call.completed":
            assert stack_of_event_types.pop() == pairs_of_event_types[event.type]

    assert mcp_event_seen, "No MCP call events were observed in the stream."
    assert len(stack_of_event_types) == 0

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +243 to +245
stack_of_event_types = []
async for event in stream_response:
assert "mcp_call" in event.type

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fix blanket mcp_call assertion in streaming test

In test_mcp_tool_calling_streaming_types every streamed event is immediately asserted to contain the substring "mcp_call" before any dispatch logic runs. The Responses API stream always begins with response.created (and response.in_progress) which do not include that substring, so this assertion fails on the first event and the rest of the test logic never executes. As written the test cannot pass even when the streaming implementation is correct; the assertion needs to be limited to the MCP events it is intended to check.

Useful? React with 👍 / 👎.

Comment on lines +749 to +753
return False

# Function calls have "functions." prefix
# Everything else is an MCP tool
return not recipient.startswith("functions.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Don’t classify built‑in tools as MCP during streaming

The helper _is_mcp_tool_by_namespace now treats any recipient that is not functions.* as an MCP tool. In the Harmony streaming path this flag is used to route tool calls into the MCP branch, so built‑in recipients such as "python", "browser", or "container" now emit response.mcp_call* events instead of the expected response.code_interpreter_call*/built‑in events. A request streaming with tools=[{"type": "code_interpreter"}] will therefore deliver only MCP events, breaking API semantics and diverging from the non‑streaming parsing logic that keeps built‑ins separate from MCP connectors.

Useful? React with 👍 / 👎.

@mergify
Copy link

mergify bot commented Dec 7, 2025

Hi @daniel-salib, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

@ApostaC
Copy link
Collaborator

ApostaC commented Dec 8, 2025

cc @robertgshaw2-redhat @chaunceyjiang Can you help review this PR? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

2 participants