[Feature]: MCP Support for Non Harmony Models

### 🚀 The feature, motivation and pitch

In August after GPTOSS was released, a lot of work was put in to enable MCP tool calling with vLLM. GPTOSS is the first model that supported this. However it is a special case as GPTOSS comes with OpenAI harmony (https://github.com/openai/harmony/) as its own parser, as opposed to most models which come with a chat template (ie Minimax https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/chat_template.jinja) which we use via the [_preprocess_chat()](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_engine.py#L1072) function.

This feature request focuses on MCP support for non harmony models --basically any model that has a chat template.

On a high level, MCP works as follows in responsesAPI:
- a client provides an input
- the input is converted to tokens via [_preprocess_chat()](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_engine.py#L1072).
- the main [generate_with_builtin_tools](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_engine.py#L1227) loop handles the token generation / tool calling loop
  - tokens are generated by [engine](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_engine.py#L1260) and stored in our [context](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/context.py)
  - if we need a tool call, we run https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_engine.py#L1281
  - via render_for_completion(), we generate new tokens and the loop continues until no more tool calls are needed 

### Example
<details>
  <summary> Example MCP client / server</summary>
  
  See #29798 for more details


  Minimax M2
```
VLLM_GPT_OSS_SYSTEM_TOOL_MCP_LABELS=web_search_preview,container,code_interpreter VLLM_USE_EXPERIMENTAL_PARSER_CONTEXT=1 vllm serve MiniMaxAI/MiniMax-M2   --tensor-parallel-size 4   --tool-call-parser minimax_m2   --reasoning-parser minimax_m2    --enable-auto-tool-choice --trust-remote-code  --tool-server=localhost:8081/container,localhost:8081/browser,localhost:8081/python
```
client request
```
curl -X POST "http://localhost:8000/v1/responses"   -H "Content-Type: application/json"   -H "Authorization: Bearer dummy-api-key"   -d '{
        "model": "MiniMaxAI/MiniMax-M2",
        "input": "Multiply 64548*15151 using the python tool.",
        "tools": [
          {
            "type": "mcp",
            "server_label": "code_interpreter",
            "headers": {"test": "test"},
            "server_url": "IGNORED"
          }
        ]
      }'
```
response
```
{
    "id": "resp_a42bc867864795cd",
    "created_at": 1764137463,
    "incomplete_details": null,
    "instructions": null,
    "metadata": null,
    "model": "moonshotai/Kimi-K2-Thinking",
    "object": "response",
    "output": [
        {
            "id": "rs_a59c0ff3d139f3ad",
            "summary": [],
            "type": "reasoning",
            "content": [
                {
                    "text": " The user wants me to multiply two numbers: 64548 and 15151. I should use the Python tool to compute this accurately.\n\nLet me set up the calculation. I'll use the arithmetic multiplication operator (*) in Python. ",
                    "type": "reasoning_text"
                }
            ],
            "encrypted_content": null,
            "status": null
        },
        {
            "id": "lol",
            "arguments": "{\"code\": \"result = 64548 * 15151\\nresult\", \"restart\": false}",
            "name": "code_interpreter",
            "server_label": "code_interpreter",
            "type": "mcp_call",
            "approval_request_id": null,
            "error": null,
            "output": "977966748\n",
            "status": "completed"
        },
        {
            "id": "rs_818e3eeeb7e9efa7",
            "summary": [],
            "type": "reasoning",
            "content": [
                {
                    "text": " The result of multiplying 64548 by 15151 is **977,966,748**. ",
                    "type": "reasoning_text"
                }
            ],
            "encrypted_content": null,
            "status": null
        },
        {
            "id": "msg_bf62d1a50301381c",
            "content": [
                {
                    "annotations": [],
                    "text": " The result of multiplying 64548 by 15151 is **977,966,748**.",
                    "type": "output_text",
                    "logprobs": null
                }
            ],
            "role": "assistant",
            "status": "completed",
            "type": "message"
        }
    ],
    "parallel_tool_calls": true,
    "temperature": 1.0,
    "tool_choice": "auto",
    "tools": [
        {
            "server_label": "code_interpreter",
            "type": "mcp",
            "allowed_tools": null,
            "authorization": null,
            "connector_id": null,
            "headers": {
                "test": "test"
            },
            "require_approval": null,
            "server_description": null,
            "server_url": "IGNORED"
        }
    ],
    "top_p": 1.0,
    "background": false,
    "max_output_tokens": 261990,
    "max_tool_calls": null,
    "previous_response_id": null,
    "prompt": null,
    "reasoning": null,
    "service_tier": "auto",
    "status": "completed",
    "text": null,
    "top_logprobs": null,
    "truncation": "disabled",
    "usage": {
        "input_tokens": 154,
        "input_tokens_details": {
            "cached_tokens": 64,
            "input_tokens_per_turn": [],
            "cached_tokens_per_turn": []
        },
        "output_tokens": 121,
        "output_tokens_details": {
            "reasoning_tokens": 0,
            "tool_output_tokens": 0,
            "output_tokens_per_turn": [],
            "tool_output_tokens_per_turn": []
        },
        "total_tokens": 275
    },
    "user": null,
    "input_messages": null,
    "output_messages": null
}
```
</details>

## What we've done so far
- Minor fixes / feature adds such as #29555, #29383, #29359, #28333 
- Set up the ResponsesParser class with a ParsableContext. These two PRs allow models that use the chat template (minimax M2, kimi k2 thinking, qwen3) to call the python MCP tool #29413, #29798 
- Build support for browser & container tool #29989

## What's next
- Build proper logging in the ResponsesParser
- Add support for passing input_messages / output_messages in the response (similar to #29549) so to help debug the ParsableContext. I have a WIP PR in https://github.com/qandrew/vllm/pull/12
- Right now we store internal state in ResponsesAPI as it allows for easy conversion to the chat template and is more expressive than ChatCompletions (due to having reasoning, etc). However responsesAPI assumes one sentence only has one of reasoning / tool call / output, while in other models a sentence could have both reasoning & tool call. We need to fix the mapping here properly.
- Build support for generic MCP tools, for both harmony and non harmony models
- support streaming for non harmony MCP calls
- support other entrypoints such as Anthropic MessagesAPI

If anyone is interested in helping here please let us know! 

cc @chaunceyjiang  @yeqcharlotte @daniel-salib @alecsolder @heheda12345  @mgoin @njhill 

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: MCP Support for Non Harmony Models #30115

🚀 The feature, motivation and pitch

Example

What we've done so far

What's next

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: MCP Support for Non Harmony Models #30115

Description

🚀 The feature, motivation and pitch

Example

What we've done so far

What's next

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions