Skip to content
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# PRD: BUG-T9 — Orphaned Web UI server process blocks port after MCP client disconnect or config change

**Task ID:** BUG-T9
**Priority:** P1
**Phase:** Known Issues / Bug Tracker
**Status:** Planned
**Dependencies:** None

## Objective Summary

When an MCP client disconnects, the wrapper can remain alive because the main loop blocks on `output_queue.get()` while `xcrun mcpbridge` still owns stdout. This leaves the Web UI thread running and the configured port (for example `8080`) occupied by an orphaned process. The objective is to make wrapper shutdown deterministic when stdin reaches EOF by actively terminating the upstream bridge with a bounded timeout, so the stdout reader emits EOF and the main loop exits. The change must preserve normal request handling, avoid regressing existing broker/web-ui flows, and keep behavior cross-platform for supported Python environments.

## Deliverables

- Lifecycle update in `src/mcpbridge_wrapper/bridge.py` and `src/mcpbridge_wrapper/__main__.py` to react to stdin closure.
- New unit tests covering stdin-EOF shutdown behavior and forced-kill fallback behavior.
- Validation report documenting quality gates and bug-specific verification.

## Success Criteria and Acceptance Tests

1. Wrapper requests graceful upstream shutdown immediately after stdin reaches EOF.
2. If upstream does not exit within a configured grace window, wrapper escalates to force kill.
3. Main loop exits without hanging once EOF-driven shutdown is triggered.
4. Existing behavior for normal request/response handling is unchanged.
5. Unit tests prove:
- EOF callback path is executed.
- Graceful-then-force termination logic is invoked as expected.
- Existing bridge forwarding tests still pass.

## Test-First Plan

1. Add failing unit test in `tests/unit/test_bridge.py` proving stdin forwarder emits a closure callback when stdin iteration ends.
2. Add failing unit test in `tests/unit/test_main.py` proving main wires an EOF callback that triggers bridge termination logic.
3. Add failing unit test for timeout escalation path (terminate then kill) via mocked bridge process.
4. Implement lifecycle changes in bridge/main modules.
5. Re-run targeted tests, then full required quality gates.

## TODO Plan

### Phase 1: EOF Signaling Hook

- **Inputs:** current `run_stdin_forwarder()` behavior, bug report root cause.
- **Outputs:** optional `on_stdin_closed` callback support and tests.
- **Verification:** callback invoked once on EOF and on write-error exit paths.

### Phase 2: Deterministic Upstream Termination

- **Inputs:** bridge subprocess handle, EOF signal from Phase 1.
- **Outputs:** helper that performs terminate -> wait(timeout) -> kill fallback.
- **Verification:** unit tests assert `terminate()` is called first and `kill()` only after timeout.

### Phase 3: Main-Loop Integration and Regression Safety

- **Inputs:** `main()` threading and cleanup flow.
- **Outputs:** EOF callback wiring in `main()` and regression tests.
- **Verification:** tests confirm no hang path and existing metrics/audit flow remains intact.

## Decision Points and Constraints

- Prefer a thread-safe callback approach over polling (`os.getppid`) to minimize runtime overhead and platform complexity.
- Keep shutdown idempotent: multiple EOF/error signals must not spam termination calls.
- Preserve current `cleanup_bridge()` finalizer behavior; new logic should unblock normal cleanup rather than replacing it.
- Avoid changing user-facing CLI flags in this task.

## Notes

- Update BUG-T9 status and resolution checklist in `SPECS/Workplan.md` after implementation.
- If behavior changes meaningfully for operators, add a short troubleshooting note about automatic stale-process cleanup.

---
**Archived:** 2026-02-25
**Verdict:** PASS
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Validation Report — BUG-T9

**Task:** BUG-T9 — Orphaned Web UI server process blocks port after MCP client disconnect or config change
**Date:** 2026-02-25
**Verdict:** PASS

## Summary

Implemented stdin-EOF shutdown signaling and bounded upstream termination (`terminate -> grace wait -> kill fallback`) so orphaned wrapper processes no longer keep Web UI ports bound after client disconnect.

## Quality Gates

### 1. Test Suite

Command:
```bash
PYTHONPATH=src pytest
```

Result: **PASS**
Key output: `659 passed, 5 skipped, 2 warnings`

### 2. Lint

Command:
```bash
ruff check src/
```

Result: **PASS**
Key output: `All checks passed!`

### 3. Type Check

Command:
```bash
mypy src/
```

Result: **PASS**
Key output: `Success: no issues found in 18 source files`

### 4. Coverage

Command:
```bash
PYTHONPATH=src pytest --cov=src/mcpbridge_wrapper --cov-report=term-missing
```

Result: **PASS**
Total coverage: **91.52%** (required: >= 90%)

## Bug-Specific Verification

- `run_stdin_forwarder()` now supports `on_stdin_closed` callback and invokes it on stdin EOF / forwarding termination.
- `main()` wires a one-shot stdin-closed callback that triggers bounded bridge termination.
- `terminate_bridge_process()` behavior validated for:
- already-exited process (no-op),
- graceful exit after SIGTERM,
- SIGKILL fallback after grace timeout.

## Artifacts Updated

- `src/mcpbridge_wrapper/bridge.py`
- `src/mcpbridge_wrapper/__main__.py`
- `tests/unit/test_bridge.py`
- `tests/unit/test_main.py`
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
## REVIEW REPORT — BUG-T9 Orphaned Web UI Process Lifecycle

**Scope:** origin/main..HEAD
**Files:** 9

### Summary Verdict
- [x] Approve
- [ ] Approve with comments
- [ ] Request changes
- [ ] Block

### Critical Issues
- None.

### Secondary Issues
- None.

### Architectural Notes
- The stdin-closure callback plus bounded terminate/kill fallback is a pragmatic fix for the orphaned process path while preserving existing final cleanup semantics.
- Callback idempotence via `threading.Event` in `main()` prevents repeated termination attempts if multiple closure/error signals occur.

### Tests
- Unit coverage expanded for forwarder EOF callback behavior and terminate escalation behavior.
- Main-loop wiring tests assert callback registration and one-shot termination trigger.
- Quality gates verified in validation report with overall coverage at 91.52% (>= 90%).

### Next Steps
- No actionable review findings.
- FOLLOW-UP phase should be skipped for this task.
6 changes: 5 additions & 1 deletion SPECS/ARCHIVE/INDEX.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
# mcpbridge-wrapper Tasks Archive

**Last Updated:** 2026-02-26 (BUG-T18_Error_Breakdown_full_width_layout_fix)
**Last Updated:** 2026-02-25 (BUG-T9_Orphaned_Web_UI_server_process_blocks_port_after_MCP_client_disconnect_or_config_change)

## Archived Tasks

| Task ID | Folder | Archived | Verdict |
|---------|--------|----------|---------|
| BUG-T18 | [BUG-T18_Error_Breakdown_full_width_layout_fix/](BUG-T18_Error_Breakdown_full_width_layout_fix/) | 2026-02-26 | PASS |
| BUG-T9 | [BUG-T9_Orphaned_Web_UI_server_process_blocks_port_after_MCP_client_disconnect_or_config_change/](BUG-T9_Orphaned_Web_UI_server_process_blocks_port_after_MCP_client_disconnect_or_config_change/) | 2026-02-25 | PASS |
| BUG-T20 | [BUG-T20_Session_Timeline_can_show_negative_duration_due_to_incorrect_entry_ordering/](BUG-T20_Session_Timeline_can_show_negative_duration_due_to_incorrect_entry_ordering/) | 2026-02-25 | PASS |
| BUG-T19 | [BUG-T19_Audit_Log_and_Session_Timeline_are_inconsistent_with_tool_charts_in_multi_process_runs/](BUG-T19_Audit_Log_and_Session_Timeline_are_inconsistent_with_tool_charts_in_multi_process_runs/) | 2026-02-25 | PASS |
| BUG-T13 | [BUG-T13_Per-Tool_Latency_Statistics_does_not_show_params_when_capture_params_is_false/](BUG-T13_Per-Tool_Latency_Statistics_does_not_show_params_when_capture_params_is_false/) | 2026-02-25 | PASS |
Expand Down Expand Up @@ -258,6 +259,7 @@
| [REVIEW_bug_t20_session_timeline_ordering.md](_Historical/REVIEW_bug_t20_session_timeline_ordering.md) | Review report for BUG-T20 |
| [REVIEW_bug_t19_audit_session_consistency.md](_Historical/REVIEW_bug_t19_audit_session_consistency.md) | Review report for BUG-T19 |
| [REVIEW_bug_t13_capture_params_hint.md](_Historical/REVIEW_bug_t13_capture_params_hint.md) | Review report for BUG-T13 |
| [REVIEW_bug_t9_orphaned_webui_process.md](BUG-T9_Orphaned_Web_UI_server_process_blocks_port_after_MCP_client_disconnect_or_config_change/REVIEW_bug_t9_orphaned_webui_process.md) | Review report for BUG-T9 |
| [REVIEW_bug_t11_request_timeline.md](_Historical/REVIEW_bug_t11_request_timeline.md) | Review report for BUG-T11 |

| [REVIEW_bug_t10.md](_Historical/REVIEW_bug_t10.md) | Review report for BUG-T10 |
Expand All @@ -269,6 +271,8 @@
|------|---------|--------|
| 2026-02-26 | BUG-T18 | Archived REVIEW_bug_t18_error_breakdown_layout report |
| 2026-02-26 | BUG-T18 | Archived Error_Breakdown_full_width_layout_fix (PASS) |
| 2026-02-25 | BUG-T9 | Archived REVIEW_bug_t9_orphaned_webui_process report |
| 2026-02-25 | BUG-T9 | Archived Orphaned_Web_UI_server_process_blocks_port_after_MCP_client_disconnect_or_config_change (PASS) |
| 2026-02-25 | BUG-T20 | Archived REVIEW_bug_t20_session_timeline_ordering report |
| 2026-02-25 | BUG-T20 | Archived Session_Timeline_can_show_negative_duration_due_to_incorrect_entry_ordering (PASS) |
| 2026-02-25 | BUG-T19 | Archived REVIEW_bug_t19_audit_session_consistency report |
Expand Down
2 changes: 1 addition & 1 deletion SPECS/INPROGRESS/next.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

## Recently Archived

- **BUG-T9** — Orphaned Web UI server process blocks port after MCP client disconnect or config change (2026-02-25, PASS)
- **BUG-T18** — Error Breakdown widget must be full width streatched (2026-02-26, PASS)
- **BUG-T20** — Session Timeline can show negative duration due to incorrect entry ordering (2026-02-25, PASS)
- **BUG-T19** — Audit Log and Session Timeline are inconsistent with tool charts in multi-process runs (2026-02-25, PASS)

## Suggested Next Tasks

Expand Down
15 changes: 8 additions & 7 deletions SPECS/Workplan.md
Original file line number Diff line number Diff line change
Expand Up @@ -1224,11 +1224,12 @@ Export audit logs as JSON/CSV (the JSONL file on disk contains the complete hist

---

### BUG-T9: Orphaned Web UI server process blocks port after MCP client disconnect or config change
### BUG-T9: Orphaned Web UI server process blocks port after MCP client disconnect or config change
- **Type:** Bug / Web UI / Process Lifecycle
- **Status:** Open
- **Status:** ✅ Fixed (2026-02-25, PASS)
- **Priority:** P1
- **Discovered:** 2026-02-15
- **Completed:** 2026-02-25
- **Component:** `__main__.py` (main loop), `server.py` (Web UI thread)
- **Affected Clients:** All clients (Cursor, Claude Code, Codex CLI, Zed)

Expand All @@ -1255,11 +1256,11 @@ The main loop in `__main__.py` blocks on `output_queue.get()` waiting for stdout
- New wrapper instances silently skip Web UI startup due to port conflict (per BUG-T6 handling).
- Manual `kill` or `lsof -ti :8080 | xargs kill` is required to recover.

#### Resolution Path (Proposed)
- [ ] Detect parent process death via `os.getppid()` polling or stdin EOF and force `mcpbridge` subprocess termination + clean exit.
- [ ] Add a stdin-watchdog thread: when stdin reaches EOF, send SIGTERM to the `mcpbridge` subprocess and set a timeout (e.g. 5s) before SIGKILL.
- [ ] Alternatively, set `mcpbridge` subprocess to die with parent using platform-specific mechanisms (e.g. `prctl(PR_SET_PDEATHSIG)` on Linux).
- [ ] Add integration test: simulate client disconnect (close stdin), verify process exits within timeout.
#### Resolution Path
- [x] Detect stdin EOF in the forwarder thread and trigger deterministic upstream shutdown.
- [x] Add a bounded termination helper (`SIGTERM` then timeout-backed `SIGKILL` fallback) for stuck upstream processes.
- [x] Wire one-shot stdin-closed callback in `main()` so repeated EOF/error events do not trigger duplicate termination attempts.
- [x] Add automated regression tests for EOF callback invocation and graceful-vs-force shutdown behavior.

#### Related Items
- **BUG-T6** ✅ — Port collision handling (warns but doesn't fix orphans).
Expand Down
34 changes: 33 additions & 1 deletion src/mcpbridge_wrapper/__main__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
"""Entry point for mcpbridge-wrapper."""

import contextlib
import signal
import sys
import threading
import time
from typing import Dict, Optional, Tuple

Expand All @@ -10,6 +12,7 @@
create_bridge,
run_stdin_forwarder,
run_stdout_reader,
terminate_bridge_process,
)
from mcpbridge_wrapper.transform import process_response_line

Expand All @@ -21,6 +24,11 @@
# Guard rail for method-correlation tracking (FU-BUG-T7-1).
MAX_PENDING_METHODS = 1000

# After stdin EOF, allow a short window for in-flight responses before forcing
# upstream termination. This reduces dropped final responses in one-shot usage.
STDIN_EOF_DRAIN_TIMEOUT_SECONDS = 0.25
STDIN_EOF_DRAIN_POLL_INTERVAL_SECONDS = 0.01


def check_xcode_tools_enabled() -> None:
"""Print diagnostic message if Xcode Tools MCP is likely not enabled."""
Expand Down Expand Up @@ -425,6 +433,7 @@ def main() -> int:
# This covers ALL request types (not just tools/call) so that non-tool method
# responses can be normalized to standard JSON-RPC errors (BUG-T7).
pending_methods: Dict[str, str] = {}
stdin_closed = threading.Event()

# Create request handler callback for stdin forwarder
def on_request(line: str) -> None:
Expand Down Expand Up @@ -476,8 +485,31 @@ def on_request(line: str) -> None:
except Exception:
pass

def on_stdin_closed() -> None:
"""Terminate upstream bridge when client stdin reaches EOF."""
if stdin_closed.is_set():
return
stdin_closed.set()

# Forward EOF upstream first so mcpbridge can finish pending responses.
if bridge.stdin is not None:
with contextlib.suppress(BrokenPipeError, OSError, ValueError):
bridge.stdin.close()

drain_deadline = time.monotonic() + STDIN_EOF_DRAIN_TIMEOUT_SECONDS
while bridge.poll() is None and time.monotonic() < drain_deadline:
if not pending_methods:
break
time.sleep(STDIN_EOF_DRAIN_POLL_INTERVAL_SECONDS)

terminate_bridge_process(bridge, grace_period=5.0)

# Start stdin forwarding in a daemon thread (with request tracking)
_ = run_stdin_forwarder(bridge, on_request=on_request)
_ = run_stdin_forwarder(
bridge,
on_request=on_request,
on_stdin_closed=on_stdin_closed,
)

# Start stdout reader in a daemon thread with queue
stdout_thread, output_queue = run_stdout_reader(bridge)
Expand Down
35 changes: 35 additions & 0 deletions src/mcpbridge_wrapper/bridge.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import subprocess
import sys
import threading
import time
from typing import Any, Callable, Generator, List, Optional, Tuple


Expand Down Expand Up @@ -173,11 +174,40 @@ def cleanup_bridge(bridge: subprocess.Popen, timeout: Optional[float] = None) ->
return bridge.returncode


def terminate_bridge_process(
bridge: subprocess.Popen,
grace_period: float = 5.0,
poll_interval: float = 0.05,
) -> None:
"""Request bridge shutdown and escalate to kill if it does not stop in time.

This helper is intended for asynchronous shutdown triggers (for example when
stdin reaches EOF). It sends SIGTERM, waits up to ``grace_period`` using
``poll()``, and sends SIGKILL as a fallback if still running.
"""
if bridge.poll() is not None:
return

with contextlib.suppress(ProcessLookupError, OSError):
bridge.terminate()

timeout_seconds = max(0.0, grace_period)
deadline = time.monotonic() + timeout_seconds

while bridge.poll() is None and time.monotonic() < deadline:
time.sleep(max(0.0, poll_interval))

if bridge.poll() is None:
with contextlib.suppress(ProcessLookupError, OSError):
bridge.kill()


def run_stdin_forwarder(
bridge: subprocess.Popen,
metrics: Optional[Any] = None,
audit: Optional[Any] = None,
on_request: Optional[Callable[[str], None]] = None,
on_stdin_closed: Optional[Callable[[], None]] = None,
) -> threading.Thread:
"""
Start a daemon thread that forwards stdin to bridge stdin.
Expand All @@ -191,6 +221,7 @@ def run_stdin_forwarder(
metrics: Optional metrics collector for tracking requests
audit: Optional audit logger for logging requests
on_request: Optional callback(line) called for each request line
on_stdin_closed: Optional callback() called when stdin forwarding ends

Returns:
The Thread object (daemon thread)
Expand All @@ -216,6 +247,10 @@ def forward_loop() -> None:
except (BrokenPipeError, OSError):
# Bridge stdin was closed, exit gracefully
pass
finally:
if on_stdin_closed is not None:
with contextlib.suppress(Exception):
on_stdin_closed()

thread = threading.Thread(target=forward_loop, daemon=True)
thread.start()
Expand Down
Loading