TOML delegator filters (just/mise/task) truncate wrapped tool output, hiding test failures

## Summary

Built-in TOML filters for task-runner delegators — `just`, `mise`, `task`, and the in-flight `poe` (#1062) — apply `max_lines = 50` to the **wrapped tool's** raw output. For tasks that wrap pytest / cargo test / eslint / pyright / etc., this routinely truncates the summary line and failure tracebacks — exactly the part the LLM needs to act on.

In some cases the filter also performs zero useful stripping on the wrapped output and only contributes the 50-line cap, making it a pure regression vs. running with `RTK_NO_TOML=1`.

## Concrete repro: `rtk just test` wrapping pytest

Given a `Justfile` like:

```just
test:
    pytest -n auto -m 'not integration and not flaky'
```

Real `pytest` output for a project with a few hundred tests and one failure:

```
============================= test session starts ==============================
platform darwin -- Python 3.13.x, pytest-8.x.x, pluggy-1.x.x
rootdir: /Users/me/proj
configfile: pyproject.toml
plugins: xdist-3.x, asyncio-0.21, mock-3.14, ...
8 workers [342 items]
........................................                                 [ 11%]
........................................                                 [ 23%]
........................................                                 [ 35%]
... (35+ more lines of progress dots) ...
=================================== FAILURES ===================================
___________________________ test_pacing_calculation ____________________________
[20-line traceback]
========================== short test summary info =============================
FAILED tests/budget_pacer/test_pacer.py::test_pacing_calculation - AssertionError
======================= 1 failed, 341 passed in 14.52s =========================
```

After `rtk just test` is processed by `src/filters/just.toml`:

1. `strip_ansi` ✓
2. `strip_lines_matching` — matches **nothing** in pytest output (the patterns only match `just --list`'s own help banner)
3. `truncate_lines_at = 150` — harmless
4. **`max_lines = 50`** — keeps the header (~6 lines) + ~44 lines of progress dots, then cuts off
5. The LLM never sees `FAILURES`, the traceback, the FAILED line, or the `1 failed, 341 passed` summary

The result is *worse* than no filter at all: pytest looks like it hung mid-run instead of failing. The agent has no signal to act on.

## Why this happens

When `rtk just test` falls into `run_fallback` (`src/main.rs:1054`), the first TOML filter whose `match_command` regex hits is selected — `^just\b`. The TOML pipeline (`src/core/toml_filter.rs`) then applies generic line filtering with no awareness that `just test` actually executed pytest underneath. There is no routing from a delegator's output back to the wrapped tool's dedicated filter.

## Affected filters

| File | `max_lines` | Stripping useful for wrapped output? |
|---|---|---|
| `src/filters/just.toml` | 50 | No — patterns only match `just --list` |
| `src/filters/mise.toml` | 50 | Partial — strips `mise install` noise, useless for `mise run <task>` |
| `src/filters/task.toml` | 50 | Partial — strips `task: [name] cmd` headers |
| `src/filters/poe.toml` (#1062) | 50 | Partial — strips `Poe => ` headers |

All four follow the same delegator pattern and share the same architectural problem. `poe` is on track to inherit it via #1062.

## Proposed direction (sketch, not prescriptive)

A **delegator filter type** that, after stripping the wrapper's own preamble, routes the remaining stdout through whatever filter would have applied to the wrapped command. Concretely for `rtk just test` running pytest:

1. TOML filter matches `^just\b`
2. Strip `just`-specific preamble (currently a no-op)
3. Detect or declare the wrapped tool (`pytest`)
4. Re-apply RTK's pytest filter (Rust module, `src/cmds/python/`) to the remaining output
5. Return final filtered output

### Recommended primary approach: parse the project file

The cleanest path is **parsing the delegator's own project file to resolve task → wrapped command** before the task runs. Each delegator has exactly one well-known config file format:

| Delegator | Project file | Where the wrapped command lives |
|---|---|---|
| `just` | `Justfile` (or `justfile`, `.justfile`) | recipe body lines |
| `mise` | `.mise.toml` / `mise.toml` / `.config/mise.toml` | `[tasks.<name>] run = "..."` |
| `task` | `Taskfile.yml` / `Taskfile.yaml` | `tasks.<name>.cmds` |
| `poe` (#1062) | `pyproject.toml` | `[tool.poe.tasks] <name> = "..."` or `{cmd = "..."}` |

Why this is the right primary path:

1. **Unambiguous and authoritative.** The project file *is* the source of truth for what a task does. No guessing, no parsing tool stdout, no relying on the wrapper to echo the command.
2. **Resolved before execution.** RTK can decide which downstream filter to apply *before* spawning the child, which means it can pick the right `Stdio` strategy (streamed vs. buffered) based on the wrapped tool. This also fixes the streaming-server problem (`uvicorn --reload` etc.) for free — if the resolved command is a streaming server, skip TOML buffering.
3. **No new TOML schema.** No need to add `wraps = ...` to every filter or every task. Filters stay declarative and small.
4. **Resilient to missing project files.** If parsing fails or the file isn't found, fall back to the current line-stripping behavior — strictly no regression.
5. **Works for chained commands.** `just lint-fix` → `ruff check --fix && ruff format` resolves to two commands; RTK picks the dominant filter (or applies sequentially) instead of giving up.

Sketch of the flow for `rtk just test`:

```
1. run_fallback sees `just test`
2. Match TOML filter `^just\b` (existing behavior)
3. NEW: parse ./Justfile, find recipe `test`, extract command line `pytest -n auto -m '...'`
4. NEW: classify the resolved command via the existing `find_matching_filter` / Clap dispatch
5. NEW: if a Rust filter matches (`pytest` → src/cmds/python/), spawn `just test` and pipe output through that filter instead of the generic `just` TOML pipeline
6. NEW: if no specific filter matches, fall back to the current `just` TOML pipeline (today's behavior)
```

The new logic is additive: existing filter behavior is the fallback, so it's a strict improvement.

### Alternative detection paths (for reference, not recommended as primary)

- **Parse the delegator's own stdout.** Some delegators echo the command they're about to run (`task: [build] go build ./...`, `Poe => pytest ...`). Works after the fact but can't inform `Stdio` choices, and not all delegators echo (`just` doesn't by default).
- **Declarative TOML config.** Add `wraps = "pytest"` per filter. Requires authors to manually map every task — doesn't scale across projects with custom recipes.
- **Output-format heuristics.** Sniff for `===== test session starts =====` etc. Fragile and order-dependent.

These can supplement the primary path (e.g. as fallbacks for unparseable project files), but should not be the main mechanism.

### Other open questions

- **Where does routing live?** A second pass through `find_matching_filter` after parsing the project file would suffice, but it breaks the current "one filter per command" mental model in `src/core/toml_filter.rs`. Probably wants its own dispatch helper.
- **Caching.** Project file parsing on every invocation adds startup cost. A cheap mtime-based cache (parse once, invalidate when the file changes) keeps RTK under the <10ms startup target.
- **Trust boundary.** `Justfile`/`Taskfile`/`.mise.toml`/`pyproject.toml` are checked into the repo and are *not* under RTK's existing `.rtk/filters.toml` trust gate. Reading them to *decide which RTK filter to apply* is safe (no execution, no replace/match_output rules), but worth noting in SECURITY.md so it's intentional.
- **Interaction with the rewrite hook.** `just` / `mise` / `task` / `poe` aren't in `src/discover/rules.rs` `RULES`, so they're never auto-rewritten — only invoked when an agent explicitly types `rtk just test`. The fix should not change rewrite behavior.

## Interim workarounds (please document if delegator routing is out of scope)

1. **Don't go through the delegator.** Call the wrapped tool directly: `rtk pytest -n auto -m '...'` instead of `rtk just test`. This is the right answer today but isn't documented anywhere — agents will reach for the task-runner alias.
2. **Project-local override** in `.rtk/filters.toml` with much higher `max_lines` (e.g. 500). Trades meaningful compression for not-actively-harmful, requires per-user `rtk trust`.
3. **Bypass.** `RTK_NO_TOML=1 just test` or `rtk proxy just test`.

## Why this matters

The whole point of RTK is to give agents *useful* compressed output. Truncating before the FAILED summary line silently inverts that goal — the agent sees a passing-looking truncated run and moves on, when in reality the build is broken. This is the worst failure mode for an LLM proxy.

Happy to take a stab at the fix if there's agreement on direction (declarative `wraps = ...` config seems lowest-risk to me). Flagging the architectural question first since it affects four filters and would shape how future delegator filters are written.

cc related: #1062 (poe filter) — would inherit the same fix for free.


File	`max_lines`	Stripping useful for wrapped output?
`src/filters/just.toml`	50	No — patterns only match `just --list`
`src/filters/mise.toml`	50	Partial — strips `mise install` noise, useless for `mise run <task>`
`src/filters/task.toml`	50	Partial — strips `task: [name] cmd` headers
`src/filters/poe.toml` (#1062)	50	Partial — strips `Poe =>` headers

Delegator	Project file	Where the wrapped command lives
`just`	`Justfile` (or `justfile`, `.justfile`)	recipe body lines
`mise`	`.mise.toml` / `mise.toml` / `.config/mise.toml`	`[tasks.<name>] run = "..."`
`task`	`Taskfile.yml` / `Taskfile.yaml`	`tasks.<name>.cmds`
`poe` (#1062)	`pyproject.toml`	`[tool.poe.tasks] <name> = "..."` or `{cmd = "..."}`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TOML delegator filters (just/mise/task) truncate wrapped tool output, hiding test failures #1065

Summary

Concrete repro: `rtk just test` wrapping pytest

Why this happens

Affected filters

Proposed direction (sketch, not prescriptive)

Recommended primary approach: parse the project file

Alternative detection paths (for reference, not recommended as primary)

Other open questions

Interim workarounds (please document if delegator routing is out of scope)

Why this matters

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TOML delegator filters (just/mise/task) truncate wrapped tool output, hiding test failures #1065

Description

Summary

Concrete repro: rtk just test wrapping pytest

Why this happens

Affected filters

Proposed direction (sketch, not prescriptive)

Recommended primary approach: parse the project file

Alternative detection paths (for reference, not recommended as primary)

Other open questions

Interim workarounds (please document if delegator routing is out of scope)

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Concrete repro: `rtk just test` wrapping pytest