Summary
Built-in TOML filters for task-runner delegators — just, mise, task, and the in-flight poe (#1062) — apply max_lines = 50 to the wrapped tool's raw output. For tasks that wrap pytest / cargo test / eslint / pyright / etc., this routinely truncates the summary line and failure tracebacks — exactly the part the LLM needs to act on.
In some cases the filter also performs zero useful stripping on the wrapped output and only contributes the 50-line cap, making it a pure regression vs. running with RTK_NO_TOML=1.
Concrete repro: rtk just test wrapping pytest
Given a Justfile like:
test:
pytest -n auto -m 'not integration and not flaky'
Real pytest output for a project with a few hundred tests and one failure:
============================= test session starts ==============================
platform darwin -- Python 3.13.x, pytest-8.x.x, pluggy-1.x.x
rootdir: /Users/me/proj
configfile: pyproject.toml
plugins: xdist-3.x, asyncio-0.21, mock-3.14, ...
8 workers [342 items]
........................................ [ 11%]
........................................ [ 23%]
........................................ [ 35%]
... (35+ more lines of progress dots) ...
=================================== FAILURES ===================================
___________________________ test_pacing_calculation ____________________________
[20-line traceback]
========================== short test summary info =============================
FAILED tests/budget_pacer/test_pacer.py::test_pacing_calculation - AssertionError
======================= 1 failed, 341 passed in 14.52s =========================
After rtk just test is processed by src/filters/just.toml:
strip_ansi ✓
strip_lines_matching — matches nothing in pytest output (the patterns only match just --list's own help banner)
truncate_lines_at = 150 — harmless
max_lines = 50 — keeps the header (~6 lines) + ~44 lines of progress dots, then cuts off
- The LLM never sees
FAILURES, the traceback, the FAILED line, or the 1 failed, 341 passed summary
The result is worse than no filter at all: pytest looks like it hung mid-run instead of failing. The agent has no signal to act on.
Why this happens
When rtk just test falls into run_fallback (src/main.rs:1054), the first TOML filter whose match_command regex hits is selected — ^just\b. The TOML pipeline (src/core/toml_filter.rs) then applies generic line filtering with no awareness that just test actually executed pytest underneath. There is no routing from a delegator's output back to the wrapped tool's dedicated filter.
Affected filters
| File |
max_lines |
Stripping useful for wrapped output? |
src/filters/just.toml |
50 |
No — patterns only match just --list |
src/filters/mise.toml |
50 |
Partial — strips mise install noise, useless for mise run <task> |
src/filters/task.toml |
50 |
Partial — strips task: [name] cmd headers |
src/filters/poe.toml (#1062) |
50 |
Partial — strips Poe => headers |
All four follow the same delegator pattern and share the same architectural problem. poe is on track to inherit it via #1062.
Proposed direction (sketch, not prescriptive)
A delegator filter type that, after stripping the wrapper's own preamble, routes the remaining stdout through whatever filter would have applied to the wrapped command. Concretely for rtk just test running pytest:
- TOML filter matches
^just\b
- Strip
just-specific preamble (currently a no-op)
- Detect or declare the wrapped tool (
pytest)
- Re-apply RTK's pytest filter (Rust module,
src/cmds/python/) to the remaining output
- Return final filtered output
Recommended primary approach: parse the project file
The cleanest path is parsing the delegator's own project file to resolve task → wrapped command before the task runs. Each delegator has exactly one well-known config file format:
| Delegator |
Project file |
Where the wrapped command lives |
just |
Justfile (or justfile, .justfile) |
recipe body lines |
mise |
.mise.toml / mise.toml / .config/mise.toml |
[tasks.<name>] run = "..." |
task |
Taskfile.yml / Taskfile.yaml |
tasks.<name>.cmds |
poe (#1062) |
pyproject.toml |
[tool.poe.tasks] <name> = "..." or {cmd = "..."} |
Why this is the right primary path:
- Unambiguous and authoritative. The project file is the source of truth for what a task does. No guessing, no parsing tool stdout, no relying on the wrapper to echo the command.
- Resolved before execution. RTK can decide which downstream filter to apply before spawning the child, which means it can pick the right
Stdio strategy (streamed vs. buffered) based on the wrapped tool. This also fixes the streaming-server problem (uvicorn --reload etc.) for free — if the resolved command is a streaming server, skip TOML buffering.
- No new TOML schema. No need to add
wraps = ... to every filter or every task. Filters stay declarative and small.
- Resilient to missing project files. If parsing fails or the file isn't found, fall back to the current line-stripping behavior — strictly no regression.
- Works for chained commands.
just lint-fix → ruff check --fix && ruff format resolves to two commands; RTK picks the dominant filter (or applies sequentially) instead of giving up.
Sketch of the flow for rtk just test:
1. run_fallback sees `just test`
2. Match TOML filter `^just\b` (existing behavior)
3. NEW: parse ./Justfile, find recipe `test`, extract command line `pytest -n auto -m '...'`
4. NEW: classify the resolved command via the existing `find_matching_filter` / Clap dispatch
5. NEW: if a Rust filter matches (`pytest` → src/cmds/python/), spawn `just test` and pipe output through that filter instead of the generic `just` TOML pipeline
6. NEW: if no specific filter matches, fall back to the current `just` TOML pipeline (today's behavior)
The new logic is additive: existing filter behavior is the fallback, so it's a strict improvement.
Alternative detection paths (for reference, not recommended as primary)
- Parse the delegator's own stdout. Some delegators echo the command they're about to run (
task: [build] go build ./..., Poe => pytest ...). Works after the fact but can't inform Stdio choices, and not all delegators echo (just doesn't by default).
- Declarative TOML config. Add
wraps = "pytest" per filter. Requires authors to manually map every task — doesn't scale across projects with custom recipes.
- Output-format heuristics. Sniff for
===== test session starts ===== etc. Fragile and order-dependent.
These can supplement the primary path (e.g. as fallbacks for unparseable project files), but should not be the main mechanism.
Other open questions
- Where does routing live? A second pass through
find_matching_filter after parsing the project file would suffice, but it breaks the current "one filter per command" mental model in src/core/toml_filter.rs. Probably wants its own dispatch helper.
- Caching. Project file parsing on every invocation adds startup cost. A cheap mtime-based cache (parse once, invalidate when the file changes) keeps RTK under the <10ms startup target.
- Trust boundary.
Justfile/Taskfile/.mise.toml/pyproject.toml are checked into the repo and are not under RTK's existing .rtk/filters.toml trust gate. Reading them to decide which RTK filter to apply is safe (no execution, no replace/match_output rules), but worth noting in SECURITY.md so it's intentional.
- Interaction with the rewrite hook.
just / mise / task / poe aren't in src/discover/rules.rs RULES, so they're never auto-rewritten — only invoked when an agent explicitly types rtk just test. The fix should not change rewrite behavior.
Interim workarounds (please document if delegator routing is out of scope)
- Don't go through the delegator. Call the wrapped tool directly:
rtk pytest -n auto -m '...' instead of rtk just test. This is the right answer today but isn't documented anywhere — agents will reach for the task-runner alias.
- Project-local override in
.rtk/filters.toml with much higher max_lines (e.g. 500). Trades meaningful compression for not-actively-harmful, requires per-user rtk trust.
- Bypass.
RTK_NO_TOML=1 just test or rtk proxy just test.
Why this matters
The whole point of RTK is to give agents useful compressed output. Truncating before the FAILED summary line silently inverts that goal — the agent sees a passing-looking truncated run and moves on, when in reality the build is broken. This is the worst failure mode for an LLM proxy.
Happy to take a stab at the fix if there's agreement on direction (declarative wraps = ... config seems lowest-risk to me). Flagging the architectural question first since it affects four filters and would shape how future delegator filters are written.
cc related: #1062 (poe filter) — would inherit the same fix for free.
Summary
Built-in TOML filters for task-runner delegators —
just,mise,task, and the in-flightpoe(#1062) — applymax_lines = 50to the wrapped tool's raw output. For tasks that wrap pytest / cargo test / eslint / pyright / etc., this routinely truncates the summary line and failure tracebacks — exactly the part the LLM needs to act on.In some cases the filter also performs zero useful stripping on the wrapped output and only contributes the 50-line cap, making it a pure regression vs. running with
RTK_NO_TOML=1.Concrete repro:
rtk just testwrapping pytestGiven a
Justfilelike:Real
pytestoutput for a project with a few hundred tests and one failure:After
rtk just testis processed bysrc/filters/just.toml:strip_ansi✓strip_lines_matching— matches nothing in pytest output (the patterns only matchjust --list's own help banner)truncate_lines_at = 150— harmlessmax_lines = 50— keeps the header (~6 lines) + ~44 lines of progress dots, then cuts offFAILURES, the traceback, the FAILED line, or the1 failed, 341 passedsummaryThe result is worse than no filter at all: pytest looks like it hung mid-run instead of failing. The agent has no signal to act on.
Why this happens
When
rtk just testfalls intorun_fallback(src/main.rs:1054), the first TOML filter whosematch_commandregex hits is selected —^just\b. The TOML pipeline (src/core/toml_filter.rs) then applies generic line filtering with no awareness thatjust testactually executed pytest underneath. There is no routing from a delegator's output back to the wrapped tool's dedicated filter.Affected filters
max_linessrc/filters/just.tomljust --listsrc/filters/mise.tomlmise installnoise, useless formise run <task>src/filters/task.tomltask: [name] cmdheaderssrc/filters/poe.toml(#1062)Poe =>headersAll four follow the same delegator pattern and share the same architectural problem.
poeis on track to inherit it via #1062.Proposed direction (sketch, not prescriptive)
A delegator filter type that, after stripping the wrapper's own preamble, routes the remaining stdout through whatever filter would have applied to the wrapped command. Concretely for
rtk just testrunning pytest:^just\bjust-specific preamble (currently a no-op)pytest)src/cmds/python/) to the remaining outputRecommended primary approach: parse the project file
The cleanest path is parsing the delegator's own project file to resolve task → wrapped command before the task runs. Each delegator has exactly one well-known config file format:
justJustfile(orjustfile,.justfile)mise.mise.toml/mise.toml/.config/mise.toml[tasks.<name>] run = "..."taskTaskfile.yml/Taskfile.yamltasks.<name>.cmdspoe(#1062)pyproject.toml[tool.poe.tasks] <name> = "..."or{cmd = "..."}Why this is the right primary path:
Stdiostrategy (streamed vs. buffered) based on the wrapped tool. This also fixes the streaming-server problem (uvicorn --reloadetc.) for free — if the resolved command is a streaming server, skip TOML buffering.wraps = ...to every filter or every task. Filters stay declarative and small.just lint-fix→ruff check --fix && ruff formatresolves to two commands; RTK picks the dominant filter (or applies sequentially) instead of giving up.Sketch of the flow for
rtk just test:The new logic is additive: existing filter behavior is the fallback, so it's a strict improvement.
Alternative detection paths (for reference, not recommended as primary)
task: [build] go build ./...,Poe => pytest ...). Works after the fact but can't informStdiochoices, and not all delegators echo (justdoesn't by default).wraps = "pytest"per filter. Requires authors to manually map every task — doesn't scale across projects with custom recipes.===== test session starts =====etc. Fragile and order-dependent.These can supplement the primary path (e.g. as fallbacks for unparseable project files), but should not be the main mechanism.
Other open questions
find_matching_filterafter parsing the project file would suffice, but it breaks the current "one filter per command" mental model insrc/core/toml_filter.rs. Probably wants its own dispatch helper.Justfile/Taskfile/.mise.toml/pyproject.tomlare checked into the repo and are not under RTK's existing.rtk/filters.tomltrust gate. Reading them to decide which RTK filter to apply is safe (no execution, no replace/match_output rules), but worth noting in SECURITY.md so it's intentional.just/mise/task/poearen't insrc/discover/rules.rsRULES, so they're never auto-rewritten — only invoked when an agent explicitly typesrtk just test. The fix should not change rewrite behavior.Interim workarounds (please document if delegator routing is out of scope)
rtk pytest -n auto -m '...'instead ofrtk just test. This is the right answer today but isn't documented anywhere — agents will reach for the task-runner alias..rtk/filters.tomlwith much highermax_lines(e.g. 500). Trades meaningful compression for not-actively-harmful, requires per-userrtk trust.RTK_NO_TOML=1 just testorrtk proxy just test.Why this matters
The whole point of RTK is to give agents useful compressed output. Truncating before the FAILED summary line silently inverts that goal — the agent sees a passing-looking truncated run and moves on, when in reality the build is broken. This is the worst failure mode for an LLM proxy.
Happy to take a stab at the fix if there's agreement on direction (declarative
wraps = ...config seems lowest-risk to me). Flagging the architectural question first since it affects four filters and would shape how future delegator filters are written.cc related: #1062 (poe filter) — would inherit the same fix for free.