Skip to content

fix: clear stale Telegram CAS bindings before reuse#91

Open
kesslerio wants to merge 4 commits intopwrdrvr:mainfrom
kesslerio:fix/stale-cas-bindings
Open

fix: clear stale Telegram CAS bindings before reuse#91
kesslerio wants to merge 4 commits intopwrdrvr:mainfrom
kesslerio:fix/stale-cas-bindings

Conversation

@kesslerio
Copy link
Copy Markdown

@kesslerio kesslerio commented Apr 11, 2026

Fixes #94.

What

  • clear a saved CAS binding when a status read or a new turn discovers that the bound thread is no longer usable
  • stop cas_status and turn startup from restoring a binding that was already cleared during that check
  • remove sibling resume-thread callbacks for the same conversation once one resume choice moves into approval or succeeds

Why

A resumed Telegram topic could still look bound and yet behave like it was dead. In the April 11 case, the thread itself was still resumable. What had drifted was the binding state around it.

That showed up in two places:

  • stale bindings could survive status reads and turn setup longer than they should
  • repeated /cas_resume attempts could leave old resume callbacks behind for the same topic

This patch cleans up both parts. It keeps the controller from reusing dead bindings, and it clears old resume choices once a topic starts binding again.

Tests

  • pnpm test src/controller.test.ts
  • pnpm test src/state.test.ts
  • pnpm typecheck
  • pnpm test

AI Assistance

I used Codex to inspect local gateway and plugin state, trace the binding lifecycle, implement the fix, and run the tests above.

@kesslerio kesslerio changed the title fix: clear stale CAS bindings before status and turns fix: clear stale Telegram CAS bindings before reuse Apr 12, 2026
@kesslerio
Copy link
Copy Markdown
Author

I did another pass on the current branch, not the earlier narrower diff.

One thing still worries me.

startTurn() now preflights every bound turn with readThreadState(). If that read fails for a generic RPC reason, the controller throws before it even tries to continue on the existing thread. Before this PR, the turn would still attempt to start with existingThreadId.

The tests cover the stale-binding case (no rollout found), but not the broader behavior change for ordinary read failures. I think this should go one of two ways:

  • fall back to attempting startTurn when the read error is not a confirmed stale binding, or
  • add a targeted test that proves aborting the turn on generic readThreadState failures is intentional

That is the only real thing I would still patch before merge.

@kesslerio
Copy link
Copy Markdown
Author

Patched the preflight issue I called out in the last review pass.

The controller still clears a binding when the thread is clearly gone. That part stays the same.

What changed is the generic failure path. If readThreadState() blows up for something ordinary like an RPC timeout, we now log it and still try startTurn() with the saved sessionKey and threadId instead of aborting the turn before it starts.

I also added a regression for that case.

Checks I reran after the patch:

  • pnpm test src/controller.test.ts
  • pnpm typecheck
  • pnpm test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: resumed Telegram topics can quietly lose their CAS binding

2 participants