Skip to content

[Critical] Agent session unpredictably terminates due to context window overflow on long-running tasks, leading to loss of work #3997

@jroth55

Description

@jroth55

What version of Codex is running?

0.39.0

Which model were you using?

gpt-5-codex high

What platform is your computer?

Darwin 23.5.0 arm64 arm

What steps can reproduce the bug?

The bug is reliably reproduced by engaging the agent in any complex, long-running software development task that requires it to build up a large conversational history. This includes, but is not limited to, full-stack implementation, comprehensive code audits, and large-scale refactoring.

  1. Initiate a large-scale task in a non-trivial codebase (e.g., "architect, plan and fully implement all in task_list.md" or "identify all errors in main.py").
  2. Allow the agent to work. The agent will accumulate a large context by reading files, executing commands, generating code, and reasoning through its steps. This process often takes 15–30+ minutes.
  3. Wait for failure. At an unpredictable point, the agent's next action will trigger a fatal error as the total size of the conversational payload exceeds the model's context window limit. The client's retry mechanism will fail, and the session is terminated.

What is the expected behavior?

The agent should be able to handle long-running tasks without catastrophic failure. The system should possess an intelligent, automatic context management mechanism that ensures session continuity.

The expected behavior is a seamless workflow where, upon reaching a context threshold, the agent:

  1. Proactively and autonomously summarizes its current state.
  2. Resets its conversational history to free up the context window.
  3. Loads the summary back into its working memory.
  4. Continues the user's pending task without interruption or loss of state.

The user should experience at most a brief pause with an informational message (e.g., “Compacting session history…”), not a fatal error that requires them to manually restart and re-establish context.

What do you see instead?

The session abruptly terminates with a series of failed retries, citing that the input exceeds the context window.

■ stream disconnected before completion: Your input exceeds the context window of this model. Please adjust your input and try again.

This is a critical failure because it erases all of the valuable, hard-earned context the agent has built. The user's only recourse is to manually piece together the last known state and start a new session, which is highly inefficient, error-prone, and discourages the use of the agent for its most powerful, high-value tasks.

Additional information

This is not a minor issue; it is the primary blocker to using Codex CLI for professional, real-world software engineering. The most valuable tasks (architecture, greenfield implementation, deep refactoring) are precisely the ones that are most likely to trigger this bug.

Note: While there is a manual /compact command, this cannot be done during a long-running task and I don't have confidence it is capturing the right items when it does compact. I suggest a solution below:

Proposed Solution: Automated Inline “Context Bridge” Compaction

The solution is to implement a proactive, client-side workflow that intelligently manages the agent's context. This system, called “Context Compaction,” creates a structured “Context Bridge” to transfer the agent's state across a session reset.

Phase 1: Monitor & Trigger

  • The client continuously monitors the token count of the conversational history.
  • When the count exceeds a threshold (e.g., 80% of the model's limit), it intercepts the next user request and initiates compaction.

Phase 2: Pin State

  • The client captures the ground truth of the workspace:

    • Git State: commit, branch.
    • Working Set: git status determines if the tree is dirty. If so, a git diff of the last active file is captured as a mandatory Active Patch.
    • Anchor Registry: A client-managed registry of deterministic anchors (TYPE:YYYYMMDD:SEQ) and their one-line descriptions is finalized.

Phase 3: Generate Context Bridge

The following is the exact generator prompt.

Developer: You are an expert, stateful software development agent. Your mission is to analyze the provided session history and pinned state, and generate a high-fidelity, lossless Context Bridge document. This document ensures seamless and zero-context-loss resumption of work. Strictly adhere to the Markdown template below at all times.

Before starting, begin with a concise checklist (3-7 bullets) outlining the key steps you will take to construct the Context Bridge document, such as analyzing session history, extracting relevant state, validating mandatory fields, and finalizing the document structure.

**CRITICAL INSTRUCTIONS:**
1. **Anchor Integrity:** Use the anchor format `TYPE:YYYYMMDD:SEQ` (e.g., `DEC:20231027:001`). Every anchor referenced in the document must be defined in the `Anchor Registry`.
2. **State Fidelity:** The `Working Tree` section is authoritative for file state. Include the full, applicable `diff` patch for the `active_file` only.
3. **Secret Redaction:** Never include secret values. Instead, display only the secret name and its status (`<set|missing>`). If any secret is referenced, set `bridge_redacted: true`.
4. **Mandatory Fields & Validation:** The following fields are mandatory: `Goal`, `Working Tree`, `Branch/Commit`, `Unsaved Changes`, and `Next Steps`. If any information is missing, synthesize a conservative placeholder (e.g., for Next Steps: "Review the code and determine next action.") and set `bridge_incomplete: true`.
5. **Resumption Directive:** The `Next Steps` section must give precise, prioritized tasks. If `unsaved_changes: yes`, the immediate step must be to apply the `active_patch`.
6. **Task Dependencies:** When representing tasks in `Next Steps` or `Task Progress`, indicate dependencies explicitly (e.g., `Depends on: <Task N>` or similar notation) to clarify prerequisite relationships between tasks. Ensure the correct execution order is clear.
7. **Comprehensive Task Recording:** Record every single task that was completed and every task that is not yet completed in the Context Bridge. For each task, fully state its details so that the agent resuming work can understand the existing and remaining plan. For each completed and remaining task, include the associated acceptance criteria to ensure continuity and clarity of the plan.

After completing each major section or edit, validate the result in 1-2 lines: confirm accuracy, completeness, and adherence to the template, and self-correct if validation fails.

---
**PINNED STATE:**
[Client injects git status, full working tree diff for active file, list of other modified files, key env vars, etc.]
---
**ANCHOR REGISTRY (Source of Truth):**
[Client injects all known anchors and their 1-line descriptions.]
---
**CONTEXT BRIDGE TEMPLATE (Output in this exact structure):**
```markdown
bridge_incomplete: <true|false>
bridge_redacted: <true|false>

## Context Bridge

### Anchor Registry
- [[CL:YYYYMMDD:SEQ]]: <1-line summary of change>
- [[DEC:YYYYMMDD:SEQ]]: <1-line summary of decision>
- [[BUG:YYYYMMDD:SEQ]]: <1-line summary of bug>
- [[FAIL:YYYYMMDD:SEQ]]: <1-line summary of failed attempt>
- [[NOTE:YYYYMMDD:SEQ]]: <Important note>
- [[WARN:YYYYMMDD:SEQ]]: <Potential risk or gotcha>

### Original Request
**Goal**: <Full original user request>  
**Approach**: <High-level strategy>  
**Acceptance Criteria**:
- [ ] AC1: <Testable outcome>

### Project Map
**Structure**: <Project organization>  
**Entry Points**: <Main files/functions>  
**Dependencies**: <Key libs/services/APIs>  
**Constraints**: <Limitations or requirements>

### Complete Change Log
**Added Files**:
- [[CL:YYYYMMDD:SEQ]]: `path/to/file` — Why: <Reason>. What: <Purpose>.
**Modified Files**:
- [[CL:YYYYMMDD:SEQ]]: `path/to/file` — Why: <Reason>. What: <Summary>.

### Task Progress
**Completed**:
- Task: <Finished task> — <Outcome>
  - Acceptance Criteria: <Criteria for this completed task>
**In Progress**:
- Current: <Task>
  - Doing: <Active work>
  - Blocked By: <Blocking issue>
  - Success Check: <Verification method>
  - Depends on: <Specify prerequisite tasks, if any>
  - Acceptance Criteria: <Criteria for this task>
**Remaining**:
- Task N: <Pending task>
  - Depends on: <Specify prerequisite tasks, if any>
  - Acceptance Criteria: <Criteria for this task>

### Key Discoveries, Decisions, & Risks
**Architecture**:
- [[DEC:YYYYMMDD:SEQ]]: Found: <Discovery> — Decided: <Approach>
**Issues Found**:
- [[BUG:YYYYMMDD:SEQ]]: `file:line`: <Bug> — <Fix status>
**Failed Attempts**:
- [[FAIL:YYYYMMDD:SEQ]]: Tried: <Attempt> — Failed because: <Reason>
**Notes & Warnings**:
- [[NOTE:YYYYMMDD:SEQ]]: <Observation>
- [[WARN:YYYYMMDD:SEQ]]: <Risk or behavior>

### Current Working State
**Branch/Commit**: <Git info>  
**Last Command Run**:
```bash
<Last command>
```
**Working Tree**:
- `active_file`: `<path/to/primary/file.js:line>`
- `active_patch`:
```diff
--- a/path/to/primary/file.js
+++ b/path/to/primary/file.js
@@ -start,len +start,len @@
... diff ...
```
- `other_modified_files`: [`path/to/other/file.py`]
- `added_files`: [`path/to/new/file.go`]
- `deleted_files`: [`path/to/old/file.rs`]
- `unsaved_changes`: <yes|no>

### Environment & Config
**Required Vars**: VAR_NAME="needed"  
**Secrets/Keys**: KEY_NAME=<set|missing>  
**Config Files**: path/to/config.yml: status

### External Context
**Documentation Consulted**: [[DOC:YYYYMMDD:SEQ]] <Source>

### Useful Commands
```bash
# Frequently used and useful commands
<command> # purpose: <purpose>
```

### Verification Commands
```bash
# Commands to test Acceptance Criteria
<command> # expected: <result>
```

### Next Steps (Prioritized)
1. **Immediate**: <If unsaved_changes=yes, "Apply the active_patch to restore the working state of `active_file`". Otherwise, specify next single concrete action.>
   - Success Check: <How to verify>
   - Depends on: <Specify prerequisite tasks, if any>
2. **Next**: <Follow-up task>
   - Success Check: <How to verify>
   - Depends on: <Specify prerequisite tasks, if any>
```
---
**FULL SESSION HISTORY (For analysis to complete above fields):**
```
[Client provides entire raw session history here.]
```

## Output Format
Your output must follow the structured Markdown template above exactly.

- All fields in the template are required. If information is unavailable, synthesize a minimal, conservative placeholder and set `bridge_incomplete` or `bridge_redacted` accordingly.
- Use Markdown list syntax where needed.
- Anchor references must exactly match those in the Anchor Registry.
- For each file modification, use absolute or workspace-relative paths; for diffs, include only that for the specified active file.
- Secrets are always represented as name and `<set|missing>` only.
- Output Boolean/status fields as lowercase, unquoted (`true` or `false`).
- Example output has been provided above for reference.

Phase 4: Validate Bridge

  • The client performs a series of non-negotiable checks on the generated bridge:

    • Token Count: Is it within the hard limit?
    • Mandatory Fields: Are key sections like Current Working State present?
    • State Integrity: Do the Commit and Unsaved Changes status in the bridge match the pinned state?
    • Referential Integrity: Do all [[...]] anchors have a definition?
  • If validation fails, the process aborts with an error, preventing the loading of a corrupt state.

Phase 5: Reset & Seed

  • The client clears its local conversation history.
  • It constructs a new seed prompt containing the “Instruction Manual” (from our analysis) and the validated Context Bridge.
  • If Unsaved Changes: yes, the client's first instruction to the new agent is to apply the Active Patch.

Phase 6: Resume

  • The client sends the user's original, held-over prompt to the newly bootstrapped agent.

This comprehensive, automated solution would fix the bug permanently, making Codex CLI a robust and reliable tool for professional, long-duration development tasks.

An additional future enhancement could include integrating the tasks remaining with the plan/todo list tool.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions