feat(core): track migration progress over time with audit history

## Motivation

Currently each audit run overwrites `audit.json`, losing historical data. Teams need temporal tracking to:

- See violation trends over time (improving vs. regressing)
- Identify which files/rules are getting better or worse
- Track progress toward migration goals
- Correlate audit runs with commits, threads, and milestones
- Measure migration velocity and estimate completion

## Proposed Solution

Introduce immutable audit snapshots with metadata, summaries, and diffs to enable progress tracking.

### Directory Structure

```
.amp/effect-migrate/
├── audits/
│   ├── manifest.json          # Append-only index of all runs
│   ├── metrics.jsonl          # One line per run for fast trending
│   └── runs/
│       └── <run_id>/
│           ├── meta.json      # Run metadata (commit, CI, actor, context)
│           ├── audit.json.gz  # Raw audit (gzipped)
│           ├── summary.json   # Aggregated metrics
│           ├── diff.json      # Delta vs parent run
│           └── fingerprints.json.gz  # Optional cache
├── audit.json                 # Latest (symlink or copy for compatibility)
├── index.json
└── threads.json
```

### Run ID Format

```
<ISO8601-timestamp>--<commit-sha7>--<short-uuid>
Example: 2025-01-16T12:34:56Z--abc1234--p9k3f2
```

### Run Metadata (meta.json)

```json
{
  "schema_version": "1.0",
  "run_id": "2025-01-16T12:34:56Z--abc1234--p9k3f2",
  "timestamp": "2025-01-16T12:34:56Z",
  "parent_run_id": "2025-01-15T18:01:02Z--89de321--m1n2o3",
  "tool_version": "0.1.0",
  "commit": {
    "sha": "abc1234...",
    "branch": "feature/migrate",
    "repo": "org/repo"
  },
  "ci": {
    "provider": "github",
    "run_url": "https://...",
    "job_id": "123456"
  },
  "actor": {
    "user": "alice",
    "email": "alice@example.com"
  },
  "context": {
    "pr": 987,
    "milestone": "M3",
    "thread_ids": ["t-abc123..."],
    "tags": ["pilot", "phase-1"]
  }
}
```

### Violation Fingerprinting

Compute stable fingerprints to track "same violation" across runs:

```typescript
fingerprint = sha1(
  rule_id + "|" + 
  rel_path + "|" + 
  start_line + ":" + start_col + "-" + end_line + ":" + end_col + "|" + 
  normalize(message)
)
```

**Normalization:** Strip dynamic numbers/paths from messages for stability.

### Summary (summary.json)

Fast-access aggregated metrics without re-scanning raw audit:

```json
{
  "schema_version": "1.0",
  "run_id": "...",
  "totals": {
    "violations": 3210,
    "files_with_violations": 456
  },
  "by_rule": {
    "no-async-await": 230,
    "no-barrel-imports": 31
  },
  "by_severity": {
    "error": 950,
    "warning": 2260
  },
  "by_dir": {
    "src/": 2100,
    "packages/core/": 600,
    "tests/": 510
  },
  "top_files": [
    { "path": "src/api/handler.ts", "count": 120 }
  ],
  "time_to_run_ms": 52234
}
```

### Diff (diff.json)

Delta relative to parent run:

```json
{
  "schema_version": "1.0",
  "run_id": "...",
  "parent_run_id": "...",
  "delta": {
    "new": [
      {
        "fingerprint": "abc123...",
        "rule_id": "no-async-await",
        "path": "src/new-file.ts",
        "location": { "start": { "line": 10, "column": 5 } }
      }
    ],
    "fixed": [ /* ... */ ],
    "unchanged_count": 2980
  },
  "by_rule_delta": {
    "no-async-await": { "new": 5, "fixed": 10, "net": -5 }
  },
  "by_file_delta": {
    "src/api/handler.ts": { "new": 2, "fixed": 0, "net": 2 }
  }
}
```

### Manifest & Metrics for Fast Queries

**manifest.json** (append-only):
```json
[
  {
    "run_id": "2025-01-16T12:34:56Z--abc1234--p9k3f2",
    "timestamp": "2025-01-16T12:34:56Z",
    "parent_run_id": "...",
    "commit_sha": "abc1234",
    "pr": 987,
    "milestone": "M3",
    "tags": ["pilot"]
  }
]
```

**metrics.jsonl** (one line per run):
```jsonl
{"run_id":"...","ts":"...","total":3210,"error":950,"warn":2260,"new":42,"fixed":81,"net":-39}
```

### Query Patterns

1. **Trending overall violations:**
   ```bash
   tail -100 metrics.jsonl | jq '{ts, total, error, warn}'
   ```

2. **Trend by rule:**
   ```bash
   jq -s 'map({ts, count: .rules["no-async-await"]})' metrics.jsonl
   ```

3. **Top regressing files:**
   ```bash
   # Sum net deltas across last 10 diff.json files
   ```

4. **Progress to goal:**
   ```bash
   jq 'select(.total < 100)' metrics.jsonl | head -1
   ```

5. **Correlate with threads/commits:**
   ```bash
   jq -s 'map({ts, commit_sha, thread_ids, total})' manifest.json
   ```

### Write Flow

1. Run audit and generate `audit.json`
2. Create `run_id` and `runs/<run_id>/` directory
3. Write `meta.json` with commit/CI/actor/context
4. Gzip and write `audit.json.gz`
5. Compute fingerprints and `summary.json`
6. Load `parent_run_id` from manifest (latest entry)
7. Compute `diff.json` vs parent
8. Append to `manifest.json` and `metrics.jsonl`
9. Update `audit.json` symlink/copy to point to latest

### Performance & Size Constraints

- Always gzip raw `audit.json` (level 6-9)
- Summaries/diffs/metrics stay small
- Retention policy: Keep last N raw audits (configurable), but keep all metrics/summaries
- Optional sharding: `audits/2025-01/runs/...`
- Optional: Cache fingerprints per run to speed up diffing

### CLI Helpers (Future)

```bash
effect-migrate history ls [--limit 10]
effect-migrate history trend [--rule <id>] [--window 30]
effect-migrate history diff <run_a> <run_b>
effect-migrate history hotfiles [--window 10]
```

## Implementation Phases

### Phase 1: Basic History (Small)
- [ ] Create run directory structure
- [ ] Generate run IDs
- [ ] Write `meta.json`
- [ ] Append to `manifest.json`
- [ ] Keep `audit.json` as latest symlink/copy

### Phase 2: Summaries & Metrics (Medium)
- [ ] Implement fingerprinting for violations
- [ ] Generate `summary.json` per run
- [ ] Write `metrics.jsonl` for fast trending
- [ ] Gzip raw audits

### Phase 3: Diffs & Analysis (Medium)
- [ ] Compute `diff.json` vs parent
- [ ] Track new/fixed/unchanged violations
- [ ] By-rule and by-file deltas

### Phase 4: Query Tools (Large)
- [ ] CLI commands for trending
- [ ] Visualization helpers
- [ ] Retention management
- [ ] Optional: SQLite/DuckDB for advanced queries

## Advanced Path (Future)

If file-based queries become too slow or complex:

- **SQLite/DuckDB:** Ingest runs into local DB
- **Tables:** runs, violations, summaries, deltas
- **Indexes:** By rule, file, fingerprint
- **Complex queries:** Multi-dimensional slicing, joins

## Estimated Effort

- **Phase 1:** Small (≤1 hour) - Basic immutable snapshots
- **Phase 2:** Medium (1-3 hours) - Summaries and metrics
- **Phase 3:** Medium (1-3 hours) - Diffs and deltas
- **Phase 4:** Large (1-2 days) - Full tooling and analytics

## Acceptance Criteria

- [ ] Each audit creates immutable snapshot with unique run_id
- [ ] Manifest and metrics files enable fast trending
- [ ] Diffs track new/fixed violations vs parent run
- [ ] Latest audit.json maintained for backward compatibility
- [ ] Configurable retention policy
- [ ] Documentation for query patterns

## Related

- #21 (Effect/TypeScript version tracking)
- #22 (Auto-add threads from Amp)
- Metrics command already exists for current state

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): track migration progress over time with audit history #23

Motivation

Proposed Solution

Directory Structure

Run ID Format

Run Metadata (meta.json)

Violation Fingerprinting

Summary (summary.json)

Diff (diff.json)

Manifest & Metrics for Fast Queries

Query Patterns

Write Flow

Performance & Size Constraints

CLI Helpers (Future)

Implementation Phases

Phase 1: Basic History (Small)

Phase 2: Summaries & Metrics (Medium)

Phase 3: Diffs & Analysis (Medium)

Phase 4: Query Tools (Large)

Advanced Path (Future)

Estimated Effort

Acceptance Criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(core): track migration progress over time with audit history #23

Description

Motivation

Proposed Solution

Directory Structure

Run ID Format

Run Metadata (meta.json)

Violation Fingerprinting

Summary (summary.json)

Diff (diff.json)

Manifest & Metrics for Fast Queries

Query Patterns

Write Flow

Performance & Size Constraints

CLI Helpers (Future)

Implementation Phases

Phase 1: Basic History (Small)

Phase 2: Summaries & Metrics (Medium)

Phase 3: Diffs & Analysis (Medium)

Phase 4: Query Tools (Large)

Advanced Path (Future)

Estimated Effort

Acceptance Criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions