Skip to content

feat(core): track migration progress over time with audit history #23

@aridyckovsky

Description

@aridyckovsky

Motivation

Currently each audit run overwrites audit.json, losing historical data. Teams need temporal tracking to:

  • See violation trends over time (improving vs. regressing)
  • Identify which files/rules are getting better or worse
  • Track progress toward migration goals
  • Correlate audit runs with commits, threads, and milestones
  • Measure migration velocity and estimate completion

Proposed Solution

Introduce immutable audit snapshots with metadata, summaries, and diffs to enable progress tracking.

Directory Structure

.amp/effect-migrate/
├── audits/
│   ├── manifest.json          # Append-only index of all runs
│   ├── metrics.jsonl          # One line per run for fast trending
│   └── runs/
│       └── <run_id>/
│           ├── meta.json      # Run metadata (commit, CI, actor, context)
│           ├── audit.json.gz  # Raw audit (gzipped)
│           ├── summary.json   # Aggregated metrics
│           ├── diff.json      # Delta vs parent run
│           └── fingerprints.json.gz  # Optional cache
├── audit.json                 # Latest (symlink or copy for compatibility)
├── index.json
└── threads.json

Run ID Format

<ISO8601-timestamp>--<commit-sha7>--<short-uuid>
Example: 2025-01-16T12:34:56Z--abc1234--p9k3f2

Run Metadata (meta.json)

{
  "schema_version": "1.0",
  "run_id": "2025-01-16T12:34:56Z--abc1234--p9k3f2",
  "timestamp": "2025-01-16T12:34:56Z",
  "parent_run_id": "2025-01-15T18:01:02Z--89de321--m1n2o3",
  "tool_version": "0.1.0",
  "commit": {
    "sha": "abc1234...",
    "branch": "feature/migrate",
    "repo": "org/repo"
  },
  "ci": {
    "provider": "github",
    "run_url": "https://...",
    "job_id": "123456"
  },
  "actor": {
    "user": "alice",
    "email": "alice@example.com"
  },
  "context": {
    "pr": 987,
    "milestone": "M3",
    "thread_ids": ["t-abc123..."],
    "tags": ["pilot", "phase-1"]
  }
}

Violation Fingerprinting

Compute stable fingerprints to track "same violation" across runs:

fingerprint = sha1(
  rule_id + "|" + 
  rel_path + "|" + 
  start_line + ":" + start_col + "-" + end_line + ":" + end_col + "|" + 
  normalize(message)
)

Normalization: Strip dynamic numbers/paths from messages for stability.

Summary (summary.json)

Fast-access aggregated metrics without re-scanning raw audit:

{
  "schema_version": "1.0",
  "run_id": "...",
  "totals": {
    "violations": 3210,
    "files_with_violations": 456
  },
  "by_rule": {
    "no-async-await": 230,
    "no-barrel-imports": 31
  },
  "by_severity": {
    "error": 950,
    "warning": 2260
  },
  "by_dir": {
    "src/": 2100,
    "packages/core/": 600,
    "tests/": 510
  },
  "top_files": [
    { "path": "src/api/handler.ts", "count": 120 }
  ],
  "time_to_run_ms": 52234
}

Diff (diff.json)

Delta relative to parent run:

{
  "schema_version": "1.0",
  "run_id": "...",
  "parent_run_id": "...",
  "delta": {
    "new": [
      {
        "fingerprint": "abc123...",
        "rule_id": "no-async-await",
        "path": "src/new-file.ts",
        "location": { "start": { "line": 10, "column": 5 } }
      }
    ],
    "fixed": [ /* ... */ ],
    "unchanged_count": 2980
  },
  "by_rule_delta": {
    "no-async-await": { "new": 5, "fixed": 10, "net": -5 }
  },
  "by_file_delta": {
    "src/api/handler.ts": { "new": 2, "fixed": 0, "net": 2 }
  }
}

Manifest & Metrics for Fast Queries

manifest.json (append-only):

[
  {
    "run_id": "2025-01-16T12:34:56Z--abc1234--p9k3f2",
    "timestamp": "2025-01-16T12:34:56Z",
    "parent_run_id": "...",
    "commit_sha": "abc1234",
    "pr": 987,
    "milestone": "M3",
    "tags": ["pilot"]
  }
]

metrics.jsonl (one line per run):

{"run_id":"...","ts":"...","total":3210,"error":950,"warn":2260,"new":42,"fixed":81,"net":-39}

Query Patterns

  1. Trending overall violations:

    tail -100 metrics.jsonl | jq '{ts, total, error, warn}'
  2. Trend by rule:

    jq -s 'map({ts, count: .rules["no-async-await"]})' metrics.jsonl
  3. Top regressing files:

    # Sum net deltas across last 10 diff.json files
  4. Progress to goal:

    jq 'select(.total < 100)' metrics.jsonl | head -1
  5. Correlate with threads/commits:

    jq -s 'map({ts, commit_sha, thread_ids, total})' manifest.json

Write Flow

  1. Run audit and generate audit.json
  2. Create run_id and runs/<run_id>/ directory
  3. Write meta.json with commit/CI/actor/context
  4. Gzip and write audit.json.gz
  5. Compute fingerprints and summary.json
  6. Load parent_run_id from manifest (latest entry)
  7. Compute diff.json vs parent
  8. Append to manifest.json and metrics.jsonl
  9. Update audit.json symlink/copy to point to latest

Performance & Size Constraints

  • Always gzip raw audit.json (level 6-9)
  • Summaries/diffs/metrics stay small
  • Retention policy: Keep last N raw audits (configurable), but keep all metrics/summaries
  • Optional sharding: audits/2025-01/runs/...
  • Optional: Cache fingerprints per run to speed up diffing

CLI Helpers (Future)

effect-migrate history ls [--limit 10]
effect-migrate history trend [--rule <id>] [--window 30]
effect-migrate history diff <run_a> <run_b>
effect-migrate history hotfiles [--window 10]

Implementation Phases

Phase 1: Basic History (Small)

  • Create run directory structure
  • Generate run IDs
  • Write meta.json
  • Append to manifest.json
  • Keep audit.json as latest symlink/copy

Phase 2: Summaries & Metrics (Medium)

  • Implement fingerprinting for violations
  • Generate summary.json per run
  • Write metrics.jsonl for fast trending
  • Gzip raw audits

Phase 3: Diffs & Analysis (Medium)

  • Compute diff.json vs parent
  • Track new/fixed/unchanged violations
  • By-rule and by-file deltas

Phase 4: Query Tools (Large)

  • CLI commands for trending
  • Visualization helpers
  • Retention management
  • Optional: SQLite/DuckDB for advanced queries

Advanced Path (Future)

If file-based queries become too slow or complex:

  • SQLite/DuckDB: Ingest runs into local DB
  • Tables: runs, violations, summaries, deltas
  • Indexes: By rule, file, fingerprint
  • Complex queries: Multi-dimensional slicing, joins

Estimated Effort

  • Phase 1: Small (≤1 hour) - Basic immutable snapshots
  • Phase 2: Medium (1-3 hours) - Summaries and metrics
  • Phase 3: Medium (1-3 hours) - Diffs and deltas
  • Phase 4: Large (1-2 days) - Full tooling and analytics

Acceptance Criteria

  • Each audit creates immutable snapshot with unique run_id
  • Manifest and metrics files enable fast trending
  • Diffs track new/fixed violations vs parent run
  • Latest audit.json maintained for backward compatibility
  • Configurable retention policy
  • Documentation for query patterns

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    amp:auditAudit context and structured violation outputamp:metricsMetrics context output for Amppkg:cliIssues related to @effect-migrate/cli packagepkg:coreIssues related to @effect-migrate/core packagepriority:highHigh prioritytype:featureNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions