Status: Accepted (v1 subset shipped in 0.2.3) Date: 2026-06-12 Deciders: PatrickJS Index: Design decisions
Shipped in 0.2.3 per Option A: per-file digest manifests (
inputs.json) persisted with every cache entry, per-task last-passing baseline pointers (pruned bygcwhen their entries go), failure context packs under.async/runs/<run-id>/context/(redacted 4 KiB log tail, repro command, digest-only input diff,baselineMissingwhen no pass is recorded), andexplain <task> --diff-inputs/explain --run <run-id>— see api.md for the reference and registered claims. The claims cross-reference shipped with a narrower heuristic than decision 2 described: packs name claims whose registered test titles appear in the failing log, rather than mapping test files to tasks. Not yet shipped: packs for downgraded cache hits (consequences “revisit” list).
When a task fails, the run record answers what happened: execution.json has status, attempts, cache key, timings, and error; summary.md is the human view; logs/<task>.log holds output. What an agent (or a tired human) actually needs to start fixing is narrower and partly missing:
computeTaskCacheKey streams every input file into a single rolling sha256 — per-file digests are computed in passing and thrown away. “The key changed” is recorded; why it changed is not reconstructable without re-hashing against a baseline that also doesn’t exist.tests/claims.json whose tests the failing task runs.An agent diagnosing a failure today must re-read the repo to rediscover all of this, which is slow, token-expensive, and exactly the kind of work a pipeline that already walks every input file should do once and persist.
Forces: execution records carry schemaVersion: 1 and additive fields are non-breaking; logs are size-bounded and secret-redacted; run records are auto-pruned (ASYNC_PIPELINE_KEEP_RUNS); input walks already touch every file, so digest persistence is nearly free at hash time but not free at storage time.
Persist per-file input digests, and emit a bounded, machine-readable failure context pack per failed task.
inputs.json: relative path → content digest for the task’s resolved input files (the same walk and exclusions the cache key uses). Stored per cache entry rather than per run, because digests are a property of the keyed input state; runs reference them..async/runs/<run-id>/context/<task>.json containing: task id and failing step, exit code and error, reproduction command (async-pipeline run-task <task>), bounded log tail (redacted, capped well below the log cap), input diff versus the task’s most recent passing cache entry (added/removed/changed paths — digests only, never contents), dependency fingerprints that changed, and — when tests/claims.json exists — the claim ids whose tests name the failing task’s test files.explain, don’t add commands. explain <task> --diff-inputs answers “what changed since this last passed” on demand; explain --run <run-id> --format json returns the context pack. No new top-level command.| Dimension | Assessment |
|---|---|
| Complexity | Low-medium — data already in hand at hash time |
| Storage cost | One manifest per cache entry; pruned by existing gc |
| Token efficiency | High — diff is precomputed, pack is bounded |
| Schema risk | Additive only (schemaVersion unchanged) |
Pros: answers “what changed” exactly, from data the pipeline already computes; baseline (last passing entry) is well-defined; useful to humans (--diff-inputs) independent of any agent.
Cons: manifest write on every cache-entry creation; “last passing entry” can be gc’d, degrading the diff to “no baseline” (pack must say so explicitly).
| Dimension | Assessment |
|---|---|
| Complexity | Low |
| Storage cost | Zero |
| Token efficiency | Medium — answer arrives, but slowly and only if asked |
| Schema risk | None |
Pros: no storage growth; no new write paths. Cons: requires a stored baseline anyway (you cannot diff against a state you didn’t record), so this collapses into “store at least the last passing manifest” — Option A with fewer guarantees; re-hashing the working tree races against the user editing files post-failure.
| Dimension | Assessment |
|---|---|
| Complexity | Medium |
| Storage cost | High — copies of inputs per entry |
| Token efficiency | Highest (exact patches reconstructable) |
| Schema risk | Additive but heavy |
Pros: enables exact “show me the change” without git.
Cons: duplicates what git already does in the repos this targets; storage blowup; secret-bearing input files would be copied into .async/, expanding the redaction surface for marginal gain.
B demonstrates that some persistence is unavoidable — a diff needs a baseline — so the real choice is digest manifests (A) versus content snapshots (C). Digests answer the operative question (“which files moved”) at path-list cost and leave content reconstruction to git, which is present in every intended deployment. C’s only unique capability, exact content diffs without git, is not worth copying potentially secret-bearing inputs into the store.
The claims cross-reference is the most repo-specific piece: it makes the pack say “this failure breaks promise X” rather than “exit code 1”. It is also optional by design — pipelines without a claims registry simply omit the field — so the feature stays general.
explain becomes the debugging entrypoint for humans too.inputs.json, context/); gc and auto-prune must account for them; docs and the execution-record schema section need the additive fields specified.computeTaskCacheKey’s walk and write inputs.json per cache entry.explain with --diff-inputs and --run JSON output; document in api.md.gc and run-pruning about manifests and packs.PROMISE: tests; CHANGELOG entry.