agent() as a First-Class Task StepStatus: Accepted (v1 subset shipped in 0.2.2) Date: 2026-06-12 Deciders: PatrickJS Index: Design decisions
Shipped in 0.2.2: the
agentsprofile block, theagent()step, prompt/transcript evidence under.async/runs/<run-id>/agents/with secret redaction, artifact cache semantics (profile id + model + prompt in the key, command path excluded), andenv.var(...)selection — see api.md for the reference and registered claims. Shipped in 0.2.4:stdoutTopropose-only artifacts, thedoctormissing-outputs warning (decision 5, second half, surfaced through doctor rather than definePipeline so metadata reads stay silent), and the canonical mocked example, examples/agent-claims-repair (action item 5).Boundary amendment (decision 3): the per-step default-deny policy for an agent’s own tool calls is not enforceable from outside the adapter — the pipeline governs the adapter spawn (executor, env, redaction, evidence), but another program’s internal tool use can only be restricted by that program’s own permission surface (e.g.
claude -pprint mode denying writes, or explicit--allowedToolsflags in the profile command). The honest contract: declare permission flags in the profile command where the adapter supports them, prefer propose-only outputs (stdoutTo) over agent-side writes, and treat sandbox selection as the hard isolation boundary. Decision 3’s original wording overstated what a runner can promise; this record now claims only what it enforces.
Task steps today are sh template strings, deferred sh((ctx) => ...) callbacks, and runtime function steps. Nothing in the step model knows what an “agent” is. Teams that want a model in the loop (generate a migration, draft a fix, summarize a diff) shell out via plain sh, which silently bypasses three things this project otherwise guarantees:
sh-wrapped agent invocation is just another shell step; the agent’s own tool calls are invisible to policy..async/runs/.sh step satisfies the letter of this and defeats its purpose — two identical runs produce different artifacts with identical keys, and nothing marks the task as model-derived.Forces: keep definePipeline metadata-only (importing a pipeline must never invoke a model); keep zero runtime dependencies; keep secrets out of stored output; stay agent-CLI-agnostic (Claude Code today, anything tomorrow).
Add an agent() step constructor to the config surface, executed through an agent adapter port with command policy enforcement and transcript capture. Sketch (illustrative, not final API):
import { agent, definePipeline, sh, task } from "@async/pipeline";
export default definePipeline({
name: "app",
agents: {
claude: { command: ["claude", "-p"], model: "claude-sonnet-4-6" }
},
tasks: {
"draft-migration": task({
inputs: ["schema/**/*.sql"],
outputs: ["migrations/next.sql"],
cache: true,
run: agent({
use: "claude",
prompt: "Write the SQL migration that reconciles schema/ with migrations/.",
commands: { fallback: "deny", rules: [/* explicit allows */] }
})
}),
"verify-migration": task({
dependsOn: ["draft-migration"],
run: sh`pnpm migrate:dry-run`
})
}
});
Decisions bundled here:
agent() creates step metadata. metadata, list, graph, and explain describe agent steps without invoking anything, exactly like deferred sh callbacks today.fallback: allow; an agent() step with no commands block gets fallback: deny plus the adapter command itself. The asymmetry is deliberate: a human wrote the shell step, a model improvises its tool calls..async/runs/<run-id>/agents/<task>.jsonl, bounded and redacted by the same machinery as task logs.outputs; validation warns when they do not (an agent task without outputs is unverifiable side effects).verify-migration above) is the recommended consumer of any agent task. Enforcement (e.g. requiring it) is deferred until real usage shows where it helps versus annoys.env sources. Profile fields and the use: selection accept env.var(...) like any env value — resolved at run time from process.env locally and rendered as $ in generated workflows. The recommended pattern selects among declared profiles (use: env.var("ASYNC_AGENT", { default: "claude" }), or a --agent flag mirroring --sandbox) rather than injecting raw command strings, so every candidate stays inspectable in metadata. The resolved adapter id and model id enter the cache key; the adapter’s binary path never does, consistent with the rule that cache keys exclude absolute machine paths. A mock profile (model: "mock") therefore keys separately from a real one — a CI mock can neither replay nor poison real artifacts. Credentials ride env.secret(...) in task env, never the command line, inheriting the existing redaction promise.agent() step + adapter port (proposed)| Dimension | Assessment |
|---|---|
| Complexity | Medium — new step kind, adapter port, policy default flip |
| Dependency cost | None — adapters are command templates |
| Safety | Policy-enforced, transcripted, sandboxable per existing --sandbox |
| Cache semantics | Explicit and documented |
Pros: agent execution inherits every existing boundary; evidence model extends naturally; agent-CLI-agnostic.
Cons: grows the frozen-at-1.0 config surface; cache-key composition for prompts needs careful spec (prompt templates referencing ctx must resolve before keying, like deferred sh).
sh| Dimension | Assessment |
|---|---|
| Complexity | Zero |
| Dependency cost | Zero |
| Safety | Policy sees the launch command only; agent tool calls unbounded |
| Cache semantics | Accidental — nondeterminism hidden behind a normal-looking step |
Pros: works today; no API growth. Cons: every guarantee this package markets (inspectable boundaries, evidence, explicit cache) is silently absent exactly where it matters most; no transcript; no redaction of model output.
@async/pipeline-agent package wrapping a model SDK| Dimension | Assessment |
|---|---|
| Complexity | High — SDK version churn, per-vendor surface |
| Dependency cost | Contained to opt-in package, but real |
| Safety | Strong (API-level tool gating) but vendor-coupled |
| Cache semantics | Same questions as A |
Pros: richer control than CLI spawning (structured tool calls, token budgets). Cons: picks vendors; duplicates what agent CLIs already do; the core still needs the step type for metadata, so this is A plus an SDK, not instead of A.
The real decision is B versus A: whether agent invocation is visible to the model of the pipeline. Everything this project claims — inspectable commands, evidence under .async/, explicit cache behavior — argues that an execution class with different trust properties must be a different step kind. Option C is a later refinement of A’s adapter port, not a competitor: if a team wants SDK-level control, the adapter interface is where it plugs in.
The riskiest piece of A is cache semantics. Replaying a cached artifact that a model produced is correct under this package’s own definition (key = inputs + steps + dependencies), but humans may expect “agent task” to mean “fresh thinking each run”. The mitigation is the explicit rule plus --force already existing for exactly this intent.
agent() task with a particular prompt and policy” instead of bespoke machinery.agent(), agents, and adapter config join the surface that must stop moving; see Path to 1.0.agents config block in pipeline-core (types + validation, no execution).pipeline-node: spawn through command executor, transcript capture, redaction, policy default flip.computeTaskCacheKey for agent steps.tests/claims.json with PROMISE: tests; CHANGELOG entry; regenerate sync surfaces.examples/ exercised by release:check, mocking the agent CLI via command.mock(...) so CI needs no model.