Skip to content
Built 26/04/17 09:39commit 8de3d61

Agent Harness Design

中文 | English

Summary

Agent harness design is the practice of adding just enough orchestration around a model to keep long-running work coherent, verifiable, and revisable, while continuously re-testing whether each piece of scaffolding is still necessary.

Core Patterns

  • A planner expands short prompts into a richer product or task spec so the implementation agent does not under-scope the job.
  • A generator performs the substantive build work, usually against explicit contracts or structured deliverables.
  • An evaluator checks the output with independent criteria and tooling, producing actionable feedback instead of self-congratulatory review.
  • Structured artifacts and handoff files preserve state across long runs, context resets, or agent boundaries; these can stay lightweight, such as session logs, PRDs, codemaps, or review notes.
  • A minimal monolithic loop can be a legitimate harness too: keep one fresh-context agent focused on one important item per pass, and externalize memory into repo artifacts instead of adding premature orchestration.
  • Repository legibility is part of the harness, not a separate concern: plans, docs, tools, and review loops all shape what the agent can reliably do.
  • Subagents and session controls help only when the work decomposes cleanly; otherwise they add token and coordination overhead faster than they add leverage.

Evaluation Lessons

  • Self-evaluation is systematically lenient; separate evaluators are easier to calibrate toward skepticism.
  • Evaluation works better when subjective judgments are translated into concrete criteria.
  • Interactive verification tools matter because screenshots and static inspection miss behavioral defects.
  • Verification loops get stronger when cheap graders, transcript reading, pass-rate metrics, and human review are treated as distinct layers instead of one all-purpose check.
  • When older benchmarks saturate, open-ended real-world tasks plus a final confirmation pass become a better capability readout than repeated replay exercises.
  • High-throughput agent teams often converge on many small PRs, squash-heavy merge policies, and explicit review agents because the merge queue becomes the real bottleneck before raw implementation speed does.

Simplification Heuristic

  • Every harness component encodes an assumption about what the base model cannot yet do well.
  • As models improve, previously essential constructs such as sprint decomposition or repeated QA passes may become unnecessary overhead.
  • Simplification should be methodical: remove one component at a time and inspect what quality or reliability was lost.

Environment Design Lessons

  • Harness quality depends on how much of the product and runtime are directly legible to the agent.
  • Short routing documents plus indexed deeper documentation scale better than one giant instruction file.
  • Architecture boundaries, custom lints, and repo-local plans are part of the control system that keeps autonomous work coherent.
  • Hooks, commands, and reusable skills are also part of the environment layer because they can enforce repeated checks without making every prompt longer.
  • Model upgrades can require harness retuning too: stronger literal instruction following, better file-system memory, higher-resolution vision, and new effort or review controls all change how prompts, budgets, and verification loops should be set.
  • Harnesses can also be shipped as reusable repository bundles with scripts, skills, and plugin metadata, not just reconstructed from prose guidance.
  • Hosted meta-harnesses are a real design option too: some teams should own only the task contract and environment policy, while buying the loop, session durability, and tool runtime as managed primitives.
  • Team-facing managed-agent platforms add another environment layer beyond repo-local scaffolding: issue boards, daemon-attached runtimes, runtime routing, and assignable agent identities can all become part of the harness surface that coordinates long-running work.

Interface Design Lessons

  • Stable interfaces can outlast one concrete harness implementation, just as operating-system abstractions outlast hardware generations.
  • Session durability should be modeled separately from the model's active context window, so recovery and context management do not collapse into one irreversible mechanism.
  • Decoupling the brain, hands, and session makes failure recovery, security boundaries, and scaling behavior easier to reason about than a single all-in-one container.
  • Treat credentials as structurally outside the sandbox where generated code runs; this is stronger than assuming the model will always respect narrower scopes.

When This Matters

  • Long-running coding tasks where coherence degrades over time.
  • Subjective domains such as design, where quality must be made gradable.
  • Product builds that need both ambitious planning and skeptical final verification.
  • Repositories where agents are expected to open, review, and merge changes with limited human intervention.

Sources