Skip to content
Built 26/04/19 09:34commit 1ac42b7

Anthropic Harness Design For Long-Running Application Development

中文 | English

Summary

This source describes how Anthropic evolved a long-running coding harness from simple decomposition into a planner-generator-evaluator architecture, with the strongest gains coming from explicit evaluation, structured handoff artifacts, and periodic simplification as model capabilities improve.

Source

Key Contributions

  • Recasts multi-agent coding harnesses in generator-evaluator terms, with a planner added to expand underspecified prompts into product specs.
  • Argues that self-evaluation is weak by default and that a separate skeptical evaluator is easier to tune than a self-critical generator.
  • Distinguishes context resets from compaction: resets solve context anxiety more cleanly, but add orchestration cost.
  • Shows that scaffolding should be treated as temporary and load-bearing assumptions should be re-tested as models improve.
  • Makes verification concrete through evaluator tooling, sprint contracts, and thresholded grading criteria.

Strongest Claims

  • Planner, generator, and evaluator roles create better long-running coding outcomes than a solo agent on tasks near the model's capability boundary.
  • Structured artifacts and explicit handoffs matter because long-running work loses coherence over time.
  • Evaluators are not universally required; they are worth the cost only when the task sits beyond what the current model handles reliably on its own.
  • Harnesses should become simpler when newer models absorb responsibilities that the scaffold previously had to supply.

Practical Implications For This Vault

  • Source ingestion should preserve structured artifacts when a source describes an operating method rather than just a concept.
  • Topic pages about agent systems should capture not only architecture but also when each layer stops being worth its complexity.
  • Lint passes should look for sources that imply missing canonical topics, not just broken links.