Built 26/04/17 09:31commit 4c9ce40
How To Make Agents Run Longer
中文 | English
Summary
To make an agent run for long periods without drifting, optimize the environment around it rather than only changing the prompt: break work into explicit roles or artifacts, make the repository and runtime legible, and add control loops that let the agent verify and recover continuously.
Practical Checklist
- Give the agent a clear planning surface: a planner, spec, sprint contract, or execution artifact that turns a vague goal into explicit deliverables.
- Separate doing from judging when quality matters: use an evaluator, QA pass, or independent review loop instead of relying on self-evaluation alone.
- Preserve state in durable artifacts: plans, decision logs, handoff files, and repo-local docs keep long runs coherent across resets or long sessions.
- Separate durable session state from the live context window when possible, so recovery does not depend on one irreversible compaction strategy.
- Make the app legible: expose UI state, logs, metrics, traces, and repeatable startup paths so the agent can inspect behavior directly.
- Keep the repository navigable: use a short
AGENTS.mdas a map and push deeper guidance into indexed docs rather than one huge instruction file. - Enforce invariants mechanically: linters, structural tests, architecture boundaries, and remediation-friendly error messages reduce drift.
- Treat cleanup as part of the loop: recurring refactors and slop cleanup stop bad patterns from compounding over multi-hour or multi-day work.
- Re-test the harness itself: as models improve, remove scaffolding that no longer pays for its complexity.
- If building the harness is not your core job, consider using a managed meta-harness instead of owning the full loop, sandbox, and tool-runtime stack yourself.
What Usually Fails
- One giant prompt or instruction file that mixes every rule together.
- Asking the same agent to build, grade, and bless its own work.
- Keeping key context in chat, Slack, or human memory instead of in the repository.
- Letting the agent operate on a black box with no direct access to runtime signals.
- Adding harness complexity without checking whether it is still load-bearing.
Rule Of Thumb
If an agent cannot run long, the problem is often not “reason harder” but “make the task easier to inspect, verify, and resume.” Long-running performance comes from control systems, not just longer prompts.
Sources
- Codex Operating Practices
- Agent Harness Design
- Agent-First Repositories
- Anthropic Harness Design For Long-Running Application Development
- Claude Managed Agents Overview
- Scaling Managed Agents: Decoupling The Brain From The Hands
- OpenAI Harness Engineering In An Agent-First World