How To Make Agents Run Longer

中文 | English

Summary

To make an agent run for long periods without drifting, optimize the environment around it rather than only changing the prompt: break work into explicit roles or artifacts, make the repository and runtime legible, and add control loops that let the agent verify and recover continuously.

Practical Checklist

Give the agent a clear planning surface: a planner, spec, sprint contract, or execution artifact that turns a vague goal into explicit deliverables.
Separate doing from judging when quality matters: use an evaluator, QA pass, or independent review loop instead of relying on self-evaluation alone.
Preserve state in durable artifacts: plans, decision logs, handoff files, and repo-local docs keep long runs coherent across resets or long sessions.
Separate durable session state from the live context window when possible, so recovery does not depend on one irreversible compaction strategy.
Make the app legible: expose UI state, logs, metrics, traces, and repeatable startup paths so the agent can inspect behavior directly.
Keep the repository navigable: use a short AGENTS.md as a map and push deeper guidance into indexed docs rather than one huge instruction file.
Enforce invariants mechanically: linters, structural tests, architecture boundaries, and remediation-friendly error messages reduce drift.
Treat cleanup as part of the loop: recurring refactors and slop cleanup stop bad patterns from compounding over multi-hour or multi-day work.
Re-test the harness itself: as models improve, remove scaffolding that no longer pays for its complexity.
If building the harness is not your core job, consider using a managed meta-harness instead of owning the full loop, sandbox, and tool-runtime stack yourself.

What Usually Fails

One giant prompt or instruction file that mixes every rule together.
Asking the same agent to build, grade, and bless its own work.
Keeping key context in chat, Slack, or human memory instead of in the repository.
Letting the agent operate on a black box with no direct access to runtime signals.
Adding harness complexity without checking whether it is still load-bearing.

Rule Of Thumb

If an agent cannot run long, the problem is often not “reason harder” but “make the task easier to inspect, verify, and resume.” Long-running performance comes from control systems, not just longer prompts.

claude-code

everything-claude-code

claude-mythos-preview

multica-ai

piebald-ai

voltagent

awesome-design-md

design-md

codex

skills

llm-wiki

anthropic

claude-code

everything-claude-code

claude-mythos-preview

cocoon-ai

dbreunig

everyinc

garrytan

github

multica-ai

piebald-ai

voltagent

karpathy

openai

codex

configuration

skills

ralph

shanraisshan

How To Make Agents Run Longer

Summary

Practical Checklist

What Usually Fails

Rule Of Thumb

Sources

everything-claude-code

awesome-design-md

design-md

skills

llm-wiki

claude-code

everything-claude-code

claude-mythos-preview

multica-ai

piebald-ai

voltagent

codex

configuration

skills

How To Make Agents Run Longer ​

Summary ​

Practical Checklist ​

What Usually Fails ​

Rule Of Thumb ​

Sources ​

Related Pages ​

How To Make Agents Run Longer

Summary

Practical Checklist

What Usually Fails

Rule Of Thumb

Sources

Related Pages