Skip to content
Built 26/04/17 09:08commit f8ff6f9

Codex Operating Practices

中文 | English

Summary

Codex works best when it is treated as a configurable teammate rather than a single-turn assistant: task context is structured up front, durable guidance lives in repo and user config, repeated work is packaged into skills or automations, and higher-autonomy execution surfaces are used with explicit permission and verification boundaries.

Task Framing And Durable Guidance

  • Start prompts with a clear goal, the relevant local context, explicit constraints, and a concrete "done when" condition.
  • For ambiguous or multi-step work, plan first through plan mode, interview-style clarification, or a durable execution-plan artifact.
  • Move stable instructions out of ad hoc prompts and into AGENTS.md, code_review.md, skills, or checked-in repo docs.
  • For frontend-heavy work, keep visual rules in repo-local design contracts such as DESIGN.md instead of repeatedly restating taste in task prompts.
  • Keep AGENTS.md short and operational: it should orient the agent, not duplicate every deeper document inline.

Configuration Surfaces

  • Use ~/.codex/config.toml for personal defaults and .codex/config.toml for trusted project-scoped behavior.
  • Treat CLI flags, --config, and profiles as higher-precedence overrides for one-off or named operating modes.
  • approval_policy, sandbox_mode, and web_search are not cosmetic settings; they define how much Codex can do and how much external risk it takes on.
  • requirements.toml matters in managed environments because it can pin or forbid security-sensitive options that users cannot override.
  • The sample and reference docs are best used as searchable maps: copy only the keys you need rather than pasting a monolithic config.

Capability Packaging

  • Use MCP when the needed context lives outside the repo and changes often enough that pasted instructions will drift.
  • Turn recurring repo-local workflows into skills once the method stabilizes; package them as plugins only when they need broader distribution or bundled integrations.
  • Named command families are useful when a team wants the whole loop to stay legible: ideation, planning, implementation, review, and knowledge-compounding steps can be exposed as explicit operations instead of one undifferentiated prompt habit.
  • Keep hooks, slash commands, skills, MCP budgets, and memory workflows as distinct operating surfaces; treating them as one undifferentiated "config" layer makes debugging and reuse harder.
  • The llm-wiki skill is a concrete example of a domain-specific operating bundle: one maintainer spec plus narrow helper scripts can turn a markdown repository into a repeatable long-running workflow.
  • Third-party harness repos can package Codex as a first-class surface, not just a compatibility note: AGENTS.md, project config.toml, role definitions, and skill layout can all be shipped alongside Claude-facing assets while preserving one shared operating philosophy.
  • Ecosystem playbooks are starting to ship Codex alongside other agent clients instead of after them: plugin-conversion CLIs, shared workflow catalogs, and “best-practice” repos increasingly treat Codex as one peer execution surface.
  • Bridge plugins matter as a separate interoperability layer: a foreign harness can expose Codex as native slash commands for review, delegation, background-job control, and even blocking review gates without forking the underlying Codex runtime.
  • Prompt-surface catalogs are becoming useful adjacent artifacts too: even when they target Claude Code rather than Codex directly, they show that mature operator tooling increasingly treats built-in prompt layers, slash-command behavior, memory routines, and safety logic as inspectable runtime surfaces.
  • Assembly analyses matter alongside prompt catalogs: they show how output style, tool availability, subagents, skills, memory systems, MCP instructions, git state, and other runtime conditions determine which prompt fragments actually appear in a live session.
  • Team distribution matters as much as packaging syntax: global install paths, repo bootstrap commands, and named specialist-role entry points make it easier to reproduce one operating model across many developers instead of one power user's laptop.
  • Use subagents when the work decomposes cleanly into bounded parallel lanes; otherwise they mainly add context, token, and coordination overhead.
  • Use automations only after a workflow is predictable by hand; skills define the method, while automations define the schedule.

Execution And Verification Surfaces

  • Interactive sessions are the default surface for exploratory or iterative work because they keep the approval loop visible.
  • Slash commands are the interactive control plane for live sessions: they let you adjust model, permissions, planning mode, compaction, review, and diagnostics without restarting.
  • codex exec is the narrow, scriptable surface for CI, scheduled runs, and pipeline steps that need JSONL events, schema-constrained outputs, or resumable non-interactive sessions.
  • The implementation repository shows that these surfaces are not paper abstractions: app-server request routing, thread lifecycle, command-exec control, and review flows are all represented as explicit runtime subsystems rather than thin CLI conveniences.
  • Non-interactive runs should keep the least permissions needed and preserve verification artifacts, especially when they write files or open follow-up PRs.
  • Treat independent contexts as a verification tool, not only as a concurrency trick: plan review, separate code review, and fresh-window debugging often outperform one overloaded thread.
  • Review and testing are part of the operating model: Codex should not just produce code, but also run checks, inspect diffs, and explain whether the requested outcome was actually met.

Guardrails

  • Do not stuff durable repo policy into every prompt; keep prompts task-local and keep operating rules versioned.
  • Do not widen sandbox or approval settings until the workflow is understood and the repository is trusted.
  • Do not automate a workflow that still needs manual steering or ambiguous human judgment at every step.
  • Do not keep unrelated tasks in one long thread; keep one thread per coherent unit of work and fork only when the work truly branches.

Sources