Codex Operating Practices

中文 | English

Summary

Codex works best when it is treated as a configurable teammate rather than a single-turn assistant: task context is structured up front, durable guidance lives in repo and user config, repeated work is packaged into skills or automations, and higher-autonomy execution surfaces are used with explicit permission and verification boundaries.

Task Framing And Durable Guidance

Start prompts with a clear goal, the relevant local context, explicit constraints, and a concrete "done when" condition.
For ambiguous or multi-step work, plan first through plan mode, interview-style clarification, or a durable execution-plan artifact.
Move stable instructions out of ad hoc prompts and into AGENTS.md, code_review.md, skills, or checked-in repo docs.
For frontend-heavy work, keep visual rules in repo-local design contracts such as DESIGN.md instead of repeatedly restating taste in task prompts.
Keep AGENTS.md short and operational: it should orient the agent, not duplicate every deeper document inline.

Configuration Surfaces

Use ~/.codex/config.toml for personal defaults and .codex/config.toml for trusted project-scoped behavior.
Treat CLI flags, --config, and profiles as higher-precedence overrides for one-off or named operating modes.
approval_policy, sandbox_mode, and web_search are not cosmetic settings; they define how much Codex can do and how much external risk it takes on.
requirements.toml matters in managed environments because it can pin or forbid security-sensitive options that users cannot override.
The sample and reference docs are best used as searchable maps: copy only the keys you need rather than pasting a monolithic config.

Capability Packaging

Use MCP when the needed context lives outside the repo and changes often enough that pasted instructions will drift.
Turn recurring repo-local workflows into skills once the method stabilizes; package them as plugins only when they need broader distribution or bundled integrations.
Named command families are useful when a team wants the whole loop to stay legible: ideation, planning, implementation, review, and knowledge-compounding steps can be exposed as explicit operations instead of one undifferentiated prompt habit.
Keep hooks, slash commands, skills, MCP budgets, and memory workflows as distinct operating surfaces; treating them as one undifferentiated "config" layer makes debugging and reuse harder.
The llm-wiki skill is a concrete example of a domain-specific operating bundle: one maintainer spec plus narrow helper scripts can turn a markdown repository into a repeatable long-running workflow.
Third-party harness repos can package Codex as a first-class surface, not just a compatibility note: AGENTS.md, project config.toml, role definitions, and skill layout can all be shipped alongside Claude-facing assets while preserving one shared operating philosophy.
Ecosystem playbooks are starting to ship Codex alongside other agent clients instead of after them: plugin-conversion CLIs, shared workflow catalogs, and “best-practice” repos increasingly treat Codex as one peer execution surface.
Bridge plugins matter as a separate interoperability layer: a foreign harness can expose Codex as native slash commands for review, delegation, background-job control, and even blocking review gates without forking the underlying Codex runtime.
Prompt-surface catalogs are becoming useful adjacent artifacts too: even when they target Claude Code rather than Codex directly, they show that mature operator tooling increasingly treats built-in prompt layers, slash-command behavior, memory routines, and safety logic as inspectable runtime surfaces.
Assembly analyses matter alongside prompt catalogs: they show how output style, tool availability, subagents, skills, memory systems, MCP instructions, git state, and other runtime conditions determine which prompt fragments actually appear in a live session.
Team distribution matters as much as packaging syntax: global install paths, repo bootstrap commands, and named specialist-role entry points make it easier to reproduce one operating model across many developers instead of one power user's laptop.
Use subagents when the work decomposes cleanly into bounded parallel lanes; otherwise they mainly add context, token, and coordination overhead.
Use automations only after a workflow is predictable by hand; skills define the method, while automations define the schedule.

Execution And Verification Surfaces

Interactive sessions are the default surface for exploratory or iterative work because they keep the approval loop visible.
Slash commands are the interactive control plane for live sessions: they let you adjust model, permissions, planning mode, compaction, review, and diagnostics without restarting.
codex exec is the narrow, scriptable surface for CI, scheduled runs, and pipeline steps that need JSONL events, schema-constrained outputs, or resumable non-interactive sessions.
The implementation repository shows that these surfaces are not paper abstractions: app-server request routing, thread lifecycle, command-exec control, and review flows are all represented as explicit runtime subsystems rather than thin CLI conveniences.
Non-interactive runs should keep the least permissions needed and preserve verification artifacts, especially when they write files or open follow-up PRs.
Treat independent contexts as a verification tool, not only as a concurrency trick: plan review, separate code review, and fresh-window debugging often outperform one overloaded thread.
Review and testing are part of the operating model: Codex should not just produce code, but also run checks, inspect diffs, and explain whether the requested outcome was actually met.

Guardrails

Do not stuff durable repo policy into every prompt; keep prompts task-local and keep operating rules versioned.
Do not widen sandbox or approval settings until the workflow is understood and the repository is trusted.
Do not automate a workflow that still needs manual steering or ambiguous human judgment at every step.
Do not keep unrelated tasks in one long thread; keep one thread per coherent unit of work and fork only when the work truly branches.

claude-code

everything-claude-code

claude-mythos-preview

multica-ai

piebald-ai

voltagent

awesome-design-md

design-md

codex

skills

llm-wiki

anthropic

claude-code

everything-claude-code

claude-mythos-preview

cocoon-ai

dbreunig

everyinc

garrytan

github

multica-ai

piebald-ai

voltagent

karpathy

openai

codex

configuration

skills

ralph

shanraisshan

Codex Operating Practices

Summary

Task Framing And Durable Guidance

Configuration Surfaces

Capability Packaging

Execution And Verification Surfaces

Guardrails

Sources

everything-claude-code

awesome-design-md

design-md

skills

llm-wiki

claude-code

everything-claude-code

claude-mythos-preview

multica-ai

piebald-ai

voltagent

codex

configuration

skills

Codex Operating Practices ​

Summary ​

Task Framing And Durable Guidance ​

Configuration Surfaces ​

Capability Packaging ​

Execution And Verification Surfaces ​

Guardrails ​

Sources ​

Related Pages ​

Codex Operating Practices

Summary

Task Framing And Durable Guidance

Configuration Surfaces

Capability Packaging

Execution And Verification Surfaces

Guardrails

Sources

Related Pages