Built 26/04/17 09:08commit f8ff6f9
Codex Operating Practices
中文 | English
Summary
Codex works best when it is treated as a configurable teammate rather than a single-turn assistant: task context is structured up front, durable guidance lives in repo and user config, repeated work is packaged into skills or automations, and higher-autonomy execution surfaces are used with explicit permission and verification boundaries.
Task Framing And Durable Guidance
- Start prompts with a clear goal, the relevant local context, explicit constraints, and a concrete "done when" condition.
- For ambiguous or multi-step work, plan first through plan mode, interview-style clarification, or a durable execution-plan artifact.
- Move stable instructions out of ad hoc prompts and into
AGENTS.md,code_review.md, skills, or checked-in repo docs. - For frontend-heavy work, keep visual rules in repo-local design contracts such as
DESIGN.mdinstead of repeatedly restating taste in task prompts. - Keep
AGENTS.mdshort and operational: it should orient the agent, not duplicate every deeper document inline.
Configuration Surfaces
- Use
~/.codex/config.tomlfor personal defaults and.codex/config.tomlfor trusted project-scoped behavior. - Treat CLI flags,
--config, and profiles as higher-precedence overrides for one-off or named operating modes. approval_policy,sandbox_mode, andweb_searchare not cosmetic settings; they define how much Codex can do and how much external risk it takes on.requirements.tomlmatters in managed environments because it can pin or forbid security-sensitive options that users cannot override.- The sample and reference docs are best used as searchable maps: copy only the keys you need rather than pasting a monolithic config.
Capability Packaging
- Use MCP when the needed context lives outside the repo and changes often enough that pasted instructions will drift.
- Turn recurring repo-local workflows into skills once the method stabilizes; package them as plugins only when they need broader distribution or bundled integrations.
- Named command families are useful when a team wants the whole loop to stay legible: ideation, planning, implementation, review, and knowledge-compounding steps can be exposed as explicit operations instead of one undifferentiated prompt habit.
- Keep hooks, slash commands, skills, MCP budgets, and memory workflows as distinct operating surfaces; treating them as one undifferentiated "config" layer makes debugging and reuse harder.
- The
llm-wikiskill is a concrete example of a domain-specific operating bundle: one maintainer spec plus narrow helper scripts can turn a markdown repository into a repeatable long-running workflow. - Third-party harness repos can package Codex as a first-class surface, not just a compatibility note:
AGENTS.md, projectconfig.toml, role definitions, and skill layout can all be shipped alongside Claude-facing assets while preserving one shared operating philosophy. - Ecosystem playbooks are starting to ship Codex alongside other agent clients instead of after them: plugin-conversion CLIs, shared workflow catalogs, and “best-practice” repos increasingly treat Codex as one peer execution surface.
- Bridge plugins matter as a separate interoperability layer: a foreign harness can expose Codex as native slash commands for review, delegation, background-job control, and even blocking review gates without forking the underlying Codex runtime.
- Prompt-surface catalogs are becoming useful adjacent artifacts too: even when they target Claude Code rather than Codex directly, they show that mature operator tooling increasingly treats built-in prompt layers, slash-command behavior, memory routines, and safety logic as inspectable runtime surfaces.
- Assembly analyses matter alongside prompt catalogs: they show how output style, tool availability, subagents, skills, memory systems, MCP instructions, git state, and other runtime conditions determine which prompt fragments actually appear in a live session.
- Team distribution matters as much as packaging syntax: global install paths, repo bootstrap commands, and named specialist-role entry points make it easier to reproduce one operating model across many developers instead of one power user's laptop.
- Use subagents when the work decomposes cleanly into bounded parallel lanes; otherwise they mainly add context, token, and coordination overhead.
- Use automations only after a workflow is predictable by hand; skills define the method, while automations define the schedule.
Execution And Verification Surfaces
- Interactive sessions are the default surface for exploratory or iterative work because they keep the approval loop visible.
- Slash commands are the interactive control plane for live sessions: they let you adjust model, permissions, planning mode, compaction, review, and diagnostics without restarting.
codex execis the narrow, scriptable surface for CI, scheduled runs, and pipeline steps that need JSONL events, schema-constrained outputs, or resumable non-interactive sessions.- The implementation repository shows that these surfaces are not paper abstractions: app-server request routing, thread lifecycle, command-exec control, and review flows are all represented as explicit runtime subsystems rather than thin CLI conveniences.
- Non-interactive runs should keep the least permissions needed and preserve verification artifacts, especially when they write files or open follow-up PRs.
- Treat independent contexts as a verification tool, not only as a concurrency trick: plan review, separate code review, and fresh-window debugging often outperform one overloaded thread.
- Review and testing are part of the operating model: Codex should not just produce code, but also run checks, inspect diffs, and explain whether the requested outcome was actually met.
Guardrails
- Do not stuff durable repo policy into every prompt; keep prompts task-local and keep operating rules versioned.
- Do not widen sandbox or approval settings until the workflow is understood and the repository is trusted.
- Do not automate a workflow that still needs manual steering or ambiguous human judgment at every step.
- Do not keep unrelated tasks in one long thread; keep one thread per coherent unit of work and fork only when the work truly branches.
Sources
- OpenAI Harness Engineering In An Agent-First World
- Affaan Mustafa Claude Code Shorthand Guide
- Affaan Mustafa Claude Code Longform Guide
- Codex LLM Wiki Skill
- Codex Best Practices
- Codex Config Basics
- Codex Advanced Configuration
- Codex Configuration Reference
- Codex Sample Configuration
- Codex CLI Slash Commands
- Codex Non-Interactive Mode
- Codex Agent Skills
- Codex Subagents
- Codex Plugin For Claude Code
- GitHub OpenAI Codex Repository
- Everything Claude Code GitHub Repository
- VoltAgent Awesome DESIGN.md
- Compound Engineering Plugin
- gstack
- Claude Code Best Practice Repository
- Claude Code Best Practice Tips Compendium
- Claude Code System Prompts Repository
- How Claude Code Builds a System Prompt
- Codex CLI Best Practice
- Karpathy Claude Coding Thread