中文 | English
LLM Wiki
Use this skill to operate a repository-backed markdown wiki in the spirit of Karpathy's LLM Wiki gist:
- raw source substance stays stable even when the raw directory is reorganized
- the wiki is the persistent, compounding artifact
index.mdandlog.mdare first-class parts of the systemAGENTS.mdtells the model how to maintain the wiki- the vault itself is maintained as a git repository
The important rule is division of labor:
- the model does the reading, synthesis, page creation, page revision, contradiction handling, and cross-linking
- scripts only help with vault discovery, vault bootstrap, structural lint, and safe path refactors
Do not turn ingest, query, or synthesis into rigid scripts unless the user explicitly asks for that.
Maintainer Posture
Act like the maintainer of a living wiki, not like a one-shot summarizer.
That means:
- prefer integration over isolated note-taking
- prefer revising canonical pages over creating duplicate pages
- prefer preserving durable knowledge in markdown over leaving it trapped in chat history
- prefer making disagreements and uncertainty explicit over silently flattening them away
- prefer intentional git history so the vault's evolution stays inspectable
- prefer synchronized English and Chinese sibling pages over ad hoc mixed-language pages
- prefer deliberate taxonomy refactors over letting the wiki degrade into flat, oversized folders
- when asked for routine cleanup or directory organization, check for obvious
raw/orwiki/taxonomy drift before settling for hygiene-only edits
Root Resolution
Before doing wiki work, resolve the vault root in this order:
--root <path>LLM_WIKI_ROOT~/.llm-wiki/config.jsonroot- nearest ancestor containing
raw/,wiki/,AGENTS.md,index.md, andlog.md
Use:
python3 .codex/skills/llm-wiki/scripts/resolve_vault.pyPrint the full resolved config:
python3 .codex/skills/llm-wiki/scripts/resolve_vault.py --format jsonThe only config file is the global defaults file:
~/.llm-wiki/config.json
Use it for:
root: the llm-wiki content directory- default directory and file names
- scheduler settings for automated maintenance
Do not put machine-oriented config inside each vault unless the user explicitly asks for that.
Automation
The global config may include either:
- a legacy
schedulerobject for one periodic maintenance task - a
schedulerdefaults object plus aschedulersobject for multiple named maintenance tasks
Recommended per-task fields:
enabledinterval_minuteslabelcodex_pathlog_dirlock_filestate_filemax_runtime_minutesprompt
Recommended split:
- a frequent content-maintenance task that integrates new raw material and repairs high-priority drift
- a daily taxonomy-maintenance task that checks
raw/andwiki/directory structure against the refactor heuristics and only lands moves when the structure is genuinely drifting - keep the taxonomy task narrower than a general cleanup pass: it should prefer one high-value subtree extraction over a broad, uniform re-nesting of every family in sight
Use these helper scripts:
python3 .codex/skills/llm-wiki/scripts/manage_launch_agent.py install
python3 .codex/skills/llm-wiki/scripts/manage_launch_agent.py status
python3 .codex/skills/llm-wiki/scripts/manage_launch_agent.py run-now
python3 .codex/skills/llm-wiki/scripts/manage_launch_agent.py uninstall
python3 .codex/skills/llm-wiki/scripts/manage_launch_agent.py status taxonomy_patrol
python3 .codex/skills/llm-wiki/scripts/manage_launch_agent.py run-now content_patrolAutomation should run a full maintenance pass:
- detect newly added raw files
- detect raw PDF sources that still need markdown conversion, translation, or wiki integration, and choose the short-PDF or long-PDF path explicitly
- repair broken links or translation drift
- refresh index and log files
- commit material changes
- run the repository's local verification or build command before pushing when the pass changed anything user-visible
Do not force a taxonomy refactor into the same cadence as content ingestion. When the user wants both kinds of automation, prefer separate named tasks with different prompts and intervals.
When a human asks for a routine cleanup pass, do not interpret that as only .gitignore, cache, or build-artifact hygiene. First inspect whether one or two source families are already clear enough that a narrow taxonomy refactor would improve navigability immediately. Only fall back to hygiene-only cleanup when no such refactor is justified.
Automation must also avoid overlap:
- if a previous scheduled run is still active, the next run should exit without starting a second Codex process
- use a lock file so overlapping intervals do not corrupt the vault or race on git state
- persist the last run state so timeouts and failures stay visible in
status - if the previous run timed out or failed, feed its summary back into the next run so maintenance can continue instead of restarting blindly
- keep the prompt narrow and feed the run a structured context block rather than a long free-form instruction wall
- make the final pre-push step explicit in the prompt: if the repo has a local build or verification command, scheduled maintenance should run it locally before committing and pushing
Bilingual Rules
This wiki is bilingual in English and Chinese.
- English pages use the canonical unsuffixed path.
- Chinese pages use sibling
.zh.mdfiles. - Markdown raw source translations use sibling
.zh.mdfiles inraw/. - Raw translation siblings must stay faithful to the source text; do not turn
raw/*.zh.mdinto summary notes, abridged rewrites, or editorial synthesis. - Visual raw assets such as screenshots can be shared across languages without duplicate
.zhimage siblings. - Human-facing repo docs such as
README.mdand files underdocs/also use sibling.zh.mdfiles. index.mdandindex.zh.mdare the bilingual root catalogs.- Every durable page in
wiki/, including source pages, should have both language variants. - Put a language switch link directly under the title on both variants.
- Put a language switch link near the top of markdown raw source siblings as well.
- When a translated raw file exists, link it from both language variants of the source page.
- When updating a page materially, update its sibling translation in the same pass when feasible.
Translation Freshness
Treat translations as maintained siblings, not one-off exports.
- Treat
raw/*.zh.mdas translation artifacts, not synthesis surfaces. If you want to summarize, interpret, or reorganize the material, do it inwiki/. - If an English wiki page changes materially, update the
.zh.mdsibling in the same pass when feasible. - If a Chinese wiki page changes materially, verify whether the English sibling should also change.
- If an English raw source changes or is replaced, update its
raw/*.zh.mdsibling in the same pass when feasible. - If a visual raw asset changes materially, update the bilingual source pages that interpret it in the same pass when feasible.
- If a translation cannot be brought current immediately, mark the translated file or page with
translation_status: stalein frontmatter until it is synced. - Remove
translation_status: staleonce the translation is brought current.
Bootstrap
If the vault does not exist yet, initialize it with:
python3 .codex/skills/llm-wiki/scripts/bootstrap_vault.py --root .This creates:
AGENTS.mdraw/wiki/subdirectoriesindex.mdlog.md- starter overview pages
- a git repository if the vault root is not already one
For layout details, read:
references/vault-layout.mdreferences/page-schema.md
Git Workflow
Treat the vault root as a git repository.
Before modifying content:
- Check whether the vault root already has its own
.git. - If not, initialize it with
git init <vault-root>.
After any material ingest, query write-back, or lint pass:
- Review the diff.
- Check whether
originhas newer commits and sync localmasterfromorigin/masterbefore proceeding when needed. - Commit the vault changes unless the user explicitly asks you not to.
- Use an intentional commit message that explains why the knowledge base changed, not just what files changed.
- Before pushing, run the repository's local verification command or site build when one exists, and make that the final pre-push validation step.
- Push the resulting commit to both
originandgitlab.
If the repo policy requires a specific commit format, follow it. Treat origin as authoritative when the remotes diverge unless the user explicitly says otherwise.
Ingest Workflow
When the user asks to ingest a source:
- Resolve the vault root.
- Read
AGENTS.md. - Read
index.mdand the relevant existing wiki pages before editing. - Read the source from the configured raw directory.
- If the source is a screenshot or other image file, run
.codex/skills/llm-wiki/scripts/extract_visual_sources.pyfirst. - If the source is markdown with local screenshots or diagrams, OCR the local visuals when they materially affect the synthesis.
- If the source is a PDF, use the
pdfskill as the document-processing layer instead of treating the PDF like an opaque binary blob. - For PDF ingest, preserve the original
.pdffile and create a markdown raw sibling next to it before doing wiki synthesis. The markdown raw file is the maintained text surface that future patrols should revise, not a replacement for the original PDF. - If the PDF workflow depends on missing tools such as
pdfinfo,pdftotext, orpdftoppm, install the required dependencies first and then continue the ingest instead of stopping at a missing-tool error. - Choose the translation path based on document size and complexity. For shorter PDFs, prefer direct model translation from the source markdown or PDF content instead of doing a tool-generated first-pass translation plus a separate polishing pass.
- Treat a PDF as short by default when it is roughly article-sized, for example around 15 pages or fewer and without heavy table/chart density. Treat larger or structurally noisy PDFs as long-form and use the more defensive tool-assisted path.
- Extract text into markdown in reading order, repair obvious PDF-to-markdown formatting damage, and keep the markdown faithful to the source rather than turning it into a summary.
- When layout, figures, tables, or charts materially affect understanding, render the relevant PDF pages to local image assets and link them from the markdown raw file so the visual evidence stays available during later wiki maintenance.
- For short PDFs, once the English markdown raw file is in place, let the model translate directly into the
.zh.mdsibling and use one careful pass instead of forcing a separate mechanical translation-and-polish loop. - For long PDFs, create or refresh the
.zh.mdsibling as a faithful translation artifact using the tool-assisted extraction plus model cleanup path. If the translation cannot be completed in the same pass, mark it withtranslation_status: stale.
- If the source is a screenshot or other image file, run
- Create or revise the durable wiki state directly in markdown:
- a source page in
wiki/sources/ - any affected concept, entity, or topic pages
index.mdindex.zh.mdlog.md
- a source page in
- Commit the resulting vault change if files were materially updated.
- Preserve source-of-truth boundaries: never rewrite files in the raw directory.
Before editing, decide whether the source mainly:
- adds to an existing page
- contradicts an existing page
- suggests a new page should exist
- changes the current synthesis of a topic
After editing, make sure the source is reflected not only in wiki/sources/ but also in the most relevant long-lived pages. Also make sure the source page and the affected long-lived pages remain bilingual. If the ingest started from a PDF, make sure the source page links to the original PDF and to any generated markdown raw siblings for fast jumping. If a translated raw file exists or is created during the ingest, make sure the source page links to both raw variants for fast jumping. If any translated siblings were not updated during the same pass, explicitly mark them stale.
Default behavior for source pages:
- summarize what the source contributes
- record the strongest claims or observations
- record visual observations and OCR-backed excerpts when screenshots are part of the evidence
- when the source began as a PDF, record whether the markdown raw file is complete, whether visual assets were extracted, and whether the translation sibling is current
- note contradictions or uncertainty explicitly
- add links to impacted pages
The model should treat ingest as integration work, not as mere note generation. A source may require touching many existing pages.
PDF Ingest Pattern
Use this pattern whenever scheduled maintenance or manual ingest encounters a new raw PDF:
- Keep the original
raw/.../source.pdf. - Generate or refresh
raw/.../source.mdas the faithful markdown raw artifact. - Decide whether the PDF is short-form or long-form.
- For short PDFs, let the model translate directly into
raw/.../source.zh.mdonce the English markdown raw file is stable. - For long PDFs, use the tool-assisted extraction path and then use the model to repair structure, terminology, and translation quality across the raw markdown siblings.
- Add any needed local visual assets for figures, charts, or scanned pages near that source family so later maintenance can still render them correctly.
- Create or update the bilingual
wiki/sources/page so it links the PDF, markdown raw, translation raw, and any materially important visual evidence. - Update the relevant long-lived wiki pages, then run the repo's final local verification step before committing and pushing.
Taxonomy Refactors
As the corpus grows, reorganize directory structure when the current folders stop being useful.
Typical signals:
- one folder has too many loosely related pages
- a topic has clearly split into multiple stable subdomains
- source pages or answer pages want domain-specific subfolders
- several pages clearly belong to the same stable family and would be easier to browse together
- the current folder has become hard to scan even though the new family boundary is already obvious
When refactoring taxonomy:
- Decide the new folder split.
- Prefer family-first grouping over uniform depth. Extract the highest-value stable subtree first instead of forcing every family into another directory layer.
- Keep English and Chinese sibling pages side-by-side after the move.
- Avoid one-file directories unless they clearly unlock a stable family boundary that is already represented elsewhere in the corpus.
- If a source family carries supporting visual assets, strongly consider co-locating those assets with the family when doing so makes markdown paths shorter and easier to maintain.
- Move pages instead of duplicating them.
- Rewrite impacted links in the same pass.
- Update
index.md,index.zh.md, overview pages, and any affected navigation pages. - Commit the refactor as one intentional change.
Use the helper when you need safe moves with automatic link rewriting:
python3 .codex/skills/llm-wiki/scripts/move_markdown_pages.py --repo . --move wiki/topics/foo.md:wiki/topics/sub/foo.mdFor multiple moves, repeat --move.
Raw Taxonomy Refactors
Treat raw/ as stable in substance, not frozen in path layout.
Typical signals:
raw/has become a flat dump with too many unrelated files- one source family clearly deserves its own subdirectory
- translated raw siblings are hard to find beside the original source
- supporting visual assets clearly belong with one source family and would become easier to maintain if co-located
- the family boundary is already obvious enough that waiting for a file-count threshold would only add clutter
When refactoring raw taxonomy:
- Preserve source substance; only rewrite path-dependent metadata or local links needed because files moved.
- Move original raw files and
.zh.mdsiblings together when both exist. - Prefer pulling a stable source family into its own subtree over flattening everything into one broad provider directory.
- Do not deepen a directory just because it is possible; if only one English/Chinese pair would live there and no broader family boundary becomes clearer, leave it where it is.
- When a raw source depends on a stable local asset bundle such as screenshots or rendered PDF pages, prefer co-locating the asset subtree with that raw family if the relative links become simpler.
- Rewrite impacted links in repo docs, wiki pages, indexes, and affected raw markdown files in the same pass.
- Re-run lint so broken raw links or missing raw translations are caught immediately.
- Run the repository's local verification or build command before commit/push when the refactor affects rendered output.
- Commit the refactor as one intentional change.
Use the helper for safe raw moves with link repair:
python3 .codex/skills/llm-wiki/scripts/move_raw_sources.py --repo . --move raw/foo.md:raw/articles/foo.mdDirectory moves are allowed too:
python3 .codex/skills/llm-wiki/scripts/move_raw_sources.py --repo . --move raw/clippings:raw/articles/clippingsVisual Source Extraction
Use OCR before integrating screenshot-heavy sources into the wiki.
Supported inputs:
- direct image files under
raw/, such as.png,.jpg,.jpeg,.webp,.heic, and.tiff - markdown raw sources that reference local image files
Use the helper:
python3 .codex/skills/llm-wiki/scripts/extract_visual_sources.py --repo . --path raw/screenshots/foo.png --lang chi_sim+engOr process every raw image:
python3 .codex/skills/llm-wiki/scripts/extract_visual_sources.py --repo . --all-raw-imagesThe helper:
- extracts OCR text with local
tesseract - defaults to a compact Chinese-first OCR bundle when
chi_simorchi_trais installed locally - supports Chinese OCR when local Tesseract language packs such as
chi_simorchi_traare installed - records basic image metadata such as format and dimensions
- supports batch extraction for screenshot-heavy vaults
- does not update the wiki by itself; the model still has to integrate the extracted evidence into source, topic, or answer pages
Query Workflow
When the user asks a knowledge question:
- Resolve the vault root.
- Read
index.mdfirst. - Read only the pages needed to answer well.
- Answer with page-level citations.
- If the answer creates durable knowledge, file it back into
wiki/answers/or update the relevant long-lived page. - Append a
queryentry tolog.mdwhen the answer materially changes the vault. - Commit the resulting vault change if files were materially updated.
Do not answer from raw sources alone when the wiki already contains the relevant synthesis, unless the question specifically requires re-checking the raw evidence.
When a query produces something durable, prefer one of these outcomes:
- update an existing concept or topic page
- create a new page in
wiki/answers/ - add cross-links that make the new synthesis discoverable later
- keep the English and Chinese variants aligned
Lint Workflow
Run the built-in lint first:
python3 .codex/skills/llm-wiki/scripts/lint_vault.pyThe lint checks:
- missing required vault files
- broken markdown links
- broken Obsidian-style wikilinks
- absolute local filesystem links in repository markdown
- missing translation siblings
- orphan wiki pages
Then fix the highest-value issues:
- contradictions
- stale summaries
- missing cross-links
- pages that should exist but do not
The model should treat lint as editorial maintenance, not just link repair.
Good lint passes often also produce:
- a sharper overview page
- cleaner source-to-concept linking
- a short list of unanswered questions worth investigating next
- a commit that records the maintenance pass when the vault changed
Index And Log
Treat index.md and log.md as operational primitives, not incidental files.
index.mdis the content map the model should read firstlog.mdis the reverse-chronological operations record foringest, durablequery, andlintpasses that changed the vault
Preferred log.md heading format:
## [YYYY-MM-DD HH:MM] ingest | Source Title
## [YYYY-MM-DD HH:MM] query | Question or answer topic
## [YYYY-MM-DD HH:MM] lint | Maintenance summaryKeep index entries short: one link plus one-line description. New log entries should be precise to the minute. Insert new log entries near the top of the entry list rather than appending them at the end. Do not treat log.md as a general changelog for scripts, config, or release-style notes. Scheduled patrols should update log.md only when they actually land a vault change.
Guardrails
- Treat raw source substance as immutable; only path-maintenance edits are allowed during raw refactors.
- Treat OCR output as supporting evidence, not as a replacement for editorial judgment when screenshots are noisy or partial.
- Prefer updating existing pages over creating near-duplicates.
- Keep links relative and repo-portable.
- When a local link target contains spaces, use markdown destinations wrapped in angle brackets.
- Do not write absolute local filesystem paths into repository markdown unless the user explicitly asks for that.
- Do not mix full English and Chinese versions into one long page; use sibling translation pages instead.
- Update
index.mdandlog.mdon every ingest and on durable query outputs. - Scheduled or manual maintenance that affects rendered pages should finish with a local build or equivalent verification command before commit/push.
- Keep the schema short and specific; evolve it when maintenance patterns change.
- The wiki should accumulate knowledge over time instead of recomputing it from scratch on every question.