Skip to content
Built 26/04/17 09:08commit f8ff6f9

Usage

中文 | English

Repository Layout

Core structure:

text
llm-wiki/
├── AGENTS.md
├── index.md
├── index.zh.md
├── log.md
├── raw/
└── wiki/

Meaning:

  • raw/ Raw source layer. It can contain markdown, screenshots, and local image assets. Source substance should stay stable, though path-level reorganization is allowed when links and path metadata are repaired in the same pass.
  • wiki/ Model-maintained knowledge layer, including sources/, topics/, concepts/, and answers/.
  • AGENTS.md The schema that defines how the model maintains the knowledge base.
  • index.md / index.zh.md Bilingual catalog entry points.
  • log.md Maintenance history for ingest, query, and lint work.

Day-to-Day Use

Manual ingest

After placing a source in raw/, in a new Codex session:

text
$llm-wiki ingest raw/xxx.md
Read AGENTS.md and index.md first, then update wiki/sources/, the relevant concept/topic/entity pages, and finally refresh index.md, index.zh.md, and log.md.

If the source is a screenshot or image file, first extract OCR text:

bash
python3 .codex/skills/llm-wiki/scripts/extract_visual_sources.py --repo . --path raw/screenshots/foo.png --lang chi_sim+eng

Then ask Codex to ingest the screenshot as a source and fold the OCR-backed evidence into wiki/sources/ and the affected long-lived pages.

Manual query

text
$llm-wiki answer this question: ...
Start from index.md / index.zh.md, read only the necessary pages, and if the answer is durable write it back into wiki/answers/ or merge it into an existing page and update log.md.

Manual lint

text
$llm-wiki run a full lint pass on this vault
Check contradictions, stale conclusions, orphan pages, missing cross-links, missing translations, stale translations, and commit if the pass makes material changes.

Raw taxonomy refactor

When raw/ becomes too flat, move files or folders without changing source substance:

  • Trigger when a stable source family is already clear enough that grouping it will make navigation better.
  • Prefer regrouping by source family over waiting for a directory-count threshold.
  • Avoid creating one-file directories unless they clarify a real long-lived family boundary.
bash
python3 .codex/skills/llm-wiki/scripts/move_raw_sources.py --repo . --move raw/foo.md:raw/articles/foo.md

Then run lint to catch any remaining broken links immediately:

bash
python3 .codex/skills/llm-wiki/scripts/lint_vault.py

Batch screenshot OCR

When the vault accumulates many screenshots, batch-extract OCR before an ingest or lint pass:

bash
python3 .codex/skills/llm-wiki/scripts/extract_visual_sources.py --repo . --all-raw-images

Check available OCR languages first when screenshots are multilingual:

bash
tesseract --list-langs

This repo now expects Chinese OCR support to be available locally. On macOS with Homebrew, install the language pack with brew install tesseract-lang, then prefer --lang chi_sim+eng for Simplified Chinese screenshots or --lang chi_tra+eng for Traditional Chinese screenshots. If chi_sim or chi_tra is installed, the helper now defaults to a compact Chinese-first OCR bundle even when --lang is omitted.

Bilingual Rules

The whole llm-wiki is bilingual.

  • English pages use canonical paths.
  • Chinese pages use sibling .zh.md files.
  • index.md and index.zh.md are the bilingual root catalogs.
  • All durable pages should exist in both languages, including wiki/sources/ and wiki/answers/.

raw is bilingual too

Markdown raw files also use sibling translations:

  • source: raw/foo.md
  • Chinese translation: raw/foo.zh.md

The raw Chinese sibling must be a faithful translation of the original source. Do not compress it into a summary, rewrite it into cleaner notes, or mix translation with editorial takeaways. That synthesis belongs in wiki/.

Both should expose a quick language switch near the top:

md
[中文](<foo.zh.md>) | English
md
中文 | [English](<foo.md>)

Each wiki/sources/*.md page should link:

  • the original raw file
  • the translated raw file

Standalone screenshots and other visual assets do not need duplicate .zh image siblings. Their bilingual interpretation should live in the source pages instead.

Translation Freshness

Translations are maintained siblings, not one-time exports.

  • Keep raw/*.zh.md loyal to the source text. If the current Chinese file behaves more like a summary than a translation, treat it as incorrect and bring it back in line with the original.
  • When the source page changes materially, update its translation in the same pass when feasible.
  • When a raw source changes materially, update raw/*.zh.md in the same pass when feasible.
  • If a translation cannot be synced immediately, mark it in frontmatter:
yaml
translation_status: stale
  • Remove the marker once the translation catches up.