Skip to content
Built 26/04/17 09:08commit f8ff6f9

LLM Wiki Schema

This repository follows the LLM Wiki pattern: raw sources are immutable, the wiki is the maintained synthesis layer, and this file is the schema that tells the model how to behave as a disciplined maintainer rather than a generic chatbot.

This repository intentionally keeps machine-oriented configuration out of the vault itself. Global defaults, including the vault root, belong in ~/.llm-wiki/config.json; this file is for wiki maintenance behavior.

Architecture

There are three layers:

  • raw/: source documents whose substance should remain stable even if paths are reorganized
  • wiki/: LLM-maintained markdown pages
  • AGENTS.md: the schema that defines structure and workflow

The vault root itself should be maintained as a git repository.

Two top-level files support the wiki:

  • index.md: the catalog of durable pages
  • log.md: the chronological record of ingests, queries, and lint passes

Core Idea

Do not treat this repository like plain RAG over a pile of files.

The goal is not to rediscover the same knowledge from raw sources every time a question is asked. The goal is to incrementally build and maintain a persistent wiki that compounds over time.

When a new source arrives, integrate it into the existing wiki. When a useful answer is produced, preserve it if it belongs in the knowledge base. When the wiki drifts, repair it.

Operating Rules

  • Do not rewrite the substantive content of files inside raw/.
  • Raw taxonomy refactors may update only path-dependent metadata or local links needed to keep moved raw files navigable.
  • Read index.md before broad ingest, query, or lint work.
  • Prefer updating an existing page over creating a near-duplicate.
  • Keep uncertainty, disagreement, and superseded claims visible.
  • Use relative markdown links.
  • When a local markdown link target contains spaces, wrap the destination in angle brackets so the link remains clickable.
  • Do not use absolute local filesystem paths in repository markdown; keep in-repo links relative.
  • Keep repository markdown renderable in the local VitePress site; if a syntax pattern breaks rendering or build output, repair it in the same pass.
  • In markdown prose, do not leave raw angle-bracket placeholders such as <name here> exposed unless they are intentional HTML; wrap them in code spans or escape them as entities.
  • Do not keep transient browser-only embeds such as blob: URLs in repository markdown; replace them with stable links, screenshots, or other render-safe representations that preserve the source meaning.
  • Update index.md whenever you add a durable page or materially change the catalog.
  • Preserve older log history, but insert new entries at the top of the entry list.
  • Treat index.md and log.md as part of the working memory of the vault.
  • If the vault root is not yet a git repository, initialize it.
  • After any material vault change, make an intentional git commit unless explicitly told not to.
  • As the corpus grows, reorganize wiki subdirectories when needed instead of letting one directory become a flat dump.
  • As the raw corpus grows, reorganize raw/ subdirectories when the folder stops being navigable.
  • When moving markdown pages to refine taxonomy, repair all impacted links in the same pass.
  • When moving raw files to refine taxonomy, repair all impacted links and raw-local navigation metadata in the same pass.

Language Conventions

This wiki is bilingual in English and Chinese.

  • English pages use the canonical unsuffixed path, for example wiki/topics/example.md.
  • Chinese pages use the sibling .zh.md path, for example wiki/topics/example.zh.md.
  • Markdown raw source translations use the same convention, for example raw/source.md and raw/source.zh.md.
  • raw/*.zh.md must be faithful translations of the source, not summary rewrites, abridgements, or editorialized notes.
  • Visual raw assets such as screenshots can be shared across languages without duplicate .zh image siblings.
  • Human-facing repo docs follow the same convention, for example README.md and README.zh.md, docs/usage.md and docs/usage.zh.md.
  • The root catalog uses both index.md and index.zh.md.
  • Every durable wiki page, including wiki/sources/ pages, should have both language variants.
  • Put a language switch link directly under the title on both variants.
  • Markdown raw source files should also have a language switch link near the top of the document.
  • When a translated raw file exists, source pages should expose both the original raw file and the translated raw file.
  • If direct markdown links to raw files prove brittle in the local VitePress site for a given source family, prefer stable code-style raw path display over repeatedly reintroducing fragile jump links.
  • When possible, Related Pages should link to pages in the same language.
  • When a page changes materially, update both language variants in the same maintenance pass.

Translation Freshness

  • Treat translations as maintained siblings, not one-time exports.
  • Treat raw translations as translation artifacts, not synthesis surfaces; summaries, interpretations, and reorganized takeaways belong in wiki/, not in raw/*.zh.md.
  • If an English wiki page changes materially, update the .zh.md sibling in the same pass when feasible.
  • If an English raw source changes or is replaced, update its raw/*.zh.md sibling in the same pass when feasible.
  • If a screenshot or other visual raw asset changes materially, update the bilingual source pages that interpret it in the same pass when feasible.
  • If a translation cannot be updated immediately, mark the translated file with translation_status: stale in frontmatter.
  • Remove translation_status: stale once the translated sibling is current again.

Operations

Ingest

When processing a source from raw/:

  • read the source directly
  • if the source is a screenshot or other image file, run .codex/skills/llm-wiki/scripts/extract_visual_sources.py first and use the OCR output as supporting evidence
  • if a markdown raw source depends on local screenshots or diagrams, extract their OCR text when the visuals materially affect the synthesis
  • for Chinese or mixed-language screenshots, prefer installed Chinese OCR languages such as chi_sim+eng or chi_tra+eng
  • create or update its page in wiki/sources/
  • revise any affected concept, entity, or topic pages
  • note where new material strengthens, weakens, or contradicts existing claims
  • update index.md
  • insert an ingest entry near the top of log.md
  • commit the resulting change when the vault was updated

Ingest is integration work, not mere summarization. A single source may require changes across multiple wiki pages.

Before creating a new long-lived page, check whether an existing page should absorb the material instead.

Query

When answering questions:

  • start from index.md
  • read only the pages needed to answer well
  • answer from the wiki when the wiki already contains the synthesis
  • revisit raw sources only when the question requires deeper verification or the wiki appears incomplete
  • if the answer creates durable knowledge, file it back into wiki/answers/ or update the relevant long-lived page
  • insert a query entry near the top of log.md when the vault changes materially
  • commit the resulting change when the vault was updated

Useful answers should compound into the wiki instead of vanishing into chat history.

Lint

Periodically maintain the wiki:

  • look for contradictions between pages
  • look for stale claims that newer sources have superseded
  • look for orphan pages and missing cross-links
  • look for important concepts mentioned repeatedly but lacking their own page
  • look for gaps that suggest the next source to ingest or next question to ask
  • look for markdown rendering hazards, especially malformed links, unmatched HTML tags, transient blob: embeds, broken fences, or escaped backticks that expose raw angle-bracket placeholders
  • look for obvious taxonomy drift in raw/ and wiki/, especially flat top-level files that already belong to a clear source family or navigation cluster
  • treat routine cleanup requests as including a taxonomy check by default, not only hygiene work like ignores or build leftovers
  • when taxonomy drift is obvious, prefer landing one or two high-confidence regroupings before falling back to hygiene-only cleanup
  • repair render-breaking markdown with the smallest syntax-only edit that preserves source meaning, especially in raw/ where substance should remain stable
  • insert a lint entry near the top of log.md when maintenance work changes the vault materially
  • commit the resulting change when the vault was updated

Lint is an editorial pass over the whole knowledge base, not just a structural check.

Taxonomy Refactors

When directory structure stops fitting the corpus:

  • split broad folders into narrower subdirectories
  • trigger when a stable family, topic slice, or navigation pattern is already obvious enough to deserve its own subtree
  • treat navigability and family clarity as the primary trigger, not raw file counts
  • prefer moving pages over cloning them
  • rewrite all affected links in the same pass
  • update index.md, index.zh.md, and any overview pages that expose the moved pages

Raw Taxonomy Refactors

When raw/ stops fitting the corpus:

  • split broad folders into narrower subdirectories
  • trigger as soon as a stable source family is clear enough that grouping it would make navigation better
  • treat “these files obviously belong together” as a stronger signal than any directory size number
  • during routine cleanup, inspect for one or two obvious raw-family regroupings before defaulting to superficial hygiene work
  • prefer narrow, high-confidence regroupings over broad uniform renesting of every file in sight
  • preserve source substance; only repair links or path metadata needed because files moved
  • move original raw files and their .zh.md translation siblings together when both exist
  • rewrite impacted links in wiki/, repo docs, indexes, and affected raw markdown files in the same pass
  • prefer using .codex/skills/llm-wiki/scripts/move_raw_sources.py for safe moves with link repair

Index And Log Conventions

index.md should stay content-oriented:

  • group pages by category
  • keep each entry to a link plus a one-line description
  • make it easy to decide which pages to read next

log.md should stay reverse-chronological, with the newest entries first.

log.md is an operations record for vault work, not a repository changelog.

  • Record only ingest, query, and lint passes that materially changed the vault.
  • Do not use log.md for code-only, script-only, config-only, or release-note style entries when the durable wiki state did not change.
  • Scheduled maintenance should update log.md only when it actually lands a vault change; a no-op patrol should leave log.md untouched.

Preferred heading shape:

markdown
## [YYYY-MM-DD HH:MM] ingest | Source Title
## [YYYY-MM-DD HH:MM] query | Topic
## [YYYY-MM-DD HH:MM] lint | Maintenance summary

New log entries should be precise to the minute.

Page Roles

  • wiki/sources/: source-oriented notes tied to raw files
  • wiki/concepts/: reusable ideas, frameworks, and abstractions
  • wiki/entities/: people, organizations, tools, products, and projects
  • wiki/topics/: broader synthesis spanning many pages
  • wiki/answers/: durable outputs created during query work

Style

  • Start pages with a short summary.
  • Prefer compact sections over long prose.
  • Use explicit sections when they help with maintenance: Summary, Key Claims, Visual Notes, OCR Excerpts, Related Pages, Sources, Open Questions.
  • Keep pages interlinked.
  • Let pages accumulate synthesis over time; do not rewrite them as isolated snapshots.
  • Keep English and Chinese sibling pages structurally aligned enough that switching languages stays fast and predictable.

Git History

  • Keep commits small and intentional.
  • Commit because the knowledge base changed in a meaningful way, not merely because a file timestamp moved.
  • If a commit convention applies in the surrounding environment, follow it.

Git Remotes

This repository uses one remote:

  • origin: GitHub, the source of truth

Rules:

  • Before starting material work, check whether origin has newer commits and sync local master from origin/master if needed.
  • Treat origin as authoritative when the remotes diverge unless the user explicitly says otherwise.
  • Before pushing changes that affect the VitePress site, run a local pnpm run docs:build and make sure it succeeds.
  • Do not push a VitePress-affecting change to origin if the local site build is failing.
  • After any local commit, push to origin.
  • Do not leave committed local changes unpublished on origin unless a push fails.

Browser Automation

  • Never use Safari for browser automation or manual browser-directed tasks on this machine.
  • Prefer Google Chrome for browser-driven work.
  • On this machine, the installed Chrome app is /Applications/Google Chrome 2.app.
  • If a tool cannot find the default Chrome path, explicitly fall back to /Applications/Google Chrome 2.app before trying any other browser.
  • When a task says to use the already-open browser, prefer attaching to or scripting the existing Google Chrome session rather than opening Safari or switching browsers.

Git Identity

  • This repository uses a repository-local GitHub-facing identity by default.
  • The local git identity for this repository should stay csh2022 <1065521866@qq.com>.
  • Keep this identity scoped to this repository only; do not change global git user.name or user.email.
  • Use the repository-local identity for normal commits and pushes to origin, including commits that trigger Vercel deployments.
  • If the repository-local git identity drifts away from csh2022 <1065521866@qq.com>, restore it before making or pushing new commits.