Built 26/04/17 09:08commit f8ff6f9
Knowledge Operations
中文 | English
Summary
The operating loop for an LLM wiki is ingest, query, lint, and occasional taxonomy refactors. Each loop updates the durable markdown layer so both new sources and useful answers become part of the maintained knowledge base.
Ingest
- Read the raw source directly and keep its substantive content stable.
- When the repository uses the
llm-wikiskill bundle, treatresolve_vault.py,bootstrap_vault.py,lint_vault.py, and the safe-move helpers as operational tools around the editorial loop, not as replacements for it. - Keep
raw/*.zh.mdfaithful to the source text; do not let the raw translation layer drift into summary or interpretation. - If the source is a screenshot or image file, extract OCR text before synthesizing it into the wiki.
- If the source is markdown with local screenshots or diagrams, OCR the local visuals when they materially affect the synthesis.
- For Chinese or mixed-language screenshots, prefer Chinese-capable OCR settings such as
chi_sim+engorchi_tra+eng. - Treat raw PDFs as ingest backlog rather than opaque attachments: keep the original PDF, create or refresh a markdown raw sibling, and only then synthesize the source into the wiki.
- Split PDF translation strategy by size and structure. Shorter PDFs can usually go straight from the maintained markdown raw into a direct
.zh.mdtranslation; longer or table-heavy PDFs should use a more tool-assisted extraction and cleanup path. - When PDF layout, charts, or figures materially affect meaning, render the relevant pages into local visual assets and link them from the markdown raw file so later maintenance can still verify the source.
- Create or revise a source page in
wiki/sources/. - Update the long-lived pages that the source changes: concepts, entities, topics, or overview pages.
- Update
index.md,index.zh.md, and insert a newest-firstingestentry near the top oflog.md.
Query
- Start from
index.mdorindex.zh.mdto discover the relevant maintained pages. - Answer from the wiki first, revisiting raw material only when the synthesis is missing or needs verification.
- File durable answers back into
wiki/answers/or merge them into existing long-lived pages.
Lint
- Check structure with the lint script, then do editorial maintenance.
- Persist automation state and split content patrols from taxonomy patrols so one scheduled loop does not erase the context or cadence discipline of the other.
- Treat routine cleanup as a content patrol first: before defaulting to hygiene-only work, check whether one or two narrow taxonomy moves are already obvious enough to improve navigation immediately.
- Look for contradictions, stale summaries, missing cross-links, orphan pages, and concepts that deserve canonical pages.
- Use lint passes to sharpen the wiki's structure, not just to repair links.
- Scheduled patrols should persist explicit run state so timeouts stay visible and the next patrol can continue from partial work instead of restarting blindly.
- When automation is enabled, split frequent content patrols from slower taxonomy-refactor patrols so ingest drift and directory restructuring do not compete for the same schedule.
- If a maintenance pass changes rendered or otherwise user-visible content, run the repo's local verification or site build as the final pre-push step.
Raw Taxonomy Refactors
- Reorganize
raw/when it becomes too flat or mixes unrelated source families. - Reorganize as soon as a stable source family or navigation slice is already obvious enough to improve browsing.
- Treat family clarity as the primary trigger; file counts are at most weak background context.
- Preserve source substance and limit raw edits to moved-path link repair or path metadata updates.
- Move original raw files and
.zh.mdsiblings together, then repair downstream links from source pages, docs, and navigation pages. - Re-run lint immediately after the refactor so broken raw links and missing raw translations surface right away.
Maintenance Heuristics
- Prefer revising canonical pages over creating near-duplicates.
- Keep uncertainty and superseded claims visible.
- Let simple markdown structure carry most of the system before adding custom tooling.
- Start splitting wiki sections when a topic family, source family, or navigation slice is already clear enough to deserve its own subtree.
- Treat raw-directory changes as editorial events: source paths, raw-local navigation links, source pages, and related index or log entries can all drift together.
- Treat OCR output as evidence to interpret, not as ground truth, especially for noisy screenshots or mixed-language UI captures, even when Chinese OCR support is installed.
- Periodic consolidation should convert relative dates to absolute ones, repair contradictions at the source, and keep top-level index files small enough to stay cheap to load.
Sources
- Karpathy LLM Wiki Gist
- Codex LLM Wiki Skill
- Claude Code Auto Dream Memory Consolidation
- Codex Non-Interactive Mode
- Anthropic Harness Design For Long-Running Application Development