Short answer: LLM Wiki wins when your knowledge base is under roughly 500K tokens, when you value human-readable output, and when you want compounding rather than transient answers. RAG wins when the corpus is too big for context, when you have multiple users, or when freshness beats consistency. Most personal and small-team setups fall on the LLM Wiki side of the line — which is why Karpathy's pattern caught on. If you are new to the whole conversation, the LLM Wiki primer covers the concept in plain English before you dive in here.
Below is the full decision framework, a side-by-side table, and the three hybrid patterns that combine the best of both worlds.
The core philosophical difference
RAG and LLM Wiki answer the same question in opposite ways. The question is: how should an LLM access knowledge that does not fit in its context window?
- RAG says: keep the raw knowledge in a vector database, retrieve relevant chunks at query time, and ask the LLM to generate an answer from the chunks.
- LLM Wiki says: compile the raw knowledge into a compact, structured wiki once, then put the wiki in context directly.
RAG is pull. LLM Wiki is push. RAG is transient — every query rebuilds the answer from chunks. LLM Wiki is persistent — every query reads a file the LLM wrote yesterday. This shapes everything downstream.
Side-by-side comparison
| Dimension | RAG | LLM Wiki |
|---|---|---|
| Knowledge form | Raw chunks in a vector DB | Structured markdown files |
| Freshness | New source is live after embedding (minutes) | New source is live after compile step (minutes) |
| Token cost per query | Low (small chunks retrieved) | Medium (full wiki or section in context) |
| Compound learning | ❌ Every query re-derives | ✅ Every query reads compiled output |
| Human readability | ❌ Only the LLM reads the chunks | ✅ You can read, edit, diff the wiki |
| Infrastructure | Vector DB + embedding pipeline | Just a folder of files |
| Scale ceiling | Tens of millions of tokens | ~200-500K tokens per session |
| Multi-user | ✅ Built-in | ⚠️ Harder (file collisions, conflict resolution) |
| Debugging bad answers | Hard (chunk ranking issues) | Easy (read the wiki, fix the schema) |
| Long-tail query quality | Strong | Weaker (wiki may not cover edge topic) |
| Vendor lock-in | Some (embedding model) | None (plain markdown) |
When LLM Wiki wins
You should default to an LLM Wiki when any two of these are true:
-
Knowledge fits in 200-500K tokens. Modern Claude and Gemini handle this comfortably. A disciplined wiki covering a research domain, a project, or a personal knowledge area usually lands well under 200K tokens for years.
-
You are the only user. Personal knowledge bases are the sweet spot. No merge conflicts, no concurrent writes, no authentication. Just you and a folder.
-
You want to read the knowledge, not just query it. Research notes, design docs, a personal reference work — anything where the output has human value beyond answering one question.
-
You care about consistency and contradictions. An LLM Wiki can flag when new sources disagree with old pages. RAG typically does not; it just retrieves whichever chunks win the similarity lottery.
-
You want to audit the LLM's reasoning. Every page in your wiki is a markdown file written by the LLM, signed and timestamped in git. You can read it, check it, and correct it. RAG's reasoning lives inside the retrieval step and is harder to inspect.
The clearest use cases are personal knowledge management, research notebooks, design document libraries, and small-team technical references. If that is you, start with our curated LLM Wiki templates — they ship with tested schemas for exactly these scenarios.
When RAG wins
RAG earns its complexity when any one of these is true:
-
The corpus is too big for context. Millions of tokens of product docs, millions of Slack messages, a legal archive. No wiki compilation fits. RAG is the right answer.
-
You have many users asking many different things. Customer-facing support bots, internal Q&A systems. RAG scales to concurrent queries in a way a single-author wiki does not.
-
Sources change continuously. Live documentation, ticketing systems, CRM records. Re-compiling a wiki on every change is wasteful. RAG just re-embeds.
-
You need long-tail edge case recall. RAG retrieves from the full source, so rare terms and obscure details are still reachable. A wiki only includes what the schema told the LLM to write — anything outside the schema is gone.
-
You already have a vector database and a team that loves it. Switching costs are real. If your RAG pipeline is working, the LLM Wiki pattern is an experiment, not a replacement.
The decision flowchart
Use this in your head the next time someone asks "should we build RAG or a wiki?":
Is the corpus > 500K tokens?
├── Yes → RAG or hybrid
└── No → Do you need multi-user concurrent access?
├── Yes → RAG
└── No → Do you want to read the output as a document?
├── Yes → LLM Wiki
└── No → Either works; start with LLM Wiki (simpler)Three hybrid patterns worth knowing
The best answer is often "both." Here are the three combinations we see most often in real projects:
1. Wiki on top, RAG underneath
Use an LLM Wiki as the structured "top layer" of your knowledge — concepts, entities, summaries, decisions. Use RAG for the bulk corpus underneath — raw PDFs, chat logs, emails. The wiki is what the LLM reads first; RAG is the fallback when the wiki says "not covered yet." This is the pattern most big teams end up with.
2. Wiki as RAG index
Invert the pattern: build a small LLM Wiki that summarizes a larger RAG corpus. The wiki acts as a navigation layer — the LLM reads the wiki first, decides which RAG documents are relevant, and retrieves only those. This gives you wiki's compounding quality on top of RAG's scale.
3. Scheduled wiki updates from RAG
Run a nightly job that pulls fresh RAG results into wiki pages. Your wiki is always within 24 hours of the raw data, but reads like a hand-maintained document. Good for competitive intelligence, market dashboards, or any domain where the data changes but the questions do not.
The honest tradeoff
LLM Wiki is not strictly better than RAG. Anyone who tells you otherwise is selling something. What it is, is a better default for most personal and small-team use cases — because the complexity budget of "a folder and a schema" is much smaller than "a vector database and an embedding pipeline and a reranker."
Start with an LLM Wiki. If you hit its limits — corpus too big, too many users, freshness matters more than structure — add RAG then. Starting with RAG and later trying to extract structure from your vector database is the harder direction.
What to read next
- What is LLM Wiki? A plain-English primer — if the core concept still feels fuzzy
- LLM Wiki for Developers — the engineering track if you are running Claude Code or Cursor
- LLM Wiki in Obsidian: step-by-step setup — the fastest way to try the pattern
- LLM Wiki for research workflows — for readers managing papers and citations
- Curated LLM Wiki templates — our battle-tested schema starter kits
One more thing: we send an occasional email summarizing the best LLM Wiki content, schema patterns, and community discussions. No fixed schedule, no upsell. Subscribe below.