LLM Wiki vs RAG: When to Use Which (Decision Framework)

Short answer: LLM Wiki wins when your knowledge base is under roughly 500K tokens, when you value human-readable output, and when you want compounding rather than transient answers. RAG wins when the corpus is too big for context, when you have multiple users, or when freshness beats consistency. Most personal and small-team setups fall on the LLM Wiki side of the line — which is why Karpathy's pattern caught on. If you are new to the whole conversation, the LLM Wiki primer covers the concept in plain English before you dive in here.

Below is the full decision framework, a side-by-side table, and the three hybrid patterns that combine the best of both worlds.

The core philosophical difference

RAG and LLM Wiki answer the same question in opposite ways. The question is: how should an LLM access knowledge that does not fit in its context window?

RAG says: keep the raw knowledge in a vector database, retrieve relevant chunks at query time, and ask the LLM to generate an answer from the chunks.
LLM Wiki says: compile the raw knowledge into a compact, structured wiki once, then put the wiki in context directly.

RAG is pull. LLM Wiki is push. RAG is transient — every query rebuilds the answer from chunks. LLM Wiki is persistent — every query reads a file the LLM wrote yesterday. This shapes everything downstream.

RAG vs LLM Wiki - two approaches to LLM knowledge access

Side-by-side comparison

Dimension	RAG	LLM Wiki
Knowledge form	Raw chunks in a vector DB	Structured markdown files
Freshness	New source is live after embedding (minutes)	New source is live after compile step (minutes)
Token cost per query	Low (small chunks retrieved)	Medium (full wiki or section in context)
Compound learning	❌ Every query re-derives	✅ Every query reads compiled output
Human readability	❌ Only the LLM reads the chunks	✅ You can read, edit, diff the wiki
Infrastructure	Vector DB + embedding pipeline	Just a folder of files
Scale ceiling	Tens of millions of tokens	~200-500K tokens per session
Multi-user	✅ Built-in	⚠️ Harder (file collisions, conflict resolution)
Debugging bad answers	Hard (chunk ranking issues)	Easy (read the wiki, fix the schema)
Long-tail query quality	Strong	Weaker (wiki may not cover edge topic)
Vendor lock-in	Some (embedding model)	None (plain markdown)

When LLM Wiki wins

You should default to an LLM Wiki when any two of these are true:

Knowledge fits in 200-500K tokens. Modern Claude and Gemini handle this comfortably. A disciplined wiki covering a research domain, a project, or a personal knowledge area usually lands well under 200K tokens for years.
You are the only user. Personal knowledge bases are the sweet spot. No merge conflicts, no concurrent writes, no authentication. Just you and a folder.
You want to read the knowledge, not just query it. Research notes, design docs, a personal reference work — anything where the output has human value beyond answering one question.
You care about consistency and contradictions. An LLM Wiki can flag when new sources disagree with old pages. RAG typically does not; it just retrieves whichever chunks win the similarity lottery.
You want to audit the LLM's reasoning. Every page in your wiki is a markdown file written by the LLM, signed and timestamped in git. You can read it, check it, and correct it. RAG's reasoning lives inside the retrieval step and is harder to inspect.

The clearest use cases are personal knowledge management, research notebooks, design document libraries, and small-team technical references. If that is you, start with our curated LLM Wiki templates — they ship with tested schemas for exactly these scenarios.

When RAG wins

RAG earns its complexity when any one of these is true:

The corpus is too big for context. Millions of tokens of product docs, millions of Slack messages, a legal archive. No wiki compilation fits. RAG is the right answer.
You have many users asking many different things. Customer-facing support bots, internal Q&A systems. RAG scales to concurrent queries in a way a single-author wiki does not.
Sources change continuously. Live documentation, ticketing systems, CRM records. Re-compiling a wiki on every change is wasteful. RAG just re-embeds.
You need long-tail edge case recall. RAG retrieves from the full source, so rare terms and obscure details are still reachable. A wiki only includes what the schema told the LLM to write — anything outside the schema is gone.
You already have a vector database and a team that loves it. Switching costs are real. If your RAG pipeline is working, the LLM Wiki pattern is an experiment, not a replacement.

The decision flowchart

Use this in your head the next time someone asks "should we build RAG or a wiki?":

Is the corpus > 500K tokens?
  ├── Yes → RAG or hybrid
  └── No → Do you need multi-user concurrent access?
            ├── Yes → RAG
            └── No → Do you want to read the output as a document?
                      ├── Yes → LLM Wiki
                      └── No → Either works; start with LLM Wiki (simpler)

Decision framework for LLM Wiki vs RAG architecture

Three hybrid patterns worth knowing

The best answer is often "both." Here are the three combinations we see most often in real projects:

1. Wiki on top, RAG underneath

Use an LLM Wiki as the structured "top layer" of your knowledge — concepts, entities, summaries, decisions. Use RAG for the bulk corpus underneath — raw PDFs, chat logs, emails. The wiki is what the LLM reads first; RAG is the fallback when the wiki says "not covered yet." This is the pattern most big teams end up with.

2. Wiki as RAG index

Invert the pattern: build a small LLM Wiki that summarizes a larger RAG corpus. The wiki acts as a navigation layer — the LLM reads the wiki first, decides which RAG documents are relevant, and retrieves only those. This gives you wiki's compounding quality on top of RAG's scale.

3. Scheduled wiki updates from RAG

Run a nightly job that pulls fresh RAG results into wiki pages. Your wiki is always within 24 hours of the raw data, but reads like a hand-maintained document. Good for competitive intelligence, market dashboards, or any domain where the data changes but the questions do not.

The honest tradeoff

LLM Wiki is not strictly better than RAG. Anyone who tells you otherwise is selling something. What it is, is a better default for most personal and small-team use cases — because the complexity budget of "a folder and a schema" is much smaller than "a vector database and an embedding pipeline and a reranker."

Start with an LLM Wiki. If you hit its limits — corpus too big, too many users, freshness matters more than structure — add RAG then. Starting with RAG and later trying to extract structure from your vector database is the harder direction.