My Brain CellsMy Brain Cells
HomeBlogAbout

© 2026 My Brain Cells

XGitHubLinkedIn
The Practical Guide to RAG: Types, Techniques, and How (and When) to Use Each

The Practical Guide to RAG: Types, Techniques, and How (and When) to Use Each

AS
Anthony Sandesh
Retrieval‑Augmented Generation (RAG) is the standard way to give large language models (LLMs) dependable, up‑to‑date knowledge without retraining them. In its original formulation, a generator (the LLM) is conditioned on passages retrieved from an external store (the “non‑parametric memory”) to answer knowledge‑intensive questions more accurately and transparently. (arXiv)
This guide gives you a field‑tested taxonomy of RAG types, the key techniques behind them, and a decision framework for when to use what—plus implementation recipes you can adapt today.

A quick mental model (taxonomy)

Think of RAG systems as varying along three axes:
  1. Retrieval strategy – sparse (BM25), dense (embeddings), hybrid, multi‑query, multi‑hop, dynamic/agentic.
  1. Knowledge structure – flat chunks, hierarchical trees (summaries), graphs (entities/relations), or structured stores (SQL, files, APIs), and multimodal corpora (text+images+tables).
  1. Orchestration – simple top‑k, re‑ranking, corrective/self‑reflective loops, routing (choose tools/indices), or full agentic plans.
With that lens, here’s the menu.

The RAG menu (10 patterns you can actually ship)

For each pattern: Best for • How it works • How to use • Watch‑outs

1) Baseline / Single‑shot RAG

  • Best for: Small corpora; straightforward QA; prototypes.
  • How it works: Split docs → embed → top‑k similarity search → stuff passages into a prompt.
  • How to use: Start with dense embeddings; set k=4–8; add source citations to prompt.
  • Watch‑outs: Recall drops with synonyms/out‑of‑vocabulary; brittle to chunking.
Tip: Pair with a sparse keyword index for exact matches (hybrid search). BM25 (sparse) complements dense semantic retrievers (DPR). (City, University of London)

2) Hybrid RAG (Sparse + Dense)

  • Best for: Real‑world search where users mix jargon, IDs, and paraphrases.
  • How it works: Run BM25 and vector search; fuse scores; optionally re‑rank.
  • How to use: Weighted score fusion (e.g., α·dense + (1−α)·sparse); add filters on metadata (dates, author, tags).
  • Watch‑outs: Tune fusion weight; ensure consistent normalization across scores. Evidence shows hybrid retrievers generalize better than either alone. (ACL Anthology)

3) Re‑ranked RAG (Two‑stage retrieval)

  • Best for: “Close but noisy” corpora; when top‑k contains the answer but not at top‑1.
  • How it works: Recall‑oriented first stage (hybrid) → cross‑encoder re‑ranker (MonoT5, BGE‑reranker) → pass top‑m passages to the LLM. (Hugging Face)
  • How to use: Keep first‑stage k high (20–200), m small (3–10). Re‑rankers are slower but far more precise.
  • Watch‑outs: Latency; batch and cache re‑ranking; cap passage lengths.

4) Query‑expansion RAG (Multi‑Query + HyDE)

  • Best for: Vague or short queries; long‑tail terminology.
  • How it works: Generate multiple paraphrases or a Hypothetical Document (HyDE); retrieve with those embeddings; merge & dedupe. HyDE is a strong zero‑shot boost. (arXiv)
  • How to use: 3–8 variants is usually enough; try MMR (Maximal Marginal Relevance) to keep diversity. (CMU School of Computer Science)
  • Watch‑outs: Guard against off‑topic expansions; audit with logs.

5) Multi‑hop / Decomposition RAG

  • Best for: Questions that require stitching multiple facts (“A about B causing C?”).
  • How it works: Decompose into sub‑questions (e.g., step‑back prompting to abstract first principles), retrieve per sub‑question, then synthesize. (arXiv)
  • How to use: Add a planning step: “What facts do I need?”; cache intermediate lookups.
  • Watch‑outs: Can balloon retrieval calls; cap hops; stop when confidence is high.

6) Hierarchical / Summary‑Tree RAG (RAPTOR)

  • Best for: Very long documents; “zoom in/zoom out” reading across sections.
  • How it works: Build a tree by recursively summarizing and clustering chunks; at query time, retrieve both summaries and leaves, matching the needed granularity. This improves long‑context retrieval and multi‑step reasoning. (arXiv)
  • How to use: Precompute summaries offline; store node‑to‑leaf links; route queries first to higher‑level nodes, then descend.
  • Watch‑outs: Extra preprocessing; ensure summaries preserve citations to leaves.

7) Graph RAG (Knowledge‑graph‑aware)

  • Best for: Corpora rich in entities/relations (policies, research, case law, product docs).
  • How it works: Extract a knowledge graph (entities/edges) + community summaries; retrieve subgraphs and community notes in addition to raw text. Microsoft’s GraphRAG popularized this pattern. (Microsoft GitHub)
  • How to use: Use entity linking during ingestion; index both graph neighborhoods and original passages; answer with graph‑aware citations.
  • Watch‑outs: IE quality determines performance; invest in extraction prompts/rules.

8) Agentic / Corrective RAG

  • Best for: Messy or shifting data; when retrieval sometimes fails or needs web fallback.
  • How it works: The LLM plans, retrieves, critiques its own answer, and loops. Variants:
    • Self‑RAG: retrieve‑generate‑critique with reflection tokens to improve grounding. (arXiv)
    • CRAG (Corrective RAG): an evaluator scores retrieval quality and switches actions (e.g., web search, query rewrite) if evidence is weak. (arXiv)
  • How to use: Add a “retrieval quality” gate before final generation; log decisions.
  • Watch‑outs: Cost/latency; set guardrails to avoid infinite loops.

9) Adaptive / Routed RAG

  • Best for: Mixed workloads (some queries easy; some very hard).
  • How it works: Classify query complexity; route to no‑retrieval, single‑shot, or iterative pipelines; or pick between retrievers. Research shows gains from adapting to question difficulty. (arXiv)
  • How to use: Start with a simple router: if the model can answer confidently from parametric knowledge → skip retrieval; else choose a heavier path.
  • Watch‑outs: Mis‑routing harms UX; train the router on your traffic.

10) Multimodal RAG

  • Best for: Documents with text + tables + images + screenshots or slides.
  • How it works: Ingest multimodal features; retrieve text and visual evidence; ground answers on both. Surveys in 2025 map the design space and evaluation. (arXiv)
  • How to use: Use OCR/structure extraction (for tables, charts); keep image thumbnails/regions to cite visually.
  • Watch‑outs: Storage and feature drift; evaluation is trickier than text‑only.
Related: Generation architectures like FiD (Fusion‑in‑Decoder) aggregate many passages effectively, and retrieval‑enhanced pretraining such as REALM and RETRO show benefits of integrating retrieval deeper into the model stack. (arXiv)

Cross‑cutting techniques that move the needle

Chunking (how you split matters)

  • Start with recursive, structure‑aware splitting to preserve semantics (headings → paragraphs → sentences). (LangChain Docs)
  • Keep chunk sizes consistent with embedding model context; add overlap (10–20%) to preserve cross‑boundary facts.
  • Use parent‑document or sentence‑window retrieval to pull context around a matched snippet when precision matters. (LangChain)

Retrieval tweaks

  • Hybrid retrieval: combine BM25 and dense; normalize and fuse. (ACL Anthology)
  • Re‑ranking: add a cross‑encoder (MonoT5/BGE‑reranker) after first‑stage recall to improve precision. (Hugging Face)
  • MMR for diversity: avoid 10 near‑duplicates all saying the same thing. (CMU School of Computer Science)
  • Query expansion: Multi‑query or HyDE when queries are short/ambiguous. (arXiv)

Orchestration

  • Decomposition: plan sub‑questions; step‑back prompting helps form abstract plans. (arXiv)
  • Corrective gates: reject/redo retrieval if low confidence (CRAG). (arXiv)
  • Routing: choose path by difficulty/intent (Adaptive‑RAG). (arXiv)

Prompting for generation

  • Always ask for citations and quote spans.
  • Use section‑aware prompts (“Answer using only the provided context. If insufficient, say so. Cite [source:page].”).

Evaluation (don’t skip this)

  • Measure retrieval precision/recall and generation faithfulness/answer relevancy. Libraries like RAGAS (faithfulness, context precision/recall), TruLens (RAG triad), and DeepEval (contextual precision/recall/relevancy) are good starting points. (Ragas)

How to choose (decision playbook)

Start here: what are your queries like and how big/messy is your corpus?
  • Small corpus, direct QA: Baseline RAG → add hybrid and a re‑ranker if precision is poor. (Hugging Face)
  • Mixed keywords & paraphrases (IDs, part numbers): Hybrid + MMR + re‑ranker. (CMU School of Computer Science)
  • Short/vague questions: Multi‑query/HyDE + re‑ranking. (arXiv)
  • Long PDFs / “read across chapters”: RAPTOR (summary tree) or parent‑document retrieval. (arXiv)
  • Entity‑dense domains (policies, case law, research): Graph RAG. (Microsoft GitHub)
  • Unreliable retrieval / rapidly changing data: Corrective/Agentic RAG with web fallback. (arXiv)
  • Traffic ranges from trivial to complex: Adaptive/Routed RAG—sometimes even no retrieval. (arXiv)
  • Screenshots, tables, diagrams: Multimodal RAG. (arXiv)

Implementation recipes (copy & adapt)

Recipe A — Company wiki Q&A (robust baseline)

  1. Ingest: Clean markdown/PDFs. Split with recursive splitter (chunk ≈ 700–1,000 chars, 15% overlap). (LangChain Docs)
  1. Index: Hybrid (BM25 + vector). Keep doc metadata (title, section, URL). (City, University of London)
  1. Retrieve:
      • Stage 1: hybrid top‑50, MMR on. (CMU School of Computer Science)
      • Stage 2: re‑rank with a cross‑encoder (e.g., BGE‑reranker) → top‑6. (Hugging Face)
  1. Generate: Prompt with explicit instructions: “Use only the context. Cite like [Title §Section]. If unsure, say you don’t know.”
  1. Evaluate: Track RAGAS faithfulness + context precision/recall; inspect failures. (Ragas)

Recipe B — Long manuals / compliance docs

  1. Build a RAPTOR index offline (summary tree). (arXiv)
  1. At query time: retrieve high‑level nodes → descend to relevant leaves → return a mix of summaries & passages with anchors to pages.
  1. Add a light re‑rank step over the final candidate set.

Recipe C — Policy navigator (entity‑dense)

  1. Run entity extraction (people, orgs, dates, obligations) + relation extraction.
  1. Store graph (nodes/edges) alongside the raw text; build community summaries (per policy area).
  1. Retrieval = (a) neighborhood subgraph, (b) community summary, (c) original passages; answer with both graph paths and citations. (Microsoft GitHub)

Recipe D — “Messy data” assistant (agentic/corrective)

  1. Baseline retrieve → compute a retrieval quality score; if low, rewrite query or expand (HyDE); if still low, search the web (with domain filters). (arXiv)
  1. Generate → self‑critique (ask “Which claims lack support?”) → if ungrounded, loop once with new retrieval. (arXiv)

Prompts & parameters that work

  • Retrieval gating (CRAG‑style)
    • “Given the passages, rate coverage 0–1. If <0.6, rewrite the query or suggest a follow‑up. Return an action in {answer, rewrite, search_web} and a confidence.” (arXiv)
  • Answering with provenance
    • “Answer using only the excerpts. Quote exact spans when possible. After each claim, add [Source Title §Section]. If evidence is missing, say so.”
  • Multi‑query prompt (HyDE/expansion)
    • “Generate 5 diverse ways a domain expert might ask this, plus a short hypothetical abstract answering it.” (arXiv)
Default knobs to start with
  • k (first stage) = 50; m (after re‑rank) = 6
  • MMR λ = 0.2–0.4; chunk 700–1,000 chars; overlap 10–20%
  • Hybrid fusion α (dense weight) = 0.5–0.7; adjust per evaluation

Common failure modes & fast fixes

  • Low recall (answers not retrieved): increase k, add multi‑query/HyDE, use hybrid search, fix chunking boundaries. (arXiv)
  • High recall, low precision (lots of noise): add re‑ranker; reduce m; enable MMR. (Hugging Face)
  • Hallucinations: require quotes/citations; add faithfulness checks; use corrective loops. (Ragas)
  • Long‑doc brittleness: switch to RAPTOR or parent‑document retrieval. (arXiv)
  • Entity/link reasoning: add Graph RAG layer for relationships. (Microsoft GitHub)

Measuring progress (what “good” looks like)

Track two layers:
  1. Retriever: contextual precision/recall (+ diversity) and coverage of gold answers. Tools: DeepEval (contextual precision/recall/relevancy), RAGAS (context metrics). (DeepEval)
  1. Generator: faithfulness (groundedness) and answer relevancy; run spot human evals on critical paths. Tools: RAGAS, TruLens (RAG triad: relevance, groundedness, coherence). (Ragas)
Ship with dashboards that show: retrieved sources, re‑rank scores, citations used, and a fail‑open path (“I don’t know—need more sources”).

Quick reference: “When to use what”

Situation
Choose
Key extras
Small corpus, straightforward Qs
Baseline RAG
Hybrid toggle, citations
Mix of IDs + paraphrases
Hybrid RAG
BM25 + dense + MMR + cross‑encoder
Vague/short queries
Multi‑query/HyDE
3–8 variants, merge and dedupe
Needs long‑range context
RAPTOR / Parent‑doc
Summary tree + leaf passages
Entity‑dense domain
Graph RAG
Graph neighborhoods + community notes
Retrieval unreliable
Corrective/Agentic
Retrieval quality gate + web fallback
Mixed easy/hard traffic
Adaptive/Routed
Classify complexity; pick pipeline
Images/tables/screenshots
Multimodal RAG
OCR + table parsers; visual cites
Precision pain after recall
Re‑ranked RAG
Cross‑encoder (MonoT5/BGE‑reranker)

Further reading (select, foundational)

  • RAG (original): Lewis et al., NeurIPS 2020. (arXiv)
  • Sparse vs dense basics: BM25 review; DPR dense retrieval. (City, University of London)
  • Re‑ranking: MonoT5; BGE‑reranker docs. (Hugging Face)
  • Query expansion: HyDE. (arXiv)
  • Multi‑step planning: Step‑Back Prompting. (arXiv)
  • Hierarchical retrieval: RAPTOR. (arXiv)
  • Graph RAG: Microsoft GraphRAG posts and docs. (Microsoft)
  • Corrective/agentic: Self‑RAG; CRAG. (arXiv)
  • Evaluation: RAGAS; TruLens; DeepEval. (Ragas)

Final checklist before you go live

Choose the lightest pattern that meets accuracy (add complexity only for measured gains).
Log retrieval quality (coverage, diversity), re‑rank scores, and citations used.
Add a fail‑open path (“insufficient evidence”) instead of guessing.
Automate freshness (TTL, re‑embed diffs) and safety (PII filters in ingestion).
Keep an evaluation suite (gold Q&A and adversarial cases) and run it on every change.
 
 
MCP Deep Dive: A Simple (but Detailed) Guide

Newer

MCP Deep Dive: A Simple (but Detailed) Guide

Guide to “RAY” by Anyscale

Older

Guide to “RAY” by Anyscale

On this page

  1. A quick mental model (taxonomy)
  2. The RAG menu (10 patterns you can actually ship)
  3. 1) Baseline / Single‑shot RAG
  4. 2) Hybrid RAG (Sparse + Dense)
  5. 3) Re‑ranked RAG (Two‑stage retrieval)
  6. 4) Query‑expansion RAG (Multi‑Query + HyDE)
  7. 5) Multi‑hop / Decomposition RAG
  8. 6) Hierarchical / Summary‑Tree RAG (RAPTOR)
  9. 7) Graph RAG (Knowledge‑graph‑aware)
  10. 8) Agentic / Corrective RAG
  11. 9) Adaptive / Routed RAG
  12. 10) Multimodal RAG
  13. Cross‑cutting techniques that move the needle
  14. Chunking (how you split matters)
  15. Retrieval tweaks
  16. Orchestration
  17. Prompting for generation
  18. Evaluation (don’t skip this)
  19. How to choose (decision playbook)
  20. Implementation recipes (copy & adapt)
  21. Recipe A — Company wiki Q&A (robust baseline)
  22. Recipe B — Long manuals / compliance docs
  23. Recipe C — Policy navigator (entity‑dense)
  24. Recipe D — “Messy data” assistant (agentic/corrective)
  25. Prompts & parameters that work
  26. Common failure modes & fast fixes
  27. Measuring progress (what “good” looks like)
  28. Quick reference: “When to use what”
  29. Further reading (select, foundational)
  30. Final checklist before you go live