Retrieval-augmented generation (RAG) combines large language models (LLMs) with external knowledge retrieval to produce more accurate and grounded outputs. In Theory, RAG should reduce hallucinations and improve factuality. However, many RAG systems still yield incomplete answers, irrelevant documents, or confidently wrong information in practice.
This article identifies 23 common pitfalls across the RAG lifecycle and offers practical fixes for each.
Data & Indexing
1. Poor Chunking of Content
Breaking source documents into inappropriate chunks can lead to the loss of context. Use overlapping chunks or varying sizes based on content. Keep chunks within the LLM's context window, about 50–75% of max tokens.
2. Outdated or Incomplete Knowledge Base
If the knowledge corpus lacks information to answer a query, the model cannot provide a grounded response. Maintain a comprehensive and current knowledge base by regularly updating content.
3. Using an Unsuitable Embedding Model
Choose and fine-tune your embedding model carefully. Pick a model that aligns with your domain, such as BioBERT for biomedical texts or LegalBERT for legal documents.
4. Not Updating Embeddings and Index
A "stale" index occurs when your documents or embedding model change, but the vector index hasn't been refreshed. Re-index regularly and monitor embedding drift.
5. Low-Quality or Noisy Data in the Corpus
Clean and organize your knowledge base by removing duplicates, errors, and unrelated text before indexing.
6. Ignoring Metadata and Contextual Signals
Incorporate metadata into search and ranking. Implement a hybrid strategy: first filter by metadata, then rank by semantic similarity.
Retrieval & Ranking
7. Over-Reliance on a Single Retrieval Method
Utilize a hybrid approach that combines semantic vector and lexical keyword search for enhanced coverage.
8. Retrieving Too Many or Too Few Documents
Enhance Top-K retrieval by testing various K values. Start with a moderate K (around 10), then narrow results to 3-5 relevant documents.
9. Suboptimal Document Ranking
Tighten ranking with a multi-stage approach. Generate candidates quickly, then apply a cross-encoder reranker for deeper relevance.
10. Context Window Overflow and Truncation
Mitigate this by optimizing consolidation and filtering. Use concise, coherent chunks; pre-filter sentences by relevance.
11. Noisy or Conflicting Retrieved Context
Filter and refine the context before generation. Remove outdated documents when newer ones address the query.
Prompt & Query
12. Poor Prompt Design and Instructions
Clarify the format in the prompt. Separate the retrieved context from the question with explicit delimiters.
13. Not Handling Ambiguous or Broad Queries
Add query understanding and refinement before retrieval. Rewrite or expand vague prompts into specific questions.
14. Failing to Address Multi-Part Questions
Break the problem into parts. Break down the query into specific subquestions, retrieve evidence for each, and then merge the results.
System & Scale
15. High Latency in Retrieval or Generation
Tune performance end-to-end. Use approximate nearest neighbor indexes, such as HNSW or FAISS IVF.
16. Poor Scalability Planning
Scale design early. Utilize distributed, sharded vector stores and select ANN structures suited to the size and update rate.
Safety, Evaluation & Trust
17. No Fallback for Unanswerable Queries
Handle these cases explicitly. Detect low-confidence retrieval and return a clear "I don't know" instead of generating.
18. Inadequate Evaluation and Monitoring
Build a full evaluation stack. Measure retrieval with Recall@K and MRR on queries where relevant documents are known.
19. Persisting LLM Hallucinations
Tighten grounding and add verification. Tell the model to use only retrieved sources and require fine-grained citations.
20. Lack of Source Attribution for Answers
Make attribution a first-class feature. Instruct the model to cite the document used for each claim.
21. Model Fabricating References or Citations
Constrain citation to real, retrieved documents. Assign each retrieved item an ID and instruct the model to cite only from that list.
22. Bias and Fairness Issues in Retrieval or Generation
Address bias across data, retrieval, and generation. Curate a diverse corpus and document coverage gaps.
23. Neglecting Data Privacy and Security
Mitigate this with strict governance. Enforce permission-aware retrieval, apply least-privilege access, and redact personal identifiers.