23 RAG Pitfalls and How to Fix Them

Retrieval-augmented generation (RAG) combines large language models (LLMs) with external knowledge retrieval to produce more accurate and grounded outputs. In Theory, RAG should reduce hallucinations and improve factuality. However, many RAG systems still yield incomplete answers, irrelevant documents, or confidently wrong information in practice.

This article identifies 23 common pitfalls across the RAG lifecycle and offers practical fixes for each.

Data & Indexing

1. Poor Chunking of Content

Breaking source documents into inappropriate chunks can lead to the loss of context. Use overlapping chunks or varying sizes based on content. Keep chunks within the LLM's context window, about 50–75% of max tokens.

2. Outdated or Incomplete Knowledge Base

If the knowledge corpus lacks information to answer a query, the model cannot provide a grounded response. Maintain a comprehensive and current knowledge base by regularly updating content.

3. Using an Unsuitable Embedding Model

Choose and fine-tune your embedding model carefully. Pick a model that aligns with your domain, such as BioBERT for biomedical texts or LegalBERT for legal documents.

4. Not Updating Embeddings and Index

A "stale" index occurs when your documents or embedding model change, but the vector index hasn't been refreshed. Re-index regularly and monitor embedding drift.

5. Low-Quality or Noisy Data in the Corpus

Clean and organize your knowledge base by removing duplicates, errors, and unrelated text before indexing.

6. Ignoring Metadata and Contextual Signals

Incorporate metadata into search and ranking. Implement a hybrid strategy: first filter by metadata, then rank by semantic similarity.

Retrieval & Ranking

7. Over-Reliance on a Single Retrieval Method

Utilize a hybrid approach that combines semantic vector and lexical keyword search for enhanced coverage.

8. Retrieving Too Many or Too Few Documents

Enhance Top-K retrieval by testing various K values. Start with a moderate K (around 10), then narrow results to 3-5 relevant documents.

9. Suboptimal Document Ranking

Tighten ranking with a multi-stage approach. Generate candidates quickly, then apply a cross-encoder reranker for deeper relevance.

10. Context Window Overflow and Truncation

Mitigate this by optimizing consolidation and filtering. Use concise, coherent chunks; pre-filter sentences by relevance.

11. Noisy or Conflicting Retrieved Context

Filter and refine the context before generation. Remove outdated documents when newer ones address the query.

Prompt & Query

12. Poor Prompt Design and Instructions

Clarify the format in the prompt. Separate the retrieved context from the question with explicit delimiters.

13. Not Handling Ambiguous or Broad Queries

Add query understanding and refinement before retrieval. Rewrite or expand vague prompts into specific questions.

14. Failing to Address Multi-Part Questions

Break the problem into parts. Break down the query into specific subquestions, retrieve evidence for each, and then merge the results.

System & Scale

15. High Latency in Retrieval or Generation

Tune performance end-to-end. Use approximate nearest neighbor indexes, such as HNSW or FAISS IVF.

16. Poor Scalability Planning

Scale design early. Utilize distributed, sharded vector stores and select ANN structures suited to the size and update rate.

Safety, Evaluation & Trust

17. No Fallback for Unanswerable Queries

Handle these cases explicitly. Detect low-confidence retrieval and return a clear "I don't know" instead of generating.

18. Inadequate Evaluation and Monitoring

Build a full evaluation stack. Measure retrieval with Recall@K and MRR on queries where relevant documents are known.

19. Persisting LLM Hallucinations

Tighten grounding and add verification. Tell the model to use only retrieved sources and require fine-grained citations.

20. Lack of Source Attribution for Answers

Make attribution a first-class feature. Instruct the model to cite the document used for each claim.

21. Model Fabricating References or Citations

Constrain citation to real, retrieved documents. Assign each retrieved item an ID and instruct the model to cite only from that list.

22. Bias and Fairness Issues in Retrieval or Generation

Address bias across data, retrieval, and generation. Curate a diverse corpus and document coverage gaps.

23. Neglecting Data Privacy and Security

Mitigate this with strict governance. Enforce permission-aware retrieval, apply least-privilege access, and redact personal identifiers.