GraphRAG and Structured Knowledge Retrieval
Vector search is excellent for finding semantically similar chunks. It is weaker when the question depends on relationships across many entities.
GraphRAG adds a knowledge graph layer so the system can retrieve by entities, relationships, communities, and paths.
When GraphRAG helps
| Question type | Vector RAG | GraphRAG |
|---|---|---|
| "Find similar passages" | strong | okay |
| "Summarize this document" | strong | okay |
| "How are these teams connected?" | weak | strong |
| "What changed across acquisitions?" | weak | strong |
| "Which risks depend on the same vendor?" | weak | strong |
The pipeline
documents
-> chunking
-> entity extraction
-> relationship extraction
-> graph construction
-> community summaries
-> graph + vector retrieval
-> grounded answer
Key design choices
- Entity schema: people, products, teams, policies, systems, incidents
- Relationship schema: owns, depends_on, reports_to, caused_by, replaced_by
- Graph storage: graph database, relational tables, or document store
- Retrieval strategy: graph traversal, vector search, or hybrid
- Summaries: community summaries for broad questions
GraphRAG failure modes
- extracted entities are inconsistent
- relationships are hallucinated during graph construction
- graph is stale
- traversal retrieves too much irrelevant context
- answer cites graph summaries without source documents
Safer GraphRAG pattern
- Extract entities with a schema.
- Store source-span provenance for every relationship.
- Use deterministic IDs where possible.
- Retrieve both graph facts and original source chunks.
- Require final answers to cite source documents.
- Rebuild or incrementally update the graph on a schedule.
GraphRAG does not replace evaluation. You still need retrieval tests, answer-grounding tests, and drift checks when the source corpus changes.
Knowledge check
Q1: Why does GraphRAG help with multi-hop questions?
It can traverse explicit entity relationships instead of relying only on semantic similarity.
Q2: What should every graph edge keep?
Provenance back to the source text or system record that supports it.