Knowledge Graphs vs. RAG for AI Agents: When to Use Which
← Blog

Engineering·

Knowledge Graphs vs. RAG for AI Agents: When to Use Which

By Mac Anderson

  • Knowledge Graph
  • RAG
  • AI Agents
  • Retrieval
  • Ontology
  • Vector Search

"Is RAG enough?" is the question most agent teams hit somewhere between week six and month four. It's the right question, and the honest answer is neither "yes" nor "no" — it's "that depends on what your agent actually needs to retrieve." This guide lays out the tradeoffs, shows where each approach breaks, and explains the hybrid patterns that most production agent teams end up running.

The framing matters. Knowledge graph and RAG are not competing religions. A knowledge graph is a data model for typed entities and their relationships. RAG is a retrieval pattern — embed chunks, find nearest neighbors, feed the chunks to the model. You can do RAG against a knowledge graph (retrieve typed nodes by semantic similarity), and you can do graph traversal on top of RAG-returned chunks (extract entities, walk the graph around them). The interesting question is not "which one," it's which retrieval shape matches the query shape your agent is asked to answer.

What RAG is good at

RAG — in the plain sense of "embed a document corpus, search by vector similarity, pass top-k to the model" — is the correct answer for one type of retrieval: open-ended semantic recall over unstructured text.

When the user asks "Find me notes from meetings that touched on the Q3 migration" and there is no structured entity named "the Q3 migration," RAG is the right tool. Embeddings capture the gist of a chunk; nearest-neighbor search finds other chunks that feel similar. The agent reads them and composes an answer.

RAG works particularly well when:

  • The source data is unstructured (meeting notes, support tickets, emails, documentation)
  • Queries are phrased as topics or themes, not entities
  • "Close enough" retrieval is acceptable — the model can rerank, filter, and compose
  • The corpus is mostly append-only and facts don't contradict each other over time

There's a reason every agent demo starts with RAG: it gets you 70% of the way to useful behavior in an afternoon, the infrastructure (a vector store and an embedding model) is cheap and well-understood, and failure modes are gentle — a bad retrieval returns less relevant chunks, not wrong chunks.

Where RAG breaks

RAG breaks when the query requires structure the embedding does not encode.

Multi-hop questions. "Which of Sarah's direct reports worked on projects that the payments team was also involved in?" This is a three-hop traversal: Sarah → manages → people → worked_on → projects → involves → payments_team. No amount of embedding similarity can reconstruct that chain. The best a vector store can do is return chunks that mention several of those nouns together — which is not the same as answering the question.

Constrained lookups. "Show me every open deal above $50k that hasn't had a touchpoint in 14 days." This is a filter with a WHERE clause, not a semantic query. Vector similarity doesn't filter; it ranks. You can stuff the filter into the prompt and hope the model does it right, but you've just turned a deterministic operation into a probabilistic one.

Entity resolution failures. If "Sarah Chen," "Sarah C.," and "schen@acme.com" produce three separate chunks with three separate embeddings, RAG treats them as three separate things. The agent's mental model of Sarah is scattered across top-k results, and the answer degrades unpredictably as the corpus grows.

Temporal correctness. "Who managed the infrastructure team in Q1 2026?" The answer depends on when you're asking. RAG surfaces the most-embedding-similar chunk, which typically means the most recent one — you ask about Q1 2026 and get the person who manages the team today. Metadata filtering on vector indexes helps a little, but it gets fragile fast.

Provenance. When an auditor or CIO asks "why does the agent believe X?", RAG answers with "this chunk was similar to that query at the moment of retrieval." That's not provenance. A knowledge graph can answer with "this fact came from that source, extracted by this pipeline on this date, with this confidence" — and that difference matters for enterprise agents.

None of this means RAG is wrong. It means RAG is the right tool for one shape of retrieval question, and the shapes above are not that shape.

What knowledge graphs are good at

A typed knowledge graph — entities with types, relationships with types, properties on both — is the correct answer for structural retrieval:

  • Who is connected to what, and how?
  • What's two hops from X along this kind of edge?
  • Which entities match this filter AND have this relationship to that entity?
  • What was true as of this point in time?
  • Where did this fact come from?

Typed graphs give you precise, deterministic, and auditable answers to those questions. Traversal latency is predictable (sub-100ms at millions of entities on a properly indexed Neo4j instance). Results are explainable because you can trace the exact path the query walked.

They're particularly valuable when:

  • Entities have identity the agent references repeatedly (people, companies, documents, services, accounts)
  • Relationships carry meaning the agent must reason about (works_at, depends_on, mentions, owns)
  • Facts change over time and the agent must answer time-scoped questions
  • Workspace isolation is not optional (multi-tenant agents)
  • Auditability is a real requirement, not a nice-to-have

Where knowledge graphs break

Graphs break when the query has no structural handle and the question is fundamentally about meaning.

"What themes came up across last week's customer calls?" is not a graph query. Even if every call is an entity and every topic is a tagged relationship, "themes" is a semantic concept that emerges from the aggregate text. A graph can narrow the search space (filter to last week's call entities), but the actual theme extraction still happens on embeddings or the model's text comprehension.

Similarly, graphs struggle with:

  • Queries phrased in terms the extraction pipeline didn't capture ("find the email where they mentioned pricing anxiety")
  • Fuzzy matching where no alias has been resolved yet ("maybe it was someone at FinCo, or was it Acme?")
  • Exploratory retrieval where the user doesn't know what to ask for
  • Long-tail entities that appear once and weren't important enough to resolve

A graph-only agent hits a wall on exactly the queries RAG is good at. Which is why most production agents end up running both.

Hybrid patterns that work in production

The three hybrid patterns below show up repeatedly in real agent systems. They are not mutually exclusive — most mature systems run all three as different retrieval paths inside the same query planner.

Pattern 1: Graph-filtered vector search

Narrow structurally first, then rank semantically. This is the most common hybrid pattern and the one with the highest precision lift.

1. Resolve the entity:   match Sarah in the graph
2. Collect observations: grab all observations tied to her
3. Vector-rank:          order those observations by similarity to the query
4. Return top-k

The structural step shrinks the search space from workspace-wide to Sarah-relevant. The vector step then surfaces the most relevant memories within that narrowed set. Both precision and latency improve versus a flat vector search over the whole corpus.

In a typed graph with a vector index on observation nodes, this composes cleanly into a single query — you filter by relationship, then sort by vector similarity, all on the same backend.

Pattern 2: Vector-seeded graph traversal

Start semantically, then expand structurally. Useful when the user's question doesn't name an entity.

1. Vector search:       top-k observations by similarity to the query
2. Find the entities:   which entity does each observation attach to?
3. Expand 1-2 hops:     collect neighbors of those entities
4. Return with context: the seed observation plus the neighborhood around it

This is how you get from "explain what we know about the Q3 migration" (no entity named) to a coherent answer that pulls in the project, the people who worked on it, the related decisions, and the timeline. RAG finds the door, the graph gives you the building.

Pattern 3: Parallel structural + semantic with merge

Run a structured query and a semantic query independently, then merge the results. For questions that have both a structured and a semantic interpretation.

Structured query:   all observations on "service" entities
                    with a "depends_on" relationship to "payments-api"
Semantic query:     top-10 observations similar to "payments service
                    deployment issues"
Merge:              union, deduplicate by observation id,
                    rank by (structural_relevance * 0.6 + semantic_score * 0.4)

The merge weights are tunable per domain. Structured-heavy domains (org charts, infrastructure maps) weight toward structural; unstructured-heavy domains (meeting notes, conversations) weight toward semantic.

The pattern works because the two retrieval paths fail differently: when the structural query returns nothing (the entity isn't in the graph yet), the semantic query still has a shot at relevance. When the semantic query drifts off-topic, the structural query anchors the answer to the right part of the domain.

A decision framework

If you're picking between RAG and a typed knowledge graph for the first version of an agent, the honest sequence is:

  1. Start with RAG if the agent's corpus is mostly unstructured text and the first use case is "ask questions about the content."
  2. Upgrade to a graph when you can point to a specific class of queries your RAG is getting wrong — multi-hop, temporal, provenance, or structural.
  3. Go hybrid when you're past the prototype and the business case depends on both open-ended recall and auditable, structural answers — which, for any agent serving enterprise workloads, is almost always.

The honest test: can you write down three queries your agent fails on today, and can you explain which retrieval shape would fix them? If yes, you know what to build next. If no, you need better eval before you need better infrastructure.

What typed graphs give you that matters for enterprise

For consumer or single-user agents, RAG alone is often enough. For agents sold into enterprises — which is the main reason anyone reaches this article — typed graphs carry four properties that the business requires:

  • Determinism. The same query against the same graph state returns the same result. Not "approximately the same." The same. This matters for reproducibility, regression testing, and incident response.
  • Auditability. Every fact has provenance. When a CIO asks where the agent learned something, the answer is a source node, an extraction timestamp, and a confidence score. RAG answers the same question with "vector similarity at retrieval time," which does not satisfy compliance review.
  • Workspace isolation. Multi-tenant agents need hard guarantees that workspace A's data cannot appear in workspace B's retrieval. Graph traversal with a WHERE workspace_id = $ws clause composes cleanly. Cross-tenant contamination in a shared vector index is a common production incident and a very hard one to prove you've eliminated.
  • Schema control. A typed graph lets the agent reason about the domain in the domain's terms. A flat vector store treats everything as "text similar to other text," which works until the domain expert starts asking domain questions.

If your buyer is a developer shipping an agent, RAG covers the first 70% and a graph covers the 30% that blocks them from shipping to their own enterprise customers. If your buyer is the CIO of that enterprise, the graph's properties are the product — RAG alone does not pass procurement.

The short version

  • RAG is the right answer for open-ended semantic recall over unstructured text, when "close enough" retrieval is acceptable.
  • A typed knowledge graph is the right answer for multi-hop, time-bounded, constrained, or audited retrieval over entities the agent references repeatedly.
  • Most production agents need both — a hybrid retrieval layer composed at query-plan time, not an either/or religious commitment.
  • Start simple, upgrade by failure mode. Let the queries your agent is getting wrong tell you what infrastructure to build next.

Oxagen is the ontology layer for AI agents — a typed, Neo4j-backed knowledge graph with hybrid vector + structural retrieval built in, so teams building agents can skip the six months of graph infrastructure work and get to the part that differentiates their product. Read the docs for API access, or book a demo to see hybrid retrieval in production.

FAQ

Is a knowledge graph always better than RAG?

No. RAG is the right answer for open-ended semantic recall over unstructured text — themes across meeting notes, "find me something like this," exploratory search. A knowledge graph is the right answer for structural queries: multi-hop traversal, time-bounded lookups, workspace-isolated retrieval, and auditable provenance. Most production agents run both, composed into hybrid retrieval patterns at query-plan time.

When should an agent team upgrade from RAG-only to a knowledge graph?

Upgrade when you can name three queries your agent is getting wrong and explain that a graph would fix them — typically multi-hop ("who is connected to X through Y?"), temporal ("what was true in March?"), constrained ("show me every X where Y and Z"), or audit-driven ("where did this fact come from?"). If you can't name the failure modes, upgrade your eval before you upgrade your infrastructure.

Can a knowledge graph replace RAG entirely?

No — not for agents handling open-ended semantic questions. Graphs struggle with fuzzy matching, theme extraction, and queries phrased in terms the extraction pipeline didn't capture. The typed graph narrows the search space; vector retrieval covers the semantic surface. A graph-only agent hits a wall on exactly the queries RAG handles best.

What is hybrid retrieval?

Hybrid retrieval composes structural (graph) and semantic (vector) retrieval in the same query plan. Three common patterns: (1) graph-filtered vector search — narrow by entity, rank by similarity; (2) vector-seeded graph traversal — find relevant observations semantically, expand their neighborhood structurally; (3) parallel structural and semantic queries with a merged, weighted result set. Most mature agent systems run all three as different retrieval paths.

Does Oxagen support both vector and graph retrieval?

Yes. Oxagen's retrieval layer exposes both vector similarity search and typed graph traversal over the same workspace-scoped data, composable into the hybrid patterns described above. Embeddings live on observation nodes alongside typed entities and relationships, so a single query can filter structurally and rank semantically.

Further reading