Engineering·
Memory Architectures for AI Agents: Vector, Graph, Hybrid
By Oxagen Team
- AI Agents
- Agent Memory
- Knowledge Graph
- Vector Database
- Architecture
An AI agent's memory is not one thing. It is four things — facts, episodes, skills, and preferences — running on the same infrastructure, usually badly.
Most deployed agents in 2026 ship memory as a flat vector index over chat history. This works for a demo and breaks for a product. The reason is not that vector retrieval is bad; it is that vector retrieval is one primitive, and serious agent memory needs three or four.
This piece compares the three architectures that show up in production — pure vector, pure graph, and hybrid — on recall quality, latency, operational cost, and the specific failure modes each hits. The goal is to help you decide which one to build first and when to upgrade.
What agents actually need memory for
Before the architecture question, a clarifying one: what is memory actually storing?
Facts. Statements that are true about entities the user cares about — people, companies, documents, deployments. "Sarah is the VP of Engineering." "The staging database is running PostgreSQL 15." Facts have a source, a time, and a truth value that may change.
Episodes. Traces of prior interactions. "Last Tuesday the user asked about Q2 revenue and I returned the wrong quarter." Episodes are timestamped and rarely mutate.
Skills. Procedures the agent has learned to execute. Voyager-style skill libraries, agent-written tools, chain-of-thought templates that worked. Skills are code-shaped, not fact-shaped.
Preferences. Standing instructions from the user. "Always return tables, not prose." "Never email Sarah after 6pm." Preferences cross sessions and need to be surfaced proactively.
Each has different retrieval requirements. Facts need entity resolution and temporal correctness. Episodes need recency and recall. Skills need exact match and versioning. Preferences need proactive injection, not retrieval on request.
A memory architecture that treats all four as "embed and retrieve" will underperform on all four. The differences between vector, graph, and hybrid come down to which of these requirements each can serve natively.
Vector-only memory
Store everything as an embedding with metadata. Retrieve by cosine similarity against the current query.
Where it works:
- Episodic recall at small scale. Under a few thousand memories per workspace, top-k semantic retrieval surfaces relevant prior conversations reliably.
- Unstructured notes. When memory is paragraphs of text without clear entities, vector retrieval is the only reasonable option.
- Rapid prototyping. Getting "my agent remembers things" from zero to working is one pgvector index and a few lines of code.
Where it breaks:
- Entity resolution. "Sarah," "Sarah Chen," and "schen@" embed to similar vectors but not identical ones. Vector retrieval returns near-duplicates instead of resolving them. At a few thousand entities, the same person can have a dozen memory fragments.
- Multi-hop queries. "Which of the people I met last quarter are at companies we've since contracted with?" cannot be answered by cosine similarity. It requires two hops: meetings → people → employers → contracts. Vector stores have no concept of hop.
- Temporal correctness. An embedding of "Sarah is a senior engineer" from 2024 and "Sarah is VP of Engineering" from 2026 both come back on a query about Sarah's role. The agent picks one by vibes.
- Scale. Past roughly 50,000 memories per workspace, top-k retrieval quality degrades measurably. You can push this with reranking and metadata filters, but you are rebuilding graph operations on a vector index.
Typical latency: 20–80ms for top-k on pgvector with HNSW. Operational cost: low — one database, one index.
Graph-based memory
Store entities, relationships, observations, and time as a typed graph. Retrieve by Cypher (or equivalent) traversal.
Where it works:
- Entity resolution. Merge "Sarah" / "Sarah Chen" / "schen@" into a single node with alias properties. Every observation attached to that node is now one memory, not four.
- Multi-hop queries. "Meetings including the VP of Engineering and anyone from security last quarter" is a native traversal — two or three hops, milliseconds on a properly indexed graph.
- Temporal correctness. Edges carry
valid_fromandvalid_to. Queries constrain on time. The agent can answer "who was VP of Engineering in March?" and not just "who is VP of Engineering?" - Provenance. Every observation edge links to its source document, making every fact auditable and explainable.
Where it breaks:
- Unstructured text. Graphs demand a schema. Dumping raw conversation transcripts into a graph without extraction is worse than a vector index — you lose semantic recall and gain nothing.
- Fuzzy semantic retrieval. "Find memories that feel like this" is not a graph operation. Graphs do exact and structural recall, not semantic.
- Schema evolution. Changing a node type in production is harder than changing an embedding column. Get the schema right early or plan for migrations.
Typical latency: 5–50ms for a three-hop traversal on Neo4j with properly indexed labels. Operational cost: medium — a graph database is a second piece of infrastructure with its own backup, scaling, and operational model.
Hybrid memory (vector + graph)
Entities and relationships in a graph. Observations stored on the graph as nodes, with vector embeddings as a property. Retrieval goes through the graph for structural queries and through the vector index for semantic ones; the two can be composed.
The composition patterns that matter:
- Graph-filtered vector search. Narrow to entities matching a structural constraint, then rank by semantic similarity within that set. "What did Sarah say about migrations?" becomes: find Sarah (one node lookup), retrieve observations attached to her (one hop), rank by similarity to "migrations" (vector search over a small set). Faster and more accurate than pure vector.
- Vector-seeded graph traversal. Start from the top-k semantically similar observations, then traverse to connected entities. "What else do we know about this topic?" — similar to query, then expand.
- Structural reranking. Pull a large set semantically, rerank by graph-derived features: entity centrality, recency, provenance quality.
Where it works:
- Everything the two independently do well, plus the combinations above.
- Production agents that have moved past the "vector index as memory" phase almost all end up here.
Where it breaks:
- Complexity. Two data stores, two query patterns, and composition logic between them. Worth it at scale, overkill for a prototype.
- Consistency. Writes must update both stores atomically or accept eventual consistency. Most teams pick eventual and hit edge cases.
Typical latency: 30–100ms for a composed query. Operational cost: highest — graph infrastructure plus vector infrastructure, plus the glue layer.
Comparison at a glance
| Dimension | Vector | Graph | Hybrid |
|---|---|---|---|
| Entity resolution | Poor | Native | Native |
| Multi-hop queries | Not supported | Native | Native |
| Temporal correctness | Manual (metadata) | Native | Native |
| Semantic recall | Native | Not supported | Native |
| Provenance | Metadata only | Native | Native |
| Latency (typical) | 20–80ms | 5–50ms | 30–100ms |
| Operational cost | Low | Medium | Higher |
| When to use | Prototype, unstructured notes | Production, entity-heavy | Production, at scale |
How to choose
Three questions decide the architecture:
- Do your memories have entities? If the agent is operating on a domain with clear entities — people, companies, documents, resources — you need entity resolution, which means a graph.
- Do you need multi-hop queries? If anything the agent must answer requires traversing relationships ("find the person who introduced me to the founder of X"), vector alone will not cut it.
- How much unstructured text do you have? If memories are paragraphs of conversation, you need semantic recall, which means vectors must be part of the picture.
The honest staging most teams should follow:
- Start with pure vector. Ship a working agent with pgvector in a week.
- Add a graph when vector-only plateaus. This happens in production, not in benchmarks — when users notice the agent has a dozen fragmented memories of the same person, or cannot answer "who works with Sarah."
- Go hybrid when both matter. At production scale, almost every serious agent memory ends up here.
The reason Oxagen is built on a typed, Neo4j-backed knowledge graph with pgvector as a secondary index is that step 2 is where every in-house implementation stalls. Teams underestimate the data modeling, then live with the plateau. The ontology layer — entities, relationships, observations, time, provenance — is the prerequisite for an agent whose memory keeps improving rather than keeps growing.
FAQ
What is AI agent memory?
AI agent memory is the persistent store of facts, episodes, skills, and preferences an agent uses to perform better across sessions. Unlike a chat context window, it survives between interactions and can be queried on demand.
When should I use a vector database for agent memory?
Use a vector database when memories are unstructured text, entities are not central to your domain, and you are under a few thousand memories per workspace. Pure vector memory is the right starting point for most prototypes.
When do I need a knowledge graph for agent memory?
Use a knowledge graph when your domain has clear entities and relationships, when you need multi-hop queries, or when temporal correctness and provenance matter — both are requirements for any enterprise deployment.
Is hybrid memory worth the complexity?
Hybrid memory is worth the complexity at production scale or when you need both structural queries and semantic recall. It is overkill for prototypes — start simpler and upgrade when vector-only plateaus.
How does agent memory differ from RAG?
RAG is retrieval over a static corpus, typically documents. Agent memory is retrieval over state the agent itself has written. The mechanisms overlap, but memory is mutable, timestamped, and agent-authored — which is why the same architecture question gets harder answers.
Further reading
- Self-Improving AI Agents: A Technical Overview — where memory fits in the broader agent architecture
- Knowledge Graphs for Agent Memory: Design Patterns — how to model entities, relationships, and time for production
- How to Evaluate Self-Improving AI Agents — measuring whether your memory architecture is actually working
Oxagen is the ontology layer for AI agents — a typed, workspace-scoped knowledge graph with hybrid vector+graph retrieval, MCP-native access, and deterministic multi-hop traversal built in. Read the docs to get an API key, or book a demo to see production agent memory in practice.