Knowledge Graphs for Agent Memory: Design Patterns
← Blog

Engineering·

Knowledge Graphs for Agent Memory: Design Patterns

By Oxagen Team

  • Knowledge Graph
  • Agent Memory
  • AI Agents
  • Ontology
  • Neo4j
  • Architecture

Vector retrieval gives an agent the ability to say "I have seen something like this before." A typed knowledge graph gives it the ability to say "I know what this is, how it relates to what I already know, and when I learned it."

The distinction matters in production. An agent operating on a few hundred memories can get by with cosine similarity and metadata filters. An agent operating on a workspace with thousands of entities — people, documents, meetings, deployments, decisions — needs structure. Specifically, it needs typed nodes, directional edges, temporal validity, and provenance. That is a knowledge graph.

This guide covers the schema patterns, entity resolution strategies, traversal techniques, and hybrid query compositions that work in production agent memory. It is opinionated toward Neo4j because that is what Oxagen runs on, but the patterns apply to any typed graph store. It is also honest about where a graph is overkill.

When you need a graph (and when you do not)

A graph earns its operational cost when three or more of these are true:

  1. Entities are central to your domain. People, companies, repositories, services, documents — things with identity that the agent references repeatedly. If memory is mostly unstructured notes with no clear subjects, a graph adds schema overhead without payoff.

  2. Multi-hop questions show up in retrieval. "Who introduced me to the CTO of the company that acquired our client?" requires three hops: introduction → person → employer → acquisition → client. Vector stores cannot traverse. If every retrieval question is a single-entity lookup, a graph is over-indexed.

  3. Facts change over time. "Sarah is VP of Engineering" was true in March and false in September. If the agent must answer questions scoped to a point in time — and any enterprise agent must — temporal edges on the graph handle this natively. Metadata filters on a vector index handle it with increasing fragility.

  4. Provenance is a requirement, not a nice-to-have. "Why does the agent believe Sarah is VP of Engineering?" should resolve to a specific document, meeting note, or ingestion event. Graphs store provenance as first-class relationships. Vector metadata stores it as a string field that nobody queries until an audit.

  5. Workspace isolation must be deterministic. Multi-tenant agents need absolute guarantees that Workspace A's memory never leaks into Workspace B's retrieval. Graph-scoped queries (WHERE workspace_id = $ws) compose cleanly with traversals. Cross-workspace leakage in a shared vector index is a common production incident.

If none of these are true — the agent is a simple chatbot with a few dozen memories and one user — use pgvector and move on. A graph is infrastructure, and infrastructure has a maintenance cost.

Node and edge schema patterns

The schema that follows has been refined through production use. It models four primitives: entities, observations, relationships, and sources. Everything else — time, confidence, workspace scope — is a property on one of these four.

Entities

An entity is a thing with identity: a person, a company, a document, a service, a concept.

CREATE (e:Entity {
  id: randomUUID(),
  workspace_id: $workspace_id,
  entity_type: "person",
  name: "Sarah Chen",
  aliases: ["Sarah", "schen@acme.com", "S. Chen"],
  created_at: datetime(),
  updated_at: datetime(),
  confidence: 0.95
})

Key decisions:

  • entity_type is a property, not a label. Neo4j labels are indexed by default, so (:Person) is tempting. In practice, agent-discovered entity types change often — an extraction pipeline may decide "Project" is a type today and not tomorrow. Keeping entity type as a property with a controlled vocabulary avoids schema-level migration on every type change. Reserve labels for structural categories: Entity, Observation, Source.
  • aliases is a list property. Entity resolution (covered below) merges candidates by adding aliases to the surviving node. Every query that looks up an entity by name should match against name and aliases.
  • confidence is a float. Extraction pipelines are not perfect. A confidence score on the entity itself lets downstream traversals filter by certainty. An entity extracted from a subject line at 0.6 confidence is different from one extracted from a structured form at 0.99.
  • workspace_id is on every node. This is the workspace-scoping mechanism. Every Cypher query includes WHERE n.workspace_id = $ws or uses a composite index on (workspace_id, entity_type).

Observations

An observation is a fact the agent has extracted or been told, attached to an entity. "Sarah is VP of Engineering." "The staging database runs PostgreSQL 15." "The user prefers tables over prose."

CREATE (o:Observation {
  id: randomUUID(),
  workspace_id: $workspace_id,
  content: "Sarah Chen is VP of Engineering at Acme Corp",
  embedding: $vector,
  observed_at: datetime("2026-03-15T10:30:00Z"),
  valid_from: datetime("2026-01-01T00:00:00Z"),
  valid_to: null,
  confidence: 0.92,
  observation_type: "fact"
})

Observations carry both observed_at (when the agent learned this) and valid_from / valid_to (when the fact was true). The distinction matters: an agent might learn in April that Sarah became VP in January. observed_at is April; valid_from is January.

The embedding property on the observation node is what makes hybrid retrieval possible. Semantic search runs against embeddings; structural traversal runs against the graph topology. Both live on the same data.

observation_type distinguishes facts ("Sarah is VP"), episodes ("yesterday's meeting covered the Q3 roadmap"), and preferences ("the user wants bullet points"). Different types get different retrieval strategies.

Relationships

A relationship is a typed, directional edge between two entities.

MATCH (a:Entity {id: $from_id}), (b:Entity {id: $to_id})
CREATE (a)-[r:RELATED_TO {
  relationship_type: "works_at",
  workspace_id: $workspace_id,
  valid_from: datetime("2026-01-01T00:00:00Z"),
  valid_to: null,
  confidence: 0.90,
  source_id: $source_id,
  created_at: datetime()
}]->(b)

Key decisions:

  • One relationship type (RELATED_TO) with a relationship_type property, not N named relationship types. Same reasoning as entity types: agent-extracted relationship types are a moving target. A single edge type with a typed property is easier to query generically (WHERE r.relationship_type IN [...]) and avoids schema proliferation.
  • Temporal edges. valid_from and valid_to on every relationship edge. "Sarah works at Acme" is true from January 2026 onward; "Sarah works at Initech" was true from 2022 to 2025. Time-bounded traversal is a WHERE clause, not a separate data model.
  • source_id on every edge. Points to a Source node. This is provenance — every relationship is traceable to the document, API response, or user statement that produced it.

Sources

A source is the origin of one or more observations or relationships.

CREATE (s:Source {
  id: randomUUID(),
  workspace_id: $workspace_id,
  source_type: "gmail",
  source_ref: "msg:18e4f2a3b1c2d4e5",
  ingested_at: datetime(),
  title: "Re: Q3 Roadmap Review"
})

Sources are referenced by observations and relationships via source_id. When the agent claims "Sarah is VP of Engineering," the provenance chain is: Observation → Source → original email. For audit, explainability, and debugging, this chain must be intact.

Entity resolution and deduplication

Entity resolution is the hardest part of graph-based agent memory. It is also the part that delivers the most value. An agent with 500 resolved entities has a better model of the workspace than one with 2,000 unresolved fragments.

The problem

Every ingestion — an email, a document, a conversation — produces entity mentions. "Sarah" in one email, "Sarah Chen" in another, "schen@acme.com" in a third. Without resolution, these are three nodes in the graph, each with their own observations. The agent's knowledge about Sarah is scattered.

Resolution strategies

String-based candidate generation. Compare normalized names (lowercased, stripped of honorifics and suffixes) using edit distance or token overlap. Fast, catches obvious matches, misses anything requiring context.

def candidate_score(a: str, b: str) -> float:
    """Compute merge-candidate score between two entity names.

    Uses normalized Jaccard similarity over name tokens.

    Args:
        a: first entity name.
        b: second entity name.

    Returns:
        Float between 0.0 and 1.0. Above 0.7 is a merge
        candidate.
    """
    tokens_a = set(a.lower().split())
    tokens_b = set(b.lower().split())
    if not tokens_a or not tokens_b:
        return 0.0
    intersection = tokens_a & tokens_b
    union = tokens_a | tokens_b
    return len(intersection) / len(union)

Embedding-based candidate generation. Embed the entity name plus its top-3 observations. Retrieve nearest neighbors. Catches semantic matches ("VP of Eng" and "Head of Engineering") that string methods miss. Requires a threshold — too low and unrelated entities merge; too high and duplicates persist.

Context-window verification. For candidate pairs above the threshold, send both entities with their observations to an LLM and ask: "Are these the same entity?" This is the most accurate step and the most expensive. Reserve it for candidates in the 0.7–0.9 similarity range where automated methods are uncertain.

The merge operation

When two entity nodes are confirmed as the same entity:

  1. Pick the surviving node (typically the one with more observations or higher confidence).
  2. Move all observations from the duplicate to the survivor. Re-point OBSERVED_ON edges.
  3. Re-point all relationship edges from/to the duplicate to the survivor. Check for duplicate relationships and merge those too.
  4. Add the duplicate's name and aliases to the survivor's aliases list.
  5. Soft-delete the duplicate. Do not hard-delete — the merge must be reversible for debugging.
// Move observations from duplicate to survivor
MATCH (dup:Entity {id: $dup_id})<-[:OBSERVED_ON]-(o:Observation)
MATCH (surv:Entity {id: $surv_id})
CREATE (o)-[:OBSERVED_ON]->(surv)
WITH o, dup
MATCH (o)-[old:OBSERVED_ON]->(dup)
DELETE old

// Re-point outgoing relationships
MATCH (dup:Entity {id: $dup_id})-[r:RELATED_TO]->(target)
MATCH (surv:Entity {id: $surv_id})
CREATE (surv)-[:RELATED_TO {
  relationship_type: r.relationship_type,
  workspace_id: r.workspace_id,
  valid_from: r.valid_from,
  valid_to: r.valid_to,
  confidence: r.confidence,
  source_id: r.source_id,
  created_at: r.created_at
}]->(target)
DELETE r

// Add aliases and soft-delete
MATCH (dup:Entity {id: $dup_id}), (surv:Entity {id: $surv_id})
SET surv.aliases = surv.aliases + dup.aliases + [dup.name]
SET surv.updated_at = datetime()
SET dup.is_deleted = true
SET dup.deleted_at = datetime()
SET dup.merged_into_id = $surv_id

When to run resolution

  • On ingestion. Every new entity mention gets checked against existing entities before creating a new node. This is the primary deduplication point.
  • Batch sweep. A scheduled job that runs candidate generation across all entities in a workspace and queues merge candidates for verification. Catches cross-source duplicates that ingestion-time resolution misses.
  • On query failure. When a traversal returns fragmented results (the same person appears as three nodes in a path), flag the fragments as merge candidates.

Traversal patterns

The schema above supports four traversal patterns that show up repeatedly in production agent memory.

Multi-hop traversal

The core graph operation. "Which of Sarah's reports have worked on projects involving the payments team?"

MATCH (sarah:Entity {name: "Sarah Chen", workspace_id: $ws})
      -[:RELATED_TO {relationship_type: "manages"}]->
      (report:Entity)
      -[:RELATED_TO {relationship_type: "works_on"}]->
      (project:Entity)
      -[:RELATED_TO {relationship_type: "involves"}]->
      (team:Entity {name: "Payments Team"})
WHERE NOT sarah.is_deleted
  AND NOT report.is_deleted
RETURN DISTINCT report.name, project.name

Three hops. Sub-50ms on a properly indexed Neo4j instance. Not expressible as a vector query at any latency.

Time-bounded traversal

The same query, but scoped to a time window. "Who managed the infrastructure team in Q1 2026?"

MATCH (person:Entity)-[r:RELATED_TO {
  relationship_type: "manages",
  workspace_id: $ws
}]->(team:Entity {name: "Infrastructure"})
WHERE r.valid_from <= datetime("2026-03-31T23:59:59Z")
  AND (r.valid_to IS NULL
       OR r.valid_to >= datetime("2026-01-01T00:00:00Z"))
  AND NOT person.is_deleted
RETURN person.name, r.valid_from, r.valid_to

Temporal edges are the mechanism. Without them, the agent returns the current manager and cannot distinguish between "who manages now" and "who managed then."

Confidence-weighted traversal

Filter or rank by confidence to avoid surfacing low-certainty extractions.

MATCH path = (start:Entity {id: $start_id})
      -[:RELATED_TO*1..3]->
      (target:Entity)
WHERE ALL(r IN relationships(path) WHERE r.confidence >= 0.75)
  AND ALL(n IN nodes(path)
          WHERE n.workspace_id = $ws AND NOT n.is_deleted)
RETURN target, reduce(
  conf = 1.0,
  r IN relationships(path) | conf * r.confidence
) AS path_confidence
ORDER BY path_confidence DESC
LIMIT 10

Path confidence is the product of edge confidences along the path. A three-hop path through 0.9-confidence edges has a path confidence of 0.73. A three-hop path through a 0.5-confidence edge drops to 0.41. This lets the agent rank answers by how much it trusts the path that produced them.

Neighborhood expansion

Start from one entity and expand outward. "What do we know about this person?"

MATCH (e:Entity {id: $entity_id, workspace_id: $ws})
OPTIONAL MATCH (e)-[r:RELATED_TO]-(neighbor:Entity)
  WHERE NOT neighbor.is_deleted
OPTIONAL MATCH (e)<-[:OBSERVED_ON]-(o:Observation)
  WHERE o.observation_type = "fact"
RETURN e, collect(DISTINCT {
  neighbor: neighbor.name,
  type: r.relationship_type,
  direction: CASE
    WHEN startNode(r) = e THEN "outgoing"
    ELSE "incoming"
  END
}) AS relationships,
collect(DISTINCT {
  content: o.content,
  observed_at: o.observed_at,
  confidence: o.confidence
}) AS observations

This is the "entity profile" query. It powers the "what does the agent know about X?" experience and is the starting point for most interactive debugging.

Integrating a graph with vector retrieval

A graph alone does not cover semantic recall. "Find memories related to the Q3 migration" is a vector query — the user did not name an entity or a relationship type. Hybrid retrieval composes vector and graph operations in the same query path.

Pattern 1: Graph-filtered vector search

Narrow structurally, then rank semantically. The most common hybrid pattern.

1. Resolve the entity:     MATCH (e:Entity) WHERE "Sarah" IN e.aliases
2. Collect observations:   MATCH (e)<-[:OBSERVED_ON]-(o:Observation)
3. Vector-rank:            ORDER BY cosine_similarity(o.embedding, $query_vec) DESC
4. Return top-k:           LIMIT 10

Step 1 and 2 use the graph. Step 3 uses the vector index. The search space for the vector operation is the observations attached to Sarah — potentially dozens, not the entire workspace's thousands. Precision and latency both improve.

In Neo4j, this composes as a single Cypher query with a vector index call:

MATCH (e:Entity {workspace_id: $ws})<-[:OBSERVED_ON]-(o:Observation)
WHERE e.name = "Sarah Chen" OR "Sarah Chen" IN e.aliases
WITH o
CALL db.index.vector.queryNodes(
  'observation_embedding', 10, $query_vector
) YIELD node, score
WHERE node = o
RETURN o.content, score
ORDER BY score DESC
LIMIT 5

Pattern 2: Vector-seeded graph traversal

Start semantically, then expand structurally. For exploratory queries where the user does not name an entity.

1. Vector search:       Top-10 observations by similarity to $query
2. Find entities:       MATCH (o)-[:OBSERVED_ON]->(e:Entity)
3. Expand neighbors:    MATCH (e)-[:RELATED_TO*1..2]-(neighbor)
4. Collect context:     Aggregate observations on neighbors
5. Return ranked:       Score by relevance + structural proximity

The vector search produces the seed. The graph traversal produces the context around the seed. The agent gets not just "here is a relevant memory" but "here is a relevant memory, and here is everything related to it."

Pattern 3: Parallel query with merge

Run a structural query and a semantic query independently, then merge results.

Structural: "All observations on entities of type 'service'
             with a 'depends_on' relationship to 'payments-api'"

Semantic:   "Top-10 observations similar to 'payments service
             deployment issues'"

Merge:      Union, deduplicate by observation ID, rank by
            (structural_relevance * 0.6 + semantic_score * 0.4)

This pattern works when the query has both a structural and a semantic interpretation. The merge weights are tunable per domain — structured-heavy domains (infrastructure, org charts) bias toward structural; unstructured-heavy domains (meeting notes, conversations) bias toward semantic.

When a graph is overkill

Three situations where the added infrastructure is not justified:

  1. Single-user agents with short memory. An agent that serves one user with fewer than a few hundred memories does not need entity resolution, multi-hop traversal, or temporal edges. pgvector and a good embedding model will cover it.

  2. Agents that do not accumulate state. A stateless tool-calling agent that retrieves from a static corpus on every request is doing RAG, not memory. Graphs model accumulated state. If the agent does not accumulate, there is nothing for the graph to model.

  3. Prototypes. The first version of anything should use the simplest infrastructure that validates the idea. Start with vector, observe where it breaks, and upgrade to a graph when the failure modes in the first section of this article start showing up in your logs.

The honest test: if you cannot point to a specific multi-hop query, entity resolution failure, or temporal correctness bug in your current system, you probably do not need a graph yet. When you can, you do.

Schema evolution in production

Graphs are not append-only. Entity types change, relationship types are added, observation schemas expand. A few patterns that keep this manageable:

Additive-only for properties. New properties on existing node types are nullable by default. Old queries that do not reference the new property still work. Subtractive changes (removing a property) are rare and should be treated as migrations.

Controlled vocabulary for types. entity_type and relationship_type values come from a workspace-scoped configuration, not from free-text extraction. The extraction pipeline maps raw text to the vocabulary; if it cannot, it falls back to a generic type. This prevents type explosion (hundreds of near-synonym relationship types that fragment traversal).

Versioned observations. When a fact changes, do not update the observation in place. Create a new observation with the current observed_at and set valid_to on the old one. Both remain in the graph. The agent can answer "what is true now?" and "what was true in March?" with the same traversal, different time filters.

Putting it together

The schema in this article — entities with aliases and confidence, observations with embeddings and temporal validity, relationships with types and provenance, sources for auditability — is the data model Oxagen uses in production. It powers workspace-scoped, MCP-native agent memory with hybrid vector+graph retrieval.

The patterns are not Oxagen-specific. Any team running Neo4j (or another typed graph store) can implement the same schema. What Oxagen provides is the extraction pipeline, entity resolution, hybrid query layer, and workspace isolation on top of this schema — so teams building agents can skip the six months of graph infrastructure work and get to the part that differentiates their product.

Further reading


Oxagen is the ontology layer for AI agents — a typed, Neo4j-backed knowledge graph with entity resolution, temporal edges, and hybrid retrieval built in. Read the docs to get an API key, or book a demo to see production agent memory in practice.