Engineering·April 25, 2026

7 Mistakes Developers Make Building Ontologies for AI Agents

By Mac Anderson

Ontology
AI Agents
Knowledge Graph
Schema Design
Mistakes
Best Practices

Agent teams tend to discover ontology mistakes in the same order. The prototype works, the first real user asks a harder question, and suddenly the graph looks wrong — fragmented entities, timestamps that mean the wrong thing, types that are everywhere and nowhere. The mistakes below are not failures of imagination; they're failures that emerge because the cost of a design choice doesn't show up until there are enough nodes in the graph for the shape of the schema to matter.

Each mistake here is paired with the symptom an engineer will actually observe in production and the fix. If any of them describe behavior you've been debugging for longer than a week, that's the one to address next.

1. Treating the ontology as a dumping ground

Symptom: Retrieval is slow, answers are noisy, and the agent surfaces "related" things that are only related in the sense of existing in the same workspace.

What happened: every ingested email, meeting note, and tool call became its own node with a generic Observation type, linked to any entity it mentioned with a generic mentions edge. The graph is dense, the cardinality is huge, and the ranker can't distinguish the meaningful signal from the noise.

The fix is selective extraction. Not everything that passes through the agent deserves a node. An ontology is a decision about what the agent should reason about, not a log of what it has seen. Keep a separate audit log for raw inputs; let the graph carry the distilled typed facts and the relationships that matter to the domain. A good test: if two different extractions of the same input produce different graph state, your pipeline is treating the graph like a log.

2. Over-typing early

Symptom: Every schema migration is painful. The extraction pipeline produces types like SoftwareEngineer, SeniorSoftwareEngineer, StaffSoftwareEngineer, and PrincipalEngineer — all of them used once or twice — and nobody can agree which queries should traverse which.

Rigid type hierarchies feel like discipline. In agent ontologies, they are premature abstractions. Agent-discovered types are a moving target: the extraction pipeline may classify the same person as Customer, ContractHolder, or Account depending on which tool produced the observation. Locking those distinctions into labels at the graph layer means every reinterpretation is a schema migration.

The fix is shallow type hierarchies with a controlled vocabulary. Use a single structural label like Entity and carry the fine-grained type as a property (entity_type: "person"). Keep the vocabulary short (typically a dozen types for the core domain), workspace-scoped, and configurable. New type distinctions earn their way in by being asked for in queries, not by being guessed at during schema design.

3. Shipping without workspace scoping

Symptom: A test tenant sees a result from a real tenant's data in a demo. You push the discovery to the top of the incident channel. You cannot explain the isolation guarantee to the buyer who was watching.

Workspace scoping is not a future-proofing concern — it's a table-stakes requirement for any multi-tenant agent. Graphs make this surprisingly easy to get wrong because the pattern that works for a single-user prototype (a big shared graph, no tenant column) silently also works for the first multi-user demo. It only breaks once the data gets interesting.

The fix is workspace-scoped from day one. Every node and every edge carries a workspace_id property. Every query filters on it. Indexes include it. Your query-builder layer refuses to compose a query that doesn't include the scope. Do not rely on application-layer guards alone — put the constraint in the database (RLS on the auth plane, filter-on-workspace guards on the ontology plane) so a forgotten WHERE clause cannot leak data across tenants.

For deeper discussion of the data model: Knowledge Graphs for Agent Memory: Design Patterns covers workspace scoping alongside entity resolution and temporal edges.

4. No provenance on nodes or edges

Symptom: A user asks "why does the agent believe Sarah is VP of Engineering?" and the team scrolls through extraction logs, tries three SQL queries, and eventually gives up and says "it came from somewhere."

This is the single most common mistake, and it only hurts once. Ontologies without provenance are not auditable. They are also not debuggable when the extraction pipeline drifts — a false fact enters the graph, propagates through a week of queries, and there is no path back to the email or document that introduced it.

The fix is first-class provenance as a graph relationship. Every observation node points to a Source node. Every relationship edge carries a source_id property. Extraction timestamp, pipeline version, and confidence score live on the extracted fact itself. When a fact is updated, don't overwrite in place — version it, so the historical record survives.

This is also what makes agent answers explainable in the sense CIOs care about: the agent can cite the source of any claim, and an auditor can walk from the answer back to the document that produced it.

5. Mixing canonical and derived data without a boundary

Symptom: The agent runs a query against the graph, the extraction pipeline runs a rebuild, and the query starts returning a different answer. Nobody can say whether the previous answer was correct or the current one is.

Production ontologies accumulate two kinds of state: canonical facts the agent ingests and derived facts the agent computes (aggregates, inferences, embeddings, importance scores). When both live in the same node type without a clear boundary, a pipeline rebuild produces indistinguishable "before and after" states, and query stability degrades.

The fix is an explicit boundary between canonical and derived. Canonical data is append-only — observations, sources, raw relationships. Derived data — computed properties, importance scores, merged entity clusters — lives in a separate layer that can be rebuilt from canonical at any time. Label them differently. Query-plan accordingly. When a user is surprised by an answer, you can answer which layer produced it in under a minute.

6. Building your own graph store

Symptom: Six months into the project, 40% of engineering time is going to graph infrastructure (migrations, backups, query planner tuning, entity-resolution throughput). The actual agent — the product — is still at the prototype line.

There are three reasons teams build their own graph infrastructure. One is "we have a specific requirement no store supports." Another is "we thought it would be cheap." The third is "we've always built our own data layers." Only the first one turns out to be true in practice, and when it is true, it is almost always one specific requirement (usually around workspace-scoped isolation, deployment flexibility, or audit-trail semantics) that a two-week engagement with an existing graph vendor could address.

The fix is buy the infrastructure, build the agent. Use Neo4j, Neptune, or a managed ontology platform for the graph layer. Spend your engineering budget on the extraction pipeline, the retrieval composition, and the agent's reasoning — the parts that are specific to your product. If a graph-store limitation does block you later, you will know exactly which one and can address it surgically, rather than re-implementing the whole stack up front.

For how this decision shows up in practice: Static vs. Self-Improving Agents: Production Tradeoffs covers when the operational cost of graph infrastructure is justified.

7. No semantic layer — only structural, or only vector

Symptom: The agent is great at structural queries but hopeless at fuzzy semantic ones. Or it's great at semantic recall but falls apart on multi-hop questions. Users find the seams quickly because the seams are in the query types they naturally ask.

A pure graph agent hits a wall on open-ended retrieval. A pure vector agent hits a wall on multi-hop and auditability. Teams that commit to one camp early often discover the other shortcoming only after the buyer stops returning calls.

The fix is hybrid retrieval from the start. Embeddings live on observation nodes alongside typed relationships. A single query path can filter structurally (workspace, entity, time window) and rank semantically (by similarity). The hybrid is the product; neither pure retrieval mode is.

For the three hybrid patterns that work in production — graph-filtered vector search, vector-seeded traversal, and parallel queries with merged ranking — see Knowledge Graphs vs. RAG for AI Agents: When to Use Which.

A closing test

Reading this list, you might find yourself nodding at two or three of them. That's normal; most agent teams hit at least three of these before the first real deployment. The question worth asking is not "have I made these mistakes?" — it's "can I tell, from looking at my graph today, which of these I've already made?"

If you can't tell, the next week's work is eval: build concrete retrieval test cases for multi-hop, temporal, provenance, and workspace-isolation queries. The mistakes become visible the moment you run them.

Oxagen ships a typed, workspace-scoped, Neo4j-backed ontology with provenance and hybrid retrieval built in — so agent teams can skip the six months of graph infrastructure mistakes and ship against the parts that differentiate their product. Read the docs to get API access, or book a demo to see the patterns above running in production.

FAQ

What's the single most impactful ontology mistake to fix first?

Missing provenance. Every other mistake on this list is debuggable if you can trace facts back to their sources. Ontologies without provenance compound problems across every other failure mode because you can't diagnose what went wrong. If you only have time to fix one thing, make provenance a first-class graph relationship: observations point to source nodes, relationships carry source_id properties.

Is over-typing always bad?

Not always — rigid types are appropriate when the domain is genuinely stable (legal entities, medical codes, financial instruments). For agent ontologies extracting from unstructured sources, type vocabularies evolve faster than schema migrations can accommodate. Start shallow with a controlled vocabulary (8-12 types for the core domain), let queries drive type refinement.

How do I know when I need workspace scoping?

The moment you have two users with data that should not cross. This is earlier than most teams think — it applies to personal assistants with multiple users, B2B products with multiple tenants, and internal tools where team-level isolation matters. Retrofitting workspace scoping after data accumulates is painful; adding it on day one costs almost nothing.

Can I add provenance retroactively to an existing ontology?

Partially. New facts from the date of the change forward can carry full provenance. Historical facts need a backfill strategy — usually a "provenance_source: legacy_migration" sentinel source for everything that predates the instrumentation. The backfill is worth it; the sentinel makes it clear which facts have first-class provenance versus which should be treated with caution during audits.

Should every agent team build their own graph infrastructure?

Almost never. Graph infrastructure is a real engineering investment (query planning, migrations, backups, entity resolution throughput, workspace isolation guarantees) and an ongoing operational cost. Unless your company is a graph-database vendor, the engineering budget is better spent on the extraction pipeline, retrieval composition, and agent reasoning — the parts that differentiate your product.