Blog

Engineering notes, benchmarks, and migration write-ups from the Oxagen team.

Engineering
May 4, 2026
AI Agent Benchmarks: What Actually Matters
What major agent benchmarks (SWE-bench, GAIA, AgentBench, τ-bench) actually measure — and why a high score does not predict production fit. Includes a benchmark red-flag checklist and a six-property spec for the in-domain benchmark you almost certainly need to build.
AI AgentsAgent EvaluationBenchmarks
Engineering
May 1, 2026
7 Mistakes Teams Make Evaluating AI Agents
Seven evaluation mistakes that ship agents which pass evals and fail in production — endpoint-only scoring, static golden sets, uncalibrated LLM judges, ignored tool-call failures, and missing cost metrics. Each one paired with the fix.
AI AgentsAgent EvaluationEvaluations
Engineering
April 28, 2026
LLM Evals vs Agent Evals: Key Differences
LLM evals score single prompt-completion exchanges; agent evals must grade trajectories — tool choice, recovery, termination, and cost — not just final answers.
AI AgentsLLM EvaluationAgent Evaluation
Engineering
April 25, 2026
7 Mistakes Developers Make Building Ontologies for AI Agents
Seven concrete failure modes that show up when teams build typed knowledge graphs for AI agents — each one observable in agent behavior, each one fixable if you catch it before the corpus scales.
OntologyAI AgentsKnowledge Graph
Engineering
April 22, 2026
How to Design a Typed Schema for Agent Memory
A step-by-step guide to designing the typed schema behind an AI agent's memory — with a worked example, the decisions that matter, and the anti-patterns that silently bite in production.
OntologyAgent MemorySchema Design
Engineering
April 19, 2026
MCP-Native Ontology: Connecting AI Agents to Structured Data
A hands-on tutorial for plugging a typed, workspace-scoped knowledge graph into Cursor, Claude Code, VS Code, Windsurf, and Codex over the Model Context Protocol — with one-line installers per client.
MCPModel Context ProtocolOntology
Engineering
April 16, 2026
Knowledge Graphs vs. RAG for AI Agents: When to Use Which
Vector RAG answers semantic similarity. A typed knowledge graph answers structural questions. Most production agents need both — here's the decision framework and where each one breaks.
Knowledge GraphRAGAI Agents
Engineering
April 13, 2026
What Is an Ontology for AI Agents?
The definitive guide to ontologies for AI agents — what they are, how they differ from flat vector retrieval, when agents need one, and what a production ontology looks like in practice.
OntologyAI AgentsKnowledge Graph
Culture
April 10, 2026
Working at Oxagen: the builder’s mindset
Why we hire for slope over pedigree, how “any person can be the right person for any job” works in practice, and the benefits package that matches the intensity of building ontology infrastructure for agents.
CareersStartupsCulture
Engineering
April 7, 2026
Static vs Self-Improving Agents: Production Tradeoffs
A decision framework for choosing between static and self-improving agents in production — when the operational overhead of self-improvement is justified and when a well-tuned static agent wins.
AI AgentsArchitectureProduction
Engineering
April 4, 2026
5 Failure Modes in Self-Improving AI Agents
The five failure modes that appear most frequently in production self-improving agents — reflection collapse, memory poisoning, entity fragmentation, eval blindness, and skill entropy — and how to detect each one early.
AI AgentsDebuggingSelf-Improvement
Engineering
April 1, 2026
How to Evaluate Self-Improving AI Agents
Designing an eval harness for self-improving agents — what metrics to track, how to detect silent drift, and the minimum viable eval suite that tells you if the agent is actually getting better.
AI AgentsEvaluationBenchmarks
Engineering
March 29, 2026
Deploying Self-Improving Agents: Production Checklist
A production checklist for deploying self-improving agents — memory infrastructure requirements, observability gates, security controls, cost management, and the operational model for a running system.
AI AgentsProductionDeployment
Engineering
March 26, 2026
The Definitive Guide to Vibe Coding Platforms (2025)
An in-depth comparison of Claude Code, Cursor, v0.dev, Lovable, and Bolt.new — ranked and rated across every dimension that actually affects your workflow.
AIDeveloper ToolsVibe Coding
Engineering
March 23, 2026
Frameworks for Self-Improving Agents: A Comparison
LangGraph, AutoGen, CrewAI, and Haystack compared on memory abstractions, reflection support, MCP compatibility, and production readiness for self-improving agents.
AI AgentsLangGraphAutoGen
Engineering
March 20, 2026
Build a Self-Improving AI Agent in Python: Walkthrough
A step-by-step walkthrough for building a self-improving AI agent in Python with LangGraph, a typed memory store, a reflection loop with a real verifier, and a nightly benchmark harness.
PythonAI AgentsTutorial
Engineering
March 17, 2026
Knowledge Graphs for Agent Memory: Design Patterns
Concrete schema patterns, entity resolution strategies, and traversal techniques for modeling agent memory as a typed knowledge graph — with the tradeoffs that decide when a graph is worth it.
Knowledge GraphAgent MemoryAI Agents
Engineering
March 14, 2026
Memory Architectures for AI Agents: Vector, Graph, Hybrid
Vector, graph, and hybrid memory architectures for AI agents compared on recall, latency, and operational cost — with the failure modes each one hits in production.
AI AgentsAgent MemoryKnowledge Graph
Engineering
March 11, 2026
Reflection in AI Agents: How Self-Critique Actually Works
Reflexion, Self-Refine, and actor-critic architectures explained with benchmark data on where reflection improves agent performance and where it silently regresses it.
AI AgentsAgent ReflectionSelf-Improvement
Engineering
March 8, 2026
Self-Improving AI Agents: A Technical Overview
The four mechanisms behind self-improving agents, which are production-ready in 2026, and why memory is the bottleneck almost every implementation ignores.
AI AgentsOntologyAgent Memory

AI Agent Benchmarks: What Actually Matters

7 Mistakes Teams Make Evaluating AI Agents

LLM Evals vs Agent Evals: Key Differences

7 Mistakes Developers Make Building Ontologies for AI Agents

How to Design a Typed Schema for Agent Memory

MCP-Native Ontology: Connecting AI Agents to Structured Data

Knowledge Graphs vs. RAG for AI Agents: When to Use Which

What Is an Ontology for AI Agents?

Working at Oxagen: the builder’s mindset

Static vs Self-Improving Agents: Production Tradeoffs

5 Failure Modes in Self-Improving AI Agents

How to Evaluate Self-Improving AI Agents

Deploying Self-Improving Agents: Production Checklist

The Definitive Guide to Vibe Coding Platforms (2025)

Frameworks for Self-Improving Agents: A Comparison

Build a Self-Improving AI Agent in Python: Walkthrough

Knowledge Graphs for Agent Memory: Design Patterns

Memory Architectures for AI Agents: Vector, Graph, Hybrid

Reflection in AI Agents: How Self-Critique Actually Works

Self-Improving AI Agents: A Technical Overview