Memory that performs
Memory that performs
We benchmarked Memori against leading memory frameworks on LoCoMo. Memori wins on accuracy without sacrificing costs.
Our key metrics
Our key metrics
Memori gives you production grade accuracy at a fraction of the cost, sitting within only 6 points of full-context.
81.95%
Outperforms Zep, LangMem, and Mem0.
95%
Less tokens compared to full-context approach.
What is LoCoMo?
What is LoCoMo?
The Long Conversation Memory (LoCoMo) benchmark was built specifically to test an agent's ability to track, retain, and synthesize information across multi-session chat histories.
Single-hop
Single-hop
Direct recall of a specific fact from a single point in the conversation.
Multi-hop
Multi-hop
Connecting facts across multiple sessions to answer compound questions.
Temporal
Temporal
Tracking how a user's situation has evolved across sessions over time.
Open-domain
Open-domain
Broad questions that require synthesizing scattered context across many sessions.
Overall accuracy scores
Overall accuracy scores
We ran the same benchmark across four systems with the same judge. Here's how they ranked.
Built for production efficiency
Built for production efficiency
Token cost compounds fast in long-running agents. Memori keeps context lean by design using 1,294 tokens per query vs 26,000 for full context.
Accuracy without the token costs
Accuracy without the token costs
Memori reaches near full-context accuracy while keeping token usage lean and predictable.
A fraction of the cost
A fraction of the cost
Memori uses 1,294 tokens per query vs. 26,000 for full-context. At scale, that's a 20x reduction in inference costs, roughly $0.001 per call on GPT-4.1-mini. For long-running agents, that difference compounds fast.
Minimal context footprint
Minimal context footprint
Larger context windows don't just cost more, they increase the risk of "lost in the middle" hallucinations. Memori keeps context to 5% of full-context, keeping responses reliable as conversations grow.
Dual-layered memory
Dual-layered memory
Semantic triples capture exact facts for precise recall while conversation summaries provide the narrative flow. Each triple links back to the summary it came from, so granular facts are never divorced from their broader context.
Facts, not noise
Facts, not noise
Memori feeds the LLM exact facts with no surrounding noise, effectively isolating high-signal knowledge. This precision drives an 81.95% overall score, within 6 points of passing the full conversation history.
How Memori stacks up
How Memori stacks up
We compared the factual accuracy and reasoning capabilities of Memori configurations against state-of-the-art baselines and a full-context ceiling.
How we got there
How we got there
Each question was answered using GPT-4.1-mini, conditioned on facts and summaries retrieved from Memori. We utilized an LLM-as-a-Judge methodology to provide an assessment across four dimensions: factual accuracy, relevance, completeness, and contextual appropriateness.
The architecture behind the score
The architecture behind the score
Step 1
Session input
Session input
New messages from the current conversation session are continuously fed into the Advanced Augmentation engine.
Step 2
Summary loop
Summary loop
The system maintains an evolving summary of the conversation, feeding context back and forth for processing.
Step 3
Memory extraction
Memory extraction
New facts are extracted and stored as memories with the updated summary in the memory database.
Don't take our word for it? Run it yourself.
Don't take our word for it? Run it yourself.
Conduct the same tests, use the same judge, and reproduce the results yourself.