Memory that performs

We benchmarked Memori against leading memory frameworks on LoCoMo. Memori wins on accuracy without sacrificing costs.

Our key metrics

Memori gives you production grade accuracy at a fraction of the cost, sitting within only 6 points of full-context.

81.95%

Outperforms Zep, LangMem, and Mem0.

95%

Less tokens compared to full-context approach.

What is LoCoMo?

The Long Conversation Memory (LoCoMo) benchmark was built specifically to test an agent's ability to track, retain, and synthesize information across multi-session chat histories.

Single-hop

Direct recall of a specific fact from a single point in the conversation.

Multi-hop

Connecting facts across multiple sessions to answer compound questions.

Temporal

Tracking how a user's situation has evolved across sessions over time.

Open-domain

Broad questions that require synthesizing scattered context across many sessions.

Overall accuracy scores

We ran the same benchmark across four systems with the same judge. Here's how they ranked.

Built for production efficiency

Token cost compounds fast in long-running agents. Memori keeps context lean by design using 1,294 tokens per query vs 26,000 for full context.

Accuracy without the token costs

Memori reaches near full-context accuracy while keeping token usage lean and predictable.

A fraction of the cost

Memori uses 1,294 tokens per query vs. 26,000 for full-context. At scale, that's a 20x reduction in inference costs, roughly $0.001 per call on GPT-4.1-mini. For long-running agents, that difference compounds fast.

Minimal context footprint

Larger context windows don't just cost more, they increase the risk of "lost in the middle" hallucinations. Memori keeps context to 5% of full-context, keeping responses reliable as conversations grow.

Dual-layered memory

Semantic triples capture exact facts for precise recall while conversation summaries provide the narrative flow. Each triple links back to the summary it came from, so granular facts are never divorced from their broader context.

Facts, not noise

Memori feeds the LLM exact facts with no surrounding noise, effectively isolating high-signal knowledge. This precision drives an 81.95% overall score, within 6 points of passing the full conversation history.

How Memori stacks up

We compared the factual accuracy and reasoning capabilities of Memori configurations against state-of-the-art baselines and a full-context ceiling.

How we got there

Each question was answered using GPT-4.1-mini, conditioned on facts and summaries retrieved from Memori. We utilized an LLM-as-a-Judge methodology to provide an assessment across four dimensions: factual accuracy, relevance, completeness, and contextual appropriateness.

The architecture behind the score

Step 1

Session input

New messages from the current conversation session are continuously fed into the Advanced Augmentation engine.

Step 2

Summary loop

The system maintains an evolving summary of the conversation, feeding context back and forth for processing.

Step 3

Memory extraction

New facts are extracted and stored as memories with the updated summary in the memory database.

Don't take our word for it? Run it yourself.

Conduct the same tests, use the same judge, and reproduce the results yourself.

Our key metrics

Our key metrics

What is LoCoMo?

What is LoCoMo?

Single-hop

Single-hop

Multi-hop

Multi-hop

Temporal

Temporal

Open-domain

Open-domain

Overall accuracy scores

Overall accuracy scores

Built for production efficiency

Built for production efficiency

Accuracy without the token costs

Accuracy without the token costs

A fraction of the cost

A fraction of the cost

Minimal context footprint

Minimal context footprint

Dual-layered memory

Dual-layered memory

Facts, not noise

Facts, not noise

How Memori stacks up

How Memori stacks up

How we got there

How we got there

01Question input01Question input

01Question input

01Question input

02Memory retrieval02Memory retrieval

02Memory retrieval

02Memory retrieval

03Context construction03Context construction

03Context construction

03Context construction

04Answer generation04Answer generation

04Answer generation

04Answer generation

05Evaluation05Evaluation

05Evaluation

05Evaluation

06Metric aggregation06Metric aggregation

06Metric aggregation

06Metric aggregation

The architecture behind the score

The architecture behind the score

Session input

Session input

Summary loop

Summary loop

Memory extraction

Memory extraction

Don't take our word for it? Run it yourself.

Don't take our word for it? Run it yourself.