How Mem0 improved memory accuracy and 99.99% reliability to become the #1 memory layer

About Mem0

Mem0 is the universal, self-improving memory layer for LLM applications. It helps teams build personalized AI experiences by storing and retrieving long-term memory for agents, improving user experience and reducing repeated token spend.

At Series A scale, Mem0 processed hundreds of millions of logs daily. Traffic could reach hundreds of millions of tokens per minute, so monitoring had to be fast, reliability had to be consistent in production, and the team needed clear observability as usage grew.

The challenge

Mem0 was not running small prototypes. Its memory system powered real users and production workloads, where downtime was costly and monitoring had to be fast.

As traffic scaled to millions of requests across multiple LLM providers, the team faced three problems. First, they needed a reliable way to monitor and improve inconsistent memory retrieval results. Second, they needed user-level cost tracking and analysis to understand how performance changes affected spend. Third, they needed to trace sessions across environments so they could isolate and reduce errors across diverse models and deployments.

Why Respan

Mem0 needed reliability and visibility across multiple LLM providers, and Respan became their unified gateway.

Respan standardized requests and logs across OpenAI, Anthropic, and open-weight models. Full observability, including complete error payloads and latency traces, let engineers pinpoint failures quickly. With native SDK integration, logs from Mem0's memory layer flowed directly into Respan for end-to-end tracing.

Beyond the platform, the partnership delivered a 99.99% uptime SLA after infrastructure rewrites, rapid iteration on product requests like thread identifier columns, filters, and stable API keys, plus hands-on integration support for Mem0's production workflows.

How Mem0 uses Respan

Unified gateway for multi-provider LLMs

Mem0 relies on multiple LLM providers for its universal memory layer. Respan served as a unified gateway for all these calls, standardizing logs and metadata across environments and enforcing consistent routing behavior.

At Mem0's scale, providers can throttle requests or return intermittent failures. Respan helped make production behavior more predictable by supporting stable rate limiting controls, automatic retries with backoff, and fallbacks across providers and models. When a request failed or hit a rate limit, Mem0 could recover without manual intervention, while still capturing complete error payloads and routing details.

The dashboard became a daily tool for engineers because they could see full error payloads, execution metadata, and thread-level traces in one place.

Deep observability for memory performance

Respan provided clear visibility into what was happening under the hood. Mem0 uses it to monitor latency, token usage, and cost across the different workloads in a memory system.

Pipeline step	What Mem0 monitors
Retrieval	Memory hit rate and relevance, retrieval latency, token usage and cost, and timeout or provider error rates
Synthesis	Output quality and consistency, context length and truncation, synthesis latency, and cost per response
Extraction	Fact precision and schema validity, deduplication rates, write latency, and failure rates on updates
Orchestration	End-to-end session latency, routing and fallback rates, retry behavior under rate limits, and cost distribution across providers/models

This supported Mem0's goals of improving memory retrieval accuracy, running user-level cost analysis, and reducing cross-environment error rates.

Session-level grouping with `event_id`

Mem0 uses Respan's flat logging API endpoint to send high-volume logs without needing complicated tracing instrumentation. To group calls, Mem0 attaches an event_id as a custom property and uses it to link all LLM calls that belong to a single pipeline run.

With Respan's grouping views, including threads, traces, and custom identifiers, the team can pull up every call for a single event_id in one place, then filter and compare by environment, user, or experiment.

Clean separation of test and production environments

Mem0 keeps a clean separation between test and production traffic in Respan. That separation makes it safe to evaluate changes to retrieval and routing logic against real traffic patterns without polluting production analytics.

Results

With Respan powering observability and monitoring, Mem0 achieved production-grade reliability while continuing to scale its universal memory platform.

They reported 99.99% reliability with an under 1% error rate across hundreds of millions of daily logs. Engineers traced failures from user session to thread to provider in seconds, with fewer blind spots and less guesswork. Mem0 also benchmarked about 90% token-cost savings and a 91% latency reduction in its retrieval system, with Respan helping keep those gains consistent in production at scale (Mem0 Research 2024). Mem0's $23.9M Series A funding further underscored investor confidence in the platform's scalability and operational maturity. Finally, structured logs captured through Respan fed directly into Mem0's self-improving memory training pipeline, enabling continuous model improvement based on real production traffic.

Quote

"Respan has been key in helping us scale to hundreds of millions of requests with reliable observability into our LLM calls and failure rates. The team is incredibly responsive, and the founders, Andy and Raymond, have even supported us at 2 a.m., a true sign of their commitment."

Deshraj Yadav, Co-founder and CTO of Mem0

Future plans

Mem0 plans to expand its use of Respan for deeper visibility and evaluation. As Mem0 scales its memory systems, it will increasingly rely on Respan tracing and dataset insights to evaluate model performance, monitor long-term memory accuracy, and maintain reliability across environments.

About Mem0

The challenge

Mem0 was not running small prototypes. Its memory system powered real users and production workloads, where downtime was costly and monitoring had to be fast.

Why Respan

Mem0 needed reliability and visibility across multiple LLM providers, and Respan became their unified gateway.

How Mem0 uses Respan

Unified gateway for multi-provider LLMs

The dashboard became a daily tool for engineers because they could see full error payloads, execution metadata, and thread-level traces in one place.

Deep observability for memory performance

Respan provided clear visibility into what was happening under the hood. Mem0 uses it to monitor latency, token usage, and cost across the different workloads in a memory system.

Pipeline step	What Mem0 monitors
Retrieval	Memory hit rate and relevance, retrieval latency, token usage and cost, and timeout or provider error rates
Synthesis	Output quality and consistency, context length and truncation, synthesis latency, and cost per response
Extraction	Fact precision and schema validity, deduplication rates, write latency, and failure rates on updates
Orchestration	End-to-end session latency, routing and fallback rates, retry behavior under rate limits, and cost distribution across providers/models

This supported Mem0's goals of improving memory retrieval accuracy, running user-level cost analysis, and reducing cross-environment error rates.

Session-level grouping with `event_id`

Clean separation of test and production environments

Results

With Respan powering observability and monitoring, Mem0 achieved production-grade reliability while continuing to scale its universal memory platform.

Quote

"Respan has been key in helping us scale to hundreds of millions of requests with reliable observability into our LLM calls and failure rates. The team is incredibly responsive, and the founders, Andy and Raymond, have even supported us at 2 a.m., a true sign of their commitment."

Deshraj Yadav, Co-founder and CTO of Mem0

How Mem0 improved memory accuracy and 99.99% reliability to become the #1 memory layer

About Mem0

The challenge

Why Respan

How Mem0 uses Respan

Unified gateway for multi-provider LLMs

Deep observability for memory performance

Session-level grouping with `event_id`

Clean separation of test and production environments

Results

Quote

Future plans

You might also like

How Retell AI scaled monitoring from 1 to 1M+ hourly LLM calls

How Apten debugs and deploys LLM models 10x faster with Respan

How Octolane AI cut LLM debugging time by 90% with Respan

Built for AI agents.
Break less.
Ship more.

How Mem0 improved memory accuracy and 99.99% reliability to become the #1 memory layer

About Mem0

The challenge

Why Respan

How Mem0 uses Respan

Unified gateway for multi-provider LLMs

Deep observability for memory performance

Session-level grouping with `event_id`

Clean separation of test and production environments

Results

Quote

Future plans

You might also like

How Retell AI scaled monitoring from 1 to 1M+ hourly LLM calls

How Apten debugs and deploys LLM models 10x faster with Respan

How Octolane AI cut LLM debugging time by 90% with Respan

Built for AI agents.
Break less.
Ship more.

How Mem0 improved memory accuracy and 99.99% reliability to become the #1 memory layer

About Mem0

The challenge

Why Respan

How Mem0 uses Respan

Unified gateway for multi-provider LLMs

Deep observability for memory performance

Session-level grouping with event_id

Clean separation of test and production environments

Results

Quote

Future plans

You might also like

How Retell AI scaled monitoring from 1 to 1M+ hourly LLM calls

How Apten debugs and deploys LLM models 10x faster with Respan

How Octolane AI cut LLM debugging time by 90% with Respan

Built for AI agents. Break less. Ship more.

How Mem0 improved memory accuracy and 99.99% reliability to become the #1 memory layer

About Mem0

The challenge

Why Respan

How Mem0 uses Respan

Unified gateway for multi-provider LLMs

Deep observability for memory performance

Session-level grouping with event_id

Clean separation of test and production environments

Results

Quote

Future plans

You might also like

How Retell AI scaled monitoring from 1 to 1M+ hourly LLM calls

How Apten debugs and deploys LLM models 10x faster with Respan

How Octolane AI cut LLM debugging time by 90% with Respan

Built for AI agents. Break less. Ship more.

Session-level grouping with `event_id`

Built for AI agents.
Break less.
Ship more.

Session-level grouping with `event_id`

Built for AI agents.
Break less.
Ship more.