| Company | Mem0 |
| Industry | AI infrastructure and LLM memory |
| Use case | Universal memory layer for LLM applications |
| Stage | Series A |
| Website | mem0.ai |
| Reliability | Error Rate |
|---|---|
| 99.99% | Under 1% |
About Mem0
Mem0 is the universal, self-improving memory layer for LLM applications. It helps teams build personalized AI experiences by storing and retrieving long-term memory for agents, improving user experience and reducing repeated token spend.
At Series A scale, Mem0 processed hundreds of millions of logs daily. Traffic could reach hundreds of millions of tokens per minute, so monitoring had to be fast, reliability had to be consistent in production, and the team needed clear observability as usage grew.
The challenge
Mem0 was not running small prototypes. Its memory system powered real users and production workloads, where downtime was costly and monitoring had to be fast.
As traffic scaled to millions of requests across multiple LLM providers, the team faced three problems. First, they needed a reliable way to monitor and improve inconsistent memory retrieval results. Second, they needed user-level cost tracking and analysis to understand how performance changes affected spend. Third, they needed to trace sessions across environments so they could isolate and reduce errors across diverse models and deployments.
Why Respan
Mem0 needed reliability and visibility across multiple LLM providers, and Respan became their unified gateway.
Respan standardized requests and logs across OpenAI, Anthropic, and open-weight models. Full observability, including complete error payloads and latency traces, let engineers pinpoint failures quickly. With native SDK integration, logs from Mem0's memory layer flowed directly into Respan for end-to-end tracing.
Beyond the platform, the partnership delivered a 99.99% uptime SLA after infrastructure rewrites, rapid iteration on product requests like thread identifier columns, filters, and stable API keys, plus hands-on integration support for Mem0's production workflows.
How Mem0 uses Respan
Unified gateway for multi-provider LLMs
Mem0 relies on multiple LLM providers for its universal memory layer. Respan served as a unified gateway for all these calls, standardizing logs and metadata across environments and enforcing consistent routing behavior.
At Mem0's scale, providers can throttle requests or return intermittent failures. Respan helped make production behavior more predictable by supporting stable rate limiting controls, automatic retries with backoff, and fallbacks across providers and models. When a request failed or hit a rate limit, Mem0 could recover without manual intervention, while still capturing complete error payloads and routing details.
The dashboard became a daily tool for engineers because they could see full error payloads, execution metadata, and thread-level traces in one place.
Deep observability for memory performance
Respan provided clear visibility into what was happening under the hood. Mem0 uses it to monitor latency, token usage, and cost across the different workloads in a memory system.
| Pipeline step | What Mem0 monitors |
|---|---|
| Retrieval | Memory hit rate and relevance, retrieval latency, token usage and cost, and timeout or provider error rates |
| Synthesis | Output quality and consistency, context length and truncation, synthesis latency, and cost per response |
| Extraction | Fact precision and schema validity, deduplication rates, write latency, and failure rates on updates |
| Orchestration | End-to-end session latency, routing and fallback rates, retry behavior under rate limits, and cost distribution across providers/models |
This supported Mem0's goals of improving memory retrieval accuracy, running user-level cost analysis, and reducing cross-environment error rates.
Session-level grouping with event_id
Mem0 uses Respan's flat logging API endpoint to send high-volume logs without needing complicated tracing instrumentation. To group calls, Mem0 attaches an event_id as a custom property and uses it to link all LLM calls that belong to a single pipeline run.
With Respan's grouping views, including threads, traces, and custom identifiers, the team can pull up every call for a single event_id in one place, then filter and compare by environment, user, or experiment.
Clean separation of test and production environments
Mem0 keeps a clean separation between test and production traffic in Respan. That separation makes it safe to evaluate changes to retrieval and routing logic against real traffic patterns without polluting production analytics.
Results
With Respan powering observability and monitoring, Mem0 achieved production-grade reliability while continuing to scale its universal memory platform.
They reported 99.99% reliability with an under 1% error rate across hundreds of millions of daily logs. Engineers traced failures from user session to thread to provider in seconds, with fewer blind spots and less guesswork. Mem0 also benchmarked about 90% token-cost savings and a 91% latency reduction in its retrieval system, with Respan helping keep those gains consistent in production at scale (Mem0 Research 2024). Mem0's $23.9M Series A funding further underscored investor confidence in the platform's scalability and operational maturity. Finally, structured logs captured through Respan fed directly into Mem0's self-improving memory training pipeline, enabling continuous model improvement based on real production traffic.
Quote
"Respan has been key in helping us scale to hundreds of millions of requests with reliable observability into our LLM calls and failure rates. The team is incredibly responsive, and the founders, Andy and Raymond, have even supported us at 2 a.m., a true sign of their commitment."
Deshraj Yadav, Co-founder and CTO of Mem0
Future plans
Mem0 plans to expand its use of Respan for deeper visibility and evaluation. As Mem0 scales its memory systems, it will increasingly rely on Respan tracing and dataset insights to evaluate model performance, monitor long-term memory accuracy, and maintain reliability across environments.



