Compresr — RAG Frameworks Platform

RAG FrameworksLayer 2Unknown

Founded 2026|San Francisco, CA|2-10

What is Compresr?

Compresr provides an API and open-source proxy for compressing LLM context at two levels: coarse-grained (selecting relevant chunks) and fine-grained (token-level compression within chunks). Part of YC W2026, it was founded by a team of four EPFL researchers: Ivan Zakazov (CEO, PhD dropout, published at EMNLP and NeurIPS), Oussama Gabouj (CTO, EMNLP 2025 paper on prompt compression), Berke Argin (CAIO, ex-UBS), and Kamel Charaf (COO, ex-Bell Labs).

The system claims up to 200x compression on aggressive RAG workloads without quality loss, with a default 50% token reduction. Their Context Gateway is an open-source Go proxy that sits between AI agents and LLM providers, compressing tool outputs and conversation history before tokens reach the model. It integrates with Claude Code, OpenClaw, and Codex.

On their SEC filing benchmark (141 questions across 79 filings up to 230K tokens each), Compresr compressed ~106K tokens to ~10.5K while improving accuracy from 72.3% to 74.5% using GPT-5.2 — a 76% cost reduction with better results. The team's peer-reviewed publications at NeurIPS and EMNLP on prompt compression give them the strongest academic credentials in the compression space.

Key Features

✓Context compression
✓Accuracy preservation
✓Long context optimization
✓RAG enhancement

Pros & Cons

Pros

+Strongest academic credentials in compression with NeurIPS and EMNLP publications
+Four-person founding team from EPFL reduces single-founder risk
+Open-source Context Gateway creates community adoption funnel
+Two-level compression (coarse + fine-grained) is more sophisticated than token-only approaches
+SEC filing benchmark demonstrates real enterprise RAG improvement with measurable results

Cons

-No disclosed pricing for the paid API tier
-No named customers or revenue metrics shared publicly
-Competes directly with The Token Company on overlapping value proposition
-200x compression claim is for aggressive workloads — default is 50%

Compresr Pricing

Free trial available

Context GatewayFree (open source)

✓Go proxy
✓Agent integration
✓Basic compression

APIContact for pricing

✓Coarse + fine-grained compression
✓Up to 200x compression
✓Python SDK
✓Enterprise support

View official pricing page

Common Use Cases

Teams building RAG systems with long contexts

•Long document processing
•RAG context optimization
•Token-efficient retrieval
•Context window management

Using Compresr with Respan

Compresr reduces LLM input costs through context compression while Respan monitors output quality and performance. Together they optimize both sides of the LLM call.

✓Verify compression maintains output quality using Respan evaluations
✓Track cost savings from Compresr alongside total LLM spend in Respan
✓Monitor end-to-end RAG pipeline performance from compression to response via Respan

Monitor compressed RAG pipelines with Respan

Best CompresrAlternatives & Competitors

Top companies in RAG Frameworks you can use instead of Compresr.

RAGFlowRAG Frameworks

UnstructuredRAG Frameworks

LlamaIndexRAG Frameworks

HaystackRAG Frameworks

ReductoRAG Frameworks

PathwayRAG Frameworks

Carbon (Perplexity)RAG Frameworks

VectaraRAG Frameworks

R2RRAG Frameworks

DoclingRAG Frameworks

ChunkrRAG Frameworks

CaptainRAG Frameworks

WhyHowRAG Frameworks

View all Compresralternatives →

Compare Compresr

Compresr vs RAGFlow Compresr vs Unstructured Compresr vs LlamaIndex Compresr vs Haystack Compresr vs Reducto

Best Integrations for Compresr

Companies from adjacent layers in the AI stack that work well with Compresr.

NVIDIAInference & Compute

RespanLLM Gateways

Anthropic MCPMCP Tooling

LangChainAgent Frameworks

llama.cppInference & Compute

OpenClawAgent Frameworks

AutoGPTAgent Frameworks

ReplitNo-Code AI Builders

ZapierWorkflow Automation

CoreWeaveInference & Compute

LangGraphAgent Frameworks

Open WebUINo-Code AI Builders

Last verified: March 27, 2026