Compresr provides an API and open-source proxy for compressing LLM context at two levels: coarse-grained (selecting relevant chunks) and fine-grained (token-level compression within chunks). Part of YC W2026, it was founded by a team of four EPFL researchers: Ivan Zakazov (CEO, PhD dropout, published at EMNLP and NeurIPS), Oussama Gabouj (CTO, EMNLP 2025 paper on prompt compression), Berke Argin (CAIO, ex-UBS), and Kamel Charaf (COO, ex-Bell Labs).
The system claims up to 200x compression on aggressive RAG workloads without quality loss, with a default 50% token reduction. Their Context Gateway is an open-source Go proxy that sits between AI agents and LLM providers, compressing tool outputs and conversation history before tokens reach the model. It integrates with Claude Code, OpenClaw, and Codex.
On their SEC filing benchmark (141 questions across 79 filings up to 230K tokens each), Compresr compressed ~106K tokens to ~10.5K while improving accuracy from 72.3% to 74.5% using GPT-5.2 — a 76% cost reduction with better results. The team's peer-reviewed publications at NeurIPS and EMNLP on prompt compression give them the strongest academic credentials in the compression space.
Free trial available
Teams building RAG systems with long contexts
Compresr reduces LLM input costs through context compression while Respan monitors output quality and performance. Together they optimize both sides of the LLM call.
Top companies in RAG Frameworks you can use instead of Compresr.
Companies from adjacent layers in the AI stack that work well with Compresr.
Last verified: March 27, 2026