End-to-end traces for every LLM call, agent step, and conversation thread - so you can debug fast, cut costs, and catch regressions before users do.
100%
of requests captured - no sampling
<80ms
P99 ingestion latency
Real-time
search across all requests
2 lines
of code to instrument
Every team that ships LLM features without tracing eventually hits the same walls. Here's what breaks first.
✗ You can't see what your LLM is doing in production
Without structured logs, you have no record of what inputs were sent, what outputs came back, how long it took, or what it cost. Debugging is guesswork.
✗ Bad responses have no trail - just a complaint ticket
When a user reports a bad output, there's nothing to look at. No input, no context, no chain of calls. You can't reproduce, isolate, or fix what you can't see.
✗ You have no idea which features are driving LLM spend
Without per-request cost attribution, you can't tell if one user, one feature, or one model variant is responsible for your monthly bill spiking.
✗ Multi-step agent pipelines are a black box
Agents make dozens of sequential LLM calls. Without distributed tracing, you can't see where latency accumulates, where failures occur, or which step produced bad output.
✗ Regressions surface in user complaints, not dashboards
Without latency alerts and quality trend tracking, a model degradation or prompt regression goes unnoticed until users start churning.
Full tracing from the moment you add the SDK. No infrastructure to deploy, no custom schema to define.
The Respan SDK wraps your existing LLM client. Every call is intercepted, logged asynchronously, and available for search and analysis in under a second. For agent workloads, spans are linked via context propagation - the full execution tree is reconstructed automatically from the trace context attached to each call.
Instrument once
Wrap your OpenAI client with the Respan SDK or point your LangChain callback to Respan. Takes two lines.
Every call is logged
Inputs, outputs, latency, cost, tokens, and custom metadata are captured asynchronously - no impact on response time.
Spans are linked
For multi-step agents, span context is propagated automatically. Respan reconstructs the full trace tree.
Search and alert
Logs are searchable in real-time. Set dashboards for cost and latency trends, and configure threshold alerts.
import openai
from respan import Logger
logger = Logger(api_key="YOUR_RESPAN_KEY")
# Option 1: Wrap the OpenAI client directly
client = logger.wrap(openai.OpenAI(api_key="sk-..."))
# Now every call is logged automatically
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "..."}],
# Optionally tag this request
extra_body={
"respan_params": {
"customer_identifier": "user-123",
"metadata": {"feature": "document-qa"}
}
}
)
# Option 2: Auto-instrument LangChain
from respan.integrations.langchain import RespanCallbackHandler
callbacks = [RespanCallbackHandler(api_key="YOUR_RESPAN_KEY")]
# Pass callbacks= to any LangChain chain or agentIf a user reports a bad response: filter logs by user ID, find the exact request, and see the full input, output, and trace in seconds - not after a 30-minute log trawl.
If you need to know why your LLM bill doubled: break down spend by user, feature, model, and environment. Find the two users driving 40% of your costs.
If you're running multi-step agents: view the full span tree to see where latency accumulates, which tool calls fail, and what data each step passed to the next.
If you're switching from GPT-4o to a cheaper model: compare latency, cost, and output quality side-by-side using real production traffic from before and after the change.
If you have latency or error rate commitments: set up alerts on P95 latency and error rate. Get notified before users notice a degradation.
Model providers
Frameworks
Languages
General APM tools track latency and error rates, but they don't understand LLM semantics. They can't parse token counts, show cost-per-request, reconstruct an agent trace from span context, or let you search the content of a prompt. You'd have to build all of that as a custom layer on top. Respan ships it all, already wired up for LLM workloads, with zero custom schema work.