Real-time dashboards for cost, latency, token usage, and error rates - so you always know what your AI is doing, what it costs, and where it breaks.
Real-time
dashboard updates
Per-user
cost and usage breakdowns
Custom
alerts on any metric
Zero config
metrics from day one
Teams running LLMs in production need answers to basic questions about cost, performance, and usage. Without metrics, those answers don't exist.
✗ You don't know what your AI features actually cost
Without per-request cost tracking, your LLM bill is a single number on a monthly invoice. You can't tell which feature, user, or model is driving the spend.
✗ Latency problems are invisible until users complain
Without P50/P95/P99 latency tracking, you have no idea if response times are degrading. You find out when users leave, not when the numbers move.
✗ Token usage is a mystery
Without token-level analytics, you can't optimize prompts for efficiency. You don't know which prompts consume the most tokens or which conversations run long.
✗ Error rates spike silently
Without error rate dashboards and alerting, a provider outage or rate limit increase goes unnoticed. Errors accumulate while you're looking elsewhere.
✗ Trends are impossible to track without a baseline
Without historical data and trend charts, you can't answer 'is this getting better or worse?' - the most basic question for any production system.
Every metric is collected automatically from your traced requests. No extra instrumentation, no custom dashboards to build.
Metrics are computed automatically from every traced request in your Respan project. When you instrument your LLM client with the Respan SDK, cost, latency, token counts, and status codes are captured on every call. These are aggregated into time-series metrics and made available in dashboards, alerts, and the API - with no additional configuration.
Instrument your client
Add the Respan SDK to your LLM client. Every request is traced with cost, latency, tokens, and metadata automatically.
Metrics are computed
Respan aggregates per-request data into time-series metrics - cost, latency percentiles, token usage, and error rates.
Dashboards update live
Open the Respan dashboard to see real-time charts. Filter by model, user, feature, or environment.
Set alerts and export
Configure threshold alerts on any metric. Export data via API for custom reporting or BI integration.
import openai
from respan import Logger
logger = Logger(api_key="YOUR_RESPAN_KEY")
# Wrap your client - metrics are collected automatically
client = logger.wrap(openai.OpenAI(api_key="sk-..."))
# Tag requests with metadata for metric breakdowns
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "..."}],
extra_body={
"respan_params": {
"customer_identifier": "user-123",
"metadata": {
"feature": "document-qa",
"environment": "production",
"team": "search"
}
}
}
)
# Metrics are now available in the dashboard:
# - Cost per request, per user, per feature
# - Latency percentiles (P50, P95, P99)
# - Token usage (prompt + completion)
# - Error rates by provider and status code
# - Trends over time for all of the aboveIf your LLM bill is growing faster than your user base: break down cost by feature, model, and user. Identify which prompts are expensive and which users drive disproportionate spend.
If you need to maintain response time SLAs: track P95 latency by model and feature. Set alerts when latency exceeds your threshold so you can act before users notice.
If you're scaling an AI product: use token usage trends and request volume charts to forecast API costs and plan budget allocation across models and providers.
If you're evaluating a model switch: compare cost, latency, and error rates between models using the same production traffic. Make the decision with data, not guesses.
If stakeholders need visibility into AI operations: export dashboards and metrics summaries for weekly reviews. Show cost trends, usage growth, and quality improvements over time.
Data sources
Export to
Alert channels
Provider dashboards show aggregate usage for their platform only. They can't break down cost by feature, compare metrics across providers, or correlate cost with your application's business dimensions. If you use multiple models or providers, you're stitching together data from different dashboards with different schemas. Respan gives you a single view across all providers, with the breakdowns that matter for your product.