What metrics are tracked automatically?

Cost (per request and aggregated), latency (time to first token and total), token usage (prompt and completion), error rates (by status code and provider), and request volume. All metrics can be broken down by model, user, feature, environment, or any custom metadata tag.

Do I need to add extra code beyond the SDK?

No. If you've instrumented your LLM client with the Respan SDK for tracing, all metrics are collected automatically. The same two lines of code that enable tracing also enable full metrics collection.

How quickly do metrics update?

Metrics update in real time. Dashboard charts refresh as new requests are traced. Alerts evaluate continuously and fire within seconds of a threshold being crossed.

Can I set alerts on specific dimensions?

Yes. You can set alerts on any metric filtered by any dimension - for example, 'alert me if P95 latency for the document-qa feature exceeds 3 seconds' or 'alert me if daily cost for user X exceeds $50'.

Can I export metrics to my BI tools?

Yes. The metrics API returns time-series data in JSON format. You can pull it into BigQuery, Snowflake, Looker, or any BI tool. Webhooks can push metric events in real time for streaming pipelines.

How far back does historical data go?

Metric retention depends on your plan. Free plans retain 30 days of metric history. Paid plans retain 90 days or more, with custom retention available on enterprise plans.

What's the difference between Metrics and Tracing?

Tracing captures the full detail of every request - inputs, outputs, spans, and metadata. Metrics aggregate that data into numbers: cost totals, latency percentiles, token counts, and error rates over time. Tracing is for debugging individual requests; metrics are for understanding trends and setting alerts.

LLM Metrics

Real-time dashboards for cost, latency, token usage, and error rates - so you always know what your AI is doing, what it costs, and where it breaks.

Start free View docs

Trusted in production

Real-time

dashboard updates

Per-user

cost and usage breakdowns

Custom

alerts on any metric

Zero config

metrics from day one

What you can't measure right now

Teams running LLMs in production need answers to basic questions about cost, performance, and usage. Without metrics, those answers don't exist.

✗ You don't know what your AI features actually cost

Without per-request cost tracking, your LLM bill is a single number on a monthly invoice. You can't tell which feature, user, or model is driving the spend.

✗ Latency problems are invisible until users complain

Without P50/P95/P99 latency tracking, you have no idea if response times are degrading. You find out when users leave, not when the numbers move.

✗ Token usage is a mystery

Without token-level analytics, you can't optimize prompts for efficiency. You don't know which prompts consume the most tokens or which conversations run long.

✗ Error rates spike silently

Without error rate dashboards and alerting, a provider outage or rate limit increase goes unnoticed. Errors accumulate while you're looking elsewhere.

✗ Trends are impossible to track without a baseline

Without historical data and trend charts, you can't answer 'is this getting better or worse?' - the most basic question for any production system.

What you get

Every metric is collected automatically from your traced requests. No extra instrumentation, no custom dashboards to build.

→Track total and per-request cost across all models and providers in real time
→Monitor P50, P95, and P99 latency with breakdowns by model, feature, and environment
→Analyze token usage patterns - prompt tokens, completion tokens, and total per request
→Track error rates by provider, model, status code, and error type
→Break down all metrics by user, feature, model, environment, or any custom metadata tag
→View trend charts over hours, days, weeks, or months to spot regressions and improvements
→Set threshold alerts on any metric - get notified when cost spikes, latency regresses, or errors increase
→Compare metrics across time periods to measure the impact of model changes or prompt updates
→Export metrics via API for integration with existing BI tools and reporting systems
→Build custom dashboard views with the metrics that matter most to your team

How it works

Metrics are computed automatically from every traced request in your Respan project. When you instrument your LLM client with the Respan SDK, cost, latency, token counts, and status codes are captured on every call. These are aggregated into time-series metrics and made available in dashboards, alerts, and the API - with no additional configuration.

Instrument your client

Add the Respan SDK to your LLM client. Every request is traced with cost, latency, tokens, and metadata automatically.

Metrics are computed

Respan aggregates per-request data into time-series metrics - cost, latency percentiles, token usage, and error rates.

Dashboards update live

Open the Respan dashboard to see real-time charts. Filter by model, user, feature, or environment.

Set alerts and export

Configure threshold alerts on any metric. Export data via API for custom reporting or BI integration.

import openai
from respan import Logger

logger = Logger(api_key="YOUR_RESPAN_KEY")

# Wrap your client - metrics are collected automatically
client = logger.wrap(openai.OpenAI(api_key="sk-..."))

# Tag requests with metadata for metric breakdowns
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "..."}],
    extra_body={
        "respan_params": {
            "customer_identifier": "user-123",
            "metadata": {
                "feature": "document-qa",
                "environment": "production",
                "team": "search"
            }
        }
    }
)

# Metrics are now available in the dashboard:
# - Cost per request, per user, per feature
# - Latency percentiles (P50, P95, P99)
# - Token usage (prompt + completion)
# - Error rates by provider and status code
# - Trends over time for all of the above

Who uses this and how

Cost optimization

If your LLM bill is growing faster than your user base: break down cost by feature, model, and user. Identify which prompts are expensive and which users drive disproportionate spend.

Performance monitoring

If you need to maintain response time SLAs: track P95 latency by model and feature. Set alerts when latency exceeds your threshold so you can act before users notice.

Capacity planning

If you're scaling an AI product: use token usage trends and request volume charts to forecast API costs and plan budget allocation across models and providers.

Model comparison

If you're evaluating a model switch: compare cost, latency, and error rates between models using the same production traffic. Make the decision with data, not guesses.

Executive reporting

If stakeholders need visibility into AI operations: export dashboards and metrics summaries for weekly reviews. Show cost trends, usage growth, and quality improvements over time.

Works with your stack

Data sources

All traced LLM requests
OpenAI
Anthropic
Google Gemini / Vertex AI
Groq
Mistral
Together AI
Any OpenAI-compatible endpoint

Export to

REST API (JSON)
Webhook (real-time)
CSV export
S3 / GCS
BigQuery
Snowflake
Any BI tool via API

Alert channels

Email
Slack
PagerDuty
Webhook
Respan dashboard
API polling

No metrics vs Respan Metrics

ConcernWithout metricsWith Respan

Cost visibilityMonthly invoice, no breakdownPer-request, per-user, per-feature cost tracking

Latency trackingNone, or manual instrumentationP50/P95/P99 with model and feature breakdowns

Token usageEstimated from billingExact prompt and completion tokens per request

Error monitoringApplication logs, if you check themReal-time error rate dashboards with alerting

Trend analysisSpreadsheets from monthly invoicesTime-series charts with hourly to monthly granularity

AlertingNone, or custom scriptsThreshold alerts on any metric, any dimension

Why not use provider dashboards?

Provider dashboards show aggregate usage for their platform only. They can't break down cost by feature, compare metrics across providers, or correlate cost with your application's business dimensions. If you use multiple models or providers, you're stitching together data from different dashboards with different schemas. Respan gives you a single view across all providers, with the breakdowns that matter for your product.

Frequently asked questions

Also in Respan

Tracing →Evaluations →Prompt optimization →AI gateway →

Built for AI agents.
Break less.
Ship more.

Start for free Get a demo

What you can't measure right now

Teams running LLMs in production need answers to basic questions about cost, performance, and usage. Without metrics, those answers don't exist.

✗ You don't know what your AI features actually cost

Without per-request cost tracking, your LLM bill is a single number on a monthly invoice. You can't tell which feature, user, or model is driving the spend.

✗ Latency problems are invisible until users complain

Without P50/P95/P99 latency tracking, you have no idea if response times are degrading. You find out when users leave, not when the numbers move.

✗ Token usage is a mystery

Without token-level analytics, you can't optimize prompts for efficiency. You don't know which prompts consume the most tokens or which conversations run long.

✗ Error rates spike silently

Without error rate dashboards and alerting, a provider outage or rate limit increase goes unnoticed. Errors accumulate while you're looking elsewhere.

✗ Trends are impossible to track without a baseline

Without historical data and trend charts, you can't answer 'is this getting better or worse?' - the most basic question for any production system.

What you get

Every metric is collected automatically from your traced requests. No extra instrumentation, no custom dashboards to build.

→Track total and per-request cost across all models and providers in real time

→Monitor P50, P95, and P99 latency with breakdowns by model, feature, and environment

→Analyze token usage patterns - prompt tokens, completion tokens, and total per request

→Track error rates by provider, model, status code, and error type

→Break down all metrics by user, feature, model, environment, or any custom metadata tag

→View trend charts over hours, days, weeks, or months to spot regressions and improvements

→Set threshold alerts on any metric - get notified when cost spikes, latency regresses, or errors increase

→Compare metrics across time periods to measure the impact of model changes or prompt updates

→Export metrics via API for integration with existing BI tools and reporting systems

→Build custom dashboard views with the metrics that matter most to your team

How it works

Instrument your client

Add the Respan SDK to your LLM client. Every request is traced with cost, latency, tokens, and metadata automatically.

Metrics are computed

Respan aggregates per-request data into time-series metrics - cost, latency percentiles, token usage, and error rates.

Dashboards update live

Open the Respan dashboard to see real-time charts. Filter by model, user, feature, or environment.

Set alerts and export

Configure threshold alerts on any metric. Export data via API for custom reporting or BI integration.

import openai from respan import Logger logger = Logger(api_key="YOUR_RESPAN_KEY") # Wrap your client - metrics are collected automatically client = logger.wrap(openai.OpenAI(api_key="sk-...")) # Tag requests with metadata for metric breakdowns response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "..."}], extra_body={ "respan_params": { "customer_identifier": "user-123", "metadata": { "feature": "document-qa", "environment": "production", "team": "search" } } } ) # Metrics are now available in the dashboard: # - Cost per request, per user, per feature # - Latency percentiles (P50, P95, P99) # - Token usage (prompt + completion) # - Error rates by provider and status code # - Trends over time for all of the above

Who uses this and how

Cost optimization

If your LLM bill is growing faster than your user base: break down cost by feature, model, and user. Identify which prompts are expensive and which users drive disproportionate spend.

Performance monitoring

If you need to maintain response time SLAs: track P95 latency by model and feature. Set alerts when latency exceeds your threshold so you can act before users notice.

Capacity planning

If you're scaling an AI product: use token usage trends and request volume charts to forecast API costs and plan budget allocation across models and providers.

Model comparison

If you're evaluating a model switch: compare cost, latency, and error rates between models using the same production traffic. Make the decision with data, not guesses.

Executive reporting

If stakeholders need visibility into AI operations: export dashboards and metrics summaries for weekly reviews. Show cost trends, usage growth, and quality improvements over time.

No metrics vs Respan Metrics

ConcernWithout metricsWith Respan

Cost visibilityMonthly invoice, no breakdownPer-request, per-user, per-feature cost tracking

Latency trackingNone, or manual instrumentationP50/P95/P99 with model and feature breakdowns

Token usageEstimated from billingExact prompt and completion tokens per request

Error monitoringApplication logs, if you check themReal-time error rate dashboards with alerting

Trend analysisSpreadsheets from monthly invoicesTime-series charts with hourly to monthly granularity

AlertingNone, or custom scriptsThreshold alerts on any metric, any dimension

Why not use provider dashboards?

Frequently asked questions

LLM Metrics

Trusted in production

What you can't measure right now

What you get

How it works

Who uses this and how

Works with your stack

No metrics vs Respan Metrics

Why not use provider dashboards?

Frequently asked questions

Frequently asked questions

What metrics are tracked automatically?

Do I need to add extra code beyond the SDK?

How quickly do metrics update?

Can I set alerts on specific dimensions?

Can I export metrics to my BI tools?

How far back does historical data go?

What's the difference between Metrics and Tracing?

Also in Respan

Built for AI agents. Break less. Ship more.

LLM Metrics

Trusted in production

What you can't measure right now

What you get

How it works

Who uses this and how

Works with your stack

No metrics vs Respan Metrics

Why not use provider dashboards?

Frequently asked questions

Frequently asked questions

What metrics are tracked automatically?

Do I need to add extra code beyond the SDK?

How quickly do metrics update?

Can I set alerts on specific dimensions?

Can I export metrics to my BI tools?

How far back does historical data go?

What's the difference between Metrics and Tracing?

Also in Respan

Built for AI agents. Break less. Ship more.

Built for AI agents.
Break less.
Ship more.

Built for AI agents.
Break less.
Ship more.