What is LLM observability on Respan?

The practice of tracking requests, tokens, errors, latency, and cost across your LLM applications — then filtering logs and traces, saving views, and deploying monitors when metrics cross thresholds. On Respan, metrics come from the same instrumented traffic as tracing and evals.

What metrics does Respan track?

LLM usage metrics include total requests, token usage, errors, latency, and cost. Break down analytics by model, user, API key, and prompt for visibility into how your application runs in production.

Views are saved filter configurations. Apply filters on Logs, Traces, Users, Prompts, or other surfaces — by status, model, user, timestamp, or custom properties — then save the setup as a view and reuse it with one click.

How do monitors work?

A monitor watches a metric over time and sends a notification when it crosses a threshold. Configure trigger (metric, source, threshold, time window), optional Where conditions to narrow by model or environment, and notifications to email, Slack, or a webhook. Deploy from Monitors or create one from a dashboard chart.

How is this different from tracing?

Tracing captures span-level input, output, and hierarchy for debugging a single run. Metrics aggregate across requests for trends and alerts; views and monitors sit on the same Logs and Traces data. Use tracing to debug, metrics and monitors to know when something regresses at scale.

Can I alert on cost or latency?

Yes. Monitors support metrics such as error rate, cost, token usage, and latency — with Where conditions on model, project, environment, user, or other event attributes. Send test alerts before deploy, then manage versions from the monitors list.

How do custom properties work with filters?

metadata on requests becomes custom properties on logs and traces — the same keys you filter and group by in the UI when you build and save views on Logs or Traces.

How does observability connect to evals and the gateway?

Gateway requests, traces, and eval scores share customer_identifier and metadata. Filter production traffic on Logs or Traces, save views for recurring investigations, and monitor cost or error rate while evaluators catch quality regressions on the same spans.

AI Observability for
Production LLM Apps

On the same platform as tracing, evals, and the gateway, metrics, saved views on logs and traces, and threshold monitors.

Try Respan for free Metrics product

Production observability on Respan

Metrics on the dashboard, saved views on filtered surfaces, and monitors when thresholds breach — on the same logs and traces you instrument for debugging.

Dashboards for LLM usage

See requests, tokens, errors, latency, and cost in one place, broken down by model, user, and API key.

See the full agent run

See LLM and tool steps in one trace tree on the same traffic, broken down by span and workflow.

Alert when metrics breach

Watch error rate, cost, latency, or tokens over time and alert Slack, email, or a webhook on breach.

Respan observability capabilities

Metrics, views, and monitors on one surface — tied to tracing, evals, and the gateway.

Usage metrics

Performance monitoring for response times and model behavior. Cost management to find expensive prompts. Quality and debugging tie back to full sessions on Logs and Traces.

Filter and save

Build filters on Logs, Traces, Users, or Prompts; then Save as view. Update filters in a view with Save or Save as new when your investigation changes.

Alert on thresholds

Trigger: When [metric] of [source] is [threshold condition] over [time window]. Narrow with Where on model, project, environment, or user. Send test alerts before deploy.

What production observability looks like

Ingest instrumented traffic, aggregate metrics on the dashboard, then alert with views and monitors when something moves.

Ingest

Use the Respan SDK or OTLP so every request feeds metrics, logs, and traces on the same platform.

Aggregate

Track requests, tokens, errors, latency, and cost on the dashboard, sliced by model, user, and API key.

Alert

Save filters as views, then deploy monitors when error rate, cost, latency, or tokens cross a threshold.

What breaks production observability setups

Six gaps teams hit without metrics, saved views, and threshold monitors on the same logs and traces they debug.

Totals without breakdown.

A single monthly bill with no per-model or per-feature chart. Use dashboard breakdowns by model, user, API key, and prompt to see what drives spend.

Filters rebuilt every incident.

Same status, model, and custom-property filters typed again for each outage. Save as view on Logs or Traces and apply it in one click.

Alerts only on HTTP errors.

LLM responses can succeed at the protocol level but regress on cost or latency. Monitor cost, tokens, and latency, not only error rate.

No shared view of prod traffic.

Engineers and PMs use different ad hoc filters. Named views (for example production errors or high-cost requests) align the team on one filter set.

Monitors too broad.

Org-wide thresholds page constantly. Add Where conditions on model, project, environment, or user so alerts match the slice you own.

Metrics disconnected from traces.

A chart spikes but you cannot open the failing runs. On Respan, the same filters on Logs and Traces back the metrics you chart and monitor.

How to set up observability

Instrument traffic, review dashboard metrics, save views on Logs and Traces, then deploy monitors with notifications.

Instrument traffic

Respan SDK or OTLP on LLM clients and agents so logs and traces feed metrics.

Review the dashboard

Track requests, tokens, errors, latency, cost by model, user, API key, prompt.

Save views

Filter Logs or Traces, then Save as view for filters you reuse every week.

Deploy monitors

Set trigger, Where, and notifications, test alert, then Deploy from Monitors.

import os
from openai import OpenAI
from respan import Respan

os.environ["RESPAN_API_KEY"] = "YOUR_RESPAN_API_KEY"
Respan()

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "customer_identifier": "user_123",
        "metadata": {"feature": "support_bot", "environment": "production"},
    },
)
print(response.choices[0].message.content)

Observability failure modes at production scale

Alerts, time windows, and cost visibility: what teams configure before monitors become noise or blind spots.

Alert without context

A monitor fires but on-call cannot find the runs. Pair each monitor with a saved view on Logs or Traces using the same Where conditions.

Wrong time window

A five-minute spike missed on a one-hour window. Match the monitor time window to how fast metrics move. Test with Send test alert.

Cost blind spot

Error rate flat while token spend doubles. Add a cost or token monitor with breakdown by model and feature metadata.

Respan is committed to maintaining compliance with the most rigorous international safety and security standards.

ISO 27001

Respan is fully compliant with ISO 27001, the internationally recognized standard for information security management.

SOC 2

We meet SOC 2 requirements to ensure secure and compliant management of data across all our systems.

GDPR

With operations designed for global compliance, we operate under GDPR - the world's strictest standard for data privacy.

HIPAA

Respan is HIPAA compliant with a Business Associate Agreement available for healthcare organizations.

Works with your entire stack

Use Respan with your favorite frameworks and tools.

Respan

Frequently asked questions

Beyond observability: gateway, evals, prompt ops

Observability is the foundation. Three adjacent disciplines build on top of it:

LLM gateway — a unified proxy across providers with fallback, caching, and budget controls.
LLM evals — offline and online evaluation pipelines that turn quality into a number you can ship against.
Prompt management and optimization — versioned prompts, experiments, and rollback without redeploys.

Engineering deep-dives

Domain-specific observability patterns and failure modes we have written about:

Anatomy of an LLM call. What to log on every request, from prompt hash to token cost.
LLM workflows and tracing. Span design for multi-step agents and tool chains.
Clinical AI hallucination. How regulated healthcare deployments detect and gate output.
Legal AI hallucination (2026). Citation grounding and review queues for legal LLM apps.
Choose the right AI stack. How observability slots into the broader build versus buy decision.

Related guides: LLM tracing · LLM evals · AI gateway

Built for AI agents.
Break less.
Ship more.

Start for free Get a demo