Add the Docs MCP to your AI coding tool to get help building with Respan. No API key needed. {
"mcpServers" : {
"respan-docs" : {
"url" : "https://docs.respan.ai/mcp"
}
}
}
HuggingFace Transformers is the leading open-source library for running state-of-the-art machine learning models locally. It provides pipelines for text generation, summarization, translation, and more. Respan can auto-instrument all Transformers calls for tracing and observability.
HuggingFace Transformers runs models locally, so only Tracing setup is available. Gateway routing is not applicable for local models.
Setup
Install packages
pip install respan-ai opentelemetry-instrumentation-transformers transformers torch python-dotenv
Set environment variables
export RESPAN_API_KEY = "YOUR_RESPAN_API_KEY"
# Optional: for gated models on HuggingFace Hub
# export HF_TOKEN="YOUR_HUGGINGFACE_TOKEN"
No provider API key needed for public models — Transformers runs models locally.
Initialize and run
import os
from dotenv import load_dotenv
load_dotenv()
from transformers import pipeline
from respan import Respan
# Auto-discover and activate all installed instrumentors (Traceloop)
respan = Respan( is_auto_instrument = True )
# Create a text generation pipeline (runs locally)
generator = pipeline( "text-generation" , model = "gpt2" )
# Calls run locally, auto-traced by Respan
output = generator( "Say hello in three languages:" , max_new_tokens = 100 )
print (output[ 0 ][ "generated_text" ])
respan.flush()
View your trace
Open the Traces page to see your auto-instrumented LLM spans.
Configuration
Parameter Type Default Description api_keystr | NoneNoneFalls back to RESPAN_API_KEY env var. base_urlstr | NoneNoneFalls back to RESPAN_BASE_URL env var. is_auto_instrumentbool | NoneFalseAuto-discover and activate all installed instrumentors via OpenTelemetry entry points. customer_identifierstr | NoneNoneDefault customer identifier for all spans. metadatadict | NoneNoneDefault metadata attached to all spans. environmentstr | NoneNoneEnvironment tag (e.g. "production").
Attributes
Attach customer identifiers, thread IDs, and metadata to spans.
In Respan()
Set defaults at initialization — these apply to all spans.
from respan import Respan
respan = Respan(
is_auto_instrument = True ,
customer_identifier = "user_123" ,
metadata = { "service" : "local-inference" , "version" : "1.0.0" },
)
With propagate_attributes
Override per-request using a context manager.
from transformers import pipeline
from respan import Respan, workflow, propagate_attributes
respan = Respan(
is_auto_instrument = True ,
metadata = { "service" : "local-inference" , "version" : "1.0.0" },
)
generator = pipeline( "text-generation" , model = "gpt2" )
@workflow ( name = "handle_request" )
def handle_request ( user_id : str , question : str ):
with propagate_attributes(
customer_identifier = user_id,
thread_identifier = "conv_001" ,
metadata = { "plan" : "pro" },
):
output = generator(question, max_new_tokens = 100 )
print (output[ 0 ][ "generated_text" ])
Attribute Type Description customer_identifierstrIdentifies the end user in Respan analytics. thread_identifierstrGroups related messages into a conversation. metadatadictCustom key-value pairs. Merged with default metadata.
Decorators
Use @workflow and @task to create structured trace hierarchies.
from transformers import pipeline
from respan import Respan, workflow, task
respan = Respan( is_auto_instrument = True )
generator = pipeline( "text-generation" , model = "gpt2" )
@task ( name = "generate_outline" )
def outline ( topic : str ) -> str :
output = generator( f "Create a brief outline about: { topic } " , max_new_tokens = 200 )
return output[ 0 ][ "generated_text" ]
@workflow ( name = "content_pipeline" )
def run_pipeline ( topic : str ):
plan = outline(topic)
output = generator( f "Write content from this outline: { plan } " , max_new_tokens = 300 )
print (output[ 0 ][ "generated_text" ])
run_pipeline( "Benefits of API gateways" )
respan.flush()
Examples
Basic pipeline
generator = pipeline( "text-generation" , model = "gpt2" )
output = generator( "Say hello in three languages:" , max_new_tokens = 100 )
print (output[ 0 ][ "generated_text" ])