Google Gen AI

Use Google Gen AI SDK with Respan
  1. Sign up — Create an account at platform.respan.ai
  2. Create an API key — Generate one on the API keys page
  3. Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page

Add the Docs MCP to your AI coding tool to get help building with Respan. No API key needed.

1{
2 "mcpServers": {
3 "respan-docs": {
4 "url": "https://docs.respan.ai/mcp"
5 }
6 }
7}

What is Google Gen AI SDK?

Respan is compatible with the official Google Gen AI SDK, enabling you to use Google’s Gemini models through our gateway with full observability, monitoring, and advanced features.

This integration is for the Respan gateway.

Example projects:

Quickstart

1

Step 1: Install the SDK

Install the official Google Gen AI SDK for Python.

$pip install google-GenAI
2

Step 2: Initialize the client

Initialize the client with your Respan API key and set the base URL to Respan’s endpoint.

1from google import GenAI
2import os
3
4client = GenAI.Client(
5 api_key=os.environ.get("RESPAN_API_KEY"),
6 http_options={
7 "base_url": "https://api.respan.ai/api/google/gemini",
8 }
9)

The base_url can be either https://api.respan.ai/api/google/gemini or https://endpoint.respan.ai/api/google/gemini.

3

Step 3: Make your first request

Now you can use the client to make requests to Google’s models.

1response = client.models.generate_content(
2 model="gemini-2.5-flash",
3 contents="Hello, world!",
4)
5
6print(response.text)
4

Step 4: Switch models

To switch between different Google models, simply change the model parameter.

1response = client.models.generate_content(
2 model="gemini-2.0-flash-exp",
3 contents="Tell me a joke.",
4)
5

Step 5: Configure parameters

Use GenerateContentConfig to control model behavior with various parameters.

1from google.GenAI import types
2
3config = types.GenerateContentConfig(
4 temperature=0.9,
5 top_k=1,
6 top_p=1,
7 max_output_tokens=2048,
8)
9
10response = client.models.generate_content(
11 model="gemini-2.5-flash",
12 contents="What is the capital of France?",
13 config=config,
14)
6

Step 6: Advanced configuration

Here’s a comprehensive example showcasing various parameters, including system instructions, safety settings, and tools.

1from google import GenAI
2from google.GenAI import types
3import os
4
5client = GenAI.Client(
6 api_key=os.environ.get("RESPAN_API_KEY"),
7 http_options={
8 "base_url": "https://api.respan.ai/api/google/gemini",
9 }
10)
11
12# Example: Configure tools for grounding
13grounding_tool = types.Tool(
14 google_search=types.GoogleSearch()
15)
16
17# Example: Comprehensive GenerateContentConfig showcasing various parameters
18config = types.GenerateContentConfig(
19 # System instruction to guide the model's behavior
20 system_instruction="You are a helpful assistant that provides accurate, concise information about sports events.",
21
22 # Sampling parameters
23 temperature=0.7, # Controls randomness (0.0-1.0). Lower = more focused, Higher = more creative
24 top_p=0.95, # Nucleus sampling. Tokens with cumulative probability up to this value are considered
25 top_k=40, # Top-k sampling. Considers this many top tokens at each step
26
27 # Output controls
28 max_output_tokens=1024, # Maximum number of tokens in the response
29 stop_sequences=["\n\n\n"], # Sequences that will stop generation
30
31 # Tools and function calling
32 tools=[grounding_tool], # Enable Google Search grounding
33
34 # Thinking configuration (for models that support it)
35 thinking_config=types.ThinkingConfig(thinking_budget=0), # Disables thinking mode
36
37 # Response format options
38 # response_mime_type="application/json", # Uncomment for JSON output
39 # response_schema=types.Schema( # Uncomment to enforce structured output
40 # type=types.Type.OBJECT,
41 # properties={
42 # "winner": types.Schema(type=types.Type.STRING),
43 # "year": types.Schema(type=types.Type.INTEGER)
44 # }
45 # ),
46
47 # Safety settings
48 safety_settings=[
49 types.SafetySetting(
50 category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
51 threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
52 ),
53 types.SafetySetting(
54 category=types.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
55 threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
56 )
57 ],
58
59 # Diversity controls
60 presence_penalty=0.0, # Penalize tokens based on presence in text (-2.0 to 2.0)
61 frequency_penalty=0.0, # Penalize tokens based on frequency in text (-2.0 to 2.0)
62
63 # Reproducibility
64 # seed=42, # Uncomment to make responses more deterministic
65
66 # Logprobs (for token analysis)
67 # response_logprobs=True, # Uncomment to get log probabilities
68 # logprobs=5, # Number of top candidate tokens to return logprobs for
69)
70
71response = client.models.generate_content(
72 model="gemini-2.5-flash",
73 contents="Who won the euro 2024?",
74 config=config,
75)
76
77print(response.text)

Configuration Parameters

The GenerateContentConfig supports a wide range of parameters to control model behavior:

System Instructions

  • system_instruction: Sets the role and behavior guidelines for the model. This helps maintain consistent personality and response style throughout the conversation.

Sampling Parameters

  • temperature (0.0-1.0): Controls randomness in responses. Lower values (0.0-0.3) make output more focused and deterministic, while higher values (0.7-1.0) increase creativity and variation.
  • top_p (0.0-1.0): Nucleus sampling parameter. The model considers tokens with cumulative probability up to this value. Lower values make responses more focused.
  • top_k: Limits the number of highest probability tokens considered at each step. Helps balance between creativity and coherence.

Output Controls

  • max_output_tokens: Maximum number of tokens in the generated response. Helps control response length and costs.
  • stop_sequences: Array of strings that will stop generation when encountered. Useful for controlling output format.

Tools and Grounding

  • tools: Array of tools the model can use, such as Google Search for grounding responses in real-time information.
  • google_search: Enables the model to search the web for up-to-date information before generating responses.

Thinking Configuration

  • thinking_config: Controls the model’s internal reasoning process for models that support thinking mode.
  • thinking_budget: Amount of tokens allocated for internal reasoning. Set to 0 to disable thinking mode.

Structured Output

  • response_mime_type: Specify the output format (e.g., “application/json” for JSON responses).
  • response_schema: Define the exact structure of JSON output using a schema. Ensures responses follow a specific format.

Safety Settings

  • safety_settings: Array of safety configurations to filter harmful content across different categories:
    • HARM_CATEGORY_HATE_SPEECH: Hate speech and discriminatory content
    • HARM_CATEGORY_DANGEROUS_CONTENT: Dangerous or harmful instructions
    • HARM_CATEGORY_HARASSMENT: Harassment and bullying
    • HARM_CATEGORY_SEXUALLY_EXPLICIT: Sexually explicit content

Threshold options:

  • BLOCK_NONE: Don’t block any content
  • BLOCK_ONLY_HIGH: Block only high-severity content
  • BLOCK_MEDIUM_AND_ABOVE: Block medium and high-severity content
  • BLOCK_LOW_AND_ABOVE: Block low, medium, and high-severity content

Diversity Controls

  • presence_penalty (-2.0 to 2.0): Penalizes tokens based on whether they appear in the text. Positive values encourage the model to talk about new topics.
  • frequency_penalty (-2.0 to 2.0): Penalizes tokens based on their frequency in the text. Positive values reduce repetition.

Reproducibility

  • seed: Integer value for deterministic output. Using the same seed with identical inputs will produce similar outputs (not guaranteed to be exactly identical due to model updates).

Token Analysis

  • response_logprobs: When enabled, returns log probabilities for generated tokens. Useful for analyzing model confidence.
  • logprobs: Number of top candidate tokens to return log probabilities for at each position.