Text Generation | Respan Docs

<Note>Make sure you add Bearer prefix to your token</Note>

You can paste the command below into your terminal to run your first API request. Make sure to replace YOUR_RESPAN_API_KEY with your actual Respan API key.

Example Call

cURL

$ curl -X POST "https://api.respan.ai/api/generate/" \
> -H "Content-Type: application/json" \
> -H "Authorization: Bearer \{YOUR_RESPAN_API_KEY}" \
> -d '\{
>   "messages": [
>     \{
>       "role": "user",
>       "content": "Hello"
>     }
>   ],
>   "models": ["gpt-3.5-turbo", "gpt-3.5-turbo-16k"],
>   "stream": false,
>   "max_tokens": 100
> }'

Python

1 def demo_call(input,
2               model="claude-2" ,
3               token="a4EUZEcl.RmrDVwbTI8yOFZNuKwSwYnrdCc03Qn6Z",
4               stream=False
5               ):
6     headers = \{
7         'Content-Type': 'application/json',
8         'Authorization': f'Bearer \{token}',
9     }
10 
11     data = \{
12         'model': model,
13         'messages': [\{'role': 'user', 'content': input}],
14         "stream": stream,
15         "models": ["gpt-3.5-turbo", "gpt-3.5-turbo-16k"],
16     }
17 
18     response = requests.post('https://api.respan.ai/api/generate/', headers=headers, json=data, stream=stream)
19     return response
20 
21 messages = "Say 'Hello World'"
22 print(demo_call(messages).json())

TypeScript

1 // Define the function with TypeScript
2 fetch('https://api.respan.ai/api/generate/', \{
3   method: 'POST',
4   headers: \{
5     'Content-Type': 'application/json',
6     'Authorization': 'Bearer a4EUZEcl.RmrDVwbTI8yOFZNuKwSwYnrdCc03Qn6Z'
7   }
8     body: JSON.stringify(\{
9         models: ['gpt-3.5-turbo', 'gpt-3.5-turbo-16k'],
10         messages: [\{role: 'user', content: "Say 'Hello World'"}]
11     })
12 })
13 .then(response => response.json())
14 .then(data => console.log(data));

OpenAI Parameters

model string: Specify which model to use. See the list of model here

messages array required: List of messages to send to the endpoint in the OpenAI style, each of them following this format:

1 \{
2   "role": "user", // Available choices are user, system or assistant
3   "content": "Hello?"
4 }

Image Processing If you want to use the image processing feature, you need to use the following format to upload the image

{ “role”: “user”, “content”: [ { “type”: “text”, “text”: “What’s in this image?” }, { “type”: “image_url”, “image_url”: { “url”: “https://as1.ftcdn.net/v2/jpg/01/34/53/74/1000_F_134537443_VendrqyXIWyHrZgxdIsfyKUost734JDP.jpg” } } ] }

- `max_tokens` *number*: Maximum number of tokens to generate in the response
- `temperature` *number*: Controls randomness in the output in the range of 0-2, higher temperature will
a more random response.
- `n` *number*: Specify many completion choices to generate for each prompt.
\<strong>Caveat!\</strong> While this can help improve generation quality by picking the optimal choice, this could also lead to more token usage.
- `stream` *boolean*: Whether to stream back partial progress token by token
\{/* Add this to the concept page */}
- `logprobs` *boolean*: Include the log probabilities on each token being selected.
- `echo` *boolean*: Echo back the prompt in addition to the completion
\{/* Add this to the concept page */}
- `stop` *array[string]*: Stop sequence
\{/* Add this to the concept page */}
- `presence_penalty` *number*: Specify how much to penalize new tokens based on whether they appear in the
text so far. Increases the model's likelihood of talking about new topics
\{/* Add this to the concept page */}
- `frequency_penalty` *number*: Specify how much to penalize new tokens based on their existing frequency in
the text so far. Decreases the model's likelihood of repeating the same line
verbatim
\{/* Add this to the concept page */}
- `logit_bias` *dict*: Used to modify the probability of tokens appearing in the response
- `tools` *array[dict]*: A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide an array of functions the model may generate JSON inputs for.
```json
\{
    "type": "function",
    "function": \{
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": \{
        "type": "object",
        "properties": \{
          "location": \{
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": \{
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  }

tool_choice dict: Manually picking the choice of tool for the model to use. This will force the model to make a function call every time a function is passed in.

1 \{
2   "type": "function",
3   "function": \{"name": "name_of_the_function"},
4 }

Respan Parameters

request_breakdown boolean: Adding this returns the summarization of the response in the response body. If streaming is on, the metrics will be streamed as the last chunk

Regular Response

1 \{
2   "id": "chatcmpl-7476cf3f-fcc9-4902-a548-a12489856d8a",
3   //... main part of the response body ...
4   "request_breakdown": \{
5     "prompt_tokens": 6,
6     "completion_tokens": 9,
7     "cost": 4.8e-5,
8     "prompt_messages": [
9       \{
10         "role": "user",
11         "content": "How are you doing today?"
12       }
13     ],
14     "completion_message": \{
15       "content": " I'm doing well, thanks for asking!",
16       "role": "assistant"
17     },
18     "model": "claude-2",
19     "cached": false,
20     "timestamp": "2024-02-20T01:23:39.329729Z",
21     "status_code": 200,
22     "stream": false,
23     "latency": 1.8415491580963135,
24     "scores": \{},
25     "category": "Questions",
26     "metadata": \{},
27     "routing_time": 0.18612787732854486,
28     "full_request": \{
29       "messages": [
30         \{
31           "role": "user",
32           "content": "How are you doing today?"
33         }
34       ],
35       "model": "claude-2",
36       "logprobs": true
37     },
38     "sentiment_score": 0
39   }
40 }

Streaming Response

1 //... other chunks ...
2 // The following is the last chunk
3 \{
4   "id": "request_breakdown",
5   "choices": [
6     \{
7       "delta": \{ "content": null, "role": "assistant" },
8       "finish_reason": "stop",
9       "request_breakdown": \{
10         "prompt_tokens": 6,
11         "completion_tokens": 9,
12         "cost": 4.8e-5, //  In usd
13         "prompt_messages": [
14           \{
15             "role": "user",
16             "content": "How are you doing today?"
17           }
18         ],
19         "completion_message": \{
20           "content": " I'm doing well, thanks for asking!",
21           "role": "assistant"
22         },
23         "model": "claude-2",
24         "cached": false,
25         "timestamp": "2024-02-20T01:23:39.329729Z",
26         "status_code": 200,
27         "stream": false,
28         "latency": 1.8415491580963135, // in seconds
29         "scores": \{},
30         "category": "Questions",
31         "metadata": \{},
32         "routing_time": 0.18612787732854486, // in seconds
33         "full_request": \{
34           "messages": [
35             \{
36               "role": "user",
37               "content": "How are you doing today?"
38             }
39           ],
40           "model": "claude-2",
41           "logprobs": true
42         },
43         "sentiment_score": 0
44       },
45       "index": 0,
46       "message": \{ "content": null, "role": "assistant" }
47     }
48   ],
49   "created": 1706100589,
50   "model": "extra_parameter",
51   "object": "chat.completion.chunk",
52   "system_fingerprint": null,
53   "usage": \{}
54 }

models array: Specify the list of models for the router to choose between. If not specified,{” ”} all models will be used. See the list of models here

If only one model is specified, it will be treated as if model parameter is used and the router will not trigger.
fallback_models array: Specify the list of backup models (ranked by priority) to respond in case of a failure in the primary model. See the list of models here
customer_credentials object: You can pass in a dictionary of your customer’s credentials and deployment variables for supported providers and use their credits when the router is calling models from those providers.

1 \{
2   "openai": \{
3     "api_key": "sk-DEacrWTDndFYhdcLYKF6T3BlbkdfghuyTj4sYL2v1EDhg3iz5",
4     "some_vars": "some_values"
5   },
6   "provider_id": \{
7     "some_provider_var_names": "some_provider_var_values"
8   }
9 }

customer_identifier string: Use this as a tag to identify the user associated with the API call.
metadata dict: You can add any key-value pair to this metadata field for your reference, contact team@respan.ai if you need extra parameter support for your use case.

Example:

1 \{
2   "my_value_key": "my_value"
3 }

disable_log boolean: When set to true, only the request and the performance metrics will be recorded, input and output messages will be omitted from the log.
exclude_models array: The list of models to exclude from the router’s selection. See the list of models here

This only excludes models in the router, if model parameter will take precedence over this parameter, andfallback_models and safety net will still use the excluded models to catch failures.
exclude_providers array: The list of providers to exclude from the router’s selection. All models under the provider will be excluded. See the list of providers here

This only excludes models in the router, if model parameter will take precedence over this parameter, andfallback_models and safety net will still use the excluded models to catch failures.

Deprecated Parameters

customer_api_keys object: You can pass in a dictionary of your customer’s API keys for specific models. If the router selects a model that is in the dictionary, it will attempt to use the customer’s API key for calling the model before using your integration API key or Respan’s default API key.

1 \{
2   "gpt-3.5-turbo": "your_customer_api_key",
3   "gpt-4": "your_customer_api_key",
4 }

<Note>Make sure you add **Bearer** prefix to your token</Note> You can paste the command below into your terminal to run your first API request. Make sure to replace `YOUR_RESPAN_API_KEY` with your actual Respan API key. ## Example Call ```bash cURL curl -X POST "https://api.respan.ai/api/generate/" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer {YOUR_RESPAN_API_KEY}" \ -d '{ "messages": [ { "role": "user", "content": "Hello" } ], "models": ["gpt-3.5-turbo", "gpt-3.5-turbo-16k"], "stream": false, "max_tokens": 100 }' ``` ```python Python def demo_call(input, model="claude-2" , token="a4EUZEcl.RmrDVwbTI8yOFZNuKwSwYnrdCc03Qn6Z", stream=False ): headers = { 'Content-Type': 'application/json', 'Authorization': f'Bearer {token}', } data = { 'model': model, 'messages': [{'role': 'user', 'content': input}], "stream": stream, "models": ["gpt-3.5-turbo", "gpt-3.5-turbo-16k"], } response = requests.post('https://api.respan.ai/api/generate/', headers=headers, json=data, stream=stream) return response messages = "Say 'Hello World'" print(demo_call(messages).json()) ``` ```TypeScript TypeScript // Define the function with TypeScript fetch('https://api.respan.ai/api/generate/', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': 'Bearer a4EUZEcl.RmrDVwbTI8yOFZNuKwSwYnrdCc03Qn6Z' } body: JSON.stringify({ models: ['gpt-3.5-turbo', 'gpt-3.5-turbo-16k'], messages: [{role: 'user', content: "Say 'Hello World'"}] }) }) .then(response => response.json()) .then(data => console.log(data)); ``` ## OpenAI Parameters - `model` *string*: Specify which model to use. See the list of model [here](/integrations/overview/overview) - `messages` *array* **required**: List of messages to send to the endpoint in the [OpenAI style](https://platform.openai.com/docs/api-reference/chat/create#chat-create-messages), each of them following this format: ```json { "role": "user", // Available choices are user, system or assistant "content": "Hello?" } ``` Image Processing If you want to use the image processing feature, you need to use the following format to upload the image ```json { "role": "user", "content": [ { "type": "text", "text": "What’s in this image?" }, { "type": "image_url", "image_url": { "url": "https://as1.ftcdn.net/v2/jpg/01/34/53/74/1000_F_134537443_VendrqyXIWyHrZgxdIsfyKUost734JDP.jpg" } } ] } ``` - `max_tokens` *number*: Maximum number of tokens to generate in the response - `temperature` *number*: Controls randomness in the output in the range of 0-2, higher temperature will a more random response. - `n` *number*: Specify many completion choices to generate for each prompt. Caveat! While this can help improve generation quality by picking the optimal choice, this could also lead to more token usage. - `stream` *boolean*: Whether to stream back partial progress token by token {/* Add this to the concept page */} - `logprobs` *boolean*: Include the log probabilities on each token being selected. - `echo` *boolean*: Echo back the prompt in addition to the completion {/* Add this to the concept page */} - `stop` *array[string]*: Stop sequence {/* Add this to the concept page */} - `presence_penalty` *number*: Specify how much to penalize new tokens based on whether they appear in the text so far. Increases the model's likelihood of talking about new topics {/* Add this to the concept page */} - `frequency_penalty` *number*: Specify how much to penalize new tokens based on their existing frequency in the text so far. Decreases the model's likelihood of repeating the same line verbatim {/* Add this to the concept page */} - `logit_bias` *dict*: Used to modify the probability of tokens appearing in the response - `tools` *array[dict]*: A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide an array of functions the model may generate JSON inputs for. ```json { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location"] } } } ``` - `tool_choice` *dict*: Manually picking the choice of tool for the model to use. This will force the model to make a function call every time a function is passed in. ```json { "type": "function", "function": {"name": "name_of_the_function"}, } ``` ## Respan Parameters - `request_breakdown` *boolean*: Adding this returns the summarization of the response in the response body. If streaming is on, the metrics will be streamed as the last chunk Regular Response ```json { "id": "chatcmpl-7476cf3f-fcc9-4902-a548-a12489856d8a", //... main part of the response body ... "request_breakdown": { "prompt_tokens": 6, "completion_tokens": 9, "cost": 4.8e-5, "prompt_messages": [ { "role": "user", "content": "How are you doing today?" } ], "completion_message": { "content": " I'm doing well, thanks for asking!", "role": "assistant" }, "model": "claude-2", "cached": false, "timestamp": "2024-02-20T01:23:39.329729Z", "status_code": 200, "stream": false, "latency": 1.8415491580963135, "scores": {}, "category": "Questions", "metadata": {}, "routing_time": 0.18612787732854486, "full_request": { "messages": [ { "role": "user", "content": "How are you doing today?" } ], "model": "claude-2", "logprobs": true }, "sentiment_score": 0 } } ``` Streaming Response ```json //... other chunks ... // The following is the last chunk { "id": "request_breakdown", "choices": [ { "delta": { "content": null, "role": "assistant" }, "finish_reason": "stop", "request_breakdown": { "prompt_tokens": 6, "completion_tokens": 9, "cost": 4.8e-5, // In usd "prompt_messages": [ { "role": "user", "content": "How are you doing today?" } ], "completion_message": { "content": " I'm doing well, thanks for asking!", "role": "assistant" }, "model": "claude-2", "cached": false, "timestamp": "2024-02-20T01:23:39.329729Z", "status_code": 200, "stream": false, "latency": 1.8415491580963135, // in seconds "scores": {}, "category": "Questions", "metadata": {}, "routing_time": 0.18612787732854486, // in seconds "full_request": { "messages": [ { "role": "user", "content": "How are you doing today?" } ], "model": "claude-2", "logprobs": true }, "sentiment_score": 0 }, "index": 0, "message": { "content": null, "role": "assistant" } } ], "created": 1706100589, "model": "extra_parameter", "object": "chat.completion.chunk", "system_fingerprint": null, "usage": {} } ``` - `models` *array*: Specify the list of models for the router to choose between. If not specified,{" "} all models will be used. See the list of models [here](/integrations/overview/overview) If only one model is specified, it will be treated as if `model` parameter is used and the router will not trigger. - `fallback_models` *array*: Specify the list of backup models (ranked by priority) to respond in case of a failure in the primary model. See the list of models [here](/integrations/overview/overview) - `customer_credentials` *object*: You can pass in a dictionary of your customer's credentials and deployment variables for [supported providers](/integrations/overview/overview) and use their credits when the router is calling models from those providers. ```json { "openai": { "api_key": "sk-DEacrWTDndFYhdcLYKF6T3BlbkdfghuyTj4sYL2v1EDhg3iz5", "some_vars": "some_values" }, "provider_id": { "some_provider_var_names": "some_provider_var_values" } } ``` - `customer_identifier` *string*: Use this as a tag to identify the user associated with the API call. - `metadata` *dict*: You can add any key-value pair to this metadata field for your reference, contact team@respan.ai if you need extra parameter support for your use case. Example: ```json { "my_value_key": "my_value" } ``` - `disable_log` *boolean*: When set to true, only the request and the [performance metrics](/documentation/features/monitoring/metrics) will be recorded, input and output messages will be omitted from the log. - `exclude_models` *array*: The list of models to exclude from the router's selection. See the list of models [here](/integrations/overview/overview) This only excludes models in the router, if `model` parameter will take precedence over this parameter, and`fallback_models` and [safety net](/documentation/features/monitoring/notifications/subscribe-alerts) will still use the excluded models to catch failures. - `exclude_providers` *array*: The list of providers to exclude from the router's selection. All models under the provider will be excluded. See the list of providers [here](/integrations/overview/overview) This only excludes models in the router, if `model` parameter will take precedence over this parameter, and`fallback_models` and [safety net](/documentation/features/monitoring/notifications/subscribe-alerts) will still use the excluded models to catch failures. ## Deprecated Parameters - `customer_api_keys` *object*: You can pass in a dictionary of your customer's API keys for specific models. If the router selects a model that is in the dictionary, it will attempt to use the customer's API key for calling the model before using your [integration API key](/documentation/admin/llm-provider-keys) or Respan's default API key. ```json { "gpt-3.5-turbo": "your_customer_api_key", "gpt-4": "your_customer_api_key", } ```

Authentication

AuthorizationBearer

API key authentication. Get your API key from https://platform.respan.ai/platform/api-keys

Request

This endpoint expects an object.

Response

Successful response for Text Generation

rolestring

contentlist of objects

Errors

401

Unauthorized Error