For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DiscordPlatform
DocumentationIntegrationsAPI referenceSDKsChangelog
DocumentationIntegrationsAPI referenceSDKsChangelog
  • Get started
    • Overview
    • Trace your first call
    • Run your first eval
    • Use gateway & prompts
    • Live demo
  • Observability
    • Users
  • Gateway
      • Fallback
      • Load balancing
      • Retries
    • Limits
  • Admin
    • API keys
    • Provider keys
    • Workspaces & projects
    • Collaborate
  • Resources
  • Security & Support
    • Support
    • Status
LogoLogo
DiscordPlatform
On this page
  • Load balancing between models
  • Load balancing between deployments
GatewayReliability

Load balancing

Distribute traffic across models and deployments with configurable weights.
Was this page helpful?
Previous

Retries

Automatically retry failed requests with configurable attempts and backoff.
Next
Built with

For the complete list of all request parameters, see API reference.


Load balancing allows you to balance the request load across different deployments. You can specify weights for each deployment based on their rate limit and your preference.

Load balancing between models

1

Go to the Load balancing page

Go to the Load balancing page and click on Create new load balancer

Load balancing group
2

Add models

Click Add model to add models and specify the weight for each model and add your own credentials.

3

Copy group ID to your codebase

After you have added the models, copy the group ID (the blue text) to your codebase and use it in your requests.

The model parameter will overwrite the load_balance_group!
1{
2 "messages": [
3 {
4 "role": "user",
5 "content": "Hi, how are you?"
6 }
7 ],
8 "load_balance_group": {
9 "group_id":"THE_GROUP_ID"
10 }
11}
4

Add load balancing group in code (Optional)

You can also add the load balancing group in your codebase directly. The models field will overwrite the load_balance_group you specified in the UI.

Example code
1{
2 "load_balance_group": {
3 "group_id":"THE_GROUP_ID",
4 "models": [
5 {
6 "model": "azure/gpt-35-turbo",
7 "weight": 1
8 },
9 {
10 "model": "azure/gpt-4",
11 "credentials": {
12 "api_base": "Your own Azure api_base",
13 "api_version": "Your own Azure api_version",
14 "api_key": "Your own Azure api_key"
15 },
16 "weight": 1
17 }
18 ]
19 }
20}

Load balancing between deployments

A deployment basically means a credential. If you add an OpenAI API key, you have one deployment. If you add 2 OpenAI API keys, you have 2 deployments.

You can go to the platform and add multiple deployments for the same provider, specifying load balancing weights for each deployment.

You can also load balance between deployments in your codebase using the customer_credentials field:

1{
2 "customer_credentials": [
3 {
4 "credentials": {
5 "openai": {
6 "api_key": "YOUR_OPENAI_API_KEY",
7 }
8 },
9 "weight": 1.0
10 },
11 {
12 "credentials": {
13 "openai": {
14 "api_key": "YOUR_OPENAI_API_KEY",
15 }
16 },
17 "weight": 1.0
18 },
19 ],
20}
Specify available models

You can specify the available models for load balancing. For example, if you only want to use gpt-3.5-turbo in an OpenAI deployment, specify it in the available_models field or do it in the platform.

Learn more about how to specify available models in the platform here.

1{
2 "customer_credentials": [
3 {
4 "credentials": {
5 "openai": {
6 "api_key": "YOUR_OPENAI_API_KEY",
7 }
8 },
9 "weight": 1.0,
10 "available_models": ["gpt-3.5-turbo"],
11 "exclude_models": ["gpt-4"]
12 },
13 {
14 "credentials": {
15 "openai": {
16 "api_key": "YOUR_OPENAI_API_KEY",
17 }
18 },
19 "weight": 1.0,
20 },
21 ],
22}