For the complete list of all request parameters, see API reference.
Load balancing allows you to balance the request load across different deployments. You can specify weights for each deployment based on their rate limit and your preference.
Click Add model to add models and specify the weight for each model and add your own credentials.
A deployment basically means a credential. If you add an OpenAI API key, you have one deployment. If you add 2 OpenAI API keys, you have 2 deployments.
You can go to the platform and add multiple deployments for the same provider, specifying load balancing weights for each deployment.
You can also load balance between deployments in your codebase using the customer_credentials field:
You can specify the available models for load balancing. For example, if you only want to use gpt-3.5-turbo in an OpenAI deployment, specify it in the available_models field or do it in the platform.
Learn more about how to specify available models in the platform here.