Input Params
Common Params​
LiteLLM accepts and translates the OpenAI Chat Completion params across all providers.
Usage​
import litellm
# set env variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
## SET MAX TOKENS - via completion()
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[{ "content": "Hello, how are you?","role": "user"}],
max_tokens=10
)
print(response)
Translated OpenAI params​
Use this function to get an up-to-date list of supported openai params for any model + provider.
from litellm import get_supported_openai_params
response = get_supported_openai_params(model="anthropic.claude-3", custom_llm_provider="bedrock")
print(response) # ["max_tokens", "tools", "tool_choice", "stream"]
This is a list of openai params we translate across providers.
Use litellm.get_supported_openai_params()
for an updated list of params for each model + provider
Provider | temperature | max_tokens | top_p | stream | stream_options | stop | n | presence_penalty | frequency_penalty | functions | function_call | logit_bias | user | response_format | seed | tools | tool_choice | logprobs | top_logprobs | extra_headers | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Anthropic | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |||||||||
OpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
Azure OpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |||
Replicate | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||||||||
Anyscale | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||||||||
Cohere | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |||||||||||||
Huggingface | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||||||
Openrouter | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||
AI21 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |||||||||||||
VertexAI | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||||||||
Bedrock | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ (for anthropic) | |||||||||||||||
Sagemaker | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||||||
TogetherAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |||||||||||||||
AlephAlpha | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||||||
Palm | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||||||
NLP Cloud | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||||||||
Petals | ✅ | ✅ | ✅ | ✅ | |||||||||||||||||
Ollama | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||||||
Databricks | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||||||||
ClarifAI | ✅ | ✅ | ✅ | ✅ |
By default, LiteLLM raises an exception if the openai param being passed in isn't supported.
To drop the param instead, set litellm.drop_params = True
or completion(..drop_params=True)
.
Input Params​
def completion(
model: str,
messages: List = [],
# Optional OpenAI params
timeout: Optional[Union[float, int]] = None,
temperature: Optional[float] = None,
top_p: Optional[float] = None,
n: Optional[int] = None,
stream: Optional[bool] = None,
stream_options: Optional[dict] = None,
stop=None,
max_tokens: Optional[int] = None,
presence_penalty: Optional[float] = None,
frequency_penalty: Optional[float] = None,
logit_bias: Optional[dict] = None,
user: Optional[str] = None,
# openai v1.0+ new params
response_format: Optional[dict] = None,
seed: Optional[int] = None,
tools: Optional[List] = None,
tool_choice: Optional[str] = None,
logprobs: Optional[bool] = None,
top_logprobs: Optional[int] = None,
deployment_id=None,
# soon to be deprecated params by OpenAI
functions: Optional[List] = None,
function_call: Optional[str] = None,
# set api_base, api_version, api_key
base_url: Optional[str] = None,
api_version: Optional[str] = None,
api_key: Optional[str] = None,
model_list: Optional[list] = None, # pass in a list of api_base,keys, etc.
# Optional liteLLM function params
**kwargs,
) -> ModelResponse:
Required Fields​
model
: string - ID of the model to use. Refer to the model endpoint compatibility table for details on which models work with the Chat API.messages
: array - A list of messages comprising the conversation so far.
Properties of messages
​
Note - Each message in the array contains the following properties:
role
: string - The role of the message's author. Roles can be: system, user, assistant, or function.content
: string or null - The contents of the message. It is required for all messages, but may be null for assistant messages with function calls.name
: string (optional) - The name of the author of the message. It is required if the role is "function". The name should match the name of the function represented in the content. It can contain characters (a-z, A-Z, 0-9), and underscores, with a maximum length of 64 characters.function_call
: object (optional) - The name and arguments of a function that should be called, as generated by the model.
Optional Fields​
temperature
: number or null (optional) - The sampling temperature to be used, between 0 and 2. Higher values like 0.8 produce more random outputs, while lower values like 0.2 make outputs more focused and deterministic.top_p
: number or null (optional) - An alternative to sampling with temperature. It instructs the model to consider the results of the tokens with top_p probability. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered.n
: integer or null (optional) - The number of chat completion choices to generate for each input message.stream
: boolean or null (optional) - If set to true, it sends partial message deltas. Tokens will be sent as they become available, with the stream terminated by a [DONE] message.stream_options
dict or null (optional) - Options for streaming response. Only set this when you setstream: true
include_usage
boolean (optional) - If set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field, but with a null value.
stop
: string/ array/ null (optional) - Up to 4 sequences where the API will stop generating further tokens.max_tokens
: integer (optional) - The maximum number of tokens to generate in the chat completion.presence_penalty
: number or null (optional) - It is used to penalize new tokens based on their existence in the text so far.response_format
: object (optional) - An object specifying the format that the model must output.- Setting to
{ "type": "json_object" }
enables JSON mode, which guarantees the message the model generates is valid JSON. - Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
- Setting to
seed
: integer or null (optional) - This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to thesystem_fingerprint
response parameter to monitor changes in the backend.tools
: array (optional) - A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.type
: string - The type of the tool. Currently, only function is supported.function
: object - Required.
tool_choice
: string or object (optional) - Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. Specifying a particular function via{"type: "function", "function": {"name": "my_function"}}
forces the model to call that function.none
is the default when no functions are present.auto
is the default if functions are present.
frequency_penalty
: number or null (optional) - It is used to penalize new tokens based on their frequency in the text so far.logit_bias
: map (optional) - Used to modify the probability of specific tokens appearing in the completion.user
: string (optional) - A unique identifier representing your end-user. This can help OpenAI to monitor and detect abuse.timeout
: int (optional) - Timeout in seconds for completion requests (Defaults to 600 seconds)logprobs
: bool (optional) - Whether to return log probabilities of the output tokens or not. If true returns the log probabilities of each output token returned in the content of messagetop_logprobs
: int (optional) - An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability.logprobs
must be set to true if this parameter is used.
Deprecated Params​
functions
: array - A list of functions that the model may use to generate JSON inputs. Each function should have the following properties:name
: string - The name of the function to be called. It should contain a-z, A-Z, 0-9, underscores and dashes, with a maximum length of 64 characters.description
: string (optional) - A description explaining what the function does. It helps the model to decide when and how to call the function.parameters
: object - The parameters that the function accepts, described as a JSON Schema object.
function_call
: string or object (optional) - Controls how the model responds to function calls.
litellm-specific params​
api_base
: string (optional) - The api endpoint you want to call the model withapi_version
: string (optional) - (Azure-specific) the api version for the callnum_retries
: int (optional) - The number of times to retry the API call if an APIError, TimeoutError or ServiceUnavailableError occurscontext_window_fallback_dict
: dict (optional) - A mapping of model to use if call fails due to context window errorfallbacks
: list (optional) - A list of model names + params to be used, in case the initial call failsmetadata
: dict (optional) - Any additional data you want to be logged when the call is made (sent to logging integrations, eg. promptlayer and accessible via custom callback function)
CUSTOM MODEL COST
input_cost_per_token
: float (optional) - The cost per input token for the completion calloutput_cost_per_token
: float (optional) - The cost per output token for the completion call
CUSTOM PROMPT TEMPLATE (See prompt formatting for more info)
initial_prompt_value
: string (optional) - Initial string applied at the start of the input messagesroles
: dict (optional) - Dictionary specifying how to format the prompt based on the role + message passed in viamessages
.final_prompt_value
: string (optional) - Final string applied at the end of the input messagesbos_token
: string (optional) - Initial string applied at the start of a sequenceeos_token
: string (optional) - Initial string applied at the end of a sequencehf_model_name
: string (optional) - [Sagemaker Only] The corresponding huggingface name of the model, used to pull the right chat template for the model.
Provider-specific Params​
Providers might offer params not supported by OpenAI (e.g. top_k). You can pass those in 2 ways:
- via completion(): We'll pass the non-openai param, straight to the provider as part of the request body.
- e.g.
completion(model="claude-instant-1", top_k=3)
- e.g.
- via provider-specific config variable (e.g.
litellm.OpenAIConfig()
).
- OpenAI
- OpenAI Text Completion
- Azure OpenAI
- Anthropic
- Huggingface
- TogetherAI
- Ollama
- Replicate
- Petals
- Palm
- AI21
- Cohere
import litellm, os
# set env variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
## SET MAX TOKENS - via completion()
response_1 = litellm.completion(
model="gpt-3.5-turbo",
messages=[{ "content": "Hello, how are you?","role": "user"}],
max_tokens=10
)
response_1_text = response_1.choices[0].message.content
## SET MAX TOKENS - via config
litellm.OpenAIConfig(max_tokens=10)
response_2 = litellm.completion(
model="gpt-3.5-turbo",
messages=[{ "content": "Hello, how are you?","role": "user"}],
)
response_2_text = response_2.choices[0].message.content
## TEST OUTPUT
assert len(response_2_text) > len(response_1_text)
import litellm, os
# set env variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
## SET MAX TOKENS - via completion()
response_1 = litellm.completion(
model="text-davinci-003",
messages=[{ "content": "Hello, how are you?","role": "user"}],
max_tokens=10
)
response_1_text = response_1.choices[0].message.content
## SET MAX TOKENS - via config
litellm.OpenAITextCompletionConfig(max_tokens=10)
response_2 = litellm.completion(
model="text-davinci-003",
messages=[{ "content": "Hello, how are you?","role": "user"}],
)
response_2_text = response_2.choices[0].message.content
## TEST OUTPUT
assert len(response_2_text) > len(response_1_text)
import litellm, os
# set env variables
os.environ["AZURE_API_BASE"] = "your-azure-api-base"
os.environ["AZURE_API_TYPE"] = "azure" # [OPTIONAL]
os.environ["AZURE_API_VERSION"] = "2023-07-01-preview" # [OPTIONAL]
## SET MAX TOKENS - via completion()
response_1 = litellm.completion(
model="azure/chatgpt-v-2",
messages=[{ "content": "Hello, how are you?","role": "user"}],
max_tokens=10
)
response_1_text = response_1.choices[0].message.content
## SET MAX TOKENS - via config
litellm.AzureOpenAIConfig(max_tokens=10)
response_2 = litellm.completion(
model="azure/chatgpt-v-2",
messages=[{ "content": "Hello, how are you?","role": "user"}],
)
response_2_text = response_2.choices[0].message.content
## TEST OUTPUT
assert len(response_2_text) > len(response_1_text)
import litellm, os
# set env variables
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
## SET MAX TOKENS - via completion()
response_1 = litellm.completion(
model="claude-instant-1",
messages=[{ "content": "Hello, how are you?","role": "user"}],
max_tokens=10
)
response_1_text = response_1.choices[0].message.content
## SET MAX TOKENS - via config
litellm.AnthropicConfig(max_tokens_to_sample=200)
response_2 = litellm.completion(
model="claude-instant-1",
messages=[{ "content": "Hello, how are you?","role": "user"}],
)
response_2_text = response_2.choices[0].message.content
## TEST OUTPUT
assert len(response_2_text) > len(response_1_text)
import litellm, os
# set env variables
os.environ["HUGGINGFACE_API_KEY"] = "your-huggingface-key" #[OPTIONAL]
## SET MAX TOKENS - via completion()
response_1 = litellm.completion(
model="huggingface/mistralai/Mistral-7B-Instruct-v0.1",
messages=[{ "content": "Hello, how are you?","role": "user"}],
api_base="https://your-huggingface-api-endpoint",
max_tokens=10
)
response_1_text = response_1.choices[0].message.content
## SET MAX TOKENS - via config
litellm.HuggingfaceConfig(max_new_tokens=200)
response_2 = litellm.completion(
model="huggingface/mistralai/Mistral-7B-Instruct-v0.1",
messages=[{ "content": "Hello, how are you?","role": "user"}],
api_base="https://your-huggingface-api-endpoint"
)
response_2_text = response_2.choices[0].message.content
## TEST OUTPUT
assert len(response_2_text) > len(response_1_text)
import litellm, os
# set env variables
os.environ["TOGETHERAI_API_KEY"] = "your-togetherai-key"
## SET MAX TOKENS - via completion()
response_1 = litellm.completion(
model="together_ai/togethercomputer/llama-2-70b-chat",
messages=[{ "content": "Hello, how are you?","role": "user"}],
max_tokens=10
)
response_1_text = response_1.choices[0].message.content
## SET MAX TOKENS - via config
litellm.TogetherAIConfig(max_tokens_to_sample=200)
response_2 = litellm.completion(
model="together_ai/togethercomputer/llama-2-70b-chat",
messages=[{ "content": "Hello, how are you?","role": "user"}],
)
response_2_text = response_2.choices[0].message.content
## TEST OUTPUT
assert len(response_2_text) > len(response_1_text)
import litellm, os
## SET MAX TOKENS - via completion()
response_1 = litellm.completion(
model="ollama/llama2",
messages=[{ "content": "Hello, how are you?","role": "user"}],
max_tokens=10
)
response_1_text = response_1.choices[0].message.content
## SET MAX TOKENS - via config
litellm.OllamConfig(num_predict=200)
response_2 = litellm.completion(
model="ollama/llama2",
messages=[{ "content": "Hello, how are you?","role": "user"}],
)
response_2_text = response_2.choices[0].message.content
## TEST OUTPUT
assert len(response_2_text) > len(response_1_text)
import litellm, os
# set env variables
os.environ["REPLICATE_API_KEY"] = "your-replicate-key"
## SET MAX TOKENS - via completion()
response_1 = litellm.completion(
model="replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
messages=[{ "content": "Hello, how are you?","role": "user"}],
max_tokens=10
)
response_1_text = response_1.choices[0].message.content
## SET MAX TOKENS - via config
litellm.ReplicateConfig(max_new_tokens=200)
response_2 = litellm.completion(
model="replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
messages=[{ "content": "Hello, how are you?","role": "user"}],
)
response_2_text = response_2.choices[0].message.content
## TEST OUTPUT
assert len(response_2_text) > len(response_1_text)
import litellm
## SET MAX TOKENS - via completion()
response_1 = litellm.completion(
model="petals/petals-team/StableBeluga2",
messages=[{ "content": "Hello, how are you?","role": "user"}],
api_base="https://chat.petals.dev/api/v1/generate",
max_tokens=10
)
response_1_text = response_1.choices[0].message.content
## SET MAX TOKENS - via config
litellm.PetalsConfig(max_new_tokens=10)
response_2 = litellm.completion(
model="petals/petals-team/StableBeluga2",
messages=[{ "content": "Hello, how are you?","role": "user"}],
api_base="https://chat.petals.dev/api/v1/generate",
)
response_2_text = response_2.choices[0].message.content
## TEST OUTPUT
assert len(response_2_text) > len(response_1_text)
import litellm, os
# set env variables
os.environ["PALM_API_KEY"] = "your-palm-key"
## SET MAX TOKENS - via completion()
response_1 = litellm.completion(
model="palm/chat-bison",
messages=[{ "content": "Hello, how are you?","role": "user"}],
max_tokens=10
)
response_1_text = response_1.choices[0].message.content
## SET MAX TOKENS - via config
litellm.PalmConfig(maxOutputTokens=10)
response_2 = litellm.completion(
model="palm/chat-bison",
messages=[{ "content": "Hello, how are you?","role": "user"}],
)
response_2_text = response_2.choices[0].message.content
## TEST OUTPUT
assert len(response_2_text) > len(response_1_text)
import litellm, os
# set env variables
os.environ["AI21_API_KEY"] = "your-ai21-key"
## SET MAX TOKENS - via completion()
response_1 = litellm.completion(
model="j2-mid",
messages=[{ "content": "Hello, how are you?","role": "user"}],
max_tokens=10
)
response_1_text = response_1.choices[0].message.content
## SET MAX TOKENS - via config
litellm.AI21Config(maxOutputTokens=10)
response_2 = litellm.completion(
model="j2-mid",
messages=[{ "content": "Hello, how are you?","role": "user"}],
)
response_2_text = response_2.choices[0].message.content
## TEST OUTPUT
assert len(response_2_text) > len(response_1_text)
import litellm, os
# set env variables
os.environ["COHERE_API_KEY"] = "your-cohere-key"
## SET MAX TOKENS - via completion()
response_1 = litellm.completion(
model="command-nightly",
messages=[{ "content": "Hello, how are you?","role": "user"}],
max_tokens=10
)
response_1_text = response_1.choices[0].message.content
## SET MAX TOKENS - via config
litellm.CohereConfig(max_tokens=200)
response_2 = litellm.completion(
model="command-nightly",
messages=[{ "content": "Hello, how are you?","role": "user"}],
)
response_2_text = response_2.choices[0].message.content
## TEST OUTPUT
assert len(response_2_text) > len(response_1_text)