💰 Budgets, Rate Limits
Requirements:
Set Budgets
You can set budgets at 3 levels:
- For the proxy
- For an internal user
- For a customer (end-user)
- For a key
- For a key (model specific budgets)
- For Proxy
- For Team
- For Team Members
- For Customers
- For Key
- For Internal User (Global)
- For Key (model specific)
Apply a budget across all calls on the proxy
Step 1. Modify config.yaml
general_settings:
master_key: sk-1234
litellm_settings:
# other litellm settings
max_budget: 0 # (float) sets max budget as $0 USD
budget_duration: 30d # (str) frequency of reset - You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").
Step 2. Start proxy
litellm /path/to/config.yaml
Step 3. Send test call
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Autherization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'
Add budgets to teams
curl --location 'http://localhost:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
"team_alias": "my-new-team_4",
"members_with_roles": [{"role": "admin", "user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a"}],
"rpm_limit": 99
}'
Sample Response
{
"team_alias": "my-new-team_4",
"team_id": "13e83b19-f851-43fe-8e93-f96e21033100",
"admins": [],
"members": [],
"members_with_roles": [
{
"role": "admin",
"user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a"
}
],
"metadata": {},
"tpm_limit": null,
"rpm_limit": 99,
"max_budget": null,
"models": [],
"spend": 0.0,
"max_parallel_requests": null,
"budget_duration": null,
"budget_reset_at": null
}
Add budget duration to teams
budget_duration
: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").
curl 'http://0.0.0.0:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
"team_alias": "my-new-team_4",
"members_with_roles": [{"role": "admin", "user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a"}],
"budget_duration": 10s,
}'
Use this when you want to budget a users spend within a Team
Step 1. Create User
Create a user with user_id=ishaan
curl --location 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"user_id": "ishaan"
}'
Step 2. Add User to an existing Team - set max_budget_in_team
Set max_budget_in_team
when adding a User to a team. We use the same user_id
we set in Step 1
curl -X POST 'http://0.0.0.0:4000/team/member_add' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{"team_id": "e8d1460f-846c-45d7-9b43-55f3cc52ac32", "max_budget_in_team": 0.000000000001, "member": {"role": "user", "user_id": "ishaan"}}'
Step 3. Create a Key for Team member from Step 1
Set user_id=ishaan
from step 1
curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"user_id": "ishaan",
"team_id": "e8d1460f-846c-45d7-9b43-55f3cc52ac32"
}'
Response from /key/generate
We use the key
from this response in Step 4
{"key":"sk-RV-l2BJEZ_LYNChSx2EueQ", "models":[],"spend":0.0,"max_budget":null,"user_id":"ishaan","team_id":"e8d1460f-846c-45d7-9b43-55f3cc52ac32","max_parallel_requests":null,"metadata":{},"tpm_limit":null,"rpm_limit":null,"budget_duration":null,"allowed_cache_controls":[],"soft_budget":null,"key_alias":null,"duration":null,"aliases":{},"config":{},"permissions":{},"model_max_budget":{},"key_name":null,"expires":null,"token_id":null}%
Step 4. Make /chat/completions requests for Team member
Use the key from step 3 for this request. After 2-3 requests expect to see The following error ExceededBudget: Crossed spend within team
curl --location 'http://localhost:4000/chat/completions' \
--header 'Authorization: Bearer sk-RV-l2BJEZ_LYNChSx2EueQ' \
--header 'Content-Type: application/json' \
--data '{
"model": "llama3",
"messages": [
{
"role": "user",
"content": "tes4"
}
]
}'
Use this to budget user
passed to /chat/completions
, without needing to create a key for every user
Step 1. Modify config.yaml
Define litellm.max_end_user_budget
general_settings:
master_key: sk-1234
litellm_settings:
max_end_user_budget: 0.0001 # budget for 'user' passed to /chat/completions
- Make a /chat/completions call, pass 'user' - First call Works
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
--data ' {
"model": "azure-gpt-3.5",
"user": "ishaan3",
"messages": [
{
"role": "user",
"content": "what time is it"
}
]
}'
- Make a /chat/completions call, pass 'user' - Call Fails, since 'ishaan3' over budget
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
--data ' {
"model": "azure-gpt-3.5",
"user": "ishaan3",
"messages": [
{
"role": "user",
"content": "what time is it"
}
]
}'
Error
{"error":{"message":"Budget has been exceeded: User ishaan3 has exceeded their budget. Current spend: 0.0008869999999999999; Max Budget: 0.0001","type":"auth_error","param":"None","code":401}}%
Apply a budget on a key.
You can:
Expected Behaviour
- Costs Per key get auto-populated in
LiteLLM_VerificationToken
Table - After the key crosses it's
max_budget
, requests fail - If duration set, spend is reset at the end of the duration
By default the max_budget
is set to null
and is not checked for keys
Add budgets to keys
curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
"team_id": "core-infra", # [OPTIONAL]
"max_budget": 10,
}'
Example Request to /chat/completions
when key has crossed budget
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <generated-key>' \
--data ' {
"model": "azure-gpt-3.5",
"user": "e09b4da8-ed80-4b05-ac93-e16d9eb56fca",
"messages": [
{
"role": "user",
"content": "respond in 50 lines"
}
],
}'
Expected Response from /chat/completions
when key has crossed budget
{
"detail":"Authentication Error, ExceededTokenBudget: Current spend for token: 7.2e-05; Max Budget for Token: 2e-07"
}
Add budget duration to keys
budget_duration
: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").
curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
"team_id": "core-infra", # [OPTIONAL]
"max_budget": 10,
"budget_duration": 10s,
}'
Apply a budget across all calls an internal user (key owner) can make on the proxy.
For most use-cases, we recommend setting team-member budgets
LiteLLM exposes a /user/new
endpoint to create budgets for this.
You can:
By default the max_budget
is set to null
and is not checked for keys
Add budgets to users
curl --location 'http://localhost:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["azure-models"], "max_budget": 0, "user_id": "krrish3@berri.ai"}'
Sample Response
{
"key": "sk-YF2OxDbrgd1y2KgwxmEA2w",
"expires": "2023-12-22T09:53:13.861000Z",
"user_id": "krrish3@berri.ai",
"max_budget": 0.0
}
Add budget duration to users
budget_duration
: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").
curl 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
"team_id": "core-infra", # [OPTIONAL]
"max_budget": 10,
"budget_duration": 10s,
}'
Create new keys for existing user
Now you can just call /key/generate
with that user_id (i.e. krrish3@berri.ai) and:
- Budget Check: krrish3@berri.ai's budget (i.e. $10) will be checked for this key
- Spend Tracking: spend for this key will update krrish3@berri.ai's spend as well
curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data '{"models": ["azure-models"], "user_id": "krrish3@berri.ai"}'
Apply model specific budgets on a key.
Expected Behaviour
model_spend
gets auto-populated inLiteLLM_VerificationToken
Table- After the key crosses the budget set for the
model
inmodel_max_budget
, calls fail
By default the model_max_budget
is set to {}
and is not checked for keys
- LiteLLM will track the cost/budgets for the
model
passed to LLM endpoints (/chat/completions
,/embeddings
)
Add model specific budgets to keys
curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
model_max_budget={"gpt4": 0.5, "gpt-5": 0.01}
}'
Reset Budgets
Reset budgets across keys/internal users/teams/customers
budget_duration
: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").
- Internal Users
- Keys
- Teams
curl 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
"max_budget": 10,
"budget_duration": 10s, # 👈 KEY CHANGE
}'
curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
"max_budget": 10,
"budget_duration": 10s, # 👈 KEY CHANGE
}'
curl 'http://0.0.0.0:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
"max_budget": 10,
"budget_duration": 10s, # 👈 KEY CHANGE
}'
Set Rate Limits
You can set:
- tpm limits (tokens per minute)
- rpm limits (requests per minute)
- max parallel requests
- Per Internal User
- Per Key
- For customers
Use /user/new
, to persist rate limits across multiple keys.
curl --location 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"user_id": "krrish@berri.ai", "max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}'
Expected Response
{
"key": "sk-sA7VDkyhlQ7m8Gt77Mbt3Q",
"expires": "2024-01-19T01:21:12.816168",
"user_id": "krrish@berri.ai",
}
Use /key/generate
, if you want them for just that key.
curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}'
Expected Response
{
"key": "sk-ulGNRXWtv7M0lFnnsQk0wQ",
"expires": "2024-01-18T20:48:44.297973",
"user_id": "78c2c8fc-c233-43b9-b0c3-eb931da27b84" // 👈 auto-generated
}
You can also create a budget id for a customer on the UI, under the 'Rate Limits' tab.
Use this to set rate limits for user
passed to /chat/completions
, without needing to create a key for every user
Step 1. Create Budget
Set a tpm_limit
on the budget (You can also pass rpm_limit
if needed)
curl --location 'http://0.0.0.0:4000/budget/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"budget_id" : "free-tier",
"tpm_limit": 5
}'
Step 2. Create Customer
with Budget
We use budget_id="free-tier"
from Step 1 when creating this new customers
curl --location 'http://0.0.0.0:4000/customer/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"user_id" : "palantir",
"budget_id": "free-tier"
}'
Step 3. Pass user_id
id in /chat/completions
requests
Pass the user_id
from Step 2 as user="palantir"
curl --location 'http://localhost:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "llama3",
"user": "palantir",
"messages": [
{
"role": "user",
"content": "gm"
}
]
}'
Grant Access to new model
Use model access groups to give users access to select models, and add new ones to it over time (e.g. mistral, llama-2, etc.).
Difference between doing this with /key/generate
vs. /user/new
? If you do it on /user/new
it'll persist across multiple keys generated for that user.
Step 1. Assign model, access group in config.yaml
model_list:
- model_name: text-embedding-ada-002
litellm_params:
model: azure/azure-embedding-model
api_base: "os.environ/AZURE_API_BASE"
api_key: "os.environ/AZURE_API_KEY"
api_version: "2023-07-01-preview"
model_info:
access_groups: ["beta-models"] # 👈 Model Access Group
Step 2. Create key with access group
curl --location 'http://localhost:4000/user/new' \
-H 'Authorization: Bearer <your-master-key>' \
-H 'Content-Type: application/json' \
-d '{"models": ["beta-models"], # 👈 Model Access Group
"max_budget": 0}'
Create new keys for existing internal user
Just include user_id in the /key/generate
request.
curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data '{"models": ["azure-models"], "user_id": "krrish@berri.ai"}'