Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
9a980f3
Add anthropic/claude-sonnet-4.5 to OpenRouter cost map
huangyafei Oct 13, 2025
6424480
add ecs to docs
mubashir1osmani Oct 12, 2025
4f33bc4
fix(ollama/chat): 'think' param handling
kowyo Sep 30, 2025
e30b716
fix: add 'think' parameter handling in ollama_chat.py
kowyo Sep 30, 2025
9557a37
fix: update 'think' parameter assignment to use provided value in tra…
kowyo Oct 4, 2025
6896c60
fix: only use think level for gpt-oss model
kowyo Oct 5, 2025
92bd238
fix comment
kowyo Oct 5, 2025
6df051e
docs: update benchmark results with improved infrastructure
AlexsanderHamir Oct 12, 2025
2e57d19
update benchmarks
AlexsanderHamir Oct 12, 2025
09662b5
direct cost calculation from openrouter
dhruvyad Oct 11, 2025
b57406e
add tests for openrouter cost tracking
dhruvyad Oct 12, 2025
cb29e33
docs: fix doc
krrishdholakia Oct 13, 2025
fcb85f8
perf(router): optimize timing functions in completion hot path
AlexsanderHamir Oct 13, 2025
b0d963c
docs(index.md): bump rc
krrishdholakia Oct 13, 2025
5bffa58
feat(ssl): add configurable ECDH curve for TLS performance
AlexsanderHamir Oct 13, 2025
65163c7
[Fix] GEMINI - CLI - add google_routes to llm_api_routes (#15500)
ishaan-jaff Oct 13, 2025
d094a33
add: unit test
AlexsanderHamir Oct 14, 2025
36c9066
perf(router): optimize model lookups with O(1) index maps and standar…
AlexsanderHamir Oct 15, 2025
c98a30a
perf(router): optimize string concatenation in hash generation
AlexsanderHamir Oct 15, 2025
97ed3d0
test(router): update error message assertion after string concat opti…
AlexsanderHamir Oct 15, 2025
294fb94
perf(router): use shallow copy instead of deepcopy for model aliases
AlexsanderHamir Oct 15, 2025
6e4da1d
perf(router): optimize model lookups with O(1) data structures
AlexsanderHamir Oct 15, 2025
1705190
Merge pull request #15576 from BerriAI/litellm_remove_deepcopy
AlexsanderHamir Oct 16, 2025
4e84937
Merge pull request #15575 from BerriAI/litellm_remove_constly_string_…
AlexsanderHamir Oct 16, 2025
56838e2
test: ensure model_names is a O(1) datastructure
AlexsanderHamir Oct 16, 2025
a17f3a7
Merge pull request #15578 from BerriAI/litellm_remove_list_lookup
AlexsanderHamir Oct 16, 2025
5e929da
test: add static analysis to prevent O(n) linear scans in router
AlexsanderHamir Oct 16, 2025
1846363
Merge branch 'litellm_october_alexsander_stanging' into litellm_route…
AlexsanderHamir Oct 16, 2025
d07e4f4
Merge pull request #15574 from BerriAI/litellm_router_index_change
AlexsanderHamir Oct 16, 2025
18c5ec8
Add missing env key
AlexsanderHamir Oct 16, 2025
8a32553
fix(proxy): change check_file_size_under_limit to accept Collection[str]
AlexsanderHamir Oct 16, 2025
2ef3a76
Merge pull request #15606 from BerriAI/litellm_fix_staging_branch
AlexsanderHamir Oct 16, 2025
254f50a
perf(router): optimize model lookups with O(1) index maps and standar…
AlexsanderHamir Oct 15, 2025
35de05d
test: add static analysis to prevent O(n) linear scans in router
AlexsanderHamir Oct 16, 2025
8c58d16
perf(router): use shallow copy instead of deepcopy for model aliases
AlexsanderHamir Oct 15, 2025
f842b2a
perf(router): optimize string concatenation in hash generation
AlexsanderHamir Oct 15, 2025
76b272d
test(router): update error message assertion after string concat opti…
AlexsanderHamir Oct 15, 2025
3b14866
perf(router): optimize model lookups with O(1) data structures
AlexsanderHamir Oct 15, 2025
e154ab6
test: ensure model_names is a O(1) datastructure
AlexsanderHamir Oct 16, 2025
220fa3f
Add missing env key
AlexsanderHamir Oct 16, 2025
aad0efd
fix(proxy): change check_file_size_under_limit to accept Collection[str]
AlexsanderHamir Oct 16, 2025
bdca4db
Merge branch 'litellm_october_alexsander_stanging' of https://github.…
AlexsanderHamir Oct 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 24 additions & 29 deletions docs/my-website/docs/benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,48 +16,50 @@ model_list:
api_key: "test"
```

### 1 Instance LiteLLM Proxy
### 2 Instance LiteLLM Proxy

In these tests the baseline latency characteristics are measured against a fake-openai-endpoint.

#### Performance Metrics

| Metric | Value |
|--------|-------|
| **Requests per Second (RPS)** | 475 |
| **End-to-End Latency P50 (ms)** | 100 |
| **LiteLLM Overhead P50 (ms)** | 3 |
| **LiteLLM Overhead P90 (ms)** | 17 |
| **LiteLLM Overhead P99 (ms)** | 31 |
| **Type** | **Name** | **Median (ms)** | **95%ile (ms)** | **99%ile (ms)** | **Average (ms)** | **Current RPS** |
| --- | --- | --- | --- | --- | --- | --- |
| POST | /chat/completions | 200 | 630 | 1200 | 262.46 | 1035.7 |
| Custom | LiteLLM Overhead Duration (ms) | 12 | 29 | 43 | 14.74 | 1035.7 |
| | Aggregated | 100 | 430 | 930 | 138.6 | 2071.4 |

<!-- <Image img={require('../img/1_instance_proxy.png')} /> -->

<!-- ## **Horizontal Scaling - 10K RPS**

<Image img={require('../img/instances_vs_rps.png')} /> -->

#### Key Findings
- Single instance: 475 RPS @ 100ms median latency
- LiteLLM adds 3ms P50 overhead, 17ms P90 overhead, 31ms P99 overhead
- 2 LiteLLM instances: 950 RPS @ 100ms latency
- 4 LiteLLM instances: 1900 RPS @ 100ms latency

### 2 Instances

**Adding 1 instance, will double the RPS and maintain the `100ms-110ms` median latency.**
### 4 Instances

| Metric | Litellm Proxy (2 Instances) |
|--------|------------------------|
| Median Latency (ms) | 100 |
| RPS | 950 |
| **Type** | **Name** | **Median (ms)** | **95%ile (ms)** | **99%ile (ms)** | **Average (ms)** | **Current RPS** |
| --- | --- | --- | --- | --- | --- | --- |
| POST | /chat/completions | 100 | 150 | 240 | 111.73 | 1170 |
| Custom | LiteLLM Overhead Duration (ms) | 2 | 8 | 13 | 3.32 | 1170 |
| | Aggregated | 77 | 130 | 180 | 57.53 | 2340 |

#### Key Findings
- Doubling from 2 to 4 LiteLLM instances halves median latency: 200 ms → 100 ms.
- High-percentile latencies drop significantly: P95 630 ms → 150 ms, P99 1,200 ms → 240 ms.
- Setting workers equal to CPU count gives optimal performance.

## Machine Spec used for testing

Each machine deploying LiteLLM had the following specs:

- 2 CPU
- 4GB RAM
- 4 CPU
- 8GB RAM


## Locust Settings

- 1000 Users
- 500 user Ramp Up

## How to measure LiteLLM Overhead

Expand Down Expand Up @@ -137,10 +139,3 @@ Using LangSmith has **no impact on latency, RPS compared to Basic Litellm Proxy*
|--------|------------------------|---------------------|
| RPS | 1133.2 | 1135 |
| Median Latency (ms) | 140 | 132 |



## Locust Settings

- 2500 Users
- 100 user Ramp Up
44 changes: 43 additions & 1 deletion docs/my-website/docs/guides/security_settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,10 +117,52 @@ litellm_settings:
```bash
export SSL_CERTIFICATE="/path/to/certificate.pem"
```

</TabItem>
</Tabs>

## 5. Configure ECDH Curve for SSL/TLS Performance

The `ssl_ecdh_curve` setting allows you to configure the Elliptic Curve Diffie-Hellman (ECDH) curve used for SSL/TLS key exchange. This is particularly useful for disabling Post-Quantum Cryptography (PQC) to improve performance in environments where PQC is not required.

**Use Case:** Some OpenSSL 3.x systems enable PQC by default, which can slow down TLS handshakes. Setting the ECDH curve to `X25519` disables PQC and can significantly improve connection performance.

<Tabs>
<TabItem value="sdk" label="SDK">

```python
import litellm
litellm.ssl_ecdh_curve = "X25519" # Disables PQC for better performance
```

</TabItem>
<TabItem value="proxy" label="PROXY">

```yaml
litellm_settings:
ssl_ecdh_curve: "X25519"
```

</TabItem>
<TabItem value="env_var" label="Environment Variables">

```bash
export SSL_ECDH_CURVE="X25519"
```

</TabItem>
</Tabs>

## 5. Use HTTP_PROXY environment variable
**Common Valid Curves:**

- `X25519` - Modern, fast curve (recommended for disabling PQC)
- `prime256v1` - NIST P-256 curve
- `secp384r1` - NIST P-384 curve
- `secp521r1` - NIST P-521 curve

**Note:** If an invalid curve name is provided or if your Python/OpenSSL version doesn't support this feature, LiteLLM will log a warning and continue with default curves.

## 6. Use HTTP_PROXY environment variable

Both httpx and aiohttp libraries use `urllib.request.getproxies` from environment variables. Before client initialization, you may set proxy (and optional SSL_CERT_FILE) by setting the environment variables:

Expand Down
1 change: 1 addition & 0 deletions docs/my-website/docs/proxy/config_settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -752,6 +752,7 @@ router_settings:
| SPEND_LOGS_URL | URL for retrieving spend logs
| SPEND_LOG_CLEANUP_BATCH_SIZE | Number of logs deleted per batch during cleanup. Default is 1000
| SSL_CERTIFICATE | Path to the SSL certificate file
| SSL_ECDH_CURVE | ECDH curve for SSL/TLS key exchange (e.g., 'X25519' to disable PQC).
| SSL_SECURITY_LEVEL | [BETA] Security level for SSL/TLS connections. E.g. `DEFAULT@SECLEVEL=1`
| SSL_VERIFY | Flag to enable or disable SSL certificate verification
| SSL_CERT_FILE | Path to the SSL certificate file for custom CA bundle
Expand Down
24 changes: 24 additions & 0 deletions docs/my-website/docs/proxy/deploy.md
Original file line number Diff line number Diff line change
Expand Up @@ -788,6 +788,30 @@ docker run --name litellm-proxy \
## Platform-specific Guide

<Tabs>
<TabItem value="AWS ECS" label="AWS ECS - Elastic Container Service">

### Terraform-based ECS Deployment

LiteLLM maintains a dedicated Terraform tutorial for deploying the proxy on ECS. Follow the step-by-step guide in the [litellm-ecs-deployment repository](https://github.com/BerriAI/litellm-ecs-deployment) to provision the required ECS services, task definitions, and supporting AWS resources.

1. Clone the tutorial repository to review the Terraform modules and variables.
```bash
git clone https://github.com/BerriAI/litellm-ecs-deployment.git
cd litellm-ecs-deployment
```

2. Initialize and validate the Terraform project before applying it to your chosen workspace/account.
```bash
terraform init
terraform plan
terraform apply
```

3. Once `terraform apply` completes, do `./build.sh` to push the repository on ECR and update the ECS cluster. Use that endpoint (port `4000` by default) for API requests to your LiteLLM proxy.


</TabItem>

<TabItem value="AWS EKS" label="AWS EKS - Kubernetes">

### Kubernetes (AWS EKS)
Expand Down
4 changes: 2 additions & 2 deletions docs/my-website/release_notes/v1.78.0-stable/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,15 +40,15 @@ import TabItem from '@theme/TabItem';
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.78.0.rc.1
ghcr.io/berriai/litellm:v1.78.0.rc.2
```

</TabItem>

<TabItem value="pip" label="Pip">

``` showLineNumbers title="pip install litellm"
pip install litellm==1.78.0.rc.1
pip install litellm==1.78.0.rc.2
```

</TabItem>
Expand Down
1 change: 1 addition & 0 deletions litellm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,7 @@
ssl_verify: Union[str, bool] = True
ssl_security_level: Optional[str] = None
ssl_certificate: Optional[str] = None
ssl_ecdh_curve: Optional[str] = None # Set to 'X25519' to disable PQC and improve performance
disable_streaming_logging: bool = False
disable_token_counter: bool = False
disable_add_transform_inline_image_block: bool = False
Expand Down
23 changes: 23 additions & 0 deletions litellm/llms/custom_httpx/http_handler.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import asyncio
import os
import ssl
import sys
import time
from typing import TYPE_CHECKING, Any, Callable, Dict, List, Mapping, Optional, Union

Expand Down Expand Up @@ -114,6 +115,28 @@ def get_ssl_configuration(
# but falls back to widely compatible ones
custom_ssl_context.set_ciphers(DEFAULT_SSL_CIPHERS)

# Configure ECDH curve for key exchange (e.g., to disable PQC and improve performance)
# Set SSL_ECDH_CURVE env var or litellm.ssl_ecdh_curve to 'X25519' to disable PQC
# Common valid curves: X25519, prime256v1, secp384r1, secp521r1
ssl_ecdh_curve = os.getenv("SSL_ECDH_CURVE", litellm.ssl_ecdh_curve)
if ssl_ecdh_curve and isinstance(ssl_ecdh_curve, str):
try:
custom_ssl_context.set_ecdh_curve(ssl_ecdh_curve)
verbose_logger.debug(f"SSL ECDH curve set to: {ssl_ecdh_curve}")
except AttributeError:
verbose_logger.warning(
f"SSL ECDH curve configuration not supported. "
f"Python version: {sys.version.split()[0]}, OpenSSL version: {ssl.OPENSSL_VERSION}. "
f"Requested curve: {ssl_ecdh_curve}. Continuing with default curves."
)
except ValueError as e:
# Invalid curve name
verbose_logger.warning(
f"Invalid SSL ECDH curve name: '{ssl_ecdh_curve}'. {e}. "
f"Common valid curves: X25519, prime256v1, secp384r1, secp521r1. "
f"Continuing with default curves (including PQC)."
)

# Use our custom SSL context instead of the original ssl_verify value
return custom_ssl_context

Expand Down
10 changes: 8 additions & 2 deletions litellm/llms/ollama/chat/transformation.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,9 +184,12 @@ def map_openai_params(
):
if value.get("json_schema") and value["json_schema"].get("schema"):
optional_params["format"] = value["json_schema"]["schema"]
### FUNCTION CALLING LOGIC ###
if param == "reasoning_effort" and value is not None:
optional_params["think"] = True
if model.startswith("gpt-oss"):
optional_params["think"] = value
else:
optional_params["think"] = True
### FUNCTION CALLING LOGIC ###
if param == "tools":
## CHECK IF MODEL SUPPORTS TOOL CALLING ##
try:
Expand Down Expand Up @@ -281,6 +284,7 @@ def transform_request(
stream = optional_params.pop("stream", False)
format = optional_params.pop("format", None)
keep_alive = optional_params.pop("keep_alive", None)
think = optional_params.pop("think", None)
function_name = optional_params.pop("function_name", None)
litellm_params["function_name"] = function_name
tools = optional_params.pop("tools", None)
Expand Down Expand Up @@ -344,6 +348,8 @@ def transform_request(
data["tools"] = tools
if keep_alive is not None:
data["keep_alive"] = keep_alive
if think is not None:
data["think"] = think

return data

Expand Down
8 changes: 7 additions & 1 deletion litellm/llms/ollama/completion/transformation.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,10 @@ def map_openai_params(
elif param == "stop":
optional_params["stop"] = value
elif param == "reasoning_effort" and value is not None:
optional_params["think"] = True
if model.startswith("gpt-oss"):
optional_params["think"] = value
else:
optional_params["think"] = True
elif param == "response_format" and isinstance(value, dict):
if value["type"] == "json_object":
optional_params["format"] = "json"
Expand Down Expand Up @@ -412,6 +415,7 @@ def transform_request(
stream = optional_params.pop("stream", False)
format = optional_params.pop("format", None)
images = optional_params.pop("images", None)
think = optional_params.pop("think", None)
data = {
"model": model,
"prompt": ollama_prompt,
Expand All @@ -425,6 +429,8 @@ def transform_request(
data["images"] = [
_convert_image(convert_to_ollama_image(image)) for image in images
]
if think is not None:
data["think"] = think

return data

Expand Down
3 changes: 3 additions & 0 deletions litellm/llms/ollama_chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ def get_ollama_response( # noqa: PLR0915
stream = optional_params.pop("stream", False)
format = optional_params.pop("format", None)
keep_alive = optional_params.pop("keep_alive", None)
think = optional_params.pop("think", None)
function_name = optional_params.pop("function_name", None)
tools = optional_params.pop("tools", None)

Expand Down Expand Up @@ -98,6 +99,8 @@ def get_ollama_response( # noqa: PLR0915
data["tools"] = tools
if keep_alive is not None:
data["keep_alive"] = keep_alive
if think is not None:
data["think"] = think
## LOGGING
logging_obj.pre_call(
input=None,
Expand Down
62 changes: 62 additions & 0 deletions litellm/llms/openrouter/chat/transformation.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,8 +147,70 @@ def transform_request(
model, messages, optional_params, litellm_params, headers
)
response.update(extra_body)

# ALWAYS add usage parameter to get cost data from OpenRouter
# This ensures cost tracking works for all OpenRouter models
if "usage" not in response:
response["usage"] = {"include": True}

return response

def transform_response(
self,
model: str,
raw_response: httpx.Response,
model_response: ModelResponse,
logging_obj: Any,
request_data: dict,
messages: List[AllMessageValues],
optional_params: dict,
litellm_params: dict,
encoding: Any,
api_key: Optional[str] = None,
json_mode: Optional[bool] = None,
) -> ModelResponse:
"""
Transform the response from OpenRouter API.

Extracts cost information from response headers if available.

Returns:
ModelResponse: The transformed response with cost information.
"""
# Call parent transform_response to get the standard ModelResponse
model_response = super().transform_response(
model=model,
raw_response=raw_response,
model_response=model_response,
logging_obj=logging_obj,
request_data=request_data,
messages=messages,
optional_params=optional_params,
litellm_params=litellm_params,
encoding=encoding,
api_key=api_key,
json_mode=json_mode,
)

# Extract cost from OpenRouter response body
# OpenRouter returns cost information in the usage object when usage.include=true
try:
response_json = raw_response.json()
if "usage" in response_json and response_json["usage"]:
response_cost = response_json["usage"].get("cost")
if response_cost is not None:
# Store cost in hidden params for the cost calculator to use
if not hasattr(model_response, "_hidden_params"):
model_response._hidden_params = {}
if "additional_headers" not in model_response._hidden_params:
model_response._hidden_params["additional_headers"] = {}
model_response._hidden_params["additional_headers"]["llm_provider-x-litellm-response-cost"] = float(response_cost)
except Exception:
# If we can't extract cost, continue without it - don't fail the response
pass

return model_response

def get_error_class(
self, error_message: str, status_code: int, headers: Union[dict, httpx.Headers]
) -> BaseLLMException:
Expand Down
Loading
Loading