BerriAI · ishaan-jaff · Oct 16, 2025 · Oct 13, 2025 · Oct 12, 2025 · Sep 30, 2025
diff --git a/docs/my-website/docs/benchmarks.md b/docs/my-website/docs/benchmarks.md
@@ -16,48 +16,50 @@ model_list:
       api_key: "test"
 ```
 
-### 1 Instance LiteLLM Proxy
+### 2 Instance LiteLLM Proxy
 
 In these tests the baseline latency characteristics are measured against a fake-openai-endpoint.
 
 #### Performance Metrics
 
-| Metric | Value |
-|--------|-------|
-| **Requests per Second (RPS)** | 475 |
-| **End-to-End Latency P50 (ms)** | 100 |
-| **LiteLLM Overhead P50 (ms)** | 3 |
-| **LiteLLM Overhead P90 (ms)** | 17 |
-| **LiteLLM Overhead P99 (ms)** | 31 |
+| **Type** | **Name** | **Median (ms)** | **95%ile (ms)** | **99%ile (ms)** | **Average (ms)** | **Current RPS** |
+| --- | --- | --- | --- | --- | --- | --- |
+| POST | /chat/completions | 200 | 630 | 1200 | 262.46 | 1035.7 |
+| Custom | LiteLLM Overhead Duration (ms) | 12 | 29 | 43 | 14.74 | 1035.7 |
+|  | Aggregated | 100 | 430 | 930 | 138.6 | 2071.4 |
 
 <!-- <Image img={require('../img/1_instance_proxy.png')} /> -->
 
 <!-- ## **Horizontal Scaling - 10K RPS**
 
 <Image img={require('../img/instances_vs_rps.png')} /> -->
 
-#### Key Findings
-- Single instance: 475 RPS @ 100ms median latency
-- LiteLLM adds 3ms P50 overhead, 17ms P90 overhead, 31ms P99 overhead
-- 2 LiteLLM instances: 950 RPS @ 100ms latency
-- 4 LiteLLM instances: 1900 RPS @ 100ms latency
-
-### 2 Instances
 
-**Adding 1 instance, will double the RPS and maintain the `100ms-110ms` median latency.**
+### 4 Instances
 
-| Metric | Litellm Proxy (2 Instances) |
-|--------|------------------------|
-| Median Latency (ms) | 100 |
-| RPS | 950 |
+| **Type** | **Name** | **Median (ms)** | **95%ile (ms)** | **99%ile (ms)** | **Average (ms)** | **Current RPS** |
+| --- | --- | --- | --- | --- | --- | --- |
+| POST | /chat/completions | 100 | 150 | 240 | 111.73 | 1170 |
+| Custom | LiteLLM Overhead Duration (ms) | 2 | 8 | 13 | 3.32 | 1170 |
+|  | Aggregated | 77 | 130 | 180 | 57.53 | 2340 |
 
+#### Key Findings
+- Doubling from 2 to 4 LiteLLM instances halves median latency: 200 ms → 100 ms.
+- High-percentile latencies drop significantly: P95 630 ms → 150 ms, P99 1,200 ms → 240 ms.
+- Setting workers equal to CPU count gives optimal performance.
 
 ## Machine Spec used for testing
 
 Each machine deploying LiteLLM had the following specs:
 
-- 2 CPU
-- 4GB RAM
+- 4 CPU
+- 8GB RAM
+
+
+## Locust Settings
+
+- 1000 Users
+- 500 user Ramp Up
 
 ## How to measure LiteLLM Overhead
 
@@ -137,10 +139,3 @@ Using LangSmith has **no impact on latency, RPS compared to Basic Litellm Proxy*
 |--------|------------------------|---------------------|
 | RPS | 1133.2 | 1135 |
 | Median Latency (ms) | 140 | 132 |
-
-
-
-## Locust Settings
-
-- 2500 Users
-- 100 user Ramp Up
diff --git a/docs/my-website/docs/guides/security_settings.md b/docs/my-website/docs/guides/security_settings.md
@@ -117,10 +117,52 @@ litellm_settings:
 ```bash
 export SSL_CERTIFICATE="/path/to/certificate.pem"
 ```
+
+</TabItem>
+</Tabs>
+
+## 5. Configure ECDH Curve for SSL/TLS Performance
+
+The `ssl_ecdh_curve` setting allows you to configure the Elliptic Curve Diffie-Hellman (ECDH) curve used for SSL/TLS key exchange. This is particularly useful for disabling Post-Quantum Cryptography (PQC) to improve performance in environments where PQC is not required.
+
+**Use Case:** Some OpenSSL 3.x systems enable PQC by default, which can slow down TLS handshakes. Setting the ECDH curve to `X25519` disables PQC and can significantly improve connection performance.
+
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+import litellm
+litellm.ssl_ecdh_curve = "X25519"  # Disables PQC for better performance
+```
+
+</TabItem>
+<TabItem value="proxy" label="PROXY">
+
+```yaml
+litellm_settings:
+  ssl_ecdh_curve: "X25519"
+```
+
+</TabItem>  
+<TabItem value="env_var" label="Environment Variables">
+
+```bash
+export SSL_ECDH_CURVE="X25519"
+```
+
 </TabItem>
 </Tabs>
 
-## 5. Use HTTP_PROXY environment variable
+**Common Valid Curves:**
+
+- `X25519` - Modern, fast curve (recommended for disabling PQC)
+- `prime256v1` - NIST P-256 curve
+- `secp384r1` - NIST P-384 curve
+- `secp521r1` - NIST P-521 curve
+
+**Note:** If an invalid curve name is provided or if your Python/OpenSSL version doesn't support this feature, LiteLLM will log a warning and continue with default curves.
+
+## 6. Use HTTP_PROXY environment variable
 
 Both httpx and aiohttp libraries use `urllib.request.getproxies` from environment variables. Before client initialization, you may set proxy (and optional SSL_CERT_FILE) by setting the environment variables:
 

diff --git a/docs/my-website/docs/proxy/config_settings.md b/docs/my-website/docs/proxy/config_settings.md
@@ -752,6 +752,7 @@ router_settings:
 | SPEND_LOGS_URL | URL for retrieving spend logs
 | SPEND_LOG_CLEANUP_BATCH_SIZE | Number of logs deleted per batch during cleanup. Default is 1000
 | SSL_CERTIFICATE | Path to the SSL certificate file
+| SSL_ECDH_CURVE | ECDH curve for SSL/TLS key exchange (e.g., 'X25519' to disable PQC).
 | SSL_SECURITY_LEVEL | [BETA] Security level for SSL/TLS connections. E.g. `DEFAULT@SECLEVEL=1`
 | SSL_VERIFY | Flag to enable or disable SSL certificate verification
 | SSL_CERT_FILE | Path to the SSL certificate file for custom CA bundle

diff --git a/docs/my-website/docs/proxy/deploy.md b/docs/my-website/docs/proxy/deploy.md
@@ -788,6 +788,30 @@ docker run --name litellm-proxy \
 ## Platform-specific Guide
 
 <Tabs>
+<TabItem value="AWS ECS" label="AWS ECS - Elastic Container Service">
+
+### Terraform-based ECS Deployment
+
+LiteLLM maintains a dedicated Terraform tutorial for deploying the proxy on ECS. Follow the step-by-step guide in the [litellm-ecs-deployment repository](https://github.com/BerriAI/litellm-ecs-deployment) to provision the required ECS services, task definitions, and supporting AWS resources.
+
+1. Clone the tutorial repository to review the Terraform modules and variables.
+  ```bash
+  git clone https://github.com/BerriAI/litellm-ecs-deployment.git
+  cd litellm-ecs-deployment
+  ```
+
+2. Initialize and validate the Terraform project before applying it to your chosen workspace/account.
+  ```bash
+  terraform init
+  terraform plan
+  terraform apply
+  ```
+
+3. Once `terraform apply` completes, do `./build.sh` to push the repository on ECR and update the ECS cluster. Use that endpoint (port `4000` by default) for API requests to your LiteLLM proxy.
+
+
+</TabItem>
+
 <TabItem value="AWS EKS" label="AWS EKS - Kubernetes">
 
 ### Kubernetes (AWS EKS)

diff --git a/docs/my-website/release_notes/v1.78.0-stable/index.md b/docs/my-website/release_notes/v1.78.0-stable/index.md
@@ -40,15 +40,15 @@ import TabItem from '@theme/TabItem';
 docker run \
 -e STORE_MODEL_IN_DB=True \
 -p 4000:4000 \
-ghcr.io/berriai/litellm:v1.78.0.rc.1
+ghcr.io/berriai/litellm:v1.78.0.rc.2
 ```
 
 </TabItem>
 
 <TabItem value="pip" label="Pip">
 
 ``` showLineNumbers title="pip install litellm"
-pip install litellm==1.78.0.rc.1
+pip install litellm==1.78.0.rc.2
 ```
 
 </TabItem>

diff --git a/litellm/__init__.py b/litellm/__init__.py
@@ -263,6 +263,7 @@
 ssl_verify: Union[str, bool] = True
 ssl_security_level: Optional[str] = None
 ssl_certificate: Optional[str] = None
+ssl_ecdh_curve: Optional[str] = None  # Set to 'X25519' to disable PQC and improve performance
 disable_streaming_logging: bool = False
 disable_token_counter: bool = False
 disable_add_transform_inline_image_block: bool = False

diff --git a/litellm/llms/custom_httpx/http_handler.py b/litellm/llms/custom_httpx/http_handler.py
@@ -1,6 +1,7 @@
 import asyncio
 import os
 import ssl
+import sys
 import time
 from typing import TYPE_CHECKING, Any, Callable, Dict, List, Mapping, Optional, Union
 
@@ -114,6 +115,28 @@ def get_ssl_configuration(
             # but falls back to widely compatible ones
             custom_ssl_context.set_ciphers(DEFAULT_SSL_CIPHERS)
 
+        # Configure ECDH curve for key exchange (e.g., to disable PQC and improve performance)
+        # Set SSL_ECDH_CURVE env var or litellm.ssl_ecdh_curve to 'X25519' to disable PQC
+        # Common valid curves: X25519, prime256v1, secp384r1, secp521r1
+        ssl_ecdh_curve = os.getenv("SSL_ECDH_CURVE", litellm.ssl_ecdh_curve)
+        if ssl_ecdh_curve and isinstance(ssl_ecdh_curve, str):
+            try:
+                custom_ssl_context.set_ecdh_curve(ssl_ecdh_curve)
+                verbose_logger.debug(f"SSL ECDH curve set to: {ssl_ecdh_curve}")
+            except AttributeError:
+                verbose_logger.warning(
+                    f"SSL ECDH curve configuration not supported. "
+                    f"Python version: {sys.version.split()[0]}, OpenSSL version: {ssl.OPENSSL_VERSION}. "
+                    f"Requested curve: {ssl_ecdh_curve}. Continuing with default curves."
+                )
+            except ValueError as e:
+                # Invalid curve name
+                verbose_logger.warning(
+                    f"Invalid SSL ECDH curve name: '{ssl_ecdh_curve}'. {e}. "
+                    f"Common valid curves: X25519, prime256v1, secp384r1, secp521r1. "
+                    f"Continuing with default curves (including PQC)."
+                )
+
         # Use our custom SSL context instead of the original ssl_verify value
         return custom_ssl_context
 

diff --git a/litellm/llms/ollama/chat/transformation.py b/litellm/llms/ollama/chat/transformation.py
@@ -184,9 +184,12 @@ def map_openai_params(
             ):
                 if value.get("json_schema") and value["json_schema"].get("schema"):
                     optional_params["format"] = value["json_schema"]["schema"]
-            ### FUNCTION CALLING LOGIC ###
             if param == "reasoning_effort" and value is not None:
-                optional_params["think"] = True
+                if model.startswith("gpt-oss"):
+                    optional_params["think"] = value
+                else:
+                    optional_params["think"] = True
+            ### FUNCTION CALLING LOGIC ###
             if param == "tools":
                 ## CHECK IF MODEL SUPPORTS TOOL CALLING ##
                 try:
@@ -281,6 +284,7 @@ def transform_request(
         stream = optional_params.pop("stream", False)
         format = optional_params.pop("format", None)
         keep_alive = optional_params.pop("keep_alive", None)
+        think = optional_params.pop("think", None)
         function_name = optional_params.pop("function_name", None)
         litellm_params["function_name"] = function_name
         tools = optional_params.pop("tools", None)
@@ -344,6 +348,8 @@ def transform_request(
             data["tools"] = tools
         if keep_alive is not None:
             data["keep_alive"] = keep_alive
+        if think is not None:
+            data["think"] = think
 
         return data
 

diff --git a/litellm/llms/ollama/completion/transformation.py b/litellm/llms/ollama/completion/transformation.py
@@ -180,7 +180,10 @@ def map_openai_params(
             elif param == "stop":
                 optional_params["stop"] = value
             elif param == "reasoning_effort" and value is not None:
-                optional_params["think"] = True
+                if model.startswith("gpt-oss"):
+                    optional_params["think"] = value
+                else:
+                    optional_params["think"] = True
             elif param == "response_format" and isinstance(value, dict):
                 if value["type"] == "json_object":
                     optional_params["format"] = "json"
@@ -412,6 +415,7 @@ def transform_request(
         stream = optional_params.pop("stream", False)
         format = optional_params.pop("format", None)
         images = optional_params.pop("images", None)
+        think = optional_params.pop("think", None)
         data = {
             "model": model,
             "prompt": ollama_prompt,
@@ -425,6 +429,8 @@ def transform_request(
             data["images"] = [
                 _convert_image(convert_to_ollama_image(image)) for image in images
             ]
+        if think is not None:
+            data["think"] = think
 
         return data
 

diff --git a/litellm/llms/ollama_chat.py b/litellm/llms/ollama_chat.py
@@ -59,6 +59,7 @@ def get_ollama_response(  # noqa: PLR0915
     stream = optional_params.pop("stream", False)
     format = optional_params.pop("format", None)
     keep_alive = optional_params.pop("keep_alive", None)
+    think = optional_params.pop("think", None)
     function_name = optional_params.pop("function_name", None)
     tools = optional_params.pop("tools", None)
 
@@ -98,6 +99,8 @@ def get_ollama_response(  # noqa: PLR0915
         data["tools"] = tools
     if keep_alive is not None:
         data["keep_alive"] = keep_alive
+    if think is not None:
+        data["think"] = think
     ## LOGGING
     logging_obj.pre_call(
         input=None,

diff --git a/litellm/llms/openrouter/chat/transformation.py b/litellm/llms/openrouter/chat/transformation.py
@@ -147,8 +147,70 @@ def transform_request(
             model, messages, optional_params, litellm_params, headers
         )
         response.update(extra_body)
+
+        # ALWAYS add usage parameter to get cost data from OpenRouter
+        # This ensures cost tracking works for all OpenRouter models
+        if "usage" not in response:
+            response["usage"] = {"include": True}
+
         return response
 
+    def transform_response(
+        self,
+        model: str,
+        raw_response: httpx.Response,
+        model_response: ModelResponse,
+        logging_obj: Any,
+        request_data: dict,
+        messages: List[AllMessageValues],
+        optional_params: dict,
+        litellm_params: dict,
+        encoding: Any,
+        api_key: Optional[str] = None,
+        json_mode: Optional[bool] = None,
+    ) -> ModelResponse:
+        """
+        Transform the response from OpenRouter API.
+
+        Extracts cost information from response headers if available.
+
+        Returns:
+            ModelResponse: The transformed response with cost information.
+        """
+        # Call parent transform_response to get the standard ModelResponse
+        model_response = super().transform_response(
+            model=model,
+            raw_response=raw_response,
+            model_response=model_response,
+            logging_obj=logging_obj,
+            request_data=request_data,
+            messages=messages,
+            optional_params=optional_params,
+            litellm_params=litellm_params,
+            encoding=encoding,
+            api_key=api_key,
+            json_mode=json_mode,
+        )
+
+        # Extract cost from OpenRouter response body
+        # OpenRouter returns cost information in the usage object when usage.include=true
+        try:
+            response_json = raw_response.json()
+            if "usage" in response_json and response_json["usage"]:
+                response_cost = response_json["usage"].get("cost")
+                if response_cost is not None:
+                    # Store cost in hidden params for the cost calculator to use
+                    if not hasattr(model_response, "_hidden_params"):
+                        model_response._hidden_params = {}
+                    if "additional_headers" not in model_response._hidden_params:
+                        model_response._hidden_params["additional_headers"] = {}
+                    model_response._hidden_params["additional_headers"]["llm_provider-x-litellm-response-cost"] = float(response_cost)
+        except Exception:
+            # If we can't extract cost, continue without it - don't fail the response
+            pass
+
+        return model_response
+
     def get_error_class(
         self, error_message: str, status_code: int, headers: Union[dict, httpx.Headers]
     ) -> BaseLLMException: