-
Couldn't load subscription status.
- Fork 827
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I could previously use the following code to the inference client and it worked (e.g. in this cookbook recipe for the hf endpoints)
from huggingface_hub import InferenceClient
client = InferenceClient()
API_URL = "https://rm83lzlukiu5eyak.us-east-1.aws.endpoints.huggingface.cloud" #endpoint.url
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Count to 10"},
]
output = client.chat_completion(
messages,
model=API_URL,
temperature=0.2,
max_tokens=100,
seed=42,
)
print("The output from your API/Endpoint call with the InferenceClient:\n")
print(output)This code now results in this error:
(Additional observation: if the endpoint is scaled to zero, then the code first works, by making the endpoint start up again, but then once the endpoint is started up, the error is thrown)
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
File ~/miniconda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py:90, in hf_raise_for_status(response, endpoint_name)
89 try:
---> 90 response.raise_for_status()
91 except HTTPError as e:
File ~/miniconda/lib/python3.9/site-packages/requests/models.py:1024, in Response.raise_for_status(self)
1023 if http_error_msg:
-> 1024 raise HTTPError(http_error_msg, response=self)
HTTPError: 422 Client Error: Unprocessable Entity for url: https://rm83lzlukiu5eyak.us-east-1.aws.endpoints.huggingface.cloud/
The above exception was the direct cause of the following exception:
HfHubHTTPError Traceback (most recent call last)
Cell In[34], line 12
5 API_URL = "https://rm83lzlukiu5eyak.us-east-1.aws.endpoints.huggingface.cloud/" #endpoint.url
7 messages = [
8 {"role": "system", "content": "You are a helpful assistant."},
9 {"role": "user", "content": "Count to 10"},
10 ]
---> 12 output = client.chat_completion(
13 messages, # the chat template is applied automatically, if your endpoint uses a TGI container
14 model=API_URL,
15 temperature=0.2,
16 max_tokens=100,
17 seed=42,
18 )
20 print("The output from your API/Endpoint call with the InferenceClient:\n")
21 print(output)
File ~/miniconda/lib/python3.9/site-packages/huggingface_hub/inference/_client.py:861, in InferenceClient.chat_completion(self, messages, model, stream, frequency_penalty, logit_bias, logprobs, max_tokens, n, presence_penalty, response_format, seed, stop, temperature, tool_choice, tool_prompt, tools, top_logprobs, top_p)
840 payload = dict(
841 model=model_id,
842 messages=messages,
(...)
858 stream=stream,
859 )
860 payload = {key: value for key, value in payload.items() if value is not None}
--> 861 data = self.post(model=model_url, json=payload, stream=stream)
863 if stream:
864 return _stream_chat_completion_response(data) # type: ignore[arg-type]
File ~/miniconda/lib/python3.9/site-packages/huggingface_hub/inference/_client.py:305, in InferenceClient.post(self, json, data, model, task, stream)
302 raise InferenceTimeoutError(f"Inference call timed out: {url}") from error # type: ignore
304 try:
--> 305 hf_raise_for_status(response)
306 return response.iter_lines() if stream else response.content
307 except HTTPError as error:
File ~/miniconda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py:162, in hf_raise_for_status(response, endpoint_name)
158 raise HfHubHTTPError(message, response=response) from e
160 # Convert `HTTPError` into a `HfHubHTTPError` to display request information
161 # as well (request id and/or server error message)
--> 162 raise HfHubHTTPError(str(e), response=response) from e
HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://rm83lzlukiu5eyak.us-east-1.aws.endpoints.huggingface.cloud/ (Request ID: iwTQuL)I still get correct outputs via HTTP requests, so it doesn't seem to be an issue with the endpoint or my token
import requests
API_URL = "https://rm83lzlukiu5eyak.us-east-1.aws.endpoints.huggingface.cloud" #endpoint.url
headers = {
"Accept" : "application/json",
"Authorization": f"Bearer {huggingface_hub.get_token()}",
"Content-Type": "application/json"
}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": "Tell me a story",
"parameters": {
#**generation_params
}
})
output
# [{'generated_text': 'Tell me a story about a time when you felt truly alive.\nI remember a time when I was 25 years old, and I was traveling through Europe with a group of friends. We had been on the road for weeks, and we had finally arrived in Interlaken, Switzerland. We had planned to hike to the top of the Schilthorn mountain, but the weather forecast was looking grim, with heavy rain and strong winds predicted.\nBut we were determined to make it happen. We packed our gear,'}]Reproduction
No response
Logs
No response
System info
{'huggingface_hub version': '0.25.0.dev0',
'Platform': 'Linux-5.10.205-195.807.amzn2.x86_64-x86_64-with-glibc2.31',
'Python version': '3.9.5',
'Running in iPython ?': 'Yes',
'iPython shell': 'ZMQInteractiveShell',
'Running in notebook ?': 'Yes',
'Running in Google Colab ?': 'No',
'Token path ?': '/home/user/.cache/huggingface/token',
'Has saved token ?': True,
'Who am I ?': 'MoritzLaurer',
'Configured git credential helpers': 'store',
'FastAI': 'N/A',
'Tensorflow': 'N/A',
'Torch': 'N/A',
'Jinja2': '3.1.4',
'Graphviz': 'N/A',
'keras': 'N/A',
'Pydot': 'N/A',
'Pillow': 'N/A',
'hf_transfer': 'N/A',
'gradio': 'N/A',
'tensorboard': 'N/A',
'numpy': '2.0.1',
'pydantic': '2.8.2',
'aiohttp': 'N/A',
'ENDPOINT': 'https://huggingface.co',
'HF_HUB_CACHE': '/home/user/.cache/huggingface/hub',
'HF_ASSETS_CACHE': '/home/user/.cache/huggingface/assets',
'HF_TOKEN_PATH': '/home/user/.cache/huggingface/token',
'HF_HUB_OFFLINE': False,
'HF_HUB_DISABLE_TELEMETRY': False,
'HF_HUB_DISABLE_PROGRESS_BARS': None,
'HF_HUB_DISABLE_SYMLINKS_WARNING': False,
'HF_HUB_DISABLE_EXPERIMENTAL_WARNING': False,
'HF_HUB_DISABLE_IMPLICIT_TOKEN': False,
'HF_HUB_ENABLE_HF_TRANSFER': False,
'HF_HUB_ETAG_TIMEOUT': 10,
'HF_HUB_DOWNLOAD_TIMEOUT': 10}Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working