Skip to content

Conversation

@hanouticelina
Copy link
Contributor

No description provided.

@hanouticelina hanouticelina requested a review from Wauplin April 30, 2025 16:06
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍
Let's not forget a test or two to definitely get this fixed (maybe one with chat_completion and one with text_to_image? hopefully the two main use cases for local TGI and text-to-image in InferenceEndpoint)

@hanouticelina hanouticelina requested a review from Wauplin May 6, 2025 08:37
@hanouticelina hanouticelina marked this pull request as ready for review May 6, 2025 08:38
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Approving though I still have a question about always adding /v1/chat/completions path. Ok to merge "as is" is aligned with what we do/what we want to do

Comment on lines +1096 to +1102
pytest.param(
"https://my-custom-endpoint.com/custom_path",
"model",
"https://my-custom-endpoint.com/custom_path/v1/chat/completions",
"dummy",
id="client_model_is_url",
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is the expected behavior? In the past I've made a difference between InferenceClient(model="https://...").chat_completion and InferenceClient(base_url="https://...").chat_completion but I haven't tested it for a very long time. My reasoning is that if we always add /v1/chat/completions to the URL, we can't provide a URL that does not support it (what if "https://..." is already the URL to the endpoint handling chat_completion?)

but ok to remove this distinction if it helps with clarity / aligns better with JS client

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be mistaken but I'm not sure we had this distinction in the pre-provider implementation of InferenceClient

def _resolve_chat_completion_url(self, model: Optional[str] = None) -> str:
but imo if you pass an URL (either via model or base_url), when you call chat_completion you're calling /chat/completions route by default (same behavior as the OpenAI client), but agree that we cannot customize this route, which is a bit annoying but I'm not sure if it's worth doing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking back. I took a look at #2540 and always adding (/v1)/chat/completions seems indeed the way to go (more consistent). Looks like we merged both behaviors even before the "pre-provider" implementation ^^

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on #3041 (comment), all good to merge! 🎉

@hanouticelina
Copy link
Contributor Author

let's merge! failing tests are unrelated (need to check where it comes from and fix that in another PR).

@hanouticelina hanouticelina merged commit 1e40c62 into main May 6, 2025
21 of 25 checks passed
@hanouticelina hanouticelina deleted the fix-endpoints branch May 6, 2025 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants