Skip to content

Can't use vLLM endpoint #285

@brandonbiggs

Description

@brandonbiggs

Hi. I'm trying to use AI tookit in VSCode. I can't figure out how to use my local models. I'm using vllm, not ollama, locally. My config is pretty straight forward.

version: v0.1
providers:
  - name: Custom
    models:
      - name: gpt-oss-20b
        chat_completion_url: https://example.com/v1
        headers:
          Authorization: '{"Authorization: Bearer example_token"}'
        api_model_name: gpt-oss-20b

When I call this in the playground, I see the following error:

errorObject = {"innerError":{"status":404,"headers":{},"requestID":null}} 
2025-10-14 09:19:34.973 [error] Unable to call the gpt-oss-20b inference endpoint due to 404.  Please check if the input or configuration is correct. 404 status code (no body) 

This model works and is accessible, I just can't figure out how to hook it up to AI toolkit. Any ideas would be helpful. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs attentionThe issue needs contributor's attention

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions