bug: issue getting started with OpenLLM

### Describe the bug

Found OpenLLM recently while researching abstractions for local model execution and abstraction. The [OpenLLM repo](https://github.com/bentoml/OpenLLM) stood out. I may have initially misunderstood the options for using the library. After reading the README for the repository, being able to use an open source model via docker was enticing. Although, after reading the description for the repository I am left wondering if it is possible to use the library without some form of bentoml.

### Questions:
_**1. Did I misunderstand the instructions or have a misunderstanding of the tool❓**_
_**2. Does OpenLLM require some form of bentoml cloud❓**_

What I found when following the **Getting Started** section of the documentation is, the execution does not give the same output as listed. Details about the execution can be found in the **to reproduce** section of this issue.

### To reproduce

Steps followed:

1. Clean ~/.openllm configuration folder
2. Create new folder for project
3. Initialize folder with uv with python 3.12.8
4. Add openllm to project
5. Execute openllm hello
6. Output describes updating default and nightly, but does not seem to continue as expected

### Steps
___

**1. Clean ~/.openllm configuration folder**
```bash
rm -rf ~/.openllm
```

**2 Create new folder for project**
```bash
# ~/local-gpt
mkdir openllm-api
cd openllm-api
```

**3. Initialize folder with uv with python 3.12.8**
```bash
# ~/local-gpt/openllm-api
uv init -p 3.12
```

Initialized project `openllm-api`

**4. Add openllm to project**

```bash
# ~/local-gpt/openllm-api
uv add openllm
```

Using CPython 3.12.8
Creating virtual environment at: .venv
Resolved 98 packages in 1.36s
Prepared 94 packages in 16.38s
Installed 94 packages in 244ms
 + a2wsgi==1.10.8
 + aiohappyeyeballs==2.6.1
 + aiohttp==3.11.18
 + aiosignal==1.3.2
 + aiosqlite==0.21.0
 + annotated-types==0.7.0
 + anyio==4.9.0
 + appdirs==1.4.4
 + asgiref==3.8.1
 + attrs==25.3.0
 + bentoml==1.4.8
 + cattrs==23.1.2
 + certifi==2025.4.26
 + charset-normalizer==3.4.2
 + click==8.1.8
 + click-option-group==0.5.7
 + cloudpickle==3.1.1
 + deprecated==1.2.18
 + distro==1.9.0
 + dulwich==0.22.8
 + filelock==3.18.0
 + frozenlist==1.6.0
 + fs==2.4.16
 + fsspec==2025.3.2
 + h11==0.16.0
 + hf-xet==1.1.0
 + httpcore==1.0.9
 + httpx==0.28.1
 + httpx-ws==0.7.2
 + huggingface-hub==0.30.2
 + idna==3.10
 + importlib-metadata==8.6.1
 + jinja2==3.1.6
 + jiter==0.9.0
 + kantoku==0.18.3
 + markdown-it-py==3.0.0
 + markupsafe==3.0.2
 + mdurl==0.1.2
 + multidict==6.4.3
 + numpy==2.2.5
 + nvidia-ml-py==12.570.86
 + openai==1.73.0
 + openllm==0.6.30
 + opentelemetry-api==1.32.1
 + opentelemetry-instrumentation==0.53b1
 + opentelemetry-instrumentation-aiohttp-client==0.53b1
 + opentelemetry-instrumentation-asgi==0.53b1
 + opentelemetry-sdk==1.32.1
 + opentelemetry-semantic-conventions==0.53b1
 + opentelemetry-util-http==0.53b1
 + packaging==25.0
 + pathspec==0.12.1
 + pip-requirements-parser==32.0.1
 + prometheus-client==0.21.1
 + prompt-toolkit==3.0.51
 + propcache==0.3.1
 + psutil==7.0.0
 + pyaml==25.1.0
 + pydantic==2.11.4
 + pydantic-core==2.33.2
 + pygments==2.19.1
 + pyparsing==3.2.3
 + python-dateutil==2.9.0.post0
 + python-dotenv==1.1.0
 + python-json-logger==3.3.0
 + python-multipart==0.0.20
 + pyyaml==6.0.2
 + pyzmq==26.4.0
 + questionary==2.1.0
 + requests==2.32.3
 + rich==14.0.0
 + schema==0.7.7
 + setuptools==80.3.0
 + shellingham==1.5.4
 + simple-di==0.1.5
 + six==1.17.0
 + sniffio==1.3.1
 + starlette==0.46.2
 + tabulate==0.9.0
 + tomli-w==1.2.0
 + tornado==6.4.2
 + tqdm==4.67.1
 + typer==0.15.3
 + typing-extensions==4.13.2
 + typing-inspection==0.4.0
 + urllib3==2.4.0
 + uv==0.7.2
 + uvicorn==0.34.2
 + watchfiles==1.0.5
 + wcwidth==0.2.13
 + wrapt==1.17.2
 + wsproto==1.2.0
 + yarl==1.20.0
 + zipp==3.21.0

**5. Execute openllm hello**
```bash
# ~/local-gpt/openllm-api
uv run openllm hello
```

updating repo default
updating repo nightly


### Logs

```shell
Output provided above in "to reproduce" section
```

### Environment

Using local environment, not cloud. All model execution is directly through ollama, but am interrested in llama.cpp as well. Does openLLM require bentoml?

**Platform:** ollama

### System information (Optional)

### Enviornment
___
**platform:** Windows 11 / WSL2 / Docker Compose (all execution done via WSL2)
**memory:** 128GB Ram
**gpu:** RTX 3080
**Shell:** bash
**Package Manager:** uv
**Python Version:** CPython 3.12.8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: issue getting started with OpenLLM #1186

Describe the bug

Questions:

To reproduce

Steps

Logs

Environment

System information (Optional)

Enviornment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: issue getting started with OpenLLM #1186

Description

Describe the bug

Questions:

To reproduce

Steps

Logs

Environment

System information (Optional)

Enviornment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions