This guide walks you through running a Llama Stack server locally using Ollama and Podman.
Install the following tools:
Verify tools:
podman --version
python3 --version
pip --version
ollama --versionNote: pip might become available only after you setup your venv (see below). Also, docker works as well as podman.
Use the following command to launch the Ollama model:
ollama run llama3.2:3b-instruct-fp16 --keepalive 60mNote: This will keep the model in memory for 60 minutes.
Launch a new terminal window to set the necessary environment variables:
export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
export LLAMA_STACK_PORT=8321Pull the Docker image:
podman pull docker.io/llamastack/distribution-ollamaCreate a local directory for persistent data:
mkdir -p ~/.llamaRun the container:
podman run -it \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env OLLAMA_URL=http://host.containers.internal:11434 \
llamastack/distribution-ollama \
--port $LLAMA_STACK_PORTOptional: Use a custom network:
podman network create llama-net
podman run --privileged --network llama-net -it \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
llamastack/distribution-ollama \
--port $LLAMA_STACK_PORTVerify the container is running:
podman psSwitch over to the frontend directory
cd frontendCreate a Python venv using your Python3.x executable
python3.11 -m venv .venvActivate the virtual environment:
source .venv/bin/activate # macOS/Linux
# Windows:
# llama-stack-demo\Scripts\activatepip install -r requirements.txtCheck installation:
pip show llama-stack-clientOutput
Name: llama_stack_client
Version: 0.2.5
Summary: The official Python library for the llama-stack-client API
Home-page: https://github.com/meta-llama/llama-stack-client-python
Author:
Author-email: Llama Stack Client <dev-feedback@llama-stack-client.com>
License:
Location: /Users/bsutter/my-projects/redhat/RAG-Blueprint/frontend/.venv/lib/python3.11/site-packages
Requires: anyio, click, distro, httpx, pandas, prompt-toolkit, pyaml, pydantic, rich, sniffio, termcolor, tqdm, typing-extensions
Required-by: llama_stack
Point the client to the local Llama Stack server:
llama-stack-client configure --endpoint http://localhost:$LLAMA_STACK_PORTHit enter as there is no API key for ollama and this container based Llama Stack server
> Enter the API key (leave empty if no key is needed):
List models:
llama-stack-client models listOutput
Available Models
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ model_type ┃ identifier ┃ provider_resource_id ┃ metadata ┃ provider_id ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ embedding │ all-MiniLM-L6-v2 │ all-minilm:latest │ {'embedding_dimension': 384.0} │ ollama │
├───────────────────┼────────────────────────────────────────────────────┼─────────────────────────────────────────┼─────────────────────────────────────────────────┼───────────────────┤
│ llm │ meta-llama/Llama-3.2-3B-Instruct │ llama3.2:3b-instruct-fp16 │ │ ollama │
└───────────────────┴────────────────────────────────────────────────────┴─────────────────────────────────────────┴─────────────────────────────────────────────────┴───────────────────┘
Total models: 2
Check if Podman is running:
podman psActivate your virtual environment:
source .venv/bin/activateReinstall the client if needed:
pip uninstall llama-stack-client
pip install llama-stack-clientTest the client in Python:
python -c "from llama_stack_client import LlamaStackClient; print(LlamaStackClient)"cd llama_stack/distribution/ui/streamlit run app.py