Release v0.3.0 · llamastack/llama-stack

Highlights

Stable OpenAI-Compatible APIs
Llama Stack now separates APIs into stable (/v1/), experimental (/v1alpha/ and /v1beta/) and deprecated (deprecated = True.)
extra_body/metadata support for APIs which support extra functionality compared to the OpenAI implementation
Documentation overhaul: Migration to Docusaurus, modern formatting, and improved API docs

What's Changed

feat(internal): add image_url download feature to OpenAIMixin by @mattf in #3516
chore(api): remove batch inference by @mattf in #3261
chore(apis): unpublish deprecated /v1/inference apis by @mattf in #3297
chore: recordings for fireworks (inference + openai) by @mattf in #3573
chore: remove extra logging by @ehhuang in #3574
chore: MANIFEST maintenance by @leseb in #3454
feat: Add items and title to ToolParameter/ToolParamDefinition by @TamiTakamiya in #3003
feat(ci): use @next branch from llama-stack-client by @ashwinb in #3576
chore(ui-deps): bump shiki from 1.29.2 to 3.13.0 in /llama_stack/ui by @dependabot[bot] in #3585
chore(ui-deps): bump tw-animate-css from 1.2.9 to 1.4.0 in /llama_stack/ui by @dependabot[bot] in #3583
chore(github-deps): bump actions/cache from 4.2.4 to 4.3.0 by @dependabot[bot] in #3577
chore: skip nvidia datastore tests when nvidia datastore is not enabled by @mattf in #3590
chore: introduce write queue for response_store by @ehhuang in #3497
revert: feat(ci): use @next branch from llama-stack-client by @ashwinb in #3593
fix: adding mime type of application/json support by @wukaixingxp in #3452
chore(api): remove deprecated embeddings impls by @mattf in #3301
feat(api): level inference/rerank and remove experimental by @cdoern in #3565
chore: skip safety tests when shield not available by @mattf in #3592
feat: update eval runner to use openai endpoints by @mattf in #3588
docs: update image paths by @reluctantfuturist in #3599
fix: remove inference.completion from docs by @mattf in #3589
fix: Remove deprecated user param in OpenAIResponseObject by @slekkala1 in #3596
fix: ensure usage is requested if telemetry is enabled by @mhdawson in #3571
feat(openai_movement): Change URL structures to kill /openai/v1 (part 1) by @ashwinb in #3587
feat(files): fix expires_after API shape by @ashwinb in #3604
feat(openai_movement)!: Change URL structures to kill /openai/v1 (part 2) by @ashwinb in #3605
fix: mcp tool with array type should include items by @ehhuang in #3602
feat: add llamastack + CrewAI integration example notebook by @wukaixingxp in #3275
chore: unpublish /inference/chat-completion by @mattf in #3609
feat: use /v1/chat/completions for safety model inference by @mattf in #3591
feat(api): level /agents as v1alpha by @cdoern in #3610
feat(api): Add Vector Store File batches api stub by @slekkala1 in #3615
fix(expires_after): make sure multipart/form-data is properly parsed by @ashwinb in #3612
docs: frontpage update by @reluctantfuturist in #3620
docs: update safety notebook by @reluctantfuturist in #3617
feat: add support for require_approval argument when creating response by @grs in #3608
fix: don't pass default response format in Responses by @ehhuang in #3614
fix(logging): disable console telemetry sink by default by @ashwinb in #3623
fix: Ensure that tool calls with no arguments get handled correctly by @jaideepr97 in #3560
chore: use openai_chat_completion for llm as a judge scoring by @mattf in #3635
chore: remove /v1/inference/completion and implementations by @mattf in #3622
feat(api): implement v1beta leveling, and additional alpha by @cdoern in #3594
feat(conformance): skip test if breaking change is ack by @cdoern in #3619
fix: log level by @ehhuang in #3637
docs: update API conformance test by @reluctantfuturist in #3631
docs: api separation by @reluctantfuturist in #3630
docs: adding supplementary markdown content to API specs by @reluctantfuturist in #3632
chore: add provider-data-api-key support to openaimixin by @mattf in #3639
chore: Remove debug logging from telemetry adapter by @ehhuang in #3643
docs: fix broken links by @reluctantfuturist in #3647
docs: add favicon and mobile styling by @reluctantfuturist in #3650
docs: fix more broken links by @reluctantfuturist in #3649
docs: Fix Dell distro documentation code snippets by @ConnorHack in #3640
refactor(agents): migrate to OpenAI chat completions API by @aakankshaduggal in #3323
fix: re-enable conformance skipping ability by @cdoern in #3651
chore!: add double routes for v1/openai/v1 by @leseb in #3636
docs: Update docs navbar config by @kelbrown20 in #3653
docs: API spec generation for Stainless by @reluctantfuturist in #3655
chore: fix agents tests for non-ollama providers, provide max_tokens by @mattf in #3657
chore: fix/add logging categories by @ehhuang in #3658
chore: fix precommit by @ehhuang in #3663
feat(tools)!: substantial clean up of "Tool" related datatypes by @ashwinb in #3627
fix: responses <> chat completion input conversion by @ehhuang in #3645
chore: OpenAIMixin implements ModelsProtocolPrivate by @mattf in #3662
feat: auto-detect Console width by @rhdedgar in #3327
feat: implement keyword and hybrid search for Weaviate provider by @ChristianZaccaria in #3264
fix(docs): Correct indentation in documented example for access_policy by @anastasds in #3652
chore: remove deprecated inference.chat_completion implementations by @mattf in #3654
docs!: adjust external provider docs by @cdoern in #3484
feat: Add OpenAI Conversations API by @franciscojavierarceo in #3429
chore: use remoteinferenceproviderconfig for remote inference providers by @mattf in #3668
docs: update OG image by @reluctantfuturist in #3669
feat: add comment-triggered pre-commit bot for PRs by @ashwinb in #3672
feat(api): add extra_body parameter support with shields example by @ashwinb in #3670
chore: Add weaviate client to unit group in pyproject.toml and uv.lock by @franciscojavierarceo in #3675
chore: update CODEOWNERS by @reluctantfuturist in #3613
chore(tests): normalize recording IDs and timestamps to reduce git diff noise by @ashwinb in #3676
chore: fix setup_telemetry script by @ehhuang in #3680
docs: Update links in README for quick start and documentation by @seyeong-han in #3678
feat(tests): implement test isolation for inference recordings by @ashwinb in #3681
chore: inference=remote::llama-openai-compat does not support /v1/completion by @mattf in #3683
chore(ui-deps): bump next from 15.5.3 to 15.5.4 in /llama_stack/ui by @dependabot[bot] in #3694
chore(ui-deps): bump react-dom and @types/react-dom in /llama_stack/ui by @dependabot[bot] in #3693
chore(python-deps): bump requests from 2.32.4 to 2.32.5 by @dependabot[bot] in #3691
chore(github-deps): bump astral-sh/setup-uv from 6.7.0 to 6.8.0 by @dependabot[bot] in #3686
chore(github-deps): bump actions/github-script from 7.0.1 to 8.0.0 by @dependabot[bot] in #3685
chore(python-deps): bump pandas from 2.3.1 to 2.3.3 by @dependabot[bot] in #3689
chore: use uvicorn to start llama stack server everywhere by @ehhuang in #3625
feat: allow for multiple external provider specs by @cdoern in #3341
chore: give OpenAIMixin subcalsses a change to list models without leaking _model_cache details by @mattf in #3682
chore: turn OpenAIMixin into a pydantic.BaseModel by @mattf in #3671
chore: remove vLLM inference adapter's custom list_models by @mattf in #3703
chore: disable openai_embeddings on inference=remote::llama-openai-compat by @mattf in #3704
chore: remove together inference adapter's custom check_model_availability by @mattf in #3702
docs: API docstrings cleanup for better documentation rendering by @reluctantfuturist in #3661
chore: logger category fix by @ehhuang in #3706
chore: fix closing error by @ehhuang in #3709
feat(api): Add vector store file batches api by @slekkala1 in #3642
chore: remove dead code by @ehhuang in #3713
feat: enable Runpod inference adapter by @justinwlin in #3707
fix: update pyproject.toml dependencies for vector processing by @skamenan7 in #3555
fix: refresh log should be debug by @cdoern in #3720
feat: add refresh_models support to inference adapters (default: false) by @mattf in #3719
fix: make telemetry optional for agents by @cdoern in #3705
feat: Enabling Annotations in Responses by @franciscojavierarceo in #3698
fix: improve model availability checks: Allows use of unavailable models on startup by @akram in #3717
chore: fix flaky unit test and add proper shutdown for file batches by @slekkala1 in #3725
fix(scripts): select container runtime for telemetry by @iamemilio in #3727
fix: fix nvidia provider by @wukaixingxp in #3716
chore: Revert "fix: fix nvidia provider (#3716)" by @ehhuang in #3730
chore: remove dead code by @slekkala1 in #3729
chore!: remove --env from llama stack run by @ehhuang in #3711
chore: require valid logging category by @ehhuang in #3712
fix: Raising an error message to the user when registering an existing provider. by @omaryashraf5 in #3624
chore(github-deps): bump actions/stale from 10.0.0 to 10.1.0 by @dependabot[bot] in #3684
fix: Update watsonx.ai provider to use LiteLLM mixin and list all models by @jwm4 in #3674
fix(responses): fix regression in support for mcp tool require_approval argument by @grs in #3731
fix(tests): ensure test isolation in server mode by @ashwinb in #3737
chore: Removing Weaviate, PGVector, and Milvus from unit tests by @franciscojavierarceo in #3742
feat(tests): add --collect-only option to integration test script by @ashwinb in #3745
chore: print integration tests command by @ehhuang in #3747
chore: revert "fix: Raising an error message to the user when registering an existing provider." by @leseb in #3750
fix: add traces for tool calls and mcp tool listing by @grs in #3722
feat(tests): make inference_recorder into api_recorder (include tool_invoke) by @ashwinb in #3403
fix(tests): remove chroma and qdrant from vector io unit tests by @ashwinb in #3759
fix(testing): improve api_recorder error messages for missing recordings by @ashwinb in #3760
chore!: remove model mgmt from CLI for Hugging Face CLI by @leseb in #3700
feat: make object registration idempotent by @mattf in #3752
fix(inference): propagate 401/403 errors from remote providers by @ashwinb in #3762
fix: update dangling references to llama download command by @ashwinb in #3763
feat(responses): add usage types to inference and responses APIs by @ashwinb in #3764
fix: allow skipping model availability check for vLLM by @akram in #3739
feat(responses)!: add in_progress, failed, content part events by @ashwinb in #3765
fix: update normalize to search all recordings dirs by @derekhiggins in #3767
feat: use SecretStr for inference provider auth credentials by @mattf in #3724
feat: reuse previous mcp tool listings where possible by @grs in #3710
fix(mypy): fix wrong attribute access by @ashwinb in #3770
fix(ci): remove responses from CI for now by @ashwinb in #3773
feat: Add support for Conversations in Responses API by @franciscojavierarceo in #3743
feat(responses): implement usage tracking in streaming responses by @ashwinb in #3771
feat: Add /v1/embeddings endpoint to batches API by @varshaprasad96 in #3384
fix(auth): allow unauthenticated access to health and version endpoints by @derekhiggins in #3736
chore: refactor (chat)completions endpoints to use shared params struct by @ehhuang in #3761
feat(api)!: BREAKING CHANGE: support passing extra_body through to providers by @ehhuang in #3777
chore!: BREAKING CHANGE removing VectorDB APIs by @franciscojavierarceo in #3774
chore(github-deps): bump astral-sh/setup-uv from 6.8.0 to 7.0.0 by @dependabot[bot] in #3782
chore(python-deps): bump psycopg2-binary from 2.9.10 to 2.9.11 by @dependabot[bot] in #3785
chore(python-deps): bump fire from 0.7.0 to 0.7.1 by @dependabot[bot] in #3787
feat(responses)!: add reasoning and annotation added events by @ashwinb in #3793
chore(python-deps): bump ollama from 0.5.1 to 0.6.0 by @dependabot[bot] in #3786
chore(python-deps): bump blobfile from 3.0.0 to 3.1.0 by @dependabot[bot] in #3784
chore(python-deps): bump black from 25.1.0 to 25.9.0 by @dependabot[bot] in #3783
chore(ui-deps): bump @types/react from 19.2.0 to 19.2.2 in /llama_stack/ui by @dependabot[bot] in #3790
chore(ui-deps): bump @types/react-dom from 19.2.0 to 19.2.1 in /llama_stack/ui by @dependabot[bot] in #3789
chore(ui-deps): bump eslint from 9.26.0 to 9.37.0 in /llama_stack/ui by @dependabot[bot] in #3791
chore(ui-deps): bump framer-motion from 12.23.12 to 12.23.24 in /llama_stack/ui by @dependabot[bot] in #3792
chore(ui-deps): bump lucide-react from 0.542.0 to 0.545.0 in /llama_stack/ui by @dependabot[bot] in #3788
chore!: Safety api refactoring to use OpenAIMessageParam by @slekkala1 in #3796
feat(api)!: support extra_body to embeddings and vector_stores APIs by @ashwinb in #3794
feat: Allow :memory: for kvstore by @raghotham in #3696
fix: record job checking wrong directory by @derekhiggins in #3799
chore: Auto-detect Provider ID when only 1 Vector Store Provider avai… by @franciscojavierarceo in #3802
fix: replace python-jose with PyJWT for JWT handling by @leseb in #3756
fix: Fixed WatsonX remote inference provider by @are-ces in #3801
refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack by @r3v5 in #3183
docs: Update CONTRIBUTING: py 3.12 and pre-commit==4.3.0 by @jwm4 in #3807
chore(api)!: BREAKING CHANGE: remove ALL telemetry APIs by @ehhuang in #3740
refactor: use extra_body to pass in input_type params for asymmetric embedding models for NVIDIA Inference Provider by @jiayin-nvidia in #3804
feat: Enable setting a default embedding model in the stack by @franciscojavierarceo in #3803
chore: Support embedding params from metadata for Vector Store by @slekkala1 in #3811
feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys by @jperezdealgaba in #3813
feat(responses)!: improve responses + conversations implementations by @ashwinb in #3810
fix(vector-io): handle missing document_id in insert_chunks by @skamenan7 in #3521
chore: mark recordings as generated files by @ehhuang in #3816
fix(responses): fix subtle bugs in non-function tool calling by @ashwinb in #3817
chore!: BREAKING CHANGE: remove sqlite from telemetry config by @ehhuang in #3808
fix(responses): use conversation items when no stored messages exist by @ashwinb in #3819
feat: Add responses and safety impl extra_body by @slekkala1 in #3781
fix(responses): fixes, re-record tests by @ashwinb in #3820
fix(models)!: always prefix models with provider_id when registering by @ashwinb in #3822
chore: remove test_cases/openai/responses.json by @derekhiggins in #3823
chore: update agent call by @leseb in #3824
fix(tests): reduce some test noise by @ashwinb in #3825
fix(openai_mixin): no yelling for model listing if API keys are not provided by @ashwinb in #3826
docs: Document known limitations of Responses by @jwm4 in #3776
fix: test id not being set in headers by @slekkala1 in #3827
chore!: remove telemetry API usage by @cdoern in #3815
chore: distrogen enables telemetry by default by @ehhuang in #3828
fix(telemetry): remove dependency on old telemetry config by @ehhuang in #3830
feat(ci): add support for docker:distro in tests by @ashwinb in #3832
fix(perf): make batches tests finish 30x faster by @ashwinb in #3834
feat(ci): enable docker based server tests by @ashwinb in #3833
chore: update API leveling docs with deprecation flag by @reluctantfuturist in #3837
docs: update docstrings for better formatting by @reluctantfuturist in #3838
test(telemetry): Telemetry Tests by @iamemilio in #3805
refactor(build): rework CLI commands and build process (1/2) by @cdoern in #2974
chore: add telemetry setup to install.sh by @ehhuang in #3821
chore(ui-deps): bump eslint-config-next from 15.5.2 to 15.5.6 in /llama_stack/ui by @dependabot[bot] in #3849
chore(ui-deps): bump jest-environment-jsdom from 30.1.2 to 30.2.0 in /llama_stack/ui by @dependabot[bot] in #3852
chore(ui-deps): bump jest and @types/jest in /llama_stack/ui by @dependabot[bot] in #3853
docs: Documentation update for NVIDIA Inference Provider by @jiayin-nvidia in #3840
docs: fix sidebar of Detailed Tutorial by @cdoern in #3856
chore: use dockerfile for building containers by @ehhuang in #3839
chore: update doc by @ehhuang in #3857
chore: disable telemetry if otel endpoint isn't set by @ehhuang in #3859
chore(python-deps): bump ruff from 0.9.10 to 0.14.1 by @dependabot[bot] in #3846
chore(python-deps): bump sqlalchemy from 2.0.41 to 2.0.44 by @dependabot[bot] in #3848
fix: nested claims mapping in OAuth2 token validation by @derekhiggins in #3814
feat: Add instructions parameter in response object by @s-akhtar-baig in #3741
feat(stores)!: use backend storage references instead of configs by @ashwinb in #3697
chore: Updating how default embedding model is set in stack by @franciscojavierarceo in #3818
feat(stainless): add stainless source of truth config by @ashwinb in #3860
chore(yaml)!: move registered resources to a sub-key by @ashwinb in #3861
chore: install client first by @ehhuang in #3862
chore(github-deps): bump actions/checkout from 4.2.2 to 5.0.0 by @dependabot[bot] in #3841
chore(github-deps): bump astral-sh/setup-uv from 7.0.0 to 7.1.0 by @dependabot[bot] in #3842
chore(github-deps): bump actions/setup-node from 5.0.0 to 6.0.0 by @dependabot[bot] in #3843
chore: remove dead code by @ehhuang in #3863
chore(python-deps): bump weaviate-client from 4.16.9 to 4.17.0 by @dependabot[bot] in #3844
chore(python-deps): bump fastapi from 0.116.1 to 0.119.0 by @dependabot[bot] in #3845
chore(ui-deps): bump @tailwindcss/postcss from 4.1.6 to 4.1.14 in /llama_stack/ui by @dependabot[bot] in #3850
chore(ui-deps): bump @types/node from 24.3.0 to 24.8.1 in /llama_stack/ui by @dependabot[bot] in #3851
chore: skip shutdown if otel_endpoint is not set by @ehhuang in #3865
chore: fix main by @ehhuang in #3868
chore: migrate stack build by @ehhuang in #3867
chore: add beta group to stainless by @cdoern in #3866
chore: remove build.py by @ehhuang in #3869
chore(cleanup)!: kill vector_db references as far as possible by @ashwinb in #3864
fix(ci): improve workflow logging and bot notifications by @ashwinb in #3872
chore(cleanup)!: remove tool_runtime.rag_tool by @ashwinb in #3871
fix(ci): dump server/container logs when tests fail by @ashwinb in #3873
fix(logging): move module-level initialization to explicit setup calls by @ashwinb in #3874
chore: update getting_started by @ehhuang in #3875
revert: "chore(cleanup)!: remove tool_runtime.rag_tool" by @ashwinb in #3877
chore: update quick_start by @ehhuang in #3878
fix: fix segfault in load model by @slekkala1 in #3879
docs: fix the building distro file by @reluctantfuturist in #3880
fix: remove consistency checks by @slekkala1 in #3881

New Contributors

@TamiTakamiya made their first contribution in #3003
@jaideepr97 made their first contribution in #3560
@anastasds made their first contribution in #3652
@seyeong-han made their first contribution in #3678
@justinwlin made their first contribution in #3707
@iamemilio made their first contribution in #3727
@jperezdealgaba made their first contribution in #3813
@s-akhtar-baig made their first contribution in #3741

Full Changelog: v0.2.23...v0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

v0.3.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

New Contributors

Contributors

Uh oh!