Highlights
- Stable OpenAI-Compatible APIs
- Llama Stack now separates APIs into stable (/v1/), experimental (/v1alpha/ and /v1beta/) and deprecated (deprecated = True.)
- extra_body/metadata support for APIs which support extra functionality compared to the OpenAI implementation
- Documentation overhaul: Migration to Docusaurus, modern formatting, and improved API docs
What's Changed
- feat(internal): add image_url download feature to OpenAIMixin by @mattf in #3516
- chore(api): remove batch inference by @mattf in #3261
- chore(apis): unpublish deprecated /v1/inference apis by @mattf in #3297
- chore: recordings for fireworks (inference + openai) by @mattf in #3573
- chore: remove extra logging by @ehhuang in #3574
- chore: MANIFEST maintenance by @leseb in #3454
- feat: Add items and title to ToolParameter/ToolParamDefinition by @TamiTakamiya in #3003
- feat(ci): use @next branch from llama-stack-client by @ashwinb in #3576
- chore(ui-deps): bump shiki from 1.29.2 to 3.13.0 in /llama_stack/ui by @dependabot[bot] in #3585
- chore(ui-deps): bump tw-animate-css from 1.2.9 to 1.4.0 in /llama_stack/ui by @dependabot[bot] in #3583
- chore(github-deps): bump actions/cache from 4.2.4 to 4.3.0 by @dependabot[bot] in #3577
- chore: skip nvidia datastore tests when nvidia datastore is not enabled by @mattf in #3590
- chore: introduce write queue for response_store by @ehhuang in #3497
- revert: feat(ci): use @next branch from llama-stack-client by @ashwinb in #3593
- fix: adding mime type of application/json support by @wukaixingxp in #3452
- chore(api): remove deprecated embeddings impls by @mattf in #3301
- feat(api): level inference/rerank and remove experimental by @cdoern in #3565
- chore: skip safety tests when shield not available by @mattf in #3592
- feat: update eval runner to use openai endpoints by @mattf in #3588
- docs: update image paths by @reluctantfuturist in #3599
- fix: remove inference.completion from docs by @mattf in #3589
- fix: Remove deprecated user param in OpenAIResponseObject by @slekkala1 in #3596
- fix: ensure usage is requested if telemetry is enabled by @mhdawson in #3571
- feat(openai_movement): Change URL structures to kill /openai/v1 (part 1) by @ashwinb in #3587
- feat(files): fix expires_after API shape by @ashwinb in #3604
- feat(openai_movement)!: Change URL structures to kill /openai/v1 (part 2) by @ashwinb in #3605
- fix: mcp tool with array type should include items by @ehhuang in #3602
- feat: add llamastack + CrewAI integration example notebook by @wukaixingxp in #3275
- chore: unpublish /inference/chat-completion by @mattf in #3609
- feat: use /v1/chat/completions for safety model inference by @mattf in #3591
- feat(api): level /agents as
v1alphaby @cdoern in #3610 - feat(api): Add Vector Store File batches api stub by @slekkala1 in #3615
- fix(expires_after): make sure multipart/form-data is properly parsed by @ashwinb in #3612
- docs: frontpage update by @reluctantfuturist in #3620
- docs: update safety notebook by @reluctantfuturist in #3617
- feat: add support for require_approval argument when creating response by @grs in #3608
- fix: don't pass default response format in Responses by @ehhuang in #3614
- fix(logging): disable console telemetry sink by default by @ashwinb in #3623
- fix: Ensure that tool calls with no arguments get handled correctly by @jaideepr97 in #3560
- chore: use openai_chat_completion for llm as a judge scoring by @mattf in #3635
- chore: remove /v1/inference/completion and implementations by @mattf in #3622
- feat(api): implement v1beta leveling, and additional alpha by @cdoern in #3594
- feat(conformance): skip test if breaking change is ack by @cdoern in #3619
- fix: log level by @ehhuang in #3637
- docs: update API conformance test by @reluctantfuturist in #3631
- docs: api separation by @reluctantfuturist in #3630
- docs: adding supplementary markdown content to API specs by @reluctantfuturist in #3632
- chore: add provider-data-api-key support to openaimixin by @mattf in #3639
- chore: Remove debug logging from telemetry adapter by @ehhuang in #3643
- docs: fix broken links by @reluctantfuturist in #3647
- docs: add favicon and mobile styling by @reluctantfuturist in #3650
- docs: fix more broken links by @reluctantfuturist in #3649
- docs: Fix Dell distro documentation code snippets by @ConnorHack in #3640
- refactor(agents): migrate to OpenAI chat completions API by @aakankshaduggal in #3323
- fix: re-enable conformance skipping ability by @cdoern in #3651
- chore!: add double routes for v1/openai/v1 by @leseb in #3636
- docs: Update docs navbar config by @kelbrown20 in #3653
- docs: API spec generation for Stainless by @reluctantfuturist in #3655
- chore: fix agents tests for non-ollama providers, provide max_tokens by @mattf in #3657
- chore: fix/add logging categories by @ehhuang in #3658
- chore: fix precommit by @ehhuang in #3663
- feat(tools)!: substantial clean up of "Tool" related datatypes by @ashwinb in #3627
- fix: responses <> chat completion input conversion by @ehhuang in #3645
- chore: OpenAIMixin implements ModelsProtocolPrivate by @mattf in #3662
- feat: auto-detect Console width by @rhdedgar in #3327
- feat: implement keyword and hybrid search for Weaviate provider by @ChristianZaccaria in #3264
- fix(docs): Correct indentation in documented example for access_policy by @anastasds in #3652
- chore: remove deprecated inference.chat_completion implementations by @mattf in #3654
- docs!: adjust external provider docs by @cdoern in #3484
- feat: Add OpenAI Conversations API by @franciscojavierarceo in #3429
- chore: use remoteinferenceproviderconfig for remote inference providers by @mattf in #3668
- docs: update OG image by @reluctantfuturist in #3669
- feat: add comment-triggered pre-commit bot for PRs by @ashwinb in #3672
- feat(api): add extra_body parameter support with shields example by @ashwinb in #3670
- chore: Add weaviate client to unit group in pyproject.toml and uv.lock by @franciscojavierarceo in #3675
- chore: update CODEOWNERS by @reluctantfuturist in #3613
- chore(tests): normalize recording IDs and timestamps to reduce git diff noise by @ashwinb in #3676
- chore: fix setup_telemetry script by @ehhuang in #3680
- docs: Update links in README for quick start and documentation by @seyeong-han in #3678
- feat(tests): implement test isolation for inference recordings by @ashwinb in #3681
- chore: inference=remote::llama-openai-compat does not support /v1/completion by @mattf in #3683
- chore(ui-deps): bump next from 15.5.3 to 15.5.4 in /llama_stack/ui by @dependabot[bot] in #3694
- chore(ui-deps): bump react-dom and @types/react-dom in /llama_stack/ui by @dependabot[bot] in #3693
- chore(python-deps): bump requests from 2.32.4 to 2.32.5 by @dependabot[bot] in #3691
- chore(github-deps): bump astral-sh/setup-uv from 6.7.0 to 6.8.0 by @dependabot[bot] in #3686
- chore(github-deps): bump actions/github-script from 7.0.1 to 8.0.0 by @dependabot[bot] in #3685
- chore(python-deps): bump pandas from 2.3.1 to 2.3.3 by @dependabot[bot] in #3689
- chore: use uvicorn to start llama stack server everywhere by @ehhuang in #3625
- feat: allow for multiple external provider specs by @cdoern in #3341
- chore: give OpenAIMixin subcalsses a change to list models without leaking _model_cache details by @mattf in #3682
- chore: turn OpenAIMixin into a pydantic.BaseModel by @mattf in #3671
- chore: remove vLLM inference adapter's custom list_models by @mattf in #3703
- chore: disable openai_embeddings on inference=remote::llama-openai-compat by @mattf in #3704
- chore: remove together inference adapter's custom check_model_availability by @mattf in #3702
- docs: API docstrings cleanup for better documentation rendering by @reluctantfuturist in #3661
- chore: logger category fix by @ehhuang in #3706
- chore: fix closing error by @ehhuang in #3709
- feat(api): Add vector store file batches api by @slekkala1 in #3642
- chore: remove dead code by @ehhuang in #3713
- feat: enable Runpod inference adapter by @justinwlin in #3707
- fix: update pyproject.toml dependencies for vector processing by @skamenan7 in #3555
- fix: refresh log should be debug by @cdoern in #3720
- feat: add refresh_models support to inference adapters (default: false) by @mattf in #3719
- fix: make telemetry optional for agents by @cdoern in #3705
- feat: Enabling Annotations in Responses by @franciscojavierarceo in #3698
- fix: improve model availability checks: Allows use of unavailable models on startup by @akram in #3717
- chore: fix flaky unit test and add proper shutdown for file batches by @slekkala1 in #3725
- fix(scripts): select container runtime for telemetry by @iamemilio in #3727
- fix: fix nvidia provider by @wukaixingxp in #3716
- chore: Revert "fix: fix nvidia provider (#3716)" by @ehhuang in #3730
- chore: remove dead code by @slekkala1 in #3729
- chore!: remove --env from
llama stack runby @ehhuang in #3711 - chore: require valid logging category by @ehhuang in #3712
- fix: Raising an error message to the user when registering an existing provider. by @omaryashraf5 in #3624
- chore(github-deps): bump actions/stale from 10.0.0 to 10.1.0 by @dependabot[bot] in #3684
- fix: Update watsonx.ai provider to use LiteLLM mixin and list all models by @jwm4 in #3674
- fix(responses): fix regression in support for mcp tool require_approval argument by @grs in #3731
- fix(tests): ensure test isolation in server mode by @ashwinb in #3737
- chore: Removing Weaviate, PGVector, and Milvus from unit tests by @franciscojavierarceo in #3742
- feat(tests): add --collect-only option to integration test script by @ashwinb in #3745
- chore: print integration tests command by @ehhuang in #3747
- chore: revert "fix: Raising an error message to the user when registering an existing provider." by @leseb in #3750
- fix: add traces for tool calls and mcp tool listing by @grs in #3722
- feat(tests): make inference_recorder into api_recorder (include tool_invoke) by @ashwinb in #3403
- fix(tests): remove chroma and qdrant from vector io unit tests by @ashwinb in #3759
- fix(testing): improve api_recorder error messages for missing recordings by @ashwinb in #3760
- chore!: remove model mgmt from CLI for Hugging Face CLI by @leseb in #3700
- feat: make object registration idempotent by @mattf in #3752
- fix(inference): propagate 401/403 errors from remote providers by @ashwinb in #3762
- fix: update dangling references to llama download command by @ashwinb in #3763
- feat(responses): add usage types to inference and responses APIs by @ashwinb in #3764
- fix: allow skipping model availability check for vLLM by @akram in #3739
- feat(responses)!: add in_progress, failed, content part events by @ashwinb in #3765
- fix: update normalize to search all recordings dirs by @derekhiggins in #3767
- feat: use SecretStr for inference provider auth credentials by @mattf in #3724
- feat: reuse previous mcp tool listings where possible by @grs in #3710
- fix(mypy): fix wrong attribute access by @ashwinb in #3770
- fix(ci): remove responses from CI for now by @ashwinb in #3773
- feat: Add support for Conversations in Responses API by @franciscojavierarceo in #3743
- feat(responses): implement usage tracking in streaming responses by @ashwinb in #3771
- feat: Add /v1/embeddings endpoint to batches API by @varshaprasad96 in #3384
- fix(auth): allow unauthenticated access to health and version endpoints by @derekhiggins in #3736
- chore: refactor (chat)completions endpoints to use shared params struct by @ehhuang in #3761
- feat(api)!: BREAKING CHANGE: support passing
extra_bodythrough to providers by @ehhuang in #3777 - chore!: BREAKING CHANGE removing VectorDB APIs by @franciscojavierarceo in #3774
- chore(github-deps): bump astral-sh/setup-uv from 6.8.0 to 7.0.0 by @dependabot[bot] in #3782
- chore(python-deps): bump psycopg2-binary from 2.9.10 to 2.9.11 by @dependabot[bot] in #3785
- chore(python-deps): bump fire from 0.7.0 to 0.7.1 by @dependabot[bot] in #3787
- feat(responses)!: add reasoning and annotation added events by @ashwinb in #3793
- chore(python-deps): bump ollama from 0.5.1 to 0.6.0 by @dependabot[bot] in #3786
- chore(python-deps): bump blobfile from 3.0.0 to 3.1.0 by @dependabot[bot] in #3784
- chore(python-deps): bump black from 25.1.0 to 25.9.0 by @dependabot[bot] in #3783
- chore(ui-deps): bump @types/react from 19.2.0 to 19.2.2 in /llama_stack/ui by @dependabot[bot] in #3790
- chore(ui-deps): bump @types/react-dom from 19.2.0 to 19.2.1 in /llama_stack/ui by @dependabot[bot] in #3789
- chore(ui-deps): bump eslint from 9.26.0 to 9.37.0 in /llama_stack/ui by @dependabot[bot] in #3791
- chore(ui-deps): bump framer-motion from 12.23.12 to 12.23.24 in /llama_stack/ui by @dependabot[bot] in #3792
- chore(ui-deps): bump lucide-react from 0.542.0 to 0.545.0 in /llama_stack/ui by @dependabot[bot] in #3788
- chore!: Safety api refactoring to use OpenAIMessageParam by @slekkala1 in #3796
- feat(api)!: support extra_body to embeddings and vector_stores APIs by @ashwinb in #3794
- feat: Allow :memory: for kvstore by @raghotham in #3696
- fix: record job checking wrong directory by @derekhiggins in #3799
- chore: Auto-detect Provider ID when only 1 Vector Store Provider avai… by @franciscojavierarceo in #3802
- fix: replace python-jose with PyJWT for JWT handling by @leseb in #3756
- fix: Fixed WatsonX remote inference provider by @are-ces in #3801
- refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack by @r3v5 in #3183
- docs: Update CONTRIBUTING: py 3.12 and pre-commit==4.3.0 by @jwm4 in #3807
- chore(api)!: BREAKING CHANGE: remove ALL telemetry APIs by @ehhuang in #3740
- refactor: use
extra_bodyto pass ininput_typeparams for asymmetric embedding models for NVIDIA Inference Provider by @jiayin-nvidia in #3804 - feat: Enable setting a default embedding model in the stack by @franciscojavierarceo in #3803
- chore: Support embedding params from metadata for Vector Store by @slekkala1 in #3811
- feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys by @jperezdealgaba in #3813
- feat(responses)!: improve responses + conversations implementations by @ashwinb in #3810
- fix(vector-io): handle missing document_id in insert_chunks by @skamenan7 in #3521
- chore: mark recordings as generated files by @ehhuang in #3816
- fix(responses): fix subtle bugs in non-function tool calling by @ashwinb in #3817
- chore!: BREAKING CHANGE: remove sqlite from telemetry config by @ehhuang in #3808
- fix(responses): use conversation items when no stored messages exist by @ashwinb in #3819
- feat: Add responses and safety impl extra_body by @slekkala1 in #3781
- fix(responses): fixes, re-record tests by @ashwinb in #3820
- fix(models)!: always prefix models with provider_id when registering by @ashwinb in #3822
- chore: remove test_cases/openai/responses.json by @derekhiggins in #3823
- chore: update agent call by @leseb in #3824
- fix(tests): reduce some test noise by @ashwinb in #3825
- fix(openai_mixin): no yelling for model listing if API keys are not provided by @ashwinb in #3826
- docs: Document known limitations of Responses by @jwm4 in #3776
- fix: test id not being set in headers by @slekkala1 in #3827
- chore!: remove telemetry API usage by @cdoern in #3815
- chore: distrogen enables telemetry by default by @ehhuang in #3828
- fix(telemetry): remove dependency on old telemetry config by @ehhuang in #3830
- feat(ci): add support for docker:distro in tests by @ashwinb in #3832
- fix(perf): make batches tests finish 30x faster by @ashwinb in #3834
- feat(ci): enable docker based server tests by @ashwinb in #3833
- chore: update API leveling docs with deprecation flag by @reluctantfuturist in #3837
- docs: update docstrings for better formatting by @reluctantfuturist in #3838
- test(telemetry): Telemetry Tests by @iamemilio in #3805
- refactor(build): rework CLI commands and build process (1/2) by @cdoern in #2974
- chore: add telemetry setup to install.sh by @ehhuang in #3821
- chore(ui-deps): bump eslint-config-next from 15.5.2 to 15.5.6 in /llama_stack/ui by @dependabot[bot] in #3849
- chore(ui-deps): bump jest-environment-jsdom from 30.1.2 to 30.2.0 in /llama_stack/ui by @dependabot[bot] in #3852
- chore(ui-deps): bump jest and @types/jest in /llama_stack/ui by @dependabot[bot] in #3853
- docs: Documentation update for NVIDIA Inference Provider by @jiayin-nvidia in #3840
- docs: fix sidebar of
Detailed Tutorialby @cdoern in #3856 - chore: use dockerfile for building containers by @ehhuang in #3839
- chore: update doc by @ehhuang in #3857
- chore: disable telemetry if otel endpoint isn't set by @ehhuang in #3859
- chore(python-deps): bump ruff from 0.9.10 to 0.14.1 by @dependabot[bot] in #3846
- chore(python-deps): bump sqlalchemy from 2.0.41 to 2.0.44 by @dependabot[bot] in #3848
- fix: nested claims mapping in OAuth2 token validation by @derekhiggins in #3814
- feat: Add instructions parameter in response object by @s-akhtar-baig in #3741
- feat(stores)!: use backend storage references instead of configs by @ashwinb in #3697
- chore: Updating how default embedding model is set in stack by @franciscojavierarceo in #3818
- feat(stainless): add stainless source of truth config by @ashwinb in #3860
- chore(yaml)!: move registered resources to a sub-key by @ashwinb in #3861
- chore: install client first by @ehhuang in #3862
- chore(github-deps): bump actions/checkout from 4.2.2 to 5.0.0 by @dependabot[bot] in #3841
- chore(github-deps): bump astral-sh/setup-uv from 7.0.0 to 7.1.0 by @dependabot[bot] in #3842
- chore(github-deps): bump actions/setup-node from 5.0.0 to 6.0.0 by @dependabot[bot] in #3843
- chore: remove dead code by @ehhuang in #3863
- chore(python-deps): bump weaviate-client from 4.16.9 to 4.17.0 by @dependabot[bot] in #3844
- chore(python-deps): bump fastapi from 0.116.1 to 0.119.0 by @dependabot[bot] in #3845
- chore(ui-deps): bump @tailwindcss/postcss from 4.1.6 to 4.1.14 in /llama_stack/ui by @dependabot[bot] in #3850
- chore(ui-deps): bump @types/node from 24.3.0 to 24.8.1 in /llama_stack/ui by @dependabot[bot] in #3851
- chore: skip shutdown if otel_endpoint is not set by @ehhuang in #3865
- chore: fix main by @ehhuang in #3868
- chore: migrate stack build by @ehhuang in #3867
- chore: add
betagroup to stainless by @cdoern in #3866 - chore: remove build.py by @ehhuang in #3869
- chore(cleanup)!: kill vector_db references as far as possible by @ashwinb in #3864
- fix(ci): improve workflow logging and bot notifications by @ashwinb in #3872
- chore(cleanup)!: remove tool_runtime.rag_tool by @ashwinb in #3871
- fix(ci): dump server/container logs when tests fail by @ashwinb in #3873
- fix(logging): move module-level initialization to explicit setup calls by @ashwinb in #3874
- chore: update getting_started by @ehhuang in #3875
- revert: "chore(cleanup)!: remove tool_runtime.rag_tool" by @ashwinb in #3877
- chore: update quick_start by @ehhuang in #3878
- fix: fix segfault in load model by @slekkala1 in #3879
- docs: fix the building distro file by @reluctantfuturist in #3880
- fix: remove consistency checks by @slekkala1 in #3881
New Contributors
- @TamiTakamiya made their first contribution in #3003
- @jaideepr97 made their first contribution in #3560
- @anastasds made their first contribution in #3652
- @seyeong-han made their first contribution in #3678
- @justinwlin made their first contribution in #3707
- @iamemilio made their first contribution in #3727
- @jperezdealgaba made their first contribution in #3813
- @s-akhtar-baig made their first contribution in #3741
Full Changelog: v0.2.23...v0.3.0