Skip to content

v0.3.0

Latest

Choose a tag to compare

@reluctantfuturist reluctantfuturist released this 22 Oct 19:21
· 29 commits to main since this release

Highlights

  • Stable OpenAI-Compatible APIs
  • Llama Stack now separates APIs into stable (/v1/), experimental (/v1alpha/ and /v1beta/) and deprecated (deprecated = True.)
  • extra_body/metadata support for APIs which support extra functionality compared to the OpenAI implementation
  • Documentation overhaul: Migration to Docusaurus, modern formatting, and improved API docs

What's Changed

  • feat(internal): add image_url download feature to OpenAIMixin by @mattf in #3516
  • chore(api): remove batch inference by @mattf in #3261
  • chore(apis): unpublish deprecated /v1/inference apis by @mattf in #3297
  • chore: recordings for fireworks (inference + openai) by @mattf in #3573
  • chore: remove extra logging by @ehhuang in #3574
  • chore: MANIFEST maintenance by @leseb in #3454
  • feat: Add items and title to ToolParameter/ToolParamDefinition by @TamiTakamiya in #3003
  • feat(ci): use @next branch from llama-stack-client by @ashwinb in #3576
  • chore(ui-deps): bump shiki from 1.29.2 to 3.13.0 in /llama_stack/ui by @dependabot[bot] in #3585
  • chore(ui-deps): bump tw-animate-css from 1.2.9 to 1.4.0 in /llama_stack/ui by @dependabot[bot] in #3583
  • chore(github-deps): bump actions/cache from 4.2.4 to 4.3.0 by @dependabot[bot] in #3577
  • chore: skip nvidia datastore tests when nvidia datastore is not enabled by @mattf in #3590
  • chore: introduce write queue for response_store by @ehhuang in #3497
  • revert: feat(ci): use @next branch from llama-stack-client by @ashwinb in #3593
  • fix: adding mime type of application/json support by @wukaixingxp in #3452
  • chore(api): remove deprecated embeddings impls by @mattf in #3301
  • feat(api): level inference/rerank and remove experimental by @cdoern in #3565
  • chore: skip safety tests when shield not available by @mattf in #3592
  • feat: update eval runner to use openai endpoints by @mattf in #3588
  • docs: update image paths by @reluctantfuturist in #3599
  • fix: remove inference.completion from docs by @mattf in #3589
  • fix: Remove deprecated user param in OpenAIResponseObject by @slekkala1 in #3596
  • fix: ensure usage is requested if telemetry is enabled by @mhdawson in #3571
  • feat(openai_movement): Change URL structures to kill /openai/v1 (part 1) by @ashwinb in #3587
  • feat(files): fix expires_after API shape by @ashwinb in #3604
  • feat(openai_movement)!: Change URL structures to kill /openai/v1 (part 2) by @ashwinb in #3605
  • fix: mcp tool with array type should include items by @ehhuang in #3602
  • feat: add llamastack + CrewAI integration example notebook by @wukaixingxp in #3275
  • chore: unpublish /inference/chat-completion by @mattf in #3609
  • feat: use /v1/chat/completions for safety model inference by @mattf in #3591
  • feat(api): level /agents as v1alpha by @cdoern in #3610
  • feat(api): Add Vector Store File batches api stub by @slekkala1 in #3615
  • fix(expires_after): make sure multipart/form-data is properly parsed by @ashwinb in #3612
  • docs: frontpage update by @reluctantfuturist in #3620
  • docs: update safety notebook by @reluctantfuturist in #3617
  • feat: add support for require_approval argument when creating response by @grs in #3608
  • fix: don't pass default response format in Responses by @ehhuang in #3614
  • fix(logging): disable console telemetry sink by default by @ashwinb in #3623
  • fix: Ensure that tool calls with no arguments get handled correctly by @jaideepr97 in #3560
  • chore: use openai_chat_completion for llm as a judge scoring by @mattf in #3635
  • chore: remove /v1/inference/completion and implementations by @mattf in #3622
  • feat(api): implement v1beta leveling, and additional alpha by @cdoern in #3594
  • feat(conformance): skip test if breaking change is ack by @cdoern in #3619
  • fix: log level by @ehhuang in #3637
  • docs: update API conformance test by @reluctantfuturist in #3631
  • docs: api separation by @reluctantfuturist in #3630
  • docs: adding supplementary markdown content to API specs by @reluctantfuturist in #3632
  • chore: add provider-data-api-key support to openaimixin by @mattf in #3639
  • chore: Remove debug logging from telemetry adapter by @ehhuang in #3643
  • docs: fix broken links by @reluctantfuturist in #3647
  • docs: add favicon and mobile styling by @reluctantfuturist in #3650
  • docs: fix more broken links by @reluctantfuturist in #3649
  • docs: Fix Dell distro documentation code snippets by @ConnorHack in #3640
  • refactor(agents): migrate to OpenAI chat completions API by @aakankshaduggal in #3323
  • fix: re-enable conformance skipping ability by @cdoern in #3651
  • chore!: add double routes for v1/openai/v1 by @leseb in #3636
  • docs: Update docs navbar config by @kelbrown20 in #3653
  • docs: API spec generation for Stainless by @reluctantfuturist in #3655
  • chore: fix agents tests for non-ollama providers, provide max_tokens by @mattf in #3657
  • chore: fix/add logging categories by @ehhuang in #3658
  • chore: fix precommit by @ehhuang in #3663
  • feat(tools)!: substantial clean up of "Tool" related datatypes by @ashwinb in #3627
  • fix: responses <> chat completion input conversion by @ehhuang in #3645
  • chore: OpenAIMixin implements ModelsProtocolPrivate by @mattf in #3662
  • feat: auto-detect Console width by @rhdedgar in #3327
  • feat: implement keyword and hybrid search for Weaviate provider by @ChristianZaccaria in #3264
  • fix(docs): Correct indentation in documented example for access_policy by @anastasds in #3652
  • chore: remove deprecated inference.chat_completion implementations by @mattf in #3654
  • docs!: adjust external provider docs by @cdoern in #3484
  • feat: Add OpenAI Conversations API by @franciscojavierarceo in #3429
  • chore: use remoteinferenceproviderconfig for remote inference providers by @mattf in #3668
  • docs: update OG image by @reluctantfuturist in #3669
  • feat: add comment-triggered pre-commit bot for PRs by @ashwinb in #3672
  • feat(api): add extra_body parameter support with shields example by @ashwinb in #3670
  • chore: Add weaviate client to unit group in pyproject.toml and uv.lock by @franciscojavierarceo in #3675
  • chore: update CODEOWNERS by @reluctantfuturist in #3613
  • chore(tests): normalize recording IDs and timestamps to reduce git diff noise by @ashwinb in #3676
  • chore: fix setup_telemetry script by @ehhuang in #3680
  • docs: Update links in README for quick start and documentation by @seyeong-han in #3678
  • feat(tests): implement test isolation for inference recordings by @ashwinb in #3681
  • chore: inference=remote::llama-openai-compat does not support /v1/completion by @mattf in #3683
  • chore(ui-deps): bump next from 15.5.3 to 15.5.4 in /llama_stack/ui by @dependabot[bot] in #3694
  • chore(ui-deps): bump react-dom and @types/react-dom in /llama_stack/ui by @dependabot[bot] in #3693
  • chore(python-deps): bump requests from 2.32.4 to 2.32.5 by @dependabot[bot] in #3691
  • chore(github-deps): bump astral-sh/setup-uv from 6.7.0 to 6.8.0 by @dependabot[bot] in #3686
  • chore(github-deps): bump actions/github-script from 7.0.1 to 8.0.0 by @dependabot[bot] in #3685
  • chore(python-deps): bump pandas from 2.3.1 to 2.3.3 by @dependabot[bot] in #3689
  • chore: use uvicorn to start llama stack server everywhere by @ehhuang in #3625
  • feat: allow for multiple external provider specs by @cdoern in #3341
  • chore: give OpenAIMixin subcalsses a change to list models without leaking _model_cache details by @mattf in #3682
  • chore: turn OpenAIMixin into a pydantic.BaseModel by @mattf in #3671
  • chore: remove vLLM inference adapter's custom list_models by @mattf in #3703
  • chore: disable openai_embeddings on inference=remote::llama-openai-compat by @mattf in #3704
  • chore: remove together inference adapter's custom check_model_availability by @mattf in #3702
  • docs: API docstrings cleanup for better documentation rendering by @reluctantfuturist in #3661
  • chore: logger category fix by @ehhuang in #3706
  • chore: fix closing error by @ehhuang in #3709
  • feat(api): Add vector store file batches api by @slekkala1 in #3642
  • chore: remove dead code by @ehhuang in #3713
  • feat: enable Runpod inference adapter by @justinwlin in #3707
  • fix: update pyproject.toml dependencies for vector processing by @skamenan7 in #3555
  • fix: refresh log should be debug by @cdoern in #3720
  • feat: add refresh_models support to inference adapters (default: false) by @mattf in #3719
  • fix: make telemetry optional for agents by @cdoern in #3705
  • feat: Enabling Annotations in Responses by @franciscojavierarceo in #3698
  • fix: improve model availability checks: Allows use of unavailable models on startup by @akram in #3717
  • chore: fix flaky unit test and add proper shutdown for file batches by @slekkala1 in #3725
  • fix(scripts): select container runtime for telemetry by @iamemilio in #3727
  • fix: fix nvidia provider by @wukaixingxp in #3716
  • chore: Revert "fix: fix nvidia provider (#3716)" by @ehhuang in #3730
  • chore: remove dead code by @slekkala1 in #3729
  • chore!: remove --env from llama stack run by @ehhuang in #3711
  • chore: require valid logging category by @ehhuang in #3712
  • fix: Raising an error message to the user when registering an existing provider. by @omaryashraf5 in #3624
  • chore(github-deps): bump actions/stale from 10.0.0 to 10.1.0 by @dependabot[bot] in #3684
  • fix: Update watsonx.ai provider to use LiteLLM mixin and list all models by @jwm4 in #3674
  • fix(responses): fix regression in support for mcp tool require_approval argument by @grs in #3731
  • fix(tests): ensure test isolation in server mode by @ashwinb in #3737
  • chore: Removing Weaviate, PGVector, and Milvus from unit tests by @franciscojavierarceo in #3742
  • feat(tests): add --collect-only option to integration test script by @ashwinb in #3745
  • chore: print integration tests command by @ehhuang in #3747
  • chore: revert "fix: Raising an error message to the user when registering an existing provider." by @leseb in #3750
  • fix: add traces for tool calls and mcp tool listing by @grs in #3722
  • feat(tests): make inference_recorder into api_recorder (include tool_invoke) by @ashwinb in #3403
  • fix(tests): remove chroma and qdrant from vector io unit tests by @ashwinb in #3759
  • fix(testing): improve api_recorder error messages for missing recordings by @ashwinb in #3760
  • chore!: remove model mgmt from CLI for Hugging Face CLI by @leseb in #3700
  • feat: make object registration idempotent by @mattf in #3752
  • fix(inference): propagate 401/403 errors from remote providers by @ashwinb in #3762
  • fix: update dangling references to llama download command by @ashwinb in #3763
  • feat(responses): add usage types to inference and responses APIs by @ashwinb in #3764
  • fix: allow skipping model availability check for vLLM by @akram in #3739
  • feat(responses)!: add in_progress, failed, content part events by @ashwinb in #3765
  • fix: update normalize to search all recordings dirs by @derekhiggins in #3767
  • feat: use SecretStr for inference provider auth credentials by @mattf in #3724
  • feat: reuse previous mcp tool listings where possible by @grs in #3710
  • fix(mypy): fix wrong attribute access by @ashwinb in #3770
  • fix(ci): remove responses from CI for now by @ashwinb in #3773
  • feat: Add support for Conversations in Responses API by @franciscojavierarceo in #3743
  • feat(responses): implement usage tracking in streaming responses by @ashwinb in #3771
  • feat: Add /v1/embeddings endpoint to batches API by @varshaprasad96 in #3384
  • fix(auth): allow unauthenticated access to health and version endpoints by @derekhiggins in #3736
  • chore: refactor (chat)completions endpoints to use shared params struct by @ehhuang in #3761
  • feat(api)!: BREAKING CHANGE: support passing extra_body through to providers by @ehhuang in #3777
  • chore!: BREAKING CHANGE removing VectorDB APIs by @franciscojavierarceo in #3774
  • chore(github-deps): bump astral-sh/setup-uv from 6.8.0 to 7.0.0 by @dependabot[bot] in #3782
  • chore(python-deps): bump psycopg2-binary from 2.9.10 to 2.9.11 by @dependabot[bot] in #3785
  • chore(python-deps): bump fire from 0.7.0 to 0.7.1 by @dependabot[bot] in #3787
  • feat(responses)!: add reasoning and annotation added events by @ashwinb in #3793
  • chore(python-deps): bump ollama from 0.5.1 to 0.6.0 by @dependabot[bot] in #3786
  • chore(python-deps): bump blobfile from 3.0.0 to 3.1.0 by @dependabot[bot] in #3784
  • chore(python-deps): bump black from 25.1.0 to 25.9.0 by @dependabot[bot] in #3783
  • chore(ui-deps): bump @types/react from 19.2.0 to 19.2.2 in /llama_stack/ui by @dependabot[bot] in #3790
  • chore(ui-deps): bump @types/react-dom from 19.2.0 to 19.2.1 in /llama_stack/ui by @dependabot[bot] in #3789
  • chore(ui-deps): bump eslint from 9.26.0 to 9.37.0 in /llama_stack/ui by @dependabot[bot] in #3791
  • chore(ui-deps): bump framer-motion from 12.23.12 to 12.23.24 in /llama_stack/ui by @dependabot[bot] in #3792
  • chore(ui-deps): bump lucide-react from 0.542.0 to 0.545.0 in /llama_stack/ui by @dependabot[bot] in #3788
  • chore!: Safety api refactoring to use OpenAIMessageParam by @slekkala1 in #3796
  • feat(api)!: support extra_body to embeddings and vector_stores APIs by @ashwinb in #3794
  • feat: Allow :memory: for kvstore by @raghotham in #3696
  • fix: record job checking wrong directory by @derekhiggins in #3799
  • chore: Auto-detect Provider ID when only 1 Vector Store Provider avai… by @franciscojavierarceo in #3802
  • fix: replace python-jose with PyJWT for JWT handling by @leseb in #3756
  • fix: Fixed WatsonX remote inference provider by @are-ces in #3801
  • refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack by @r3v5 in #3183
  • docs: Update CONTRIBUTING: py 3.12 and pre-commit==4.3.0 by @jwm4 in #3807
  • chore(api)!: BREAKING CHANGE: remove ALL telemetry APIs by @ehhuang in #3740
  • refactor: use extra_body to pass in input_type params for asymmetric embedding models for NVIDIA Inference Provider by @jiayin-nvidia in #3804
  • feat: Enable setting a default embedding model in the stack by @franciscojavierarceo in #3803
  • chore: Support embedding params from metadata for Vector Store by @slekkala1 in #3811
  • feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys by @jperezdealgaba in #3813
  • feat(responses)!: improve responses + conversations implementations by @ashwinb in #3810
  • fix(vector-io): handle missing document_id in insert_chunks by @skamenan7 in #3521
  • chore: mark recordings as generated files by @ehhuang in #3816
  • fix(responses): fix subtle bugs in non-function tool calling by @ashwinb in #3817
  • chore!: BREAKING CHANGE: remove sqlite from telemetry config by @ehhuang in #3808
  • fix(responses): use conversation items when no stored messages exist by @ashwinb in #3819
  • feat: Add responses and safety impl extra_body by @slekkala1 in #3781
  • fix(responses): fixes, re-record tests by @ashwinb in #3820
  • fix(models)!: always prefix models with provider_id when registering by @ashwinb in #3822
  • chore: remove test_cases/openai/responses.json by @derekhiggins in #3823
  • chore: update agent call by @leseb in #3824
  • fix(tests): reduce some test noise by @ashwinb in #3825
  • fix(openai_mixin): no yelling for model listing if API keys are not provided by @ashwinb in #3826
  • docs: Document known limitations of Responses by @jwm4 in #3776
  • fix: test id not being set in headers by @slekkala1 in #3827
  • chore!: remove telemetry API usage by @cdoern in #3815
  • chore: distrogen enables telemetry by default by @ehhuang in #3828
  • fix(telemetry): remove dependency on old telemetry config by @ehhuang in #3830
  • feat(ci): add support for docker:distro in tests by @ashwinb in #3832
  • fix(perf): make batches tests finish 30x faster by @ashwinb in #3834
  • feat(ci): enable docker based server tests by @ashwinb in #3833
  • chore: update API leveling docs with deprecation flag by @reluctantfuturist in #3837
  • docs: update docstrings for better formatting by @reluctantfuturist in #3838
  • test(telemetry): Telemetry Tests by @iamemilio in #3805
  • refactor(build): rework CLI commands and build process (1/2) by @cdoern in #2974
  • chore: add telemetry setup to install.sh by @ehhuang in #3821
  • chore(ui-deps): bump eslint-config-next from 15.5.2 to 15.5.6 in /llama_stack/ui by @dependabot[bot] in #3849
  • chore(ui-deps): bump jest-environment-jsdom from 30.1.2 to 30.2.0 in /llama_stack/ui by @dependabot[bot] in #3852
  • chore(ui-deps): bump jest and @types/jest in /llama_stack/ui by @dependabot[bot] in #3853
  • docs: Documentation update for NVIDIA Inference Provider by @jiayin-nvidia in #3840
  • docs: fix sidebar of Detailed Tutorial by @cdoern in #3856
  • chore: use dockerfile for building containers by @ehhuang in #3839
  • chore: update doc by @ehhuang in #3857
  • chore: disable telemetry if otel endpoint isn't set by @ehhuang in #3859
  • chore(python-deps): bump ruff from 0.9.10 to 0.14.1 by @dependabot[bot] in #3846
  • chore(python-deps): bump sqlalchemy from 2.0.41 to 2.0.44 by @dependabot[bot] in #3848
  • fix: nested claims mapping in OAuth2 token validation by @derekhiggins in #3814
  • feat: Add instructions parameter in response object by @s-akhtar-baig in #3741
  • feat(stores)!: use backend storage references instead of configs by @ashwinb in #3697
  • chore: Updating how default embedding model is set in stack by @franciscojavierarceo in #3818
  • feat(stainless): add stainless source of truth config by @ashwinb in #3860
  • chore(yaml)!: move registered resources to a sub-key by @ashwinb in #3861
  • chore: install client first by @ehhuang in #3862
  • chore(github-deps): bump actions/checkout from 4.2.2 to 5.0.0 by @dependabot[bot] in #3841
  • chore(github-deps): bump astral-sh/setup-uv from 7.0.0 to 7.1.0 by @dependabot[bot] in #3842
  • chore(github-deps): bump actions/setup-node from 5.0.0 to 6.0.0 by @dependabot[bot] in #3843
  • chore: remove dead code by @ehhuang in #3863
  • chore(python-deps): bump weaviate-client from 4.16.9 to 4.17.0 by @dependabot[bot] in #3844
  • chore(python-deps): bump fastapi from 0.116.1 to 0.119.0 by @dependabot[bot] in #3845
  • chore(ui-deps): bump @tailwindcss/postcss from 4.1.6 to 4.1.14 in /llama_stack/ui by @dependabot[bot] in #3850
  • chore(ui-deps): bump @types/node from 24.3.0 to 24.8.1 in /llama_stack/ui by @dependabot[bot] in #3851
  • chore: skip shutdown if otel_endpoint is not set by @ehhuang in #3865
  • chore: fix main by @ehhuang in #3868
  • chore: migrate stack build by @ehhuang in #3867
  • chore: add beta group to stainless by @cdoern in #3866
  • chore: remove build.py by @ehhuang in #3869
  • chore(cleanup)!: kill vector_db references as far as possible by @ashwinb in #3864
  • fix(ci): improve workflow logging and bot notifications by @ashwinb in #3872
  • chore(cleanup)!: remove tool_runtime.rag_tool by @ashwinb in #3871
  • fix(ci): dump server/container logs when tests fail by @ashwinb in #3873
  • fix(logging): move module-level initialization to explicit setup calls by @ashwinb in #3874
  • chore: update getting_started by @ehhuang in #3875
  • revert: "chore(cleanup)!: remove tool_runtime.rag_tool" by @ashwinb in #3877
  • chore: update quick_start by @ehhuang in #3878
  • fix: fix segfault in load model by @slekkala1 in #3879
  • docs: fix the building distro file by @reluctantfuturist in #3880
  • fix: remove consistency checks by @slekkala1 in #3881

New Contributors

Full Changelog: v0.2.23...v0.3.0