Skip to content

Commit 8f91cff

Browse files
Shixiaowei02kaiyux
andauthored
TensorRT-LLM Release 0.15.0 (#2529)
Co-authored-by: Kaiyu Xie <[email protected]>
1 parent b088016 commit 8f91cff

File tree

758 files changed

+1273212
-832844
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

758 files changed

+1273212
-832844
lines changed

.gitignore

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,12 @@ dump*/
2929
config.json
3030
/*.svg
3131
cpp/cmake-build-*
32-
cpp/.ccache/
32+
cpp/.ccache
3333
tensorrt_llm/bin
3434
tensorrt_llm/libs
3535
tensorrt_llm/bindings.*.so
3636
tensorrt_llm/bindings.pyi
37-
tensorrt_llm/bindings/*.pyi
37+
tensorrt_llm/bindings/**/*.pyi
3838
*docs/cpp_docs*
3939
*docs/source/_cpp_gen*
4040
docs/source/llm-api/*.rst
@@ -55,3 +55,5 @@ cpp/include/tensorrt_llm/executor/version.h
5555

5656
# User config files
5757
CMakeUserPresets.json
58+
compile_commands.json
59+
*.bin

README.md

Lines changed: 29 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,23 +6,41 @@ TensorRT-LLM
66

77
[![Documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](https://nvidia.github.io/TensorRT-LLM/)
88
[![python](https://img.shields.io/badge/python-3.10.12-green)](https://www.python.org/downloads/release/python-31012/)
9-
[![cuda](https://img.shields.io/badge/cuda-12.5.1-green)](https://developer.nvidia.com/cuda-downloads)
10-
[![trt](https://img.shields.io/badge/TRT-10.4.0-green)](https://developer.nvidia.com/tensorrt)
11-
[![version](https://img.shields.io/badge/release-0.14.0-green)](./tensorrt_llm/version.py)
9+
[![cuda](https://img.shields.io/badge/cuda-12.6.2-green)](https://developer.nvidia.com/cuda-downloads)
10+
[![trt](https://img.shields.io/badge/TRT-10.6.0-green)](https://developer.nvidia.com/tensorrt)
11+
[![version](https://img.shields.io/badge/release-0.15.0-green)](./tensorrt_llm/version.py)
1212
[![license](https://img.shields.io/badge/license-Apache%202-blue)](./LICENSE)
1313

14-
[Architecture](./docs/source/architecture/overview.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Results](./docs/source/performance/perf-overview.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Examples](./examples/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Documentation](./docs/source/)
14+
[Architecture](./docs/source/architecture/overview.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Results](./docs/source/performance/perf-overview.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Examples](./examples/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Documentation](./docs/source/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[Roadmap](https://docs.google.com/presentation/d/1gycPmtdh7uUcH6laOvW65Dbp9F1McUkGDIcAyjicBZs/edit?usp=sharing)
1515

1616
---
1717
<div align="left">
1818

1919
## Latest News
20-
* [2024/09/29] 🌟 AI at Meta PyTorch + TensorRT v2.4 🌟 ⚡TensorRT 10.1 ⚡PyTorch 2.4 ⚡CUDA 12.4 ⚡Python 3.12
21-
[➡️ link](https://github.com/pytorch/TensorRT/releases/tag/v2.4.0)
20+
21+
* [2024/11/02] 🌟🌟🌟 NVIDIA and LlamaIndex Developer Contest
22+
🙌 Enter for a chance to win prizes including an NVIDIA® GeForce RTX™ 4080 SUPER GPU, DLI credits, and more🙌
23+
[➡️ link](https://developer.nvidia.com/llamaindex-developer-contest)
2224
<div align="center">
23-
<img src="docs/source/media/image-09-29-2024.png" width="50%">
25+
<img src="docs/source/media/image-11-02-2024.png" width="50%">
2426
<div align="left">
2527

28+
* [2024/10/28] 🏎️🏎️🏎️ NVIDIA GH200 Superchip Accelerates Inference by 2x in Multiturn Interactions with Llama Models
29+
[➡️ link](https://developer.nvidia.com/blog/nvidia-gh200-superchip-accelerates-inference-by-2x-in-multiturn-interactions-with-llama-models/)
30+
31+
* [2024/10/22] New 📝 Step-by-step instructions on how to
32+
✅ Optimize LLMs with NVIDIA TensorRT-LLM,
33+
✅ Deploy the optimized models with Triton Inference Server,
34+
✅ Autoscale LLMs deployment in a Kubernetes environment.
35+
🙌 Technical Deep Dive:
36+
[➡️ link](https://nvda.ws/3YgI8UT)
37+
38+
* [2024/10/07] 🚀🚀🚀Optimizing Microsoft Bing Visual Search with NVIDIA Accelerated Libraries
39+
[➡️ link](https://developer.nvidia.com/blog/optimizing-microsoft-bing-visual-search-with-nvidia-accelerated-libraries/)
40+
41+
* [2024/09/29] 🌟 AI at Meta PyTorch + TensorRT v2.4 🌟 ⚡TensorRT 10.1 ⚡PyTorch 2.4 ⚡CUDA 12.4 ⚡Python 3.12
42+
[➡️ link](https://github.com/pytorch/TensorRT/releases/tag/v2.4.0)
43+
2644
* [2024/09/17] ✨ NVIDIA TensorRT-LLM Meetup
2745
[➡️ link](https://drive.google.com/file/d/1RR8GqC-QbuaKuHj82rZcXb3MS20SWo6F/view?usp=share_link)
2846

@@ -35,6 +53,9 @@ TensorRT-LLM
3553
* [2024/09/04] 🏎️🏎️🏎️ Best Practices for Tuning TensorRT-LLM for Optimal Serving with BentoML
3654
[➡️ link](https://www.bentoml.com/blog/tuning-tensor-rt-llm-for-optimal-serving-with-bentoml)
3755

56+
<details close>
57+
<summary>Previous News</summary>
58+
3859
* [2024/08/20] 🏎️SDXL with #TensorRT Model Optimizer ⏱️⚡ 🏁 cache diffusion 🏁 quantization aware training 🏁 QLoRA 🏁 #Python 3.12
3960
[➡️ link](https://developer.nvidia.com/blog/nvidia-tensorrt-model-optimizer-v0-15-boosts-inference-performance-and-expands-model-support/)
4061

@@ -61,9 +82,6 @@ TensorRT-LLM
6182
* [2024/07/02] Let the @MistralAI MoE tokens fly 📈 🚀 #Mixtral 8x7B with NVIDIA #TensorRT #LLM on #H100.
6283
[➡️ Tech blog](https://developer.nvidia.com/blog/achieving-high-mixtral-8x7b-performance-with-nvidia-h100-tensor-core-gpus-and-tensorrt-llm?ncid=so-twit-928467)
6384

64-
<details close>
65-
<summary>Previous News</summary>
66-
6785
* [2024/06/24] Enhanced with NVIDIA #TensorRT #LLM, @upstage.ai’s solar-10.7B-instruct is ready to power your developer projects through our API catalog 🏎️. ✨[➡️ link](https://build.nvidia.com/upstage/solar-10_7b-instruct?snippet_tab=Try )
6886

6987
* [2024/06/18] CYMI: 🤩 Stable Diffusion 3 dropped last week 🎊 🏎️ Speed up your SD3 with #TensorRT INT8 Quantization[➡️ link](https://build.nvidia.com/upstage/solar-10_7b-instruct?snippet_tab=Try )
@@ -125,6 +143,7 @@ To get started with TensorRT-LLM, visit our documentation:
125143
- [Release Notes](https://nvidia.github.io/TensorRT-LLM/release-notes.html)
126144
- [Installation Guide for Linux](https://nvidia.github.io/TensorRT-LLM/installation/linux.html)
127145
- [Installation Guide for Windows](https://nvidia.github.io/TensorRT-LLM/installation/windows.html)
146+
- [Installation Guide for Grace Hopper](https://nvidia.github.io/TensorRT-LLM/installation/grace-hopper.html)
128147
- [Supported Hardware, Models, and other Software](https://nvidia.github.io/TensorRT-LLM/reference/support-matrix.html)
129148

130149
## Community

0 commit comments

Comments
 (0)