Update on the development branch #2298
DanBlanaru
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
The TensorRT-LLM team is pleased to announce that we have pushed an update to the development branch (and the Triton backend) this Oct 08, 2024.
This #2297 includes:
examples/run.pyand documentation is inexamples/draft_target_model/README.md.ModelRunnerCppclass.isParticipantmethod to the C++ExecutorAPI to check if the current process is a participant in the executor instance.trtllm-buildcommand.strongly_typed=Falseto build the fp16 vision engine for the multimodal example. TensorRT 10 made the defaultstrongly_typed=Trueso fp32 vision engines are built, even if input ONNX files are fp16. This issue is now fixed.trtllm-build --fast-buildwith fake or random weights. Thanks to @ZJLi2013 for flagging it in trtllm-build with --fast-build ignore transformer layers #2135.assistant_model.customAllReduceperformance by using Lamport-style AllReduce + Norm fusion.memcpyover MPI to the target model's process inorchestratormode. This reduces the latency between the end of the draft model generation and beginning of target inference.Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions