Major
🥓 Context Parallelism
SFT now supports Context Parallelism (CP) for training large language models on very large sequences. You can now train with an arbitrarily long sequence length.

🧨 Dynamic Fine-Tuning
Dynamic Fine-Tuning (DFT) is a nnow supported in TRL.
from trl import SFTConfig
training_args = SFTConfig(
loss_type="dft",
...
)

by @qgallouedec in #4042
🪵 Truncated Importance Sampling (TIS) to address rollout-training mismatch
Different implementations are used for rollout generation (vLLM) and model training. The implementation gap implicitly turns the on-policy RL to be off-policy. Truncated Importance Sampling (TIS) a simple yet effective importance sampling technique for handling such discrepancy. This is now implemented in GRPO.
from trl import GRPOConfig
training_args = GRPOConfig(
...
use_vllm=True,
vllm_importance_sampling_correction=True, # default True
vllm_importance_sampling_cap=2.0, # hyper-parameter C
)
by @LeonEricsson in #3867
🥣 [SFTTrainer]: Add Aux Loss for MoE models
Mixture of Experts (MoE) models require an auxiliary loss to ensure that the different experts are used evenly. This auxiliary loss is now supported in SFTTrainer.
training_args = SFTConfig(
model_init_kwargs={"output_router_logits": True},
...
)
by @pramodith in #4012
💤 [GRPO/RLOO] Adds an option to sleep vllm when running in colocated mode
When running GRPO (or RLOO) with vLLM in colocated mode, the vLLM server consume VRAM during optimization while not being used. We now have an option to put the vLLM server to sleep during optimization to free up VRAM.
from trl import GRPOConfig
training_args = GRPOConfig(..., vllm_sleep_enabled=True)
by @edbeeching in #3968
⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer
You can now use vLLM server mode with OnlineDPOTrainer. Additionally, VLM models are now supported.
Comprehensive Paper Index Enhancement with 9 New Algorithm Implementations
The paper index has been significantly enhanced with the addition of 9+ new algorithm implementations, providing a more comprehensive resource for users.
by @behroozazarkhalili in #3990
Other Notable Changes
- 👷 Added Kernels on the Hub x TRL guide by @sergiopaniego in #3969
- 🌵 Refactor entropy_from_logits for memory efficiency by @qgallouedec in #4013
What's Changed
- ⬆️ Bump dev version by @qgallouedec in #3978
- 👮 Fix GRPO CLI by setting parameters for
get_soft_overlong_punishment
by @qgallouedec in #3972 - 🪃
args.gradient_checkpointing = False
instead ofargs = dataclasses.replace(args, gradient_checkpointing=False)
by @qgallouedec in #3981 - [GRPO] Adds an option to sleep vllm when running in colocated mode by @edbeeching in #3968
- 🎯 Add Trackio integration documentation and update TOC by @qgallouedec in #3971
- ⚖️ Fix scale_rewards issue in GRPO by @Peter-Chou in #3992
- ⏰ fix: add return to shift_tokens_right by @ginkyenglee in #3987
- Add pre-commit and hf-doc-builder as dev dependencies by @albertvillanova in #3993
- [GRPO] Truncated Importance Sampling to address rollout-training mismatch by @LeonEricsson in #3867
- Fixed tags shown problem in memory usage docs by @sergiopaniego in #3999
- ✖️ Support pad-to-multiple-of and padding-free by @qgallouedec in #3996
- 💾 [bugfix] fix PPO save_checkpoint by @hjh0119 in #3998
- [GRPO]: Fix Multi-GPU training for Entropy based masking of tokens. by @pramodith in #3964
- 📏
torch_dype
todtype
everywhere by @sergiopaniego in #4000 - Comprehensive Paper Index Enhancement with 9 New Algorithm Implementations by @behroozazarkhalili in #3990
- [SFT] fix: collator docstring by @LeonEricsson in #4011
- 👷 Added Kernels on the Hub x TRL guide by @sergiopaniego in #3969
- 🌵 Refactor entropy_from_logits for memory efficiency by @qgallouedec in #4013
- [SFTTrainer]: Add Aux Loss for MoE models. by @pramodith in #4012
- Add missing doc strings in SFTrainer by @pramodith in #4003
- ⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer by @vaelev in #3783
- Fix typo in GRPO quickstart by @dwisdom0 in #4020
- Align docstring parameters with function definitions by @albertvillanova in #4017
- Fix formatting errors in docstrings by @albertvillanova in #4025
- [doc] Paper index for Truncated Importance Sampling by @LeonEricsson in #4026
- [doc] Group paper index by trainer by @LeonEricsson in #4027
- Add missing trainer docstrings by @albertvillanova in #4030
- Add autodoc for AlignPropTrainer and AlignPropConfig by @albertvillanova in #4033
- 🥓 [docs] add CP docs by @kashif in #3994
- ⚖️ Remove
average_tokens_across_devices
default replacement by @qgallouedec in #4039 - CI hotfix: xfail test_training_with_transformers_paged by @albertvillanova in #4046
- Update transformers minimum version to 4.56.1 by @albertvillanova in #4047
- 🧨 DFT by @qgallouedec in #4042
- Update VLM arch check to
AutoModelForImageTextToText
for DPO and Online DPO by @sergiopaniego in #4049 - 🏂 Fix label shifting logic in
SFTTrainer
for compatibility with CP by @qgallouedec in #4038 - Add autodoc for BestOfNSampler and improve docstrings by @albertvillanova in #4034
- ✨ Improve SFT doc by @qgallouedec in #4005
- 💬 Remove setting chat template in sft script by @qgallouedec in #4037
- 🪪 Update SFTTrainer to handle labels correctly and add configuration example in paper index by @qgallouedec in #4051
- 🗜 Hotfix: avoid passing
quantization_config=None
by @qgallouedec in #4019 - Release: 0.23 by @qgallouedec in #4053
New Contributors
- @Peter-Chou made their first contribution in #3992
- @ginkyenglee made their first contribution in #3987
- @albertvillanova made their first contribution in #3993
- @hjh0119 made their first contribution in #3998
- @vaelev made their first contribution in #3783
- @dwisdom0 made their first contribution in #4020
Full Changelog: v0.22.0...v0.23.0