Release v0.23.0 · huggingface/trl

Major

🥓 Context Parallelism

SFT now supports Context Parallelism (CP) for training large language models on very large sequences. You can now train with an arbitrarily long sequence length.

by @kashif in #3994

🧨 Dynamic Fine-Tuning

Dynamic Fine-Tuning (DFT) is a nnow supported in TRL.

from trl import SFTConfig

training_args = SFTConfig(
    loss_type="dft",
    ...
)

by @qgallouedec in #4042

🪵 Truncated Importance Sampling (TIS) to address rollout-training mismatch

Different implementations are used for rollout generation (vLLM) and model training. The implementation gap implicitly turns the on-policy RL to be off-policy. Truncated Importance Sampling (TIS) a simple yet effective importance sampling technique for handling such discrepancy. This is now implemented in GRPO.

from trl import GRPOConfig

training_args = GRPOConfig(
    ...
    use_vllm=True,
    vllm_importance_sampling_correction=True, # default True
    vllm_importance_sampling_cap=2.0, # hyper-parameter C
)

by @LeonEricsson in #3867

🥣 [SFTTrainer]: Add Aux Loss for MoE models

Mixture of Experts (MoE) models require an auxiliary loss to ensure that the different experts are used evenly. This auxiliary loss is now supported in SFTTrainer.

training_args = SFTConfig(
    model_init_kwargs={"output_router_logits": True},
    ...
)

by @pramodith in #4012

💤 [GRPO/RLOO] Adds an option to sleep vllm when running in colocated mode

When running GRPO (or RLOO) with vLLM in colocated mode, the vLLM server consume VRAM during optimization while not being used. We now have an option to put the vLLM server to sleep during optimization to free up VRAM.

from trl import GRPOConfig

training_args = GRPOConfig(..., vllm_sleep_enabled=True)

by @edbeeching in #3968

⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer

You can now use vLLM server mode with OnlineDPOTrainer. Additionally, VLM models are now supported.

by @vaelev in #3783

Comprehensive Paper Index Enhancement with 9 New Algorithm Implementations

The paper index has been significantly enhanced with the addition of 9+ new algorithm implementations, providing a more comprehensive resource for users.

by @behroozazarkhalili in #3990

Other Notable Changes

👷 Added Kernels on the Hub x TRL guide by @sergiopaniego in #3969
🌵 Refactor entropy_from_logits for memory efficiency by @qgallouedec in #4013

What's Changed

⬆️ Bump dev version by @qgallouedec in #3978
👮 Fix GRPO CLI by setting parameters for get_soft_overlong_punishment by @qgallouedec in #3972
🪃 args.gradient_checkpointing = False instead of args = dataclasses.replace(args, gradient_checkpointing=False) by @qgallouedec in #3981
[GRPO] Adds an option to sleep vllm when running in colocated mode by @edbeeching in #3968
🎯 Add Trackio integration documentation and update TOC by @qgallouedec in #3971
⚖️ Fix scale_rewards issue in GRPO by @Peter-Chou in #3992
⏰ fix: add return to shift_tokens_right by @ginkyenglee in #3987
Add pre-commit and hf-doc-builder as dev dependencies by @albertvillanova in #3993
[GRPO] Truncated Importance Sampling to address rollout-training mismatch by @LeonEricsson in #3867
Fixed tags shown problem in memory usage docs by @sergiopaniego in #3999
✖️ Support pad-to-multiple-of and padding-free by @qgallouedec in #3996
💾 [bugfix] fix PPO save_checkpoint by @hjh0119 in #3998
[GRPO]: Fix Multi-GPU training for Entropy based masking of tokens. by @pramodith in #3964
📏 torch_dype to dtype everywhere by @sergiopaniego in #4000
Comprehensive Paper Index Enhancement with 9 New Algorithm Implementations by @behroozazarkhalili in #3990
[SFT] fix: collator docstring by @LeonEricsson in #4011
👷 Added Kernels on the Hub x TRL guide by @sergiopaniego in #3969
🌵 Refactor entropy_from_logits for memory efficiency by @qgallouedec in #4013
[SFTTrainer]: Add Aux Loss for MoE models. by @pramodith in #4012
Add missing doc strings in SFTrainer by @pramodith in #4003
⚖️ Add vLLM server mode and VLM support to OnlineDPOTrainer by @vaelev in #3783
Fix typo in GRPO quickstart by @dwisdom0 in #4020
Align docstring parameters with function definitions by @albertvillanova in #4017
Fix formatting errors in docstrings by @albertvillanova in #4025
[doc] Paper index for Truncated Importance Sampling by @LeonEricsson in #4026
[doc] Group paper index by trainer by @LeonEricsson in #4027
Add missing trainer docstrings by @albertvillanova in #4030
Add autodoc for AlignPropTrainer and AlignPropConfig by @albertvillanova in #4033
🥓 [docs] add CP docs by @kashif in #3994
⚖️ Remove average_tokens_across_devices default replacement by @qgallouedec in #4039
CI hotfix: xfail test_training_with_transformers_paged by @albertvillanova in #4046
Update transformers minimum version to 4.56.1 by @albertvillanova in #4047
🧨 DFT by @qgallouedec in #4042
Update VLM arch check to AutoModelForImageTextToText for DPO and Online DPO by @sergiopaniego in #4049
🏂 Fix label shifting logic in SFTTrainer for compatibility with CP by @qgallouedec in #4038
Add autodoc for BestOfNSampler and improve docstrings by @albertvillanova in #4034
✨ Improve SFT doc by @qgallouedec in #4005
💬 Remove setting chat template in sft script by @qgallouedec in #4037
🪪 Update SFTTrainer to handle labels correctly and add configuration example in paper index by @qgallouedec in #4051
🗜 Hotfix: avoid passing quantization_config=None by @qgallouedec in #4019
Release: 0.23 by @qgallouedec in #4053

New Contributors

@Peter-Chou made their first contribution in #3992
@ginkyenglee made their first contribution in #3987
@albertvillanova made their first contribution in #3993
@hjh0119 made their first contribution in #3998
@vaelev made their first contribution in #3783
@dwisdom0 made their first contribution in #4020

Full Changelog: v0.22.0...v0.23.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.23.0