Skip to content
38 changes: 38 additions & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,44 @@
TRL is a full stack library where we provide a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more.
The library is integrated with 🤗 [transformers](https://github.com/huggingface/transformers).

Below is the current list of TRL trainers, organized by method type (⚡️ = vLLM support).

<div style="display: flex; justify-content: space-between; width: 100%; gap: 2rem;">

<div style="flex: 1; min-width: 0;">

**Online methods**
- [`GRPOTrainer`] ⚡️
- [`RLOOTrainer`] ⚡️
- [`OnlineDPOTrainer`] ⚡️
- [`NashMDTrainer`] ⚡️
- [`XPOTrainer`] ⚡️
- [`PPOTrainer`]

**Reward modeling**
- [`PRMTrainer`]
- [`RewardTrainer`]

</div>

<div style="flex: 1; min-width: 0;">

**Offline methods**
- [`SFTTrainer`]
- [`DPOTrainer`]
- [`ORPOTrainer`]
- [`BCOTrainer`]
- [`CPOTrainer`]
- [`KTOTrainer`]

**Knowledge distillation**
- [`GKDTrainer`]

</div>

</div>


## 🎉 What's New

**✨ OpenAI GPT OSS Support**: TRL now fully supports fine-tuning the latest [OpenAI GPT OSS models](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4)! Check out the:
Expand Down
Loading