You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/community_tutorials.md
+4-7Lines changed: 4 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,13 +29,10 @@ Community tutorials are made by active members of the Hugging Face community who
29
29
<details>
30
30
<summary>⚠️ Deprecated features notice for "How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset" (click to expand)</summary>
31
31
32
-
<Tipwarning={true}>
33
-
34
-
The tutorial uses two deprecated features:
35
-
-`SFTTrainer(..., tokenizer=tokenizer)`: Use `SFTTrainer(..., processing_class=tokenizer)` instead, or simply omit it (it will be inferred from the model).
36
-
-`setup_chat_format(model, tokenizer)`: Use `SFTConfig(..., chat_template_path="Qwen/Qwen3-0.6B")`, where `chat_template_path` specifies the model whose chat template you want to copy.
37
-
38
-
</Tip>
32
+
> [!WARNING]
33
+
> The tutorial uses two deprecated features:
34
+
> -`SFTTrainer(..., tokenizer=tokenizer)`: Use `SFTTrainer(..., processing_class=tokenizer)` instead, or simply omit it (it will be inferred from the model).
35
+
> -`setup_chat_format(model, tokenizer)`: Use `SFTConfig(..., chat_template_path="Qwen/Qwen3-0.6B")`, where `chat_template_path` specifies the model whose chat template you want to copy.
Copy file name to clipboardExpand all lines: docs/source/dataset_formats.md
+48-66Lines changed: 48 additions & 66 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -289,31 +289,28 @@ prompt_only_example = {"prompt": [{"role": "user", "content": "What color is the
289
289
290
290
For examples of prompt-only datasets, refer to the [Prompt-only datasets collection](https://huggingface.co/collections/trl-lib/prompt-only-datasets-677ea25245d20252cea00368).
291
291
292
-
<Tip>
293
-
294
-
While both the prompt-only and language modeling types are similar, they differ in how the input is handled. In the prompt-only type, the prompt represents a partial input that expects the model to complete or continue, while in the language modeling type, the input is treated as a complete sentence or sequence. These two types are processed differently by TRL. Below is an example showing the difference in the output of the `apply_chat_template` function for each type:
# Output: {'prompt': '<|user|>\nWhat color is the sky?<|end|>\n<|assistant|>\n'}
306
-
307
-
# Example for language modeling type
308
-
lm_example = {"messages": [{"role": "user", "content": "What color is the sky?"}]}
309
-
apply_chat_template(lm_example, tokenizer)
310
-
# Output: {'text': '<|user|>\nWhat color is the sky?<|end|>\n<|endoftext|>'}
311
-
```
312
-
313
-
- The prompt-only output includes a `'<|assistant|>\n'`, indicating the beginning of the assistant’s turn and expecting the model to generate a completion.
314
-
- In contrast, the language modeling output treats the input as a complete sequence and terminates it with `'<|endoftext|>'`, signaling the end of the text and not expecting any additional content.
315
-
316
-
</Tip>
292
+
> [!TIP]
293
+
> While both the prompt-only and language modeling types are similar, they differ in how the input is handled. In the prompt-only type, the prompt represents a partial input that expects the model to complete or continue, while in the language modeling type, the input is treated as a complete sentence or sequence. These two types are processed differently by TRL. Below is an example showing the difference in the output of the `apply_chat_template` function for each type:
># Output: {'prompt': '<|user|>\nWhat color is the sky?<|end|>\n<|assistant|>\n'}
305
+
>
306
+
># Example for language modeling type
307
+
> lm_example = {"messages": [{"role": "user", "content": "What color is the sky?"}]}
308
+
> apply_chat_template(lm_example, tokenizer)
309
+
># Output: {'text': '<|user|>\nWhat color is the sky?<|end|>\n<|endoftext|>'}
310
+
>```
311
+
>
312
+
>- The prompt-only output includes a `'<|assistant|>\n'`, indicating the beginning of the assistant’s turn and expecting the model to generate a completion.
313
+
>- In contrast, the language modeling output treats the inputas a complete sequence and terminates it with`'<|endoftext|>'`, signaling the end of the text andnot expecting any additional content.
317
314
318
315
#### Prompt-completion
319
316
@@ -408,12 +405,9 @@ Choosing the right dataset type depends on the task you are working on and the s
408
405
|[`SFTTrainer`]|[Language modeling](#language-modeling) or [Prompt-completion](#prompt-completion)|
409
406
|[`XPOTrainer`]|[Prompt-only](#prompt-only)|
410
407
411
-
<Tip>
412
-
413
-
TRL trainers only support standard dataset formats, [for now](https://github.com/huggingface/trl/issues/2071). If you have a conversational dataset, you must first convert it into a standard format.
414
-
For more information on how to work with conversational datasets, refer to the [Working with conversational datasets in TRL](#working-with-conversational-datasets-in-trl) section.
415
-
416
-
</Tip>
408
+
> [!TIP]
409
+
> TRL trainers only support standard dataset formats, [for now](https://github.com/huggingface/trl/issues/2071). If you have a conversational dataset, you must first convert it into a standard format.
410
+
> For more information on how to work with conversational datasets, refer to the [Working with conversational datasets in TRL](#working-with-conversational-datasets-in-trl) section.
# 'completion': ['It is blue.<|end|>\n<|endoftext|>', 'In the sky.<|end|>\n<|endoftext|>']}
466
460
```
467
461
468
-
<Tipwarning={true}>
469
-
470
-
We recommend using the [`apply_chat_template`] function instead of calling `tokenizer.apply_chat_template` directly. Handling chat templates for non-language modeling datasets can be tricky and may result in errors, such as mistakenly placing a system prompt in the middle of a conversation.
471
-
For additional examples, see [#1930 (comment)](https://github.com/huggingface/trl/pull/1930#issuecomment-2292908614). The [`apply_chat_template`] is designed to handle these intricacies and ensure the correct application of chat templates for various tasks.
472
-
473
-
</Tip>
474
-
475
-
<Tipwarning={true}>
476
-
477
-
It's important to note that chat templates are model-specific. For example, if you use the chat template from [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) with the above example, you get a different output:
# {'prompt': '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat color is the sky?<|im_end|>\n<|im_start|>assistant\n',
483
-
# 'completion': 'It is blue.<|im_end|>\n'}
484
-
```
485
-
486
-
Always use the chat template associated with the model you're working with. Using the wrong template can lead to inaccurate or unexpected results.
487
-
488
-
</Tip>
462
+
> [!WARNING]
463
+
> We recommend using the [`apply_chat_template`] function instead of calling `tokenizer.apply_chat_template` directly. Handling chat templates for non-language modeling datasets can be tricky and may result in errors, such as mistakenly placing a system prompt in the middle of a conversation.
464
+
> For additional examples, see [#1930 (comment)](https://github.com/huggingface/trl/pull/1930#issuecomment-2292908614). The [`apply_chat_template`] is designed to handle these intricacies and ensure the correct application of chat templates for various tasks.
465
+
466
+
> [!WARNING]
467
+
> It's important to note that chat templates are model-specific. For example, if you use the chat template from [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) with the above example, you get a different output:
># {'prompt': '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat color is the sky?<|im_end|>\n<|im_start|>assistant\n',
473
+
># 'completion': 'It is blue.<|im_end|>\n'}
474
+
>```
475
+
>
476
+
> Always use the chat template associated with the model you're working with. Using the wrong template can lead to inaccurate or unexpected results.
489
477
490
478
## Using any dataset with TRL: preprocessing and conversion
Keep in mind that the `"chosen"` and `"rejected"` completions in a preference dataset can be both good or bad.
721
-
Before applying [`unpair_preference_dataset`], please ensure that all `"chosen"` completions can be labeled as good and all `"rejected"` completions as bad.
722
-
This can be ensured by checking absolute rating of each completion, e.g. from a reward model.
723
-
724
-
</Tip>
706
+
> [!WARNING]
707
+
> Keep in mind that the `"chosen"` and `"rejected"` completions in a preference dataset can be both good or bad.
708
+
> Before applying [`unpair_preference_dataset`], please ensure that all `"chosen"` completions can be labeled as good and all `"rejected"` completions as bad.
709
+
> This can be ensured by checking absolute rating of each completion, e.g. from a reward model.
Keep in mind that the `"chosen"` and `"rejected"` completions in a preference dataset can be both good or bad.
862
-
Before applying [`unpair_preference_dataset`], please ensure that all `"chosen"` completions can be labeled as good and all `"rejected"` completions as bad.
863
-
This can be ensured by checking absolute rating of each completion, e.g. from a reward model.
864
-
865
-
</Tip>
844
+
> [!WARNING]
845
+
> Keep in mind that the `"chosen"` and `"rejected"` completions in a preference dataset can be both good or bad.
846
+
> Before applying [`unpair_preference_dataset`], please ensure that all `"chosen"` completions can be labeled as good and all `"rejected"` completions as bad.
847
+
> This can be ensured by checking absolute rating of each completion, e.g. from a reward model.
866
848
867
849
### From unpaired preference to language modeling dataset
Copy file name to clipboardExpand all lines: docs/source/deepspeed_integration.md
+2-5Lines changed: 2 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,7 @@
1
1
# DeepSpeed Integration
2
2
3
-
<Tipwarning={true}>
4
-
5
-
Section under construction. Feel free to contribute!
6
-
7
-
</Tip>
3
+
> [!WARNING]
4
+
> Section under construction. Feel free to contribute!
8
5
9
6
TRL supports training with DeepSpeed, a library that implements advanced training optimization techniques. These include optimizer state partitioning, offloading, gradient partitioning, and more.
| 8 | 4 | 1 | Multi-GPU to get the best of both worlds |
51
50
52
-
<Tip>
53
-
54
-
Having one model per GPU can lead to high memory usage, which may not be feasible for large models or low-memory GPUs. In such cases, you can leverage [DeepSpeed](https://github.com/deepspeedai/DeepSpeed), which provides optimizations like model sharding, Zero Redundancy Optimizer, mixed precision training, and offloading to CPU or NVMe. Check out our [DeepSpeed Integration](deepspeed_integration) guide for more details.
55
-
56
-
</Tip>
51
+
> [!TIP]
52
+
> Having one model per GPU can lead to high memory usage, which may not be feasible for large models or low-memory GPUs. In such cases, you can leverage [DeepSpeed](https://github.com/deepspeedai/DeepSpeed), which provides optimizations like model sharding, Zero Redundancy Optimizer, mixed precision training, and offloading to CPU or NVMe. Check out our [DeepSpeed Integration](deepspeed_integration) guide for more details.
57
53
58
54
## Context Parallelism
59
55
@@ -176,13 +172,10 @@ These results show that **Context Parallelism (CP) scales effectively with more
Accelerate also supports **N-Dimensional Parallelism (ND-parallelism)**, which enables you to combine different parallelization strategies to efficiently distribute model training across multiple GPUs.
182
-
183
-
You can learn more and explore configuration examples in the [Accelerate ND-parallelism guide](https://github.com/huggingface/accelerate/blob/main/examples/torch_native_parallelism/README.md#nd-parallelism).
184
-
185
-
</Tip>
175
+
> [!TIP]
176
+
> Accelerate also supports **N-Dimensional Parallelism (ND-parallelism)**, which enables you to combine different parallelization strategies to efficiently distribute model training across multiple GPUs.
177
+
>
178
+
> You can learn more and explore configuration examples in the [Accelerate ND-parallelism guide](https://github.com/huggingface/accelerate/blob/main/examples/torch_native_parallelism/README.md#nd-parallelism).
Copy file name to clipboardExpand all lines: docs/source/experimental.md
+4-10Lines changed: 4 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,11 +2,8 @@
2
2
3
3
The `trl.experimental` namespace provides a minimal, clearly separated space for fast iteration on new ideas.
4
4
5
-
<Tipwarning={true}>
6
-
7
-
**Stability contract:** Anything under `trl.experimental` may change or be removed in *any* release (including patch versions) without prior deprecation. Do not rely on these APIs for production workloads.
8
-
9
-
</Tip>
5
+
> [!WARNING]
6
+
> **Stability contract:** Anything under `trl.experimental` may change or be removed in *any* release (including patch versions) without prior deprecation. Do not rely on these APIs for production workloads.
10
7
11
8
## Current Experimental Features
12
9
@@ -95,11 +92,8 @@ training_args = GRPOConfig(
95
92
)
96
93
```
97
94
98
-
<Tipwarning={true}>
99
-
100
-
To leverage GSPO-token, the user will need to provide the per-token advantage \\( \hat{A_{i,t}} \\) for each token \\( t \\) in the sequence \\( i \\) (i.e., make \\( \hat{A_{i,t}} \\) varies with \\( t \\)—which isn't the case here, \\( \hat{A_{i,t}}=\hat{A_{i}} \\)). Otherwise, GSPO-Token gradient is just equivalent to the original GSPO implementation.
101
-
102
-
</Tip>
95
+
> [!WARNING]
96
+
> To leverage GSPO-token, the user will need to provide the per-token advantage \\( \hat{A_{i,t}} \\) for each token \\( t \\) in the sequence \\( i \\) (i.e., make \\( \hat{A_{i,t}} \\) varies with \\( t \\)—which isn't the case here, \\( \hat{A_{i,t}}=\hat{A_{i}} \\)). Otherwise, GSPO-Token gradient is just equivalent to the original GSPO implementation.
0 commit comments