Skip to content

Commit b07df79

Browse files
committed
Squashed commit of the following:
commit ae6837f Author: Sergio Paniego Blanco <[email protected]> Date: Mon Oct 6 18:40:18 2025 +0200 Removed tokenizer/processor creation from example scripts (#4211) commit 56a8f11 Author: Albert Villanova del Moral <[email protected]> Date: Mon Oct 6 17:45:44 2025 +0200 Replace setup with pyproject and fix packaging unintended modules (#4194) commit 5291015 Author: Sergio Paniego Blanco <[email protected]> Date: Mon Oct 6 16:04:06 2025 +0200 Remove `Optional` from `processing_class` in `PPOTrainer` (#4212) commit 0588b1f Author: Sergio Paniego Blanco <[email protected]> Date: Mon Oct 6 15:57:17 2025 +0200 Updated vLLM integration guide (#4162) Co-authored-by: Quentin Gallouédec <[email protected]> commit 45ee98b Author: Albert Villanova del Moral <[email protected]> Date: Mon Oct 6 11:14:54 2025 +0200 Replace unittest with pytest (#4188) commit 3800a6e Author: Albert Villanova del Moral <[email protected]> Date: Mon Oct 6 11:13:21 2025 +0200 Hotfix: Exclude transformers 4.57.0 for Python 3.9 (#4209) Co-authored-by: Sergio Paniego Blanco <[email protected]> commit 7ad9ce8 Author: Sergio Paniego Blanco <[email protected]> Date: Mon Oct 6 11:04:20 2025 +0200 Remove tokenizer creation from `sft` example script (#4197) commit 0c2dc14 Author: Albert Villanova del Moral <[email protected]> Date: Mon Oct 6 08:31:58 2025 +0200 Remove custome_container for building the docs (#4198) commit ced8b33 Author: burtenshaw <[email protected]> Date: Mon Oct 6 08:23:11 2025 +0200 [DOCS/FIX] lora without regrets - fix lr (#4207)
1 parent b691f39 commit b07df79

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+1963
-1960
lines changed

.github/workflows/build_documentation.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,5 @@ jobs:
1414
commit_sha: ${{ github.sha }}
1515
package: trl
1616
version_tag_suffix: ""
17-
custom_container: huggingface/transformers-doc-builder
1817
secrets:
1918
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

.github/workflows/build_pr_documentation.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,3 @@ jobs:
1616
pr_number: ${{ github.event.number }}
1717
package: trl
1818
version_tag_suffix: ""
19-
custom_container: huggingface/transformers-doc-builder

MANIFEST.in

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
include LICENSE
22
include CONTRIBUTING.md
33
include README.md
4-
recursive-exclude * __pycache__
4+
include trl/accelerate_configs/*.yaml
55
include trl/templates/*.md
6-
include trl/accelerate_configs/*.yaml
6+
recursive-exclude * __pycache__
7+
prune tests

docs/source/lora_without_regret.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -376,7 +376,7 @@ Here are the parameters we used to train the above models
376376
|----------------------------------|----------------------------------------------------|-------------------------------|
377377
| `--model_name_or_path` | HuggingFaceTB/SmolLM3-3B | HuggingFaceTB/SmolLM3-3B |
378378
| `--dataset_name` | HuggingFaceH4/OpenR1-Math-220k-default-verified | HuggingFaceH4/OpenR1-Math-220k-default-verified |
379-
| `--learning_rate` | 1.0e-6 | 1.0e-5 |
379+
| `--learning_rate` | 1.0e-5 | 1.0e-6 |
380380
| `--max_prompt_length` | 1024 | 1024 |
381381
| `--max_completion_length` | 4096 | 4096 |
382382
| `--lora_r` | 1 | - |

docs/source/vllm_integration.md

Lines changed: 348 additions & 44 deletions
Large diffs are not rendered by default.

examples/scripts/dpo_vlm.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@
8585
script_args, training_args, model_args = parser.parse_args_and_config()
8686

8787
################
88-
# Model & Tokenizer
88+
# Model & Processor
8989
################
9090
dtype = model_args.dtype if model_args.dtype in ["auto", None] else getattr(torch, model_args.dtype)
9191

@@ -117,7 +117,6 @@
117117
processor = AutoProcessor.from_pretrained(
118118
model_args.model_name_or_path, trust_remote_code=model_args.trust_remote_code, do_image_splitting=False
119119
)
120-
tokenizer = processor.tokenizer
121120

122121
# Set up the chat template
123122
if model.config.model_type == "idefics2":
@@ -127,8 +126,6 @@
127126
elif model.config.model_type == "llava":
128127
processor.chat_template = """{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{% if message['role'] == 'user' %}USER: {% else %}ASSISTANT: {% endif %}{% for item in message['content'] %}{% if item['type'] == 'text' %}{{ item['text'] }}{% elif item['type'] == 'image' %}<image>{% endif %}{% endfor %}{% if message['role'] == 'user' %} {% else %}{{eos_token}}{% endif %}{% endfor %}{% if add_generation_prompt %}ASSISTANT: {% endif %}"""
129128

130-
if tokenizer.pad_token is None:
131-
tokenizer.pad_token = tokenizer.eos_token
132129
if script_args.ignore_bias_buffers:
133130
# torch distributed hack
134131
model._ddp_params_and_buffers_to_ignore = [
@@ -153,7 +150,6 @@
153150
args=training_args,
154151
train_dataset=dataset[script_args.dataset_train_split],
155152
eval_dataset=dataset[script_args.dataset_test_split] if training_args.eval_strategy != "no" else None,
156-
processing_class=processor,
157153
peft_config=peft_config,
158154
)
159155

examples/scripts/grpo_vlm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@
9494
parser = TrlParser((ScriptArguments, GRPOConfig, ModelConfig))
9595
script_args, training_args, model_args = parser.parse_args_and_config()
9696
################
97-
# Model & Processor
97+
# Model
9898
################
9999
dtype = model_args.dtype if model_args.dtype in ["auto", None] else getattr(torch, model_args.dtype)
100100
training_args.model_init_kwargs = dict(

examples/scripts/gspo_vlm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@
8181
parser = TrlParser((ScriptArguments, GRPOConfig, ModelConfig))
8282
script_args, training_args, model_args = parser.parse_args_and_config()
8383
################
84-
# Model & Processor
84+
# Model
8585
################
8686
dtype = model_args.dtype if model_args.dtype in ["auto", None] else getattr(torch, model_args.dtype)
8787
training_args.model_init_kwargs = dict(

examples/scripts/mpo_vlm.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@
4646
import torch
4747
from datasets import load_dataset
4848
from PIL import Image
49-
from transformers import AutoModelForImageTextToText, AutoProcessor
49+
from transformers import AutoModelForImageTextToText
5050

5151
from trl import (
5252
DPOConfig,
@@ -97,9 +97,6 @@
9797
)
9898
else:
9999
ref_model = None
100-
processor = AutoProcessor.from_pretrained(
101-
model_args.model_name_or_path, trust_remote_code=model_args.trust_remote_code
102-
)
103100

104101
################
105102
# Dataset
@@ -135,7 +132,6 @@ def ensure_rgb(example):
135132
args=training_args,
136133
train_dataset=train_dataset,
137134
eval_dataset=test_dataset,
138-
processing_class=processor,
139135
peft_config=peft_config,
140136
)
141137

examples/scripts/reward_modeling.py

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@
5757
import torch
5858
from accelerate import logging
5959
from datasets import load_dataset
60-
from transformers import AutoModelForSequenceClassification, AutoTokenizer, HfArgumentParser
60+
from transformers import AutoModelForSequenceClassification, HfArgumentParser
6161

6262
from trl import (
6363
ModelConfig,
@@ -97,18 +97,9 @@
9797
model_kwargs["device_map"] = get_kbit_device_map()
9898
model_kwargs["quantization_config"] = quantization_config
9999

100-
tokenizer = AutoTokenizer.from_pretrained(
101-
model_args.model_name_or_path, trust_remote_code=model_args.trust_remote_code, use_fast=True
102-
)
103100
model = AutoModelForSequenceClassification.from_pretrained(
104101
model_args.model_name_or_path, num_labels=1, trust_remote_code=model_args.trust_remote_code, **model_kwargs
105102
)
106-
# Align padding tokens between tokenizer and model
107-
model.config.pad_token_id = tokenizer.pad_token_id
108-
109-
# If post-training a base model, use ChatML as the default template
110-
if tokenizer.chat_template is None:
111-
model, tokenizer = setup_chat_format(model, tokenizer)
112103

113104
if model_args.use_peft and model_args.lora_task_type != "SEQ_CLS":
114105
logger.warning(
@@ -126,7 +117,6 @@
126117
##########
127118
trainer = RewardTrainer(
128119
model=model,
129-
processing_class=tokenizer,
130120
args=training_args,
131121
train_dataset=dataset[script_args.dataset_train_split],
132122
eval_dataset=dataset[script_args.dataset_test_split] if training_args.eval_strategy != "no" else None,

0 commit comments

Comments
 (0)