generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Labels
Description
Reproduction
from datasets import load_dataset
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
from trl import (
SFTConfig,
SFTTrainer
)
model_name = "Qwen/Qwen2.5-3B-Instruct"
dataset_name = "nvidia/OpenMathInstruct-2"
train_split = "train_1M"
raw_datasets = load_dataset(dataset_name, split=train_split)
dataset = raw_datasets.rename_columns({
"problem" : "prompt",
"generated_solution" : "completion"
})
train_eval_dataset = dataset[train_split].train_test_split(test_size=0.1, seed=42)
train_dataset = train_eval_dataset["train"].take(1000)
eval_dataset = train_eval_dataset["test"].take(100)
training_args = SFTConfig(
output_dir="/tmp",
)
model = AutoModelForCausalLM.from_pretrained(model_name)
trainer = SFTTrainer(
model,
args=training_args, ## some training_args
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
trainer.train()In SFTTrainer'_prepare_dataset() doesn't apply chat_template to prompt-completion dataset. just prompt+completion+EOS
This works, but it doesn't seem to learn anything.(In my case repetition answer happen)
System Info
- Platform: Linux-6.11.0-1013-gcp-x86_64-with-glibc2.39
- Python version: 3.12.9
- TRL version: 0.17.0
- PyTorch version: 2.6.0
- CUDA device(s): NVIDIA L4, NVIDIA L4, NVIDIA L4, NVIDIA L4
- Transformers version: 4.51.3
- Accelerate version: 1.3.0
- Accelerate config: not found
- Datasets version: 3.5.0
- HF Hub version: 0.30.2
- bitsandbytes version: 0.45.5
- DeepSpeed version: 0.16.7
- Diffusers version: 0.33.1
- Liger-Kernel version: 0.5.9
- LLM-Blender version: 0.0.2
- OpenAI version: 1.76.0
- PEFT version: 0.15.2
- vLLM version: 0.8.5.post1
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete