We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent d1bf560 commit f5c2fecCopy full SHA for f5c2fec
docs/source/quickstart.md
@@ -32,7 +32,7 @@ def reward_function(completions, **kwargs):
32
trainer = GRPOTrainer(
33
model="Qwen/Qwen2.5-0.5B-Instruct", # Start from SFT model
34
train_dataset=load_dataset("trl-lib/tldr", split="train"),
35
- reward_function=reward_function,
+ reward_funcs=reward_function,
36
)
37
trainer.train()
38
```
0 commit comments