steps_per_generation in GRPOTrainer #4200

kdricci · 2025-10-02T18:08:46Z

kdricci
Oct 2, 2025

I am looking through the GRPOTrainer code. Can anyone help me understand why steps_per_generation is involved in the calculation of repeat_count here, in the construction of the RepeatSampler?

trl/trl/trainer/grpo_trainer.py

Line 695 in e086f07

repeat_count=self.num_iterations * self.args.steps_per_generation,

It seems to be also used as a multiplier for the batch size:

trl/trl/trainer/grpo_config.py

Line 650 in e086f07

    
           self.generation_batch_size = self.per_device_train_batch_size * num_processes * self.steps_per_generation

trl/trl/trainer/grpo_trainer.py

Line 694 in e086f07

batch_size=self.args.generation_batch_size // self.num_generations,

However, my understanding is that each batch gets repeated repeat_count times in the Sampler, based on the following snippets:

trl/trl/trainer/utils.py

Line 1752 in e086f07

    
           indexes = [indexes[i : i + self.batch_size] for i in range(0, len(indexes), self.batch_size)]

trl/trl/trainer/utils.py

Line 1759 in e086f07

for _ in range(self.repeat_count):

So, it seems to me like steps_per_generation contributes to both batch size and number of times the batch is repeated. If this understanding is correct, why is this? If not, what did I miss?

Thanks!

Answered by qgallouedec

Oct 2, 2025

Yes I get that it may not be super easy to understand the implementation details. Maybe the important point here is that we don't really need the repetition in the sampling, because when the trainer doesn't need to generate, it just ignores the samples data. See https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L1007-L1008

So if you have steps_per_generation=2 then

Sample 1 > generate
Sample 1 > ignored
Sample 2 > generate

And so on

View full answer

qgallouedec · 2025-10-02T22:03:51Z

qgallouedec
Oct 2, 2025
Maintainer

steps_per_generation controls how often you generate. If it's 1, it generates every step. If it's 2, it generates for 2 steps every 2 steps. And so on.

0 replies

kdricci · 2025-10-02T23:21:07Z

kdricci
Oct 2, 2025
Author

Thank you so much for taking the time to answer! Unfortunately, I'm still not quite understanding the details that I see in the code.

I think the main point of confusion for me is why repeat_count is not simply equal to num_iterations (in the first snippet I cited).

Say we want steps_per_generation=2:

It makes sense to me that our number of prompts per batch should therefore be multiplied by 2 (second & third snippets) -- and we then divide that batch up over 2 iterations via _prepare_inputs.

However, it's unclear to me why we also multiply by 2 the number of iterations for which that doubled batch of prompts gets used (first & last snippets).

What is the reasoning for this? Or, am I wrong that this is what is happening?

Thanks again.

2 replies

qgallouedec Oct 2, 2025
Maintainer

Yes I get that it may not be super easy to understand the implementation details. Maybe the important point here is that we don't really need the repetition in the sampling, because when the trainer doesn't need to generate, it just ignores the samples data. See https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L1007-L1008

So if you have steps_per_generation=2 then

Sample 1 > generate
Sample 1 > ignored
Sample 2 > generate

And so on

Answer selected by kdricci

kdricci Oct 3, 2025
Author

Gotcha! It's really helpful for me to understand this bit--thanks so much for taking the time to answer!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

steps_per_generation in GRPOTrainer #4200

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

steps_per_generation in GRPOTrainer #4200

Uh oh!

kdricci Oct 2, 2025

Replies: 2 comments · 2 replies

Uh oh!

qgallouedec Oct 2, 2025 Maintainer

Uh oh!

kdricci Oct 2, 2025 Author

Uh oh!

qgallouedec Oct 2, 2025 Maintainer

Uh oh!

kdricci Oct 3, 2025 Author

kdricci
Oct 2, 2025

Replies: 2 comments 2 replies

qgallouedec
Oct 2, 2025
Maintainer

kdricci
Oct 2, 2025
Author

qgallouedec Oct 2, 2025
Maintainer

kdricci Oct 3, 2025
Author