Skip to content

How to modify the advantage computation in GRPOTrainer #4525

@Tuziking

Description

@Tuziking

I’m looking to customize the advantage computation used in the DAPO algorithm. Do I need to subclass the full GRPOTrainer to do this, or is it sufficient to overwrite the logic in _generate_and_score_completions, since that method appears to handle the advantage calculation?

Metadata

Metadata

Assignees

No one assigned

    Labels

    ❓ questionSeeking clarification or more information🏋 GRPORelated to GRPO

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions