I’m looking to customize the advantage computation used in the DAPO algorithm. Do I need to subclass the full GRPOTrainer to do this, or is it sufficient to overwrite the logic in _generate_and_score_completions, since that method appears to handle the advantage calculation?