Skip to content

Conversation

W3en2g
Copy link
Contributor

@W3en2g W3en2g commented Apr 21, 2025

Description

Implemented the GRPO with remote reward model and added cookbook for both Chinese and English.

Motivation and Context

This update consists of two main components:

  1. Implemented grpo_remote_rm.py based on the existing ppo_remote_rm.py, providing support for grpo with remote reward.

  2. Created a comprehensive text_to_text_grpo.ipynb cookbook that demonstrates:

    • How to use GRPO for model alignment within the Align Anything framework
    • How to apply grammatical rewards for RLVR (Reinforcement Learning with Value Ranking)
    • How to customize and implement various reward functions
    • Complete examples and best practices for both Chinese and English language processing
  • I have raised an issue to propose this change (required for new features and bug fixes)

Test

Please test your changes by running the following command:

cd scripts
bash test/test_text_to_text.sh ./opt PATH_TO_OUTPUT_ROOT_DIR

Here, ./opt is the directory containing the test scripts for the opt model, and PATH_TO_OUTPUT_ROOT_DIR is the path to the output root directory. The test scripts will save ~1GB data to the output root directory and delete it after the test. Please ensure you have enough space on your disk.

Lint

Please run the following command in the root directory to check your code style:

pip install pre-commit
pre-commit run --all-files

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant