-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add trainers taxonomy to docs #4195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
OnlineDPO doesn't inherit from DOO |
lol updated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!!
I would just raise one concern about maintainability: every time we add/rename/remove a trainer, this image would need to be regenerated and updated accordingly.
What do you think?
I understand your concern and am open to your thoughts. I don't think the addition of trainers happens frequently enough to be an issue. My intention in adding this taxonomy was to make it easier for beginners to understand the dependencies without having to navigate the code. and showcase online support via vLLM. Perhaps a simple list like the following could work as well: (⚡ = online support via vLLM)
|
Alternatively, do you know if the Hub supports graph LR
root[TRL Trainers]
BCO[BCOTrainer]
CPO[CPOTrainer]
DPO[DPOTrainer]
OnlineDPO[OnlineDPO ⚡]
%% Group NashMD and XPO together without visible box
subgraph cluster_online_dpo[ ]
style cluster_online_dpo fill:none,stroke:none
NashMD[NashMD ⚡]
XPO[XPOTrainer ⚡]
end
GRPO[GRPOTrainer ⚡]
KTO[KTOTrainer]
ORPO[ORPOTrainer]
PPO[PPOTrainer]
PRM[PRMTrainer]
Reward[RewardTrainer]
RLOO[RLOOTrainer ⚡]
SFT[SFTTrainer]
GKD[GKDTrainer]
root --> BCO
root --> CPO
root --> DPO
root --> OnlineDPO
OnlineDPO --> NashMD
OnlineDPO --> XPO
root --> GRPO
root --> KTO
root --> ORPO
root --> PPO
root --> PRM
root --> Reward
root --> RLOO
root --> SFT
SFT --> GKD
|
For taxonomy, organizing by method style rather than Python inheritance may be more informative. For example:
wdyt? |
Also, at some point I'd like to see the inheritance from GKD to SFT removed. Once this is done, this "inheritance" taxonomy would become even less informative and would just be a list, with the exception of XPO and NashMD. |
afaik, there's no support updated organizing it based on method style! |
Better! A few things to fix:
|
updated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks!
What does this PR do?
Add the trainers taxonomy to the documentation, including details on inheritance and online support.
I've added the diagram here and if approved, I'll move it to
documentation-images
before merging.Before submitting
Pull Request section?
to it if that's the case.
Who can review?
@qgallouedec