-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Description
作者您好,感谢您十分有趣的工作(ON-POLICY RL MEETS OFF-POLICY EXPERTS: HARMONIZING SUPERVISED FINE-TUNING AND REINFORCEMENT LEARNING VIA DYNAMIC WEIGHTING)。我想运行一下chord代码,但我对chord的sft数据集和rl数据集是哪些不太清楚。我是直接下载chord 例子中提到的datajuicer/Trinity-ToolAce-RL-split和datajuicer/Trinity-ToolAce-SFT-split,然后替换yaml文件中相应的路径即可嘛?
此外,您在论文中提到对于数学推理任务,采样了5k sft和20k rl数据,但我好像没有找到相应文件的下载路径?请问这些文件是暂时还未开源嘛?您计划共享相应的文件嘛?
Metadata
Metadata
Assignees
Labels
No labels