Skip to content

请问chord中的sft和rl数据的具体设置应该是? #331

@sdws259

Description

@sdws259

作者您好,感谢您十分有趣的工作(ON-POLICY RL MEETS OFF-POLICY EXPERTS: HARMONIZING SUPERVISED FINE-TUNING AND REINFORCEMENT LEARNING VIA DYNAMIC WEIGHTING)。我想运行一下chord代码,但我对chord的sft数据集和rl数据集是哪些不太清楚。我是直接下载chord 例子中提到的datajuicer/Trinity-ToolAce-RL-split和datajuicer/Trinity-ToolAce-SFT-split,然后替换yaml文件中相应的路径即可嘛?
此外,您在论文中提到对于数学推理任务,采样了5k sft和20k rl数据,但我好像没有找到相应文件的下载路径?请问这些文件是暂时还未开源嘛?您计划共享相应的文件嘛?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions