Replies: 1 comment 3 replies
-
is this the section you are looking at? they say it is similar to the cringe paper, but then goes on to outline what they actually did |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
-
I think authors of self rewarding llm didn't use standard DPO but Interactive DPO, which is from their another paper: https://arxiv.org/pdf/2312.16682.pdf.
Beta Was this translation helpful? Give feedback.
All reactions