An update on DPO vs PPO for LLM alignment

Пікірлер: 7

@LinetteScuderi
2 ай бұрын
Офигенно близкая игра! Очень кайфово смотреть такие адреналиновые заносы!
@666WolfWere
3 ай бұрын
THX! :D
@natolambert
3 ай бұрын
Models, datasets, etc: huggingface.co/collections/allenai/tulu-v25-suite-66676520fd578080e126f618
@sumanthbalaji1768
2 ай бұрын
Hey Nathan, your research seems to defend PPO over DPO but the most recent large models from llama3.1 and nemotron 4 DONT make use of PPO. They just make use of DPO with rejection sampling. In fact llama 3.1 paper chooses DPO only because of ease of compute. What are your thoughts on this? Is PPO more relevant for small to medium sized LLMs? Can the scale of large LLMs with DPO (and clever rejection sampling) be enough?
@natolambert
2 ай бұрын
@@sumanthbalaji1768 will write an update on this soon on www.interconnects.ai/ :)
@sumanthbalaji1768
2 ай бұрын
@@natolambert lovely, thanks
@420_gunna
3 ай бұрын
"White Rice Research" 🍚🔍👁

Пікірлер: 7