Негізгі бет Ғылым және технология New ideas in AI: DPO has given us alignment without the overhead

29 күн бұрын

New ideas in AI: DPO has given us alignment without the overhead

Рет қаралды 267

Scale Venture Partners

1 1

Scale Partner Jeremy Kaufmann interviews Archit Sharma and Rafael Rafailov, two of the authors of the 2023 NeurIPS Outstanding Paper “Direct Preference Optimization: Your Language Model is Secretly a Reward Model” (aka the DPO paper).

Пікірлер