Scale Partner Jeremy Kaufmann interviews Archit Sharma and Rafael Rafailov, two of the authors of the 2023 NeurIPS Outstanding Paper “Direct Preference Optimization: Your Language Model is Secretly a Reward Model” (aka the DPO paper).
Негізгі бет Ғылым және технология New ideas in AI: DPO has given us alignment without the overhead
Пікірлер