From Fully Connected 2023
Over the past couple months CarperAI has built trlX, one of the first open source reinforcement learning with human feedback (RLHF) implementations capable of fine-tuning large language models at scale. They’ve tested offline reinforcement algorithms to reduce compute requirements and explored the practicality of synthetic preference data, finding both can be combined to significantly reduce expensive RLHF costs.
Негізгі бет Ғылым және технология Building The Next Large Model: trlX: A Framework for Open-Source RLHF
Пікірлер: 3