Now we will have some grounding for when weird ChatGPT behaviors are intended or side-effects -- shrinking the Overton window of RLHF bugs.
This is AI generated audio with Python and 11Labs.
Source code: github.com/natolambert/interc...
Original post: www.interconnects.ai/p/openai...
00:00 OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions
02:56 Reviewing the Model Spec
08:26 Where RLHF can fail OpenAI
12:23 From Model Spec's to personalization
Fig 1: huggingface.co/datasets/natol...
Fig 2: huggingface.co/datasets/natol...
Fig 3: huggingface.co/datasets/natol...
Fig 4: huggingface.co/datasets/natol...
Fig 5: huggingface.co/datasets/natol...
Fig 6: huggingface.co/datasets/natol...
Негізгі бет OpenAI's RLHF Specifications
Пікірлер