Wonderful episode! AE Studio is a very inspiring team. Has the self-modeling idea been applied when training reward models and does it have a predictable effect during rlhf?
@mikevaiana
3 күн бұрын
Thanks for the kind words! To answer your question, we haven't applied it to reward models yet.
Пікірлер: 2