Part 1 of a walkthrough of our paper, Progress Measures for Grokking via Mechanistic Interpretability. I'm joined by my co-author Lawrence Chan. In this part, we give an overview of the paper and discuss the key takeaways
Part 2: • A Walkthrough of Progr...
Part 3: • A Walkthrough of Progr...
If you want to learn more about mechanistic interpretability, check out neelnanda.io/getting-started
Our paper: arxiv.org/abs/2301.05217
Original grokking paper: arxiv.org/abs/2201.02177
AdamW: pytorch.org/docs/stable/gener...
Walkthrough of toy models of superposition: • A Walkthrough of Toy M...
Danny Hernandez paper on scaling laws for repeated data: arxiv.org/abs/2205.10487
Jermyn & Schlegeris on S-Shaped Curves: www.alignmentforum.org/posts/...
Unifying Grokking and Double Descent: arxiv.org/abs/2303.06173
Omnigrok: arxiv.org/abs/2210.01117
0:00 - Intro
0:50 - What is grokking?
9:53 - Mechanistic interpretability
11:47 - Paper overview, modular addition algorithm
15:08 - Progress measures
21:41 - why this work is bullshit
29:30 - Predicting when it will grok?
33:45 - Why does grokking happen?
40:27 - Lottery ticket hypothesis
42:43 - Conclusion
Негізгі бет Ғылым және технология A Walkthrough of Progress Measures for Grokking via Mechanistic Interpretability: What? (Part 1/3)
Пікірлер: 7