This Keras Code Examples show you how to implement Knowledge Distillation! Knowledge Distillation has lead to new advances in compression, training state of the art models, and stabilizing Transformers for Computer Vision. All you need to do to build on this is swap out the Teacher and Student architectures. I think the example of how to overwrite keras.Model and integrate two loss functions controlled with an alpha hyperparameter weighting is very useful as well.
Content Links
Knowledge Distillation (Keras Code Examples): keras.io/examples/vision/know...
DistilBERT: arxiv.org/pdf/1910.01108.pdf
Self-Training with Noisy Student: arxiv.org/pdf/1911.04252.pdf
Data-efficient Image Transformers: / data-efficient-image-t...
KL Divergence: en.wikipedia.org/wiki/Kullbac...
0:00 Beginning
0:44 Motivation, Success Stories
2:47 Custom keras.Model
11:18 Teacher and Student models
12:17 Data Loading, Train the Teacher
14:05 Distill Teacher to Student
Негізгі бет Ғылым және технология Knowledge Distillation - Keras Code Examples
Пікірлер: 21