Distilling the Knowledge in a Neural Network

Рет қаралды 19,744

Kapil Sachdeva

Жүктеу

Пікірлер: 50

@Dannyboi91
3 жыл бұрын
This was a great explanation! The paper is fairly short and clear cut but the additional graph made the presentation way easier to understand!
@leonwitt6830
3 жыл бұрын
Amazing Job Kapil! Please continue with your work :)
@squarehead6c1
Жыл бұрын
Dear Mr. Sachdeva! Excellent presentation of this paper! A question. I am currently interested in knowledge engineering and the concept of explicit (e.g., logical sentences)and implciit (such as the one encoded in ANNs). Does the distillation extract explicit knowledge, for instance for the purpose of explainable AI or is it still just as encoded in the ANN?
@KapilSachdeva
Жыл бұрын
Very good question. Unfortunately, I do not have enough insight and or have come across experiments regarding the explicit vs implicit knowledge in relation with knowledge distillation. Would appreciate if comment here when you get some answer from your research. Thanks.
@syedkhureshi1879
3 жыл бұрын
Great work at explaining the concepts!
@tranquangkhai8329
3 жыл бұрын
Excellent explanation! Learn many things from your video. Thank you!
@skauddy755
7 ай бұрын
Very insightful walkthrough. Thank You!
@KapilSachdeva
7 ай бұрын
🙏
@grayleafmorgo
2 жыл бұрын
You explained it amazingly. There is another new paper called "Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge" used this concept to suggest a model called FedGKT in the federated learning field. I suggest you to check it and maybe make a video to clarify their work. Thank you
@KapilSachdeva
2 жыл бұрын
🙏
@shruti9457
2 жыл бұрын
Thank you for the very clear explanation!
@KapilSachdeva
2 жыл бұрын
🙏
@CarlosFajardoA
3 жыл бұрын
Great explanation. It helps me to undertand better. Thank you for sharing.
@furkatsultonov9976
3 жыл бұрын
excellent explanation! Thank you
@mhozaifakhan
4 жыл бұрын
Great explanation. Thanks.
@aishaal-harbi1929
2 жыл бұрын
Thank you so much, sir!
@KapilSachdeva
2 жыл бұрын
🙏
@mohammedkassahun3899
2 жыл бұрын
Hows it that u only have 2k followers? I have no words. Even if this was the only video u made, u deserve a million likes.! Thanks a lot man!
@KapilSachdeva
2 жыл бұрын
🙏. Am happy it was of some help.
@sqliu9489
3 жыл бұрын
Nice video👍
@goldfishjy95
3 жыл бұрын
This is high quality education...thank you so much!
@KapilSachdeva
3 жыл бұрын
🙏
@kushalneo
9 ай бұрын
Great Explanation
@KapilSachdeva
9 ай бұрын
🙏
@furqanmalik1425
3 жыл бұрын
Very good and adorable demonstration. Do you have any video explaining the paper Model Compression via Distillation and Quantization ICLR 2018 by Antonio Polino, Dan Alistarh and Razvan Pascanu ( Google Deep Mind)
@KapilSachdeva
3 жыл бұрын
🙏 Thanks for the kind words. At present, I do not have any video on my channel for this particular paper. Just finished reading it and indeed it is an interesting paper. Thanks for providing the reference.
@KapilSachdeva
3 жыл бұрын
Here is what I have understood by reading this paper - Background: End goal is to have a smaller, simpler, shallower network to be able to use on resource constrained devices and/or perform predictions faster. The 3 main ideas that exists to achieve this goal are - Transfer Learning, Quantization, Distillation Quantization => you reduce the weights size e.g. float to int or even binary => the math operations become very fast. Premise behind distillation: During training the network explore various directions to learn and often those are not required during prediction. I talk about this in my paper reading also. The Big Idea: Could we combine Quantization & Distillation? But how to combine them (Algorithm 1, Page 5): - Before you compute the distillation loss, you create the quantized weights. Note - do not update the weights or do not train it such that the weights are forced to be quantized. - Compute the gradients (i.e. the backward pass) - The gradients should be used to update the "original" weights. When the training ends or the last step in the training then you replace the weights with their quantized version. They have another version of the algorithm called Algorithm 2 (Page 6). This second version of algorithm is based on how one does the qunatization. For e.g. quantization is done by rounding number either up or down. To do this up or down you are adding a small number. But what should be that smaller number? ........ This is what this version of algorithm is doing by "learning" the appropriate smaller number to add. They call them quantization points. I am not sure if I have completely understood this aspect so make sure to verify it!. Results/Conclusions: They "strongly" suggests that distillation loss is better than normal loss. This is proven by doing experiments. Hope this helps!
@anirudhthatipelli8765
Жыл бұрын
Thanks a lot, this was wonderfully explained!
@KapilSachdeva
Жыл бұрын
🙏
@SP-db6sh
3 жыл бұрын
Best candid explanation of this cumbersome topic is distilled here !
@KapilSachdeva
3 жыл бұрын
Thanks 😀
@ghazalehserati1831
2 жыл бұрын
Great and helpful explanations. thanks a lot.
@KapilSachdeva
2 жыл бұрын
🙏
@anasnb2022
4 жыл бұрын
Thank you.. :)
@lidiyanorman8521
3 жыл бұрын
Thank you, great explanations!
@KapilSachdeva
3 жыл бұрын
🙏
@ishansharma4900
3 жыл бұрын
Thanks a lot!
@KapilSachdeva
3 жыл бұрын
🙏
@InquilineKea
Жыл бұрын
THIS IS ACTUALLY SO GOOD
@KapilSachdeva
Жыл бұрын
🙏
@kumarteerath6916
3 жыл бұрын
Impressive work! Keep it up.
@KapilSachdeva
3 жыл бұрын
🙏
@ogsconnect1312
2 жыл бұрын
Well done! Good job!
@KapilSachdeva
2 жыл бұрын
🙏
@psychicmario
3 жыл бұрын
Excellent explanation
@KapilSachdeva
3 жыл бұрын
🙏
@pallaviprakash1090
3 жыл бұрын
Loved it
@KapilSachdeva
3 жыл бұрын
🙏
@youzheng9546
Жыл бұрын
It's amazing ! Thank you for sharing the Knowledge Distillation so clearly !
@KapilSachdeva
Жыл бұрын
🙏