Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Рет қаралды 13,113

Four techniques to optimize the speed of your model's inference process:
0:38 - Quantization
5:59 - Pruning
9:48 - Knowledge Distillation
13:00 - Engineering Optimizations
References:
LLM Inference Optimization blog post: lilianweng.github.io/posts/20...
How to deploy your deep learning project on a budget: luckytoilet.wordpress.com/202...
Efficient deep learning survey paper: arxiv.org/abs/2106.08962
SparseDNN: arxiv.org/abs/2101.07948

Жүктеу

Пікірлер: 22

@thomasschmitt9669
2 ай бұрын
This was one of the best explanation videos I have ever seen! Well structured and right complexity grade to follow without getting a headache. 👌
@lucaskeller656
Ай бұрын
Great format, succinctness, and diagrams. Thank you!
@muhannadobeidat
2 ай бұрын
Excellent video. Well spoken. Nice visualizations.
@vineetkumarmishra2989
2 ай бұрын
wonderfully explained !! Thanks for the video.
@kevon217
9 ай бұрын
Thanks for this!
@heteromodal
4 ай бұрын
What a great video! Thank you!
@jokmenen_
3 ай бұрын
Awesome video!
@unclecode
4 ай бұрын
Great content, well done. Please make a video for ONNX, and another one for Flash Attention. Appreciate.
@huiwencheng4585
4 ай бұрын
Fantastic introduction and explanation !
@hrsight
Ай бұрын
nice video
@420_gunna
4 ай бұрын
This felt very nicely taught -- I loved that you pulled back a summary/review at the end of the video - great practice. Please continue, thank you!
@user-qo7vr3ml4c
15 күн бұрын
Great summary, thank you.
@jeremyuzan1169
Ай бұрын
Great video
@user-bd7eq6vx1t
11 ай бұрын
your teaches so excellent.. we accepted many more videos from your side to understand for the fundamental NLP
@kevon217
9 ай бұрын
^
@MuhammadAli-dw7mv
Ай бұрын
nicely done
@yunlu4657
4 ай бұрын
Excellent video, learnt a lot! However, the definition of zero-point quantization is off. What you're showing in the video is the abs-max quantization instead.
@EfficientNLP
4 ай бұрын
The example I showed is zero-point quantization because 0 in the original domain is mapped to 0 in the quantized domain (before transforming to unsigned). In abs-max (not covered in this video), the maximum in the original domain would be mapped to 127, and the minimum would be mapped to -128.
@ricardokullock2535
15 күн бұрын
And if one was to quantize a distilled model? Is the outcome any good?
@EfficientNLP
15 күн бұрын
Yes, these two techniques are often used together to improve efficiency.
@andrea-mj9ce
2 ай бұрын
The explanation for distillation remains at the surface, it is not enough to understand it
@EfficientNLP
2 ай бұрын
If you have any specific questions I’ll try to answer them!