▬▬ Papers / Resources ▬▬▬
Colab Notebook: colab.research.google.com/dri...
ViT paper: arxiv.org/abs/2010.11929
Best Transformer intro: jalammar.github.io/illustrate...
CNNs vs ViT: arxiv.org/abs/2108.08810
CNNs vs ViT Blog: towardsdatascience.com/do-vis...
Swin Transformer: arxiv.org/abs/2103.14030
DeiT: arxiv.org/abs/2012.12877
▬▬ Support me if you like 🌟
►Link to this channel: bit.ly/3zEqL1W
►Support me on Patreon: bit.ly/2Wed242
►Buy me a coffee on Ko-Fi: bit.ly/3kJYEdl
►E-Mail: deepfindr@gmail.com
▬▬ Used Music ▬▬▬▬▬▬▬▬▬▬▬
Music from #Uppbeat (free for Creators!):
uppbeat.io/t/92elm/jasmine
License code: SMTWRWLNGHZHH0OC
▬▬ Used Icons ▬▬▬▬▬▬▬▬▬▬
All Icons are from flaticon: www.flaticon.com/authors/freepik
▬▬ Timestamps ▬▬▬▬▬▬▬▬▬▬▬
00:00 Introduction
00:16 ViT Intro
01:12 Input embeddings
01:50 Image patching
02:54 Einops reshaping
04:13 [CODE] Patching
05:35 CLS Token
06:40 Positional Embeddings
08:09 Transformer Encoder
08:30 Multi-head attention
08:50 [CODE] Multi-head attention
09:12 Layer Norm
09:30 [CODE] Layer Norm
09:55 Feed Forward Head
10:05 Feed Forward Head
10:21 Residuals
10:45 [CODE] final ViT
13:10 CNN vs. ViT
14:45 ViT Variants
▬▬ My equipment 💻
- Microphone: amzn.to/3DVqB8H
- Microphone mount: amzn.to/3BWUcOJ
- Monitors: amzn.to/3G2Jjgr
- Monitor mount: amzn.to/3AWGIAY
- Height-adjustable table: amzn.to/3aUysXC
- Ergonomic chair: amzn.to/3phQg7r
- PC case: amzn.to/3jdlI2Y
- GPU: amzn.to/3AWyzwy
- Keyboard: amzn.to/2XskWHP
- Bluelight filter glasses: amzn.to/3pj0fK2
Негізгі бет Vision Transformer Quick Guide - Theory and Code in (almost) 15 min
Пікірлер: 54