I will cover Vision transformer in three parts. The first part which is this video focusses on patch embedding in vision transformer.
I will go over all the details and explain everything happening inside the patch embedding in VIT in detail.
I will also go over how an implementation of patch embedding for vision transformer in Pytorch would look like.
The second part which goes through attention can be found here -
Attention in Vision Transformer (Part Two) - • ATTENTION | An Image i...
The third part which builds entire transformer and shows how to visualize attention maps and positional embeddings can be found below -
Implementing Vision Transformer (Part Three) - • Image Classification U...
Timestamps :
00:00 Intro
00:56 Need for Patch Embedding in Vision Transformer
01:30 Converting Image into Sequence of Patches
01:59 Patch Embedding Projection
02:45 Positional Information for Patches
03:40 CLS Token
04:10 Patch Embedding Responsibilities
04:40 Patch Embedding Module Implementation
08:02 Outro
Paper Link - tinyurl.com/exai-vit-paper
Implementation will be pushed here after all three videos are out - tinyurl.com/exai-vit-code
Subscribe - tinyurl.com/exai-channel-link
Background Track - Fruits of Life by Jimena Contreras
Email - explainingai.official@gmail.com
Негізгі бет PATCH EMBEDDING | Vision Transformers explained
Пікірлер: 22