The complete guide to Transformer neural Networks!

Рет қаралды 31,566

Let's do a deep dive into the Transformer Neural Network Architecture for language translation.
ABOUT ME
⭕ Subscribe: kzitem.info...
📚 Medium Blog: / dataemporium
💻 Github: github.com/ajhalthor
👔 LinkedIn: / ajay-halthor-477974bb
RESOURCES
[ 1 🔎] Transformer Architecture Image :github.com/ajhalthor/Transfor...
[2 🔎] draw.io version of the image for clarity: github.com/ajhalthor/Transfor...
PLAYLISTS FROM MY CHANNEL
⭕ Transformers from scratch playlist: • Self Attention in Tran...
⭕ ChatGPT Playlist of all other videos: • ChatGPT
⭕ Transformer Neural Networks: • Natural Language Proce...
⭕ Convolutional Neural Networks: • Convolution Neural Net...
⭕ The Math You Should Know : • The Math You Should Know
⭕ Probability Theory for Machine Learning: • Probability Theory for...
⭕ Coding Machine Learning: • Code Machine Learning
MATH COURSES (7 day free trial)
📕 Mathematics for Machine Learning: imp.i384100.net/MathML
📕 Calculus: imp.i384100.net/Calculus
📕 Statistics for Data Science: imp.i384100.net/AdvancedStati...
📕 Bayesian Statistics: imp.i384100.net/BayesianStati...
📕 Linear Algebra: imp.i384100.net/LinearAlgebra
📕 Probability: imp.i384100.net/Probability
OTHER RELATED COURSES (7 day free trial)
📕 ⭐ Deep Learning Specialization: imp.i384100.net/Deep-Learning
📕 Python for Everybody: imp.i384100.net/python
📕 MLOps Course: imp.i384100.net/MLOps
📕 Natural Language Processing (NLP): imp.i384100.net/NLP
📕 Machine Learning in Production: imp.i384100.net/MLProduction
📕 Data Science Specialization: imp.i384100.net/DataScience
📕 Tensorflow: imp.i384100.net/Tensorflow
TIMESTAMPS
0:00 Introduction
1:38 Transformer at a high level
4:15 Why Batch Data? Why Fixed Length Sequence?
6:13 Embeddings
7:00 Positional Encodings
7:58 Query, Key and Value vectors
9:19 Masked Multi Head Self Attention
14:46 Residual Connections
15:50 Layer Normalization
17:57 Decoder
20:12 Masked Multi Head Cross Attention
22:47
24:03 Tokenization & Generating the next translated word
26:00 Transformer Inference Example

Жүктеу

Пікірлер: 106

@CodeEmporium
Жыл бұрын
The link to the image and it’s raw file are in the description. If you think I deserve it, please give this video a like and subscribe for more! If you think it’s worth sharing, please do so as well. I would love to grow to 100k subscribers this year with your help :) Thank you!
@RanDuan-dp6oz
Жыл бұрын
Just gave the thumb up! Just curious: what software did you use to draw such a wonderful diagram?
@junningdeng7385
Жыл бұрын
Sooooo nice! Where we can find the link to the image😂
@CodeEmporium
Жыл бұрын
Thanks I used draw.io to draw the image
@CodeEmporium
Жыл бұрын
The image can be found in the description of the video on GitHub
@user-np2jc9km3u
6 ай бұрын
But what is the source for the kannada words that was feed in to the output?, how can we get those word in reality? could you explain me if you are willing to. Thank you.
@swethanandyala
3 сағат бұрын
The best explanations on transformers that i have seen!
@siddheshdandagavhal9804
10 ай бұрын
Most underrated youtuber. You are explaining this complex topics with such an ease. Many big channels avoid explaining this topics. Really appreciate your work man.
@CodeEmporium
10 ай бұрын
Thanks a lot for the kind words. I try :)
@ShimoriUta77
5 ай бұрын
Bro for real! It never felt a possibility for me to learn ML but this guy took me by hand and is teaching all this for free! I can't even thank this dude enough
@Anirudh-cf3oc
8 ай бұрын
You are the most underrated KZitemr. This is the best video explaining Transformers completely in the most intuitive way. I started my journey with Transformers with your first Transformers video few years ago which was very helpful. Also, I am so happy to see an AI tutorial video using an Indian Language. I really appreciate your work.
@moseslee8761
9 ай бұрын
You explain really well! I think its quite complex but as you explained it, it has become more clear. I think with the coding video, it is extremely useful
@ramakantshakya5478
Жыл бұрын
Amazing explanations throughout the series, and top-notch content, as always. Waiting for a detailed explanation/visualisation of the backward pass in the encoder/decoder during training. I would appreciate it if you were thinking in the same way.
@asdfasdf71865
11 ай бұрын
i like your visualization of the matrixes. those residual connections and positional embeddings were good details to mention here
@amiralioghli8622
9 ай бұрын
Thank you so much for taking the time to code and explain the transformer model in such detail, I followed your series from zeros to heros. You are amazing and, if possible please do a series on how transformers can be used for time series anomaly detection and forecasting. it is extremly necessary on yotube for somone!
@ianrugg
Жыл бұрын
Great overview! Thanks for taking the time to put all this together!
@CodeEmporium
Жыл бұрын
Thanks so much! My pleasure
@menghan9260
Жыл бұрын
The way you approach this topic make it so easy to understand, and I appreciate the pace of your talking. Best content on transformer.
@CodeEmporium
Жыл бұрын
You are very welcome. And thanks so much for that super thanks. You didn’t have to, but very appreciated
@triloksachin4826
2 ай бұрын
Amazing video, keep up the good work. Thanks for this!!
@aintgonhappen
Жыл бұрын
Video quality is amazing. Keep it up, buddy!
@CodeEmporium
Жыл бұрын
I shall. Thanks so much!
@bhashganti9483
5 ай бұрын
Awesome tutorial on application of "transformer" architecture for language translation. This is my very first lesson on the topic and I will give a 5+ stars. Thx dude you inspired me to subscribe to your channel -- my very first you tube subscription . Can't thank you enough!!
@CodeEmporium
5 ай бұрын
Thanks for the kind words! And super glad this video was helpful. Hope you enjoy the full playlist “Transformers from scratch “ of which this video is a part of :)
@ArunKumar-bp5lo
6 ай бұрын
love the visualization makes it so clear
@Mr.AIFella
Жыл бұрын
You're explanation is the most realistic explication of the Transformer that I've ever seen in the internet. Thanks dude.
@CodeEmporium
Жыл бұрын
That means a lot. Thank you. Please like subscribe and share around if you can :)
@Sneha-Sivakumar
7 ай бұрын
this was a brilliant video!! super comprehensive
@lakshman587
7 ай бұрын
Thank you so much for all these videos, I have learnt a lot from your videos!!! I thought you were from Tamil Nadu, but today I got to know that you were from Karnataka!! Where from Karnataka? I'm staying in Bangalore, Would like to meet you in-person!!!!!
@helloansuman
Жыл бұрын
Amazing❤ Salute to the dedication in making this video, visual explaination and knowledge.
@CodeEmporium
Жыл бұрын
Thanks so much for watching and commenting!
@amitsingha1637
8 ай бұрын
Bro all of my Confusion vanished like vanishing Gradient. Thanks. Really worth it.
@wireghost897
11 ай бұрын
Very well explained. Thank you.
@cyberpunkdarren
3 ай бұрын
You kanada written language is really beautiful!
@Diego-nw4rt
Жыл бұрын
Great channel and very useful video, thank you very much! I will watch other videos of your channel as well. I have a question. After you perform layer normalization obtaining an output tensor, how do you give a three-dimensional tensor as input to a feed forward layer? Do you flatten the input?
@enrico1976
5 ай бұрын
That was awesome. Thank you man!!!
@KulkarniPrashant
Ай бұрын
Amazing video! Thank you.
@user-pu4iz8wb4d
Жыл бұрын
THIS IS AMAZING ,helped me a lot thanks :)
@CodeEmporium
Жыл бұрын
Thanks so much for watching and commenting!
@prashantlawhatre7007
Жыл бұрын
Eagerly waiting for the upcoming videos in the series.
@CodeEmporium
Жыл бұрын
Thanks! Probably just 1-2 long form video(s) more
@phaZZi6461
Жыл бұрын
hi, i really love your complete model overview! also at 8:08 you mention that the difference between K Q V isnt very explicit to the model. what would be your personal intuitive interpretation for what a Key vector might extract/learn from a input word? i find the key conept a bit odd and wondered how the authors came up with the idea of training a Key vector(/matrix), where previous attention papers only had a value vector, which would be used in both places (K and V) of the equation . when i think about information retrieval concepts where we have a search query and documents to be ranked, iirc the intuition there is to compute a dot product to get a similarity/relevance score between them. in my mind the concept of "how relevant is each document" isnt that far off from "how much attention should i pay to each document". And analogously I would interpret documents to be Values, and the idea of a key seems to be absent? (unless IR in practice computes a key for each document, basically a key_of(document)-query-similarity; then i just answered the question myself). anyways, i wondered if it wouldnt be possible to simplify the attention mechanism, while keeping it conceptually similar. not sure where i should look to get to know more about this.
@ravikumarnaduvin5399
Жыл бұрын
My friend Ajay, your playlist "Transformers from scratch" is great. It was very appealing to me to see your block diagram representation. Waiting with great anticipation for the final video. Would you be able to make it available soon?
@CodeEmporium
Жыл бұрын
Glad you like it! I am hitting a few roadblocks though I feel I am 99% there. I’ll make a video on this to mostly ask the community. So it should be a fun exercise for everyone too :) hoping when that is resolved, we can make a final video :D
@soumilyade1057
Жыл бұрын
hopefully the series is completed soon ❤️ would binge watch 😁
@CodeEmporium
Жыл бұрын
Yep. Maybe 1 or 2 videos left. I am running into some issues, but I’ll probably either have them solved or just have a fun community help video. Either way, it should be good
@soumilyade1057
Жыл бұрын
@@CodeEmporium ♥️♥️ 😌
@DanielTorres-gd2uf
Жыл бұрын
Damn, could've used a few weeks ago for my OMSCS quiz. Solid review though, nice job!
@k-c
Жыл бұрын
Will have to brush up my basics and then come back to this.
@CodeEmporium
Жыл бұрын
Yea. This can be a lot of info. Hopefully the earlier videos in this playlist will help too
@k-c
Жыл бұрын
@@CodeEmporium Your channel is really good! Thanks for all the work.
@josephfemia8496
Жыл бұрын
If I can recommend a next steps to this series, going into Bert, GPT, and DETR would be lovely extensions
@CodeEmporium
Жыл бұрын
I was kind of thinking the same! For now, I have videos on BERT , GPT on the channel if you haven’t checked it out. But an architecture deep dive would be fun too :)
@RanDuan-dp6oz
Жыл бұрын
@@CodeEmporium Yes, that will be super fun! Also, it would be great if you can introduce how a ML practitioner could do fine tune based on these complex models.
@rafaelgp9072
Жыл бұрын
Would be nice a video like this explaining LLAMA model
@user-wr4yl7tx3w
Жыл бұрын
Really well presented.
@CodeEmporium
Жыл бұрын
Thanks a ton! :)
@sarahgh8756
4 ай бұрын
Thank you for all the videos about transformer. Although I understood the architecture, I still dont know what to set for the input of the decoder (embedded target) and mask for the TEST phase?
@davefaulkner6302
3 ай бұрын
Fantastic lecture. The attention layer and their inter-relationships are very well explained. Thank you. However this and other videos gloss over the use of the fully-connected layers following the attention layer. Using FC with language model embeddings makes little sense to me. Are there 512x50 inputs to the FC, i.e., is the input sentence simply flattened as input to the FC layer?
@codeative
Жыл бұрын
Very well explained 👍
@CodeEmporium
Жыл бұрын
Thanks a ton for commenting and watching :)
@charleskangai4618
3 ай бұрын
Excellent!
@whiteroadism
Жыл бұрын
Great video. At 12:09 , how will dividing all the numbers by 8 ensure the small values are not too small or large values are not too large? Wouldn't dividing by 8 cause a number to be 8 times smaller?
@loplop88
Ай бұрын
so underrated!
@abirbenaissa3717
10 ай бұрын
Life saver, thank you
@CodeEmporium
10 ай бұрын
You are very welcome
@markusnascimento210
Жыл бұрын
Very good. In general articles don´t show the dimensions when explaining. It helps a lot. Tks
@CodeEmporium
Жыл бұрын
My pleasure!
@naveenrs7460
Жыл бұрын
Lovely brother. I am your Neighbour Tamizhan. Lovely brotherhood
@CodeEmporium
Жыл бұрын
Thanks so much! :)
@anandgupta2892
Жыл бұрын
very well 👍
@abulfahadsohail466
Жыл бұрын
Please can you apply transformers which you have built on text summarisation. It is really helpful.
@gabrielnilo6101
Жыл бұрын
11:08 I am sorry if I am wrong but the transposed K matrix, isn't it 50x30x64?
@CyKeulz
Жыл бұрын
Great! Still a bit too hard for me but i still learned stuff. Question, would it be possible to use the same encoder accross multiple languages ? without retrainning it after the first time, i mean.
@CodeEmporium
Жыл бұрын
I hope the full playlist “Transformers from scratch” helps with pacing this. To your second question. This is a simple transformer neural network and not the typical language model like BERT/GPT. The transformer on its own doesn’t make use of transfer learning typically. So some retraining will be required. That said, if you were using the language models, then you might just need to fine tune your parameters to the target language (which is technically training). Or if you go the GPT3 route, you could get away without fine tuning and use meta learning techniques instead.
@paragbhardwaj5753
Жыл бұрын
Do a video on this new model. Called RWKV-LM.
@KulkarniPrashant
Ай бұрын
Thanks!
@CodeEmporium
21 күн бұрын
You are super welcome! Thanks for the donation too!
@fayezalhussein7115
Жыл бұрын
amaaazing
@CodeEmporium
Жыл бұрын
Thanks so much :)
@capyk5455
Жыл бұрын
Amazing
@CodeEmporium
Жыл бұрын
Thanks so much!
@susmitjaiswal136
Жыл бұрын
what is the use of feed forward network in transformer ..please answer
@venkideshk2413
Жыл бұрын
Masked multihead attention is for decoder right. Is that a typo in your encoder architecture.
@anwarulislam6823
Жыл бұрын
Without bci multi head attention process possible with human brain?
@wishIKnewHowToLove
Жыл бұрын
concise
@CodeEmporium
Жыл бұрын
Thanks! I try not to bore :)
@erikschmidt3067
Жыл бұрын
What're in the feed forward layers? Just an input and output layer? Are there hidden layers? What are the sizes of the layers?
@CodeEmporium
Жыл бұрын
Freed forward layers are hidden layers. It’s just essentially 2,048 neurons in size. You can think of it as mapping 512 dimension vector to 2,048 dimension vector. And then mapping the 2048 vector to 512 dimensions. All of this to capture additional information about the word
@user-np2jc9km3u
6 ай бұрын
But what is the source for the kannada words that was feed in to the output?, how can we get those word in reality? could someone explain me if you are willing to. Thank you.
@colinmaharaj50
9 ай бұрын
Can this be done in pure C++
@samurock100
4 ай бұрын
1kth like
@TheTimtimtimtam
Жыл бұрын
First :)
@CodeEmporium
Жыл бұрын
Please keep being the first! :)
@jamesroy9027
11 ай бұрын
background music create lot of disturbance and especially that pop out sound otherwise content delivery is best
@wintobisakul1848
Жыл бұрын
amazing fluent in english speak like native speaker
@CodeEmporium
Жыл бұрын
I am a native English speaker, but I’ve lived a good amount of my adolescence and early adult life in India
@wintobisakul1848
Жыл бұрын
@@CodeEmporium Wow, so that means you also speak the Indian dialect, which I assume makes you fluent in three languages?
@wintobisakul1848
Жыл бұрын
I truly appreciate your explanation regarding content, tone, accent, and other related aspects.
@creativeuser9086
Жыл бұрын
So you're from the silicon valley of India. We all now it
@CodeEmporium
Жыл бұрын
Haha kinda yea.
@raxn2673
21 күн бұрын
It is highly unlikely that you will respond to this, but if you do, I am grateful. Is this a monetized KZitem channel? If so, is this a monetized video? And if so, has Google Research tried to copyright you for using their "Transformer Architecture" figure for cmmercial purposes (your monetized video)? I am asking because I want to make my own transformative work of the image (by changing the colors, fonts, style of drawing, etc.) to use it for a paid A.I. course (commercial purposes, obviously) that I am want to make. I want to see if Google actually comes after your neck if you use their figure.
@CodeEmporium
21 күн бұрын
I haven’t had problems thus far. And yes, the video is monetized
@joegarcia8935
Жыл бұрын
Thanks!
@CodeEmporium
Жыл бұрын
You are super welcome! I appreciate the donation! Thanks!