17. Transformers Explained Easily: Part 1 - Generative Music AI

Рет қаралды 4,945

Valerio Velardo - The Sound of AI

Жүктеу

Пікірлер: 41

@richardwang5877
2 ай бұрын
I didn't expect to finally understand transformers in this generative music course. I had watched lots of other videos about transformers but still found them really confusing. I started this course because I'm interested in generative music, so understanding transformers is just a bonus. I will definitely recommend this series to my classmates. Thank you!
@hollowjohnny
9 ай бұрын
This is such a generous and empowering resource. Massive thanks!
@NikolozKordzakhia
3 ай бұрын
i probably can say that this video is the best on whole KZitem about this topic, i searched really a lot and all i found was very superficial courses. Great job.
@ValerioVelardoTheSoundofAI
3 ай бұрын
Thank you :)
@6little6fang6
Ай бұрын
Mad value in this video. You are such a good expositor.
@ValerioVelardoTheSoundofAI
Ай бұрын
Thank you!
@philtgun
10 ай бұрын
Good video, and good explanations of query, key and value matrices with analogies!
@punyabrotad
4 ай бұрын
Excellent explanation in a very lucid fashion. It was really helpful!
@jeremyuzan1169
4 күн бұрын
the kind Valerio. Thank you
@Kevoshea
10 ай бұрын
Great work you're doing here Valerio. Really appreciated!
@ValerioVelardoTheSoundofAI
10 ай бұрын
Thanks!
@lubhanshukachhawaha8559
10 ай бұрын
This video just saved my ass as I was having hard time understanding transformers for my work assignment to train a transformer model for audio classification. Thank You!!
@ValerioVelardoTheSoundofAI
10 ай бұрын
Amazing!
@_NickTech
5 ай бұрын
Thank you very much! It will significantly help me with my university project!
@ArjoonSuddhoo
28 күн бұрын
Superbly presented!!
@hemhemtheglass391
5 ай бұрын
Best explination I found so far. Keep it up!
@ANMOLMISHRA-m8e
10 ай бұрын
Amazing video, I would like it a thousand times if I could!
@neyten._py
10 ай бұрын
Thanks a lot, that's pure gold content !
@ValerioVelardoTheSoundofAI
10 ай бұрын
Thank you!
@НиколайНовичков-е1э
10 ай бұрын
Thanks a lot! You made great work!
@ValerioVelardoTheSoundofAI
10 ай бұрын
Thanks!
@egorge00
10 ай бұрын
Excellent , thanks !
@oldskooltrancer
7 ай бұрын
Thank you so much, Valerio!
@vladimirbosinceanu5778
10 ай бұрын
Thank you, sir!
@ValerioVelardoTheSoundofAI
10 ай бұрын
Please call me Valerio :)
@vladimirbosinceanu5778
10 ай бұрын
Thank you, Valerio! :) Lovely explanation as always.@@ValerioVelardoTheSoundofAI
@kyleworrall680
2 ай бұрын
Velario, I'm mid writing my PhD thesis on music generation and this video is incredibly useful for ensuring my explanations make sense and is a great source to cite. Thanks for making it! Also at 1:00:38, why is your dimension model 2 for the cos(pos/10000^2i / dimension model) examples? Just want to make sure if I'm misunderstanding something :) Thanks again!
@ANMOLMISHRA-m8e
10 ай бұрын
59:41 The denominator values in the second column of this matrix seem to be different from the formula. Shouldn't it be 10000^(2*0/3)?
@ValerioVelardoTheSoundofAI
10 ай бұрын
You're right and wrong at the same time. There's a mistake in the video -> dimension_model = 2 instead of 3 (I messed this one up in LaTex!). There's also a mistake in your formula "2*0" should be "2*1" as is correctly showed in the video. We're at embedding position 2, that is i = 1, given 0-indexing. In any case, thank you for pointing this out :)
@jdavibedoya
7 ай бұрын
I believe @user-yf6yf6ki6f has a valid point. The denominator in the second column should be 10000^(2*0/3), and I also noticed a mistake in the third column - it should be 10000^(2*1/3). I think this is how it is implemented in the upcoming video within the _get_angles method.
@hariduraibaskar9056
10 ай бұрын
Awesome explanation. I have a doubt, the embeddings I is such that the first row corresponds to first word in the sequence and so on. Now we have the positional representation of eac word in the sequence, isn't this enough for the transformer model to undersatnd position related info of all the words in the input sequence?
@ValerioVelardoTheSoundofAI
10 ай бұрын
The self-attention process is inherently position-agnostic - it doesn't inherently consider the order of words. The attention mechanism would work the same way regardless of the word order if not for positional encodings. That's why we can't rely on the order in the input matrix directly. The model needs an explicit, numerical way to understand word order. That is the job of the sinusoidal function.
@hariduraibaskar9056
10 ай бұрын
@@ValerioVelardoTheSoundofAI Like a blind mice which can sense gradient in smell of cheese in its environment.
@ValerioVelardoTheSoundofAI
10 ай бұрын
@@hariduraibaskar9056 I love the metaphor :D Quite appropriate!
@dhnguyen68
10 ай бұрын
Is there the part II of the video ?
@ValerioVelardoTheSoundofAI
10 ай бұрын
It'll come out tomorrow - stay tuned ;)
@dhnguyen68
10 ай бұрын
@@ValerioVelardoTheSoundofAI great thanks for sharing your knowledge.
@thenortheasterwizard16
7 ай бұрын
🤟
@heeeyno
5 ай бұрын
the positional encoding matrix is either a 'clever math trick', or a sign that all of this is a kludgy hack and that we're still very far off from actually understanding this crap lol. like, we're still messin with brimstone and vitriol, and haven't been able describe 'sulfur' yet
@metroidandroid
10 ай бұрын
you say "easily" but your part 1 video is over 1hour 😅
@ValerioVelardoTheSoundofAI
10 ай бұрын
I considered various methods to convey this topic: 1. Release a concise 15-minute video, giving viewers a feeling of understanding about transformers, yet only skimming the surface; 2. Publish a denser 30-minute video, heavy on mathematics and light on explanations, assuming a substantial level of pre-knowledge and making the material challenging; 3. Provide an in-depth, 2+ hour explanation filled with details, offering sufficient time to demystify the more intricate concepts in a user-friendly way. My choice was the third option. Though it is lengthy, I believe its length makes it inherently simpler to comprehend due to the thorough coverage it allows.