Stanford CS25: V2 I Introduction to Transformers w/ Andrej Karpathy

Рет қаралды 693,735

Stanford Online

Жүктеу

Пікірлер: 236

@lukeliem9216
Жыл бұрын
I discover that the best way to understand this lecture is to study in parallel Andrej's "Let's build GPT: from scratch, in code, spelled out" KZitem video. Browsing thru that video give me much better insight into understanding this video. He was directly coding the attention mechanism in PyTorch in that video, and it is fascinating how things just start clicking.😇😀😀
@sapnilpatel1645
Жыл бұрын
True.
@DaveJ6515
Жыл бұрын
"All pieces clicking in place" is exactly the way I was describing the feeling to my students no later than ten minutes ago. You are definitely right.
@jerryyang7011
Жыл бұрын
What a legend Andrej is - the historical context puts quite a bit of "human touch" on Transformers and AI/ML as a whole.
@dr.mikeybee
10 ай бұрын
I always listen when Andrej talks.
@RalphDratman
6 ай бұрын
@@dr.mikeybee I love Andrej
@МихаилЧертушкин-я2с
Жыл бұрын
Thank you very much! If possible, please keep posting other lectures from 2023 playlist, this is awesome! 👍
@ahmedivy
Жыл бұрын
Pure Gold Content by a LEGEND Teacher 💖
@jcorey333
Жыл бұрын
This was amazing to learn about the historical context of transformers! The audio was a bit low quality, but I'm still glad this was posted
@shauryaseth8859
Жыл бұрын
Andrej is so good that we had Bane sitting in the audience asking questions
@SampadMohanty7
Жыл бұрын
Its Megatron, not Bane
@user-wr4yl7tx3w
Жыл бұрын
Audio could be better
@HemangJoshi
Жыл бұрын
Definitely
@HemangJoshi
Жыл бұрын
Even a $10 mic could give better results than this, they didn't even honor karpathy enough to get a decent mic 🎤 can't believe stanford shot video like this
@frankyvincent366
Жыл бұрын
Yes, there's AI algorithm to improve sounds by suppressing room noise... made using transformers 😅
@miyamotomasao3636
Жыл бұрын
And in English, too !
@recursion.
Жыл бұрын
Dude I'm pretty sure they know about this. Be grateful that you're getting access to materials from one of the top schools in America.
@wildwind4725
Жыл бұрын
The year is 2023, and we've AI models capable of writing a decent essay. At the same time, the audio quality in online presentations is sometimes worse than that of the Apollo mission.
@everydaybob
Жыл бұрын
Guys did Andrew Ng help you with audio for this lecture? It's his trademark usually to use "state of the art" mic (filtered by a pillow)
@susdoge3767
4 ай бұрын
this is by far the best video on transformer i have seen, kudos
@dr.mikeybee
10 ай бұрын
The attention mechanism is a dual-embedding architecture. It looks at the probability of two words being next to each other -- at least it uses something like cosine similarity to compare the tokens in a sentence. That's really the basis. For sequence to sequence translation, we use the fact that language has a definite shape inside a semantic space. Once again, we use something like cosine similarity to find a context signature (vectorized representation) that is closest to the context signature of the sequence in the original language.
@rachadlakis1
3 ай бұрын
It's amazing to see how transformers have revolutionized various fields of Deep Learning. Thank you for sharing this valuable information and resource links. It's truly fascinating to learn about the advancements in AI and the impact it's making across different domains.
@sumitsp01
Жыл бұрын
I was not aware that Megatron was attending this lecture to understand Transformers. He did ask some great questions 😄
@SampadMohanty7
Жыл бұрын
This is legendary
@yuktikaura
11 ай бұрын
Epic😀😀
@existenceisillusion6528
7 ай бұрын
Sounded more like DarkSeid
@alonsogarrote8898
5 ай бұрын
at what min?
@sumitsp01
5 ай бұрын
@@alonsogarrote8898 every time when someone from audience asks a question.
@snowman2627
Жыл бұрын
Andrej the best teacher! The node graph analogy is quite intuitive.
@stanfordonline
Жыл бұрын
Hi Hao, thanks for watching and for your comment!
@mikeiavelli
Жыл бұрын
Andrej starts at 10:16
@dsazz801
10 ай бұрын
Thank you for sharing such a great quality of lecture!
@davidsewell4999
Жыл бұрын
Is it just my audio or is Satan always the one asking questions in the audience?
@neuralthink
2 ай бұрын
😂
@nerouchih3529
4 ай бұрын
28:00 A unique view at attention. In this image all 6 nodes are related with all 6 nodes in self-attention case. And in cross attention it would be like set A sends a message to nodes in set B. And voila, it's a fully-connected layer! But with tokens passed instead of values
@AIautopilot
Жыл бұрын
This is the funniest moment from the presentation at 🤣1:00:22 . Great video, Andrej is so knowledgeable and down to earth
@laalbujhakkar
5 ай бұрын
Really disappointing audio. It ruins the lecture.
@bpmoran89
5 ай бұрын
Describing RNNs and LSTMs as prehistoric is wild
@23232323rdurian
Жыл бұрын
the AUDIO is real choppy.....hard to make out the words spoken...but great lecture
@TheBontenbal
Жыл бұрын
Great lecture as always (except for audio ;-)) . Is there somebody who has a link to Andrej's code? Thank you.
@linlinpan3150
2 ай бұрын
Got one of the greatest technologist of our time, and can't find a microphone from after the year 2000
@christofferweber9432
Жыл бұрын
Sad that a great lecture is cut short by questions that could have been taken offline...
@rudraxxadb
2 ай бұрын
Great content and such a beautiful explanation. Question: At 24:43 when the incoming nodes information is used, shouldn't that be using m.value() instead of m.key()? m.value is what is exposed to others?
@sahreenhaider9906
8 ай бұрын
What questions did Megatron ask? I mean the audio was pretty bad
@sansin-dev
Жыл бұрын
It's a pity the audio is so bad
@peteluo5367
9 ай бұрын
Thanks for sharing. This is really useful for me.
@iansnow4698
Жыл бұрын
Hi Andrej, Its a great historic view of Attention that you showed there, especially the email is a golden discovery in my eyes. All I could found before was as deep as Yoshua's papers. I have have a question hope you or some one else could answer here. Is there any connection of the Key Value Query mechanism in the later paper to the weighted average of BiRNN idea in the email? Or maybe that was simply a new idea in the Attention Is All You Need paper? Best regards, Ian
@НиколайНовичков-е1э
Жыл бұрын
Great seminar!
@alielouafiq2552
Жыл бұрын
OMG ! just noticed this was released today !
@vimukthirandika872
7 ай бұрын
awesome!
@TheNewton
5 ай бұрын
19:47 so is there a functional difference between calling the usage of softmax `attention` instead of the simpler word `search` beyond trying to be catchy?
@user-xn8dp5zy8t
Жыл бұрын
Really bad audio quality, please ensure the speakers have better microphones next time
@gregx8245
7 ай бұрын
Div Garg's audio is so horrible, I'm moving on to other videos at the 1 minute 30 second mark. You guys have a lot to learn about video production. (Have you heard of microphones?)
@saptarshipalchaudhuri5640
Жыл бұрын
This really piqued my interest. The seminal papers on the road to develop transformers included here makes the introduction just perfect. The Audio placed hurdles thogh. I watch lectures @ 2X speed or more. Here I could not go beyond 1.5
@wolpumba4099
6 ай бұрын
*ELI5 Abstract* *Imagine transformers as super-smart LEGO blocks:* * *They learn by paying attention:* Transformers figure out what's important in a bunch of information, just like you focus on the right LEGO piece to build something cool. * *They talk to each other:* Transformers share info, like when you ask a friend to pass a LEGO brick. * *They can be built in many ways:* You can make different things with LEGOs, and transformers can learn to do different stuff too! They can understand words, make pictures, and even play games. * *They get better with practice:* The more you build with LEGOs, the better you get. Transformers get smarter the more they learn from examples, like getting better at building a castle after making a few towers first. * *They need a little help sometimes:* Sometimes you need instructions for a fancy LEGO build. Transformers can also use hints to learn faster, especially when they don't have lots of examples. * *They like to remember things:* Transformers have a scratchpad, just like you use a notebook to remember steps, so they don't forget important stuff. *Transformers are changing the world:* They're like the new building blocks for computers, making them understand us and do much cooler things! *Abstract* This video explores the remarkable transformer architecture, a foundational building block in modern AI. Transformers were introduced in the 2017 paper "Attention is All You Need" and have revolutionized fields like natural language processing (NLP), computer vision, and reinforcement learning. The video delves into several key aspects of transformers: * *Core Concepts:* Attention mechanisms, message passing on directed graphs, and the interplay between communication and computation phases within a transformer block. * *Implementation:* A detailed walkthrough of a minimal transformer implementation (NanoGPT) highlights data preparation, batching, positional encodings, and the essential components of transformer blocks. * *Transformers Across Domains:* The ease with which transformers adapt to diverse modalities (images, speech, reinforcement learning) underscores their flexibility. * *Meta-Learning Capabilities:* Transformers exhibit in-context learning or meta-learning capabilities, highlighted by the GPT-3 model. This suggests potential for gradient-like learning within transformer activations. * *Optimizability and Efficiency:* Transformers are designed to be highly optimizable by gradient descent and computationally efficient on GPUs, key factors in their widespread adoption. * *Inductive Biases and Memory:* While inherently general, transformers can incorporate inductive biases and expand memory via techniques like scratchpads, demonstrating adaptability. The video also includes discussions on the historical context of transformers, their relationship to neural networks, and potential future directions in AI. *Keywords:* Transformers, Attention, Deep Learning, NLP, Computer Vision See also: kzitem.info/news/bejne/zHmZnnqjfpRioY4
@wolpumba4099
6 ай бұрын
*Summary* *Introduction to Transformers* * *0:05** - Welcome and course overview:* Introduction to a course focused on transformers in artificial intelligence (AI). * *0:52** - Instructors introduce themselves:* The course instructors share their backgrounds. *Foundations of Transformers* * *3:24** - Introduction to transformers:* The basics of transformer architecture are explained. * *3:38** - Explanation of the attention timeline:* Discussion of how attention mechanisms developed over time. *Understanding and Implementing Transformers* * *3:51** - Transformer Evolution:* Progression from RNNs, LSTMs, and simple attention to the dominance of transformers in NLP, vision, biology, robotics, and generative models. * *10:18** - Andrej Karpathy presents on transformers* Karpathy provides historical context on why transformers are important and their evolution from pre-deep learning approaches. * *15:15** - Origins of the Transformer* Exploration of foundational papers on neural machine translation and the introduction of attention to solve the "encoder bottleneck" problem. * *20:13** - Attention is All You Need:* Discussion of the landmark 2017 paper, its innovations, and core concepts behind the transformer (attention, positional encoding, residual networks, layer normalization, multi-headed attention). * *22:36** - The Speaker's view on Attention:* A unique perspective on attention as a communication phase intertwined with computation. * *25:13** - Attention as Message Passing:* Explanation of attention as nodes in a graph communicating with "key", "query", and "value" vectors. Python code illustrates the process. * *30:58** - NanoGPT: Transformer Implementation* Introduction of NanoGPT, a minimal transformer the speaker created to reproduce GPT-2, followed by in-depth explanations of its components, data preparation, batching, and block structure. *Transformers: Applications and Future Directions* * *52:56** - Transformers Across Domains:* How transformers are adapted for images, speech recognition, reinforcement learning, and even biology (AlphaFold). * *54:26** - Flexibility with Multiple Inputs:* The ease of incorporating diverse information into transformers. * *55:43** - What Makes Transformers Special?:* Highlighting in-context learning (meta-learning), potential for gradient-like learning within activations, and the speaker's insights shared via tweets. * *58:27** - The Essence of Transformers:* Three key properties: expressiveness, optimizability, and efficiency on GPUs. * *59:51** - Transformers as General Purpose Computers Over Text:* Analogy comparing powerful transformers to computers executing natural language programs. * *1:06:28** - Inductive Biases in Transformers:* The balance between data and manual knowledge encoding, and how to modify transformer encodings. * *1:08:42** - Expanding Transformer Memory:* The "scratchpad" concept for extending memory. *Questions and Answers* * *27:30** - Q&A: Self-Attention vs. Multi-headed Attention* Explaining the differences and purposes. * *46:12** - Q&A: Dynamic Connectivity in Transformers* Discussion on graph connectivity in transformers. * *50:20** - Q&A: Future Directions* Exploring beyond autoregressive models and the relation to graph neural networks. * *1:02:01** - Q&A: RNNs vs. Transformers* Contrasting the limitations of RNNs and the strengths of transformers. * *1:04:21** - Q&A: Multimodal Inputs* How transformers handle diverse data types. * *1:10:09** - Q&A: ChatGPT* The speaker's limited exploration of ChatGPT. * *1:10:41** - Q&A: S4 Architecture and Speaker's Next Steps* Focus on NanoGPT for GPT-like models and interest in building a "Google++" inspired by ChatGPT. Disclaimer: I used gemini advanced 1.0 (2024.03.03) to summarize the video transcript. This method may make mistakes in recognizing words and it can't distinguish between speakers.
@AQkrafts
Жыл бұрын
Audio quality is poor, distracts from compelling content
@amoghjain
8 ай бұрын
Hello!! Thank you for sharing the talk!! is it possible to share the slides as well?? Thanks
@gabrielepi.3208
Жыл бұрын
Hey Stanford, a GPT is not needed to understand that you need some mics in the audience for better audio…
@0xggbrnr
Жыл бұрын
Use transformers to improve the audio quality next time.
@1ntrcnnctr608
Жыл бұрын
when auto "mastering"/EQ of audio integration here on YT?
@1ntrcnnctr608
Жыл бұрын
@@hyperadapted yup, yearning for quality these days
@1ntrcnnctr608
Жыл бұрын
@@hyperadapted "everyone will have a better learning experience" - 👑
@jbperez808
10 ай бұрын
@4:09 "performance increased every time we fired our linguists..." if you listen closely. The auto-transcript caught more of it than the human one.
@jonathanr4242
Жыл бұрын
You think 2011 was bad? I was doing nn image processing at the turn of the century
@jackgame8841
Жыл бұрын
this legend
@KarenLasser-n7i
Жыл бұрын
What was the sentence he said before, ‘I have to be very careful?’
@MilesBellas
2 ай бұрын
Process the audio with AI and repost?
@anmolagarwal999
Жыл бұрын
Andrej starts at 10:12
@abhijithganesh2064
11 ай бұрын
Heyy! Wonderful video but please get a better microphone. It is very hard for me to comprehend sometimes
@NanXiao
Ай бұрын
I think it's better to check Andrej Karpathy's full videos on transformers and neural nets: - kzitem.info/news/bejne/zHmZnnqjfpRioY4si=kpQy5jGrqvT9Ymsz - kzitem.info/news/bejne/t4Ogk2eJaqacqGUsi=DgOju3xZbKQmqzlk
@KarenLasser-n7i
Жыл бұрын
Is the guy asking questions using a voice encoder, or does he have a voice that deep cuz he’s 12 feet tall?
@mytharamou
7 ай бұрын
Now we need a model to improve your audio . Or you just get a descent mic.
@nbme-answers
Жыл бұрын
10:15 START
@harriehausenman8623
Жыл бұрын
And one would think Stanford could afford microphones for their presentation, instead of the tin-cans they obviously use here.
@elenagavrilova3109
Ай бұрын
Hard to understand. And where is Andrej?
@yurcchello
Жыл бұрын
please reupload with better sound quality
@paparaoveeragandham284
10 ай бұрын
nice
@rayson.-.
Жыл бұрын
The course is great but the audio could be better LOL
@harshitkumar5147
10 ай бұрын
Where do I get the slides?
@csmac3144a
6 ай бұрын
How can Stanford release something with such abysmal audio quality? Many of us have hearing issues and/or problems processing garbled / muffled speech. This video is just terrible.
@888cromartie
Жыл бұрын
Why do Stanford students ask questions via a voice changer? It is very difficult to hear and make out what they are asking.
@RaviAnnaswamy
7 ай бұрын
I think room noise cancellation was on in this recording.
@raminanushiravani9524
Жыл бұрын
anyone knows where to find the slides?
@DaveDugganITPro
2 ай бұрын
why is the audio so poor - perhaps AI can help with improving audio?
@sitrakaforler8696
Жыл бұрын
Dam thanks man !
@laurentprat8219
Жыл бұрын
hello, what is 20 for in the node class?, it is the size of the embedding vector (only 20 token)? - (code shown at: 25:30)
@faiqkhan7545
11 ай бұрын
20 by 20 matrix. initialized randomly, that will be trained during training backpropagation
@Jacob011
Жыл бұрын
Seriously, what's with the Megatron voice?
@reechr
Жыл бұрын
Why release this 6 months later????
@abdulnim
Жыл бұрын
Andrej ignored the transformer in the first slide but he keep asking questions.
@henry3435
Жыл бұрын
Geez, you'd think stanford would have good good mics
@djcardwell
9 ай бұрын
I'm slightly concerned that Darth Vader attended with so many questions.
@aygunvarol
Жыл бұрын
12:38'de kaldım
@maxjesch
Жыл бұрын
Great content, but PLEASE: GET A PROPER MICROPHONE!
@RalphDratman
6 ай бұрын
22:30
@ClarkeZona-t8w
8 күн бұрын
Rodriguez Jose Rodriguez Matthew Perez Elizabeth
@PplsChampion
10 ай бұрын
11:30 this kitchen sink stuff was my life in 2011 ahahahahaha simpler times
@think_read
Жыл бұрын
Please invest in a proper microphone :)
@MCSGproject
Ай бұрын
machine learning gods but haven't figured out simple audio-visual quality?
@smithnigelw
7 ай бұрын
Andrew starts at about 10 minutes in…
@ИльяЛомоносов-ю3м
Жыл бұрын
Ватоадмин подарил микрофон Андрею Вот это колаб!
@gabscar1
8 ай бұрын
Poor volume.
@ヽ̀ゝ́-r3h
13 күн бұрын
english accent really hard to understand
@GeorgeMcKnight-w9b
9 күн бұрын
Williams Patricia Davis Paul Thomas Sandra
@TitusAugust-l6n
14 күн бұрын
Hall Jose Lewis Amy Perez Matthew
@alanzom1503
Жыл бұрын
Why do they always have to add that silly transformers Optimus Prime image whenever they talk about the Transformenrs architecture? Are we 5 year olds?
@VictorSimms-y2m
14 күн бұрын
Smith Linda Hall Brenda Garcia Barbara
@АлександрДунай-е9ъ
7 күн бұрын
Wilson Susan Rodriguez Betty Brown Timothy
@bender101
Жыл бұрын
14:49 exuse me, what? the brain is not "very homogenous and uniform" across the cortex. the cytoarchitecture does differ a lot (layers, brodmann areas) and all of it is intertwined with subcortical nucleis. cut back on the neuroscience andrew, lol
@ishanbhatt6067
11 ай бұрын
Worst audio quality
@NabilGhodbane
Ай бұрын
Audio is a disaster
@opencvitk
11 ай бұрын
The more I watch these "ai" videos, the more this area feels like some closed club of pseudo babble
@usefulalgorithms659
4 ай бұрын
Me drunk could explain better
@Athens1992
Жыл бұрын
what better Friday night with Karpathy expalining transformers love it!!! good night from Greece
@stanfordonline
Жыл бұрын
Hi George, thanks for watching. We will be releasing more videos from this series soon - stay tuned!
@Athens1992
Жыл бұрын
@@stanfordonline amazing love Karpathy teaching and how easy he made them be
@rajatpatel5691
Жыл бұрын
@@Athens1992 total agree 💯
@harunyigit897
6 ай бұрын
Good night from turket too
@soumilyade1057
Жыл бұрын
Quality of the audio has ruined an otherwise great lecture 😬 see it to if it can be improved...thank you ❤
@yuzhou1
6 ай бұрын
Could use a better microphone tbh
@ericgonzales5057
10 ай бұрын
yall need to seriously seriously fix the Audio on your videos. Nobody wants to watch a youtube video with audio from your macbook, andrej...
@liangcheng9856
Жыл бұрын
sound quality plz.
@rul1175
8 ай бұрын
Impossible to understand what they saying "hagahsjsjs jahahdhdkskaja".
@aojing
6 ай бұрын
What is wrong with the voice of questioners? Is his audio deliberately post-processed by Stanford?🙃
@niclored
8 ай бұрын
If you dont work in the quality of the audio everything you did for this presentation is kinda ruind. Please try with a better mic since this is the stanford account and this is fairly recent. Audio should not be an issue and in this video is.