Негізгі бет Stanford CS224N: NLP with Deep Learning | Winter 2021 | Lecture 1 - Intro & Word Vectors

Күн бұрын

Stanford CS224N: NLP with Deep Learning | Winter 2021 | Lecture 1 - Intro & Word Vectors

Рет қаралды 590,015

Stanford Online

Жүктеу

Пікірлер: 124

@itaylavi2556
2 жыл бұрын
So many full courses in great quality, great lecturers AND with normal subtitles... Can someone PLEASE give Stanford University some kind of international prize for knowledge sharing?
@airbup
7 ай бұрын
99% of courses are not online and cost money. I would like them to add more.
@tusharrohilla7154
2 жыл бұрын
Amazing lecture it was, thanks to Stanford to make these lectures public.
@magdalenastratmann1248
Жыл бұрын
o . . .p pünktlich üüm
@magdalenastratmann1248
Жыл бұрын
00p Buchhandluääähäjdzk6d.z m.. Mit freundlichen Grüßen aus h ä 9 0ö Popupsö9äm üüä9
@magdalenastratmann1248
Жыл бұрын
z
@magdalenastratmann1248
Жыл бұрын
An
@magdalenastratmann1248
Жыл бұрын
üü9 ääh
@rahullak
8 ай бұрын
Thank you to Stanford and to Prof. Manning for making these lectures available to everyone.
@teogiannilias655
Жыл бұрын
Thanks for everything Stanford University. As an AI master's student I have to state that having these lectures for free enables me to compare and broaden my ideas for NLP, resulting in deeper intuitive understanding of the subject.
@stanfordonline
Жыл бұрын
Hi Teo, thanks very much for your comment and feedback! Happy to hear these lectures were so helpful to your studies.
@nabilisham6133
Жыл бұрын
🎯 Key Takeaways for quick navigation: 00:05 🎓 This lecture introduces Stanford's CS224N course on NLP with deep learning, covering topics like word vectors, word2vec algorithm, optimization, and system building. 01:32 🤯 The surprising discovery that word meanings can be well represented by large vectors of real numbers challenges centuries of linguistic tradition. 02:29 📚 The course aims to teach deep understanding of modern NLP methods, provide insights into human language complexity, and impart PyTorch-based skills for solving NLP problems. 07:15 🗓️ Human language's evolution is relatively recent (100,000 - 1 million years ago), but it has led to significant communication power and adaptability. 10:59 🧠 GPT-3 is a powerful language model capable of diverse tasks due to its ability to predict and generate text based on context and examples. 14:52 🧩 Distributional semantics uses context words to represent word meaning as dense vectors, enabling similarity and relationships between words to be captured. 18:37 🏛️ Traditional NLP represented words as discrete symbols, lacking a natural notion of similarity; distributional semantics overcomes this by capturing meaning through context. 25:19 🔍 Word embeddings, or distributed representations, place words in high-dimensional vector spaces; they group similar words, forming clusters that capture meaning relationships. 27:15 🧠 Word2Vec is an algorithm introduced by Tomas Mikolov and colleagues in 2013 for learning word vectors from text corpus. 28:11 📚 Word2Vec creates vector representations for words by predicting words' context in a text corpus using distributional similarity. 29:07 🔄 Word vectors are adjusted to maximize the probability of context words occurring around center words in the training text. 31:02 🎯 Word2Vec aims to predict context words within a fixed window size given a center word, optimizing for predictive accuracy. 32:56 📈 The optimization process involves calculating gradients using calculus to adjust word vectors for better context word predictions. 36:33 💡 Word2Vec employs the softmax function to convert dot products of word vectors into probability distributions for context word prediction. 38:51 ⚙️ The optimization process aims to minimize the loss function, maximizing the accuracy of context word predictions. 45:53 📝 The derivative of the log probability of context words involves using the chain rule and results in a formula similar to the softmax probability formula. 49:28 🔢 The gradient calculation involves adjusting word vectors to minimize the difference between observed and expected context word probabilities. 53:34 🔀 The derivative of the log probability formula simplifies into a form where the observed context word probability is subtracted from the expected probability. 58:57 📊 Word vectors for "bread" and "croissant" show similarity in dimensions, indicating they are related. 59:26 🌐 Word vectors reveal similar words to "croissant" (e.g., brioche, baguette), and analogies like "USA" to "Canada" can be inferred. 59:55 ➗ Word vector arithmetic allows analogy tasks, like "king - male + female = queen," and similar analogies can be formed for various words. 01:00:22 🤖 The analogy task shows the ability to perform vector arithmetic and retrieve similar words based on relationships. 01:01:23 🤔 Negative similarity and positive similarity together enable analogies and meaningful relationships among words. 01:03:17 💬 The model's knowledge is limited to the time it was built (2014), but it can still perform various linguistic analogies. 01:04:39 🧠 Word vectors capture multiple meanings and contexts for a single word, like "star" having astronomical or fame-related connotations. 01:05:36 🔄 Different vectors are used for a word as the center and as part of the context, contributing to the overall representation. 01:07:02 🧐 Using separate vectors for center and context words simplifies derivatives calculations and results in similar word representations. 01:11:26 ⚖️ The model struggles with capturing antonyms and sentiment-related relationships due to common contexts. 01:12:44 🎙️ The class primarily focuses on text analysis, with a separate speech class covering speech recognition and dialogue systems. 01:18:06 🗣️ Function words like "so" and "not" pose challenges due to occurring in diverse contexts, but advanced models consider structural information. 01:20:25 🧠 Word2Vec offers different algorithms within the framework; optimization details like negative sampling can significantly improve efficiency. 01:23:18 🔁 The process of constructing word vectors involves iterative updates using gradients, moving towards minimizing the loss function. Made with HARPA AI
@ansekao4516
Жыл бұрын
What a great lecturer, he feels students, puts himself in our place and explains material very nicely. This is literally my first piece of material about NLP I have ever seen, and I understood most of it. Thanks a lot
@stanfordonline
Жыл бұрын
Awesome feedback, thanks for your comment!
@wenqianzhao2648
2 жыл бұрын
Moved from Coursera NLP Specialization to here. Definately amazing to receive such detailed math explanations of all these concepts
@CarlosEduardo-hp5wg
7 ай бұрын
Here is better?
@sudhanvasavyasachi2525
20 күн бұрын
which should i do first, the specilization or cs-224n
@hewas321
8 ай бұрын
Oh my days I love his positive vibes! Also clear explanation of multiple topics. I really appreciate you providing us with such great lectures online for free!
@vanongle9648
Жыл бұрын
hello Stanford online i started to self-study machine learning my university program does not teach in depth about AI , i feel i have not reached my full potential and i have taught myself about AI for 6 months recently .And i have learned , learned all areas in AI, machine learning, deep learning or reinforcement learning, thank you for this free lecture, i really appreciate it.
@gefallenesobst6855
5 ай бұрын
I am so grateful that Stanford has given us all this great gift. Thanks to their great machine learning and AI video series, I am able to build a solid foundation of knowledge and have started my PhD based on that.
@commonsense1019
Жыл бұрын
I got exhausted yet your enthusiasm is what made me stay here amazing session
@MenTaLLyMenTaL
2 жыл бұрын
Math is not magic, but is as beautiful as magic.
@dazhijiang-fx9he
Жыл бұрын
Can't expect more from a lesson! Thank you all for sharing the class towards all the people🤩
@progamer1196
Жыл бұрын
Really liked the energy and simplicity of the presentation !
@ThaoPham-pe5vj
6 күн бұрын
Thank you Stanford and Professor for the excellent lecture!
@niyousha6868
2 жыл бұрын
It is great to watch this and don't have to do the homework.
@CarlosEduardo-hp5wg
7 ай бұрын
Hahahaha
@kunalnarang1912
2 жыл бұрын
The result at 55:45 is just beautiful!
@FifaBayern710
6 ай бұрын
It's not entirely clear to me why we change the index exept for separting the sums at the end? Anyone knows more? Thanks!!
@AmitGupta2526
9 ай бұрын
At 51:45, when he says "we need to change the index to x from w, else we'll get into trouble" while taking the inner derivative of exponential term. How can he change the index when the denominator term coming will exactly same as the derivative of exponential term and they should cancel each other. Changing index changed the fundamental definition of P(o|c). Is there something I am missing here.
@anikettiwari6885
Жыл бұрын
This is so amazing. Thank you so much for the wonderful explanation
@djl-1n
26 күн бұрын
The version of the SciPy library seems to be too new for the assignment to work properly. I can't import the triu. if someone knows it ,please commont
@RahulMadhavan
2 жыл бұрын
Do areas of sparsity in the high dimensional word2vec space mean anything: For example, can you say - some word should exist here, but doesn't?
@lopyus
Жыл бұрын
I wonder if words which don't have an equivalent in other languages fit here
@bilalsedef9545
2 жыл бұрын
It was a great lesson. Hope the sound quality will be better in the future.
@The-Daily-AI
2 жыл бұрын
1:10:50 Why would you average both vectors together, wouldn't it be useful to keep both of the vectors depending on the different tasks that need to be done?
@goanshubansal8035
11 ай бұрын
hopefully I will be proud after it's completion
@edphi
Жыл бұрын
The best video on nlp
@harshitsingh3061
11 ай бұрын
Never seen a beautiful lecture before!.
@jpgunman0708
Жыл бұрын
51:39 how to get it? I don't understand.
@zzq-w1w
5 күн бұрын
I want to know whether it provide homeworks' answer
@3018RAHULSILONIYA
Жыл бұрын
This might be silly but, at after 55:00 when we take ux out of the derivative, why do we lose the transpose operator?
@Ad-qv7ij
9 ай бұрын
Same doubt, did you figure out by any chance?
@rahul_siloniya
8 ай бұрын
@@Ad-qv7ijI guess there is a little error there. If you try to derive it on your own you will reach the right expression.
@jiadavid
11 ай бұрын
This is amazing, thank you for uploading this online
@mujumdarshaunakhrishikeshc1076
2 жыл бұрын
Sir absolutely loved your explanation. Thank you very much
@nanunsaram
Жыл бұрын
00:56:55 Gensim word vectors example 01:05:16 Student Q&A
@goanshubansal8035
11 ай бұрын
in every sub topic they share their learning experience...
@osvaldonavarro3292
Жыл бұрын
How are the initial probabilities of the context word vectors calculated? They are mentioned at 55:29 but not how they are determined.
@seeker4430
5 ай бұрын
You could explained the probability portion of the vector a little more sir... The differentiation of the vectors is quite straight forward
@unknownhero6187
2 жыл бұрын
When we take chain derivative, why do we lose transpose operation? For example, on 53:06 there is just u_x, not the u_x^T, why?
@kelvinwu9844
2 жыл бұрын
We can treat that as a gradient. The dot product can be viewed as a multivariable function with input (v_c1, ... , v_cd), and therefore we can calculate the gradient of it w.r.t. each of the components of v_c. Since gradient is the direction that v_c should go in order to increase the value of the dot product, this gradient vector can be added to v_c, so they should have the same shape :)
@namansinghal3685
7 ай бұрын
Reminds me of Sheldon for some reason
@tallfred500
Жыл бұрын
On slide 23, the Likelihood is missing a root-T of the double product.
@aman6089
Жыл бұрын
Calculus noob question: But why don't the two [for w from 1 to V Sum over u_x^T*v_c cancel out at 55:10
@robertxu18
Жыл бұрын
Why is the change in variable at 51:38 necessary? Does it not represent the same quantity whether we use uw or ux?
@raphaelkalandadze9691
Жыл бұрын
First of all, thank you so much for this amazing course. I have learned a lot from your lectures. Can I ask when this course will be updated?
@stanfordonline
Жыл бұрын
Hi Raphael, thanks for your feedback and question! Our team is looking into adding new lectures for this course in the future :)
@raphaelkalandadze9691
Жыл бұрын
@@stanfordonline Sounds like it won't be soon :)
@zainabtareen9583
Жыл бұрын
Isn't wt the center word instead of wj on slide 23 (30:52)?
@zutubee
Жыл бұрын
wt is the center word
@black-sci
5 ай бұрын
yes Wt is the center word. j is changing from -m to m.
@zaberraiyan2570
2 жыл бұрын
Great Lecture, will finish the entire series
@Galois189
Жыл бұрын
I am wondering how the two vectors (Uw, Vw) are determined for each Word?
@clutchnoobs4506
2 жыл бұрын
I had a question about "observed - expected" around @55:48. Maybe I misunderstand but isn't the summation of p(x|c)*ux our prediction therefore making it our observed?
@ItsJayCross319
2 жыл бұрын
Yes, it is our prediction, but because that's our prediction, that would be the expected. The word vector we obtain from uo (our actual observed word vector) would be our observed, then we subtract the sum of p(x|c)ux from it to obtain margin of error. In a perfect case, they would subtract to 0, which he explains at 55:44.
@adeolajoseph7276
9 ай бұрын
Great content, excellent delivery.
@iwantpeace6535
10 ай бұрын
Hi sir , Is it possible to use Neural networks to learn new dialects and translate new words that belong to unknown new dialects for various languages..?
@isaacfernandez2243
Жыл бұрын
is there any way to get access to the notebooks shown through out the course? Thanks!
@yagneshm.bhadiyadra4359
5 ай бұрын
Good content, but explanation-wise they are missing intutions at some points, especially when formula for word vectors are getting dervied.
@ayoubrayanemesbah8845
10 ай бұрын
at 32:46 it's like computing the entropy , but way if any one knows please feel free to comment
@goanshubansal8035
11 ай бұрын
I have to learn to listen to the professors like editors to your previous self
@miguelpinheiro8291
Жыл бұрын
I really liked this guy
@nanunsaram
11 ай бұрын
Great again!
@zutubee
Жыл бұрын
Objective function seeks to maximise the probable likelihood of context word given center word. However should it also not try to minimise the probability of incorrect context words given center word?
@zutubee
Жыл бұрын
I got the answer, the way probabilities have been calculated ensures this happened in the denominator
@isbestlizard
9 ай бұрын
Wow this vector idea is interesting. Have we tried getting models to emit nonsense text that nonetheless has a similar vectors to real words and seeing if human brains sort of subconsciously get that same meaning? Computers could be really good at writing poetry o.o
@isbestlizard
9 ай бұрын
Like onomatopea and Lewis Carrol dialed up to 11
@aimatters5600
Жыл бұрын
everytime he says something important video stops. great
@jakanader
Жыл бұрын
watching on 1.5 speed smooths out the stuttering and is still understandable for the most part
@aamnakhan2784
Жыл бұрын
I dont get what theta(parameters) here?
@zeinebromthana7336
Ай бұрын
very fruitful!!
@binb3463
2 жыл бұрын
It is an amazing lecture
@ohakimedward2852
2 жыл бұрын
Great lecture.
@박성현학생항공우주공
11 ай бұрын
How can I get solutions for the assignments of this course? I'm looking for solutions for Winter 2021 ver.
@djl-1n
26 күн бұрын
github
@TolemyKashyap
8 ай бұрын
I have little to no knowledge about machine learning... Can I still start this course? Is it beginner friendly?
@stanfordonline
8 ай бұрын
Hi there, great question! If you are just beginning to learn about machine learning we recommend starting with this course: www.coursera.org/specializations/machine-learning-introduction
@JayaC96
2 жыл бұрын
Thank you, great lecture!
@goanshubansal8035
11 ай бұрын
Which are his personal sentences ?
@fayzankowshik3625
Жыл бұрын
It seems this course is theory based, where can I learn to code these concepts and algorithms?
@ahmedtryaq7853
Жыл бұрын
coursera
@borgo1633
Жыл бұрын
thx for sharing
@goanshubansal8035
11 ай бұрын
how does Christopher d manning papa think?
@haticeozbolat0371
Жыл бұрын
41:10 Gradient I don't understand. How to get it? Can anyone reading the comments give me advice?🤗🤗🤗
@izumiasmr
11 ай бұрын
check up your knowledge of single variable calculus (derivative, differentiating, interpretation of a derivative & applications of derivative) and then just basics of multiple variable calculus (functions of several variables, partial derivatives), mit 18.01sc 18.02sc could be good (and free) resources for picking it up. that is if you want to get an understanding of the math under the hood, I'd say that in parallel you definitely could practice with the higher-level applications of it just like this course.
@haticeozbolat0371
10 ай бұрын
Thankss@@izumiasmr
@icer9591
Жыл бұрын
So NICE!
@MrSuperrdad
8 ай бұрын
I loved it!!!
@georgeb8637
Жыл бұрын
27:51 - word2vec
@TopicalAuthority
2 жыл бұрын
Thank you.
@BeckCaesar-r8l
3 күн бұрын
Hernandez Timothy Lewis Brenda Garcia Richard
@MathewsonKindermann
5 ай бұрын
how can i get the ppt ?
@jinli1835
2 жыл бұрын
Is this course suitable for beginners?
@yagneshbhadiyadra7938
5 ай бұрын
Great knowledge it seems, but give this to an Indian youtuber, and he will make a 3 video series out of a single lecture that is easier to understand. #opinion
@dtace5339
4 ай бұрын
LOL TRUE
@sayedmohammedsalimkp5236
Жыл бұрын
link of textbook?
@jianrongjiao
Ай бұрын
哪里有中文版本的呢？中文字幕也可以
@djl-1n
26 күн бұрын
bilibili
@TaoTunesAndWisdom
7 ай бұрын
"um"
@Akina__Chen-y6p
8 ай бұрын
Great
@piewpok3127
Жыл бұрын
Day 1 .
@piewpok3127
Жыл бұрын
lezz go !!
@haroldkumarnaik9971
2 жыл бұрын
34:49 what is uo and vc
@bpmsilva
2 жыл бұрын
I think vc is the vector representation of the center word and uo is the vector representation of a context word
@goanshubansal8035
11 ай бұрын
What am going to get out of this very video, lets see
@goanshubansal8035
11 ай бұрын
Christopher d manning papa
@user-rp1kn8gq6h
Жыл бұрын
13:55
@annawilson3824
8 ай бұрын
35:31
@goanshubansal8035
11 ай бұрын
what is stopping you here?
@RobertDoobay
Жыл бұрын
w0o0ord
@muhammadhelmy5575
Жыл бұрын
14:10
@goanshubansal8035
11 ай бұрын
what is the qualification of this professor ?