StatQuest: Linear Discriminant Analysis (LDA) clearly explained.

Рет қаралды 760,694

StatQuest with Josh Starmer

Жүктеу

Пікірлер: 898

@statquest
4 жыл бұрын
NOTE: The StatQuest LDA Study Guide is available! statquest.gumroad.com Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@realcirno1750
4 жыл бұрын
woohoo
@WIFI-nf4tg
3 жыл бұрын
Can you please do something on canonical analysis ?
@statquest
3 жыл бұрын
@@WIFI-nf4tg I'll keep that in mind.
@falaksingla6242
2 жыл бұрын
Hi Josh, Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so. Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.
@aayushtheapple
2 жыл бұрын
website shows "Error establishing a database connection" !
@yuniprastika7022
3 жыл бұрын
the funny thing is, so many materials from this channel are for those university students (like me) but he keeps treating us like kindergarten children. Haha feels like i'll never be growing up, by watching your videos sir! QUADRO BAAM SIR, THIS WORLD HAS BEEN GONE TOO SERIOUS, THANK YOU FOR BRINGING BACK THE JOY
@statquest
3 жыл бұрын
Thank you very much! :)
@daisy-fb5jc
2 жыл бұрын
I am a kindergarden kid in this subject : (
@andrejalabama1204
Жыл бұрын
@@daisy-fb5jc same here; i need someone to explain it like im a little kid
@SinoLegionaire
9 ай бұрын
Remember: Us Adults are just big children
@bokai5829
5 жыл бұрын
Every time I heard the intro music. I know my assignment is due in 2 days.
@statquest
5 жыл бұрын
LOL! :)
@bokai5829
5 жыл бұрын
@@statquest Thank you very much!
@KonesThe
4 жыл бұрын
hahahah I'm on the same boat right now
@HenriqueEC1
4 жыл бұрын
Good to know I'm not alone.
@PJokerLP
Жыл бұрын
10,5 hours till my machine learning exam. Thank you so much, I feel way better prepared than if I would have watched all of my class material.
@haydo8373
7 жыл бұрын
Hey what is the intro track called? I couldn't find it on Spotify. . . :D
@hiteshjoshi3061
Ай бұрын
It's their own
@hiteshjoshi3061
Ай бұрын
Listen carefully it's the channel name in it and is cool 😂😂👌
@beccalynch4407
2 жыл бұрын
Just spent hours so confused, watching my lectures where the professor used only lin alg and not a single picture. Watched this video and understood it right away. Thank you so much for what you do!
@statquest
2 жыл бұрын
Glad it helped!
@robinduan1985
6 жыл бұрын
This is amazing! 15 mins video does way better than my lecturer in an 2 hours class
@elise3455
3 жыл бұрын
While these 15 min videos are excellent for gaining intuition, you still often need those 2-hour classes to get familiar with the mathematical rigor.
@NoahElRhandour
2 жыл бұрын
@@elise3455 no you dont. math follows super quick and easy when you understood what it is about
@GaganSingh-zz9el
2 жыл бұрын
@@NoahElRhandour yeah brother
@Jacob-t1j
Жыл бұрын
@@elise3455 No you don't. Math become super easy once you understand what you doing
@rachelstarmer9835
8 жыл бұрын
Awesome! Even I get it and love it! I'm going to share one of your stat-quest posts as an example of why simple explanations in everyday language is far superior to using academic jargon in complex ways to argue a point. Also, it's a great example of how to develop an argument. You've created something here that's useful beyond statistics! Three cheers for the liberal arts education!!!! Three cheers for Stat-Quest!!
@kshitijkumar6610
4 жыл бұрын
Are you somehow related to Joshua? :-P
@georgeshibley9529
4 жыл бұрын
@@rachelstarmer5073 ha
@seifeldineslam
4 жыл бұрын
This was honestly helpful, i am an aspiring behavioral geneticist (Aspiring because I am still an undergraduate of biotechnology) with really disrupted fundamentals of math especially statics. Your existence as a youtube channel is a treasure discovery to me !
@statquest
4 жыл бұрын
Thanks! :)
@hlatse98
7 жыл бұрын
Brilliant video! Very helpful. Thank you.
@MartinUToob
5 жыл бұрын
When's the StatQuest album coming out? (Here come the Grammies!) 🎸👑 Actually, the only reason I watch your videos is for the music. 😍🎶🎵
@maverickstclare3756
5 жыл бұрын
"Dr, those cancer pills just make me feel worse" presses red button "wohp waaaaaaaa" "next patient please" :)
@Muzik2hruRain
4 жыл бұрын
You, sir, you are a life saver. Now in every complicated machine learning topics I look for your explanation, or at least wonder how you would have approached this. Thank you, really.
@statquest
4 жыл бұрын
Awesome! Thank you! :)
@ebadulislam123
4 ай бұрын
Whats the most frequent phrase said by Josh ? a) "Bam" b) "Waa waa" c) "Oh No" d) "I am a geneticist"
@statquest
4 ай бұрын
bam!
@ebadulislam123
4 ай бұрын
@@statquest Double Bam!!
@gauranggarg549
3 жыл бұрын
Cant understand a topic and then u find a statquest video on it TRIPLE BAMM!!
@statquest
3 жыл бұрын
:)
@hamzaghandi4807
2 жыл бұрын
Besides this wonderful explanation, Your music is very good !
@statquest
2 жыл бұрын
Many thanks!
@phoenixflames6019
Ай бұрын
10/10 intro song 10/10 explanation using PCA, I can reduce these two ratings to just one: 10/10 is enough to rate the whole video using LDA, the KZitem chapters feature maximizes the separation between these 2 major components (intro and explanation) of the video
@statquest
Ай бұрын
BAM!!! :)
@Anmolmovies
6 жыл бұрын
Absolutely brilliant. Kudo's to you for making seem it so simple. Thanks!
@YuzuruA
4 жыл бұрын
Just saw the support vector machines and got surprised as the goal is almost the same, method is 90 degrees different!
@statquest
4 жыл бұрын
Yep - a lot of machine learning is all about trying to find ways to separate categories.
@alphabetadministrator
7 ай бұрын
Hello Josh. As always, thank you for your super intuitive videos. I won't survive college without you. I do have an unanswered conundrum about this video, however. For Linear Discriminant Analysis, shouldn't there be at least as many predictors as the number of clusters? Here's why. Say p=1 and I have 2 clusters. In this case, there is nothing I can do to further optimize the class separations. The points as they are on the line already maximizes the Fisher Criterion(between-class scatter/in-class scatter). While I do not have the second predictor axis to begin with, even if I were to apply a linear transformation on the line to find a new line to re-project the data on, it will only make the means closer together. Extending this reasoning to the 2D case where you used gene x and gene y as predictors and 3 classes, if the 3 classes exist on a 2D plane, there is nothing we can do to further optimize the separation of the means of the 3 classes because re-projecting the points on a new tilted 2D plane will most likely reduce the distances between the means. Now, if each scatter lied perfectly vertically such that as Gene Y goes up the classes are separated distinctly, then we could re-project the points on a new line(that would be parallel to the invisible vertical class separation line) to further minimize each class's scatter, but this kind of case is very rare. Given my reasoning, my intuition is that an implicit assumption for LDA is that there needs to be at least as many predictors as the number of classes to separate. Is my intuition valid?
@statquest
6 ай бұрын
I believe your question might be answered in this video on PCA tips: kzitem.info/news/bejne/0IiszaVvb2iqjZw
@hannav7125
3 жыл бұрын
fact: none of you skipped the intro
@statquest
3 жыл бұрын
This is one of my favorites. :)
@whasuklee
5 жыл бұрын
Came for my midterm tomorrow, stayed for the intro track.
@Azureandfabricmastery
4 жыл бұрын
Hi Josh, Helpful to understand the differences between PCA and LDA and how LDA actually works internally. You're indeed making life easier with visual demonstrations for students like me :) God bless and Thank you!
@statquest
4 жыл бұрын
Glad it was helpful!
@xujerry7891
7 жыл бұрын
Hi, Joshua. Thank you for your videos, it’s really helpful. I have a question: so when you have a LDA to categorize n categories, does it mean that you need (n-1) axis to separate the points? In that case, how can I visualize them?
@kwamenani7775
2 жыл бұрын
Hi, Josh is it possible to make your account accept gifts? I feel like I owe you a lot, I failed a couple of interviews in the past in ML theory but binge-watched videos on your channel within 2 months and have landed multiple offers.
@statquest
2 жыл бұрын
Triple bam! Congratulations on getting some job offers. If you want to support StatQuest, you can donate here: www.paypal.com/paypalme/statquest
@alis5893
3 жыл бұрын
Josh. you are an amazing teacher. i have learned so much from you , a big thank you from the bottom ofmy heart. god bless you
@statquest
3 жыл бұрын
My pleasure!
@aneeshmenon12
8 жыл бұрын
woww........toooo goodddddddddddd.....dear Starmer...nothing to say..you are incredible...I am eagerly waiting for your next video...
@amrit20061994
3 жыл бұрын
"But what if we used data from 10k genes?" "Suddenly, being able to create 2 axes that maximize the separation of three categories is 'super cool'." Well played, StatQuest, well played!
@statquest
3 жыл бұрын
Thanks!
@daisy-fb5jc
2 жыл бұрын
I wish I can throw this video to my professor, and teach her how to give understandable lectures. Just a wish.
@statquest
2 жыл бұрын
:)
@merida3975
2 жыл бұрын
The song at the beginning made my day, even though I took wrong tutorial of Linear discriminant analysis in data science. Just awesome. Love it a lot. We need more and more funny teachers like you.
@statquest
2 жыл бұрын
Thanks!
@zn4q3oi18zx
4 жыл бұрын
I really enjoyed your video! But it seems "Linear Discriminant Analysis" in this video actually means Fisher's linear discriminant?
@statquest
4 жыл бұрын
What you say is true - however, to quote the Wikipedia article on LDA: "The terms Fisher's linear discriminant and LDA are often used interchangeably" en.wikipedia.org/wiki/Linear_discriminant_analysis
@AdityaDodda
4 жыл бұрын
@StatQuest: Thank you for the video. It is very helpful. Would it be fair to say that PCA is unsupervised but LDA is supervised?
@statquest
4 жыл бұрын
Yes, that is correct.
@sridharyamijala4739
Жыл бұрын
Another excellent video just as great as the one on PCA. I read a Professor's view on most of the models and algorithms stuff in ML where he recommended understanding the concepts well so that we know where to apply and not worry too much about the actual computation at that stage. The thing that is great in your videos is that you explain the concept very well.
@statquest
Жыл бұрын
Thank you very much! :)
@r12-ux7us
Жыл бұрын
Linear Discriminant Analysis sounds function of minority report system(i.e movie).
@statquest
Жыл бұрын
:)
@neelkhare6934
4 жыл бұрын
Wow , that is one of the best explanations of LDA it helped me get an intuitive idea about LDA and what it actually does in classification Thank You!
@statquest
4 жыл бұрын
Hooray! Thank you! :)
@neelkhare6934
4 жыл бұрын
Can you make a video on quadratic discriminant Analysis
@statquest
4 жыл бұрын
@@Sachin-vr4ms Which part? Can you specify minutes and seconds in the video?
@statquest
4 жыл бұрын
@@Sachin-vr4ms I'm sorry that it is confusing, but let me try to explain: At 9:46, imagine rotating the black line a bunch of times, a few degrees at a time, and using the equation shown at 8:55 to calculate a value at each step. The rotation that gives us the largest value (i.e. there is a relatively large distance between the means and a relatively small amount of scatter in both clusters) is the rotation that we select. If we have 3 categories, then we rotate an "x/y-axis" a bunch of times, a few degrees each time, and calculate the distances from the means to the central point and the scatter for each category and then calculate the ratio of the squared means and the scatter. Again, the rotation with the largest value is the one that we will use. Does that help?
@statquest
4 жыл бұрын
@@Sachin-vr4ms I'm glad it was helpful, and I'll try to include more "how to do this in R and python" videos.
@art.ventures
4 жыл бұрын
Who did finish the video and immediately restart it just for listening to the song?
@statquest
4 жыл бұрын
This is one of my best! :)
@art.ventures
4 жыл бұрын
@@statquest Nice!!!
@cnbmonster1042
3 жыл бұрын
Amazing! I subscribed after watching your video only twice!
@statquest
3 жыл бұрын
Wow, thanks!
@nuttapatchaovanapricha
10 ай бұрын
Very useful and intuitive, also sick intro music right there as usual! xD
@statquest
10 ай бұрын
I think this might be my favorite intro.
@assortedtea902
5 жыл бұрын
What the heck is a gene transcript. i really hate it when these things are mentioned casually and the listener is assumed to know them already. NO i dont know what is a gene transcript. now i have to pause the video and google about gene transcripts. ugghhh
@jahanvi9429
2 жыл бұрын
the song in the introduction is always awesome. thanks lol! and very useful video
@statquest
2 жыл бұрын
Thanks!
@woodworkingaspirations1720
Жыл бұрын
This lecture has no instance of "bam" or "double Bam"
@statquest
Жыл бұрын
It's an old one. :)
@SeqBioMusic
7 жыл бұрын
Awesome! It'll be good to give some differences of PCA and LDA. For example, PCA is studying the X. LDA is studying the X->Y.
@mrunalwaghmare
5 ай бұрын
love ur vids they are simple to understand
@statquest
5 ай бұрын
Glad you like them!
@happylearning-gp
2 жыл бұрын
Excellent Tutorial, Thank you very much
@statquest
2 жыл бұрын
Glad you liked it!
@blownspeakersss
6 жыл бұрын
Why does LDA here seem totally different from how LDA is presented in the ISLR textbook? In ISLR, we simply assume P(X | Y = k) is Gaussian for all k classes. Then we literally just plug in estimates into bayes theorem. So now we have an estimate for P(Y = k | X), which is the desired probability for classifying a feature vector X into a class k. Then, we take the log of bayes theorem, with our estimates, and we get a linear discriminate function that is used for classifying. The way it's presented in ISLR is similar to how it's presented in this lecture: kzitem.info/news/bejne/wKNtuoGhs4yrqKg I just don't see why the same topic of LDA seems vastly different in this video? Edit: Apparently there are different "discriminate rules". The one I'm referring to is called Bayes Discriminant Rule. The type presented here is called Fisher’s linear discriminant rule.
@statquest
6 жыл бұрын
You seem to have figured out. This video describes LDA as Fisher originally presented it way back in the day - as an attempt to maximize variance between groups and minimize variance within groups - not the Bayesian approach.
@blownspeakersss
6 жыл бұрын
Yep, thanks. This also clears up why Wikipedia says that LDA is similar to ANOVA. At first I didn't see how the two were similar, but it makes sense now.
@dpcarlyle
3 жыл бұрын
Thank you for the amazing explanation :) you make it so much fun....
@statquest
3 жыл бұрын
My pleasure!
@foramjoshi3699
3 жыл бұрын
The 96 people who disliked this our the patients for whom the medicines did not work :P
@statquest
3 жыл бұрын
True!
@MohammedNoureldin
4 жыл бұрын
Man I like you! Thanks a lot you helped me to understand PCA and LDA without even making "Owa Owa" once! :D
@statquest
4 жыл бұрын
Thanks! :)
@arpitagupta4474
2 жыл бұрын
I am able to grasp on this topic without being scared. Kudos to this channel
@statquest
2 жыл бұрын
Thank you!
@tomaszberent801
5 жыл бұрын
The video shows how LDA reduces dimensions and we can clearly see a newly constructed axis (like with PCA) which - in LDA analysis - maximizes the separation. That was very clear!. How does this line relates to a line that actually separates the two categories on an original XY plain you refer to on 2:48 minute of your video?. After all it is this line (do we call it a discriminant function?) which is usually used to show the separation?. The latter is intuitively understood as a separation border, the former explains how we reduced dimension. What is the link between the two?
@statquest
5 жыл бұрын
That's a good question. There are a few options for coming up with a threshold that allows you to classify new observations into one of the categories in your training dataset. The simplest is to transform the new observation using the transformation that the training dataset created, and then measure the euclidean distance between the new observations and the center of each classification. The classification that is closest to the new observation is used to classify the new observation.
@EGlobalKnowledge
2 жыл бұрын
A wonderful explanation. Thank you
@statquest
2 жыл бұрын
You are welcome!
@armansh7978
4 жыл бұрын
Awesome, just I can say bravo man, bravo, thank you very much.
@statquest
4 жыл бұрын
Thanks!
@worldofbrahmatej2023
5 жыл бұрын
Excellent! You are a better teacher than many overrated professors out there :)
@statquest
5 жыл бұрын
Thank you! :)
@arungandhi5612
3 жыл бұрын
you are very cool bro. I aced my work at my research institute because of youuuuuuuu
@statquest
3 жыл бұрын
That's awesome!!! So glad to hear the videos are helpful. :)
@wtfJonKnowNothing
9 ай бұрын
I'm more aligned to hear and love the song than the lecture these days :)
@statquest
9 ай бұрын
:)
@sanketbadhe3572
5 жыл бұрын
I just watched all your videos for intro track :P ......awesome tracks and nicely explained videos
@statquest
5 жыл бұрын
Awesome! :)
@ArghyadeepPal
4 жыл бұрын
Waiting for your new album lol..
@statquest
4 жыл бұрын
It's out: joshuastarmer.bandcamp.com/
@ArghyadeepPal
4 жыл бұрын
@@statquest Wow thanks. Btw your videos are helping me a lot in my last minute preparations for a test. Thanks Josh!!
@statquest
4 жыл бұрын
bam! Good luck on your exam! :)
@Dr.CandanEsin
4 жыл бұрын
Too much time and effort spent, but they worth it. Best explanation I watched after six weeks of search. Cordially thank you.
@statquest
4 жыл бұрын
Thanks! :)
@elizabeths3989
3 жыл бұрын
You are about to be the reason I pass my qualifying exam in bioinformatics 🙏🙏
@statquest
3 жыл бұрын
Good luck!!! BAM! :)
@salmankhan-cu9hn
3 жыл бұрын
you are the best. Thanks for such a good explanation :)
@statquest
3 жыл бұрын
You're welcome 😊
@PV10008
5 жыл бұрын
I really like the systematic way you approach each topic and anticipate all the questions a student might have.
@mahdimantash313
2 жыл бұрын
I really can't thank you enough for that...you did in 16 mins what I couldn't do in 4 hours. keep on the good work!! and thank you again !!!
@statquest
2 жыл бұрын
Thanks!
@LaKtJ
6 жыл бұрын
Regarding LDA for 3 categories, how do you maximize the distance between the central point and each category's central point? These points are always the same, aren't they? So how do you maximize something that does not change?
@LaKtJ
6 жыл бұрын
Yes, I understand it now. Thanks!
@rohil1993
4 жыл бұрын
This explains the beauty of LDA so well! Thank you so much!
@statquest
4 жыл бұрын
Awesome! Thank you very much! :)
@AliceWandering-c8w
4 жыл бұрын
Thanks for the video! It really helps! May I check is that PCA and LDA are similar in the sense that they both reduce dimension, but PCA is unsupervised learning while LDA is supervised learning?
@statquest
4 жыл бұрын
That's exactly right! :)
@ChaminduWeerasinghe
3 жыл бұрын
Best explanation iv ever seen on ML. This is the first time iv watch ML youtube video without rewind :| .. Keep Up bro..
@statquest
3 жыл бұрын
Wow, thanks!
@בועזמונטיליה
2 жыл бұрын
hey, thanks for an amazing explanation! I have a question: In the video you mentioned the mean difference is squared in order to prevent negativity, you can just as well use the absolute value, is there a reason the prefer squaring?
@statquest
2 жыл бұрын
From the perspective of doing the math, the square function is usually much easier to work with than the absolute value.
@theultimatereductionist7592
6 жыл бұрын
I know in any dimension this is an NP complete problem because it has the same cardinality of solution space as the subset sum problem: which is 2^N where N= number of data points, therefore 2^N is the number of all subsets one must check: i.e. project all possible subsets down to the new axis. Of course, once on a single ordered 1D axis, since the data points will now have a fixed order relative to each other, one has at most N places to make the separation.
@RaghuMittal
7 жыл бұрын
Great video! I initially couldn't understand LDA looking at the math equations elsewhere, but when I came across this video, I was able to understand LDA very well. Thanks for the effort.
@yuhaooo8143
6 жыл бұрын
is it for n categories, we construct n-1 axis? thanks for reply:)
@statquest
6 жыл бұрын
Correct!
@namanjha4964
11 ай бұрын
Thank You very very very much, You bring joy to me
@statquest
11 ай бұрын
Thank you!
5 жыл бұрын
I think I love you! thanks for these amazing videos! It's helping me to understand a lot of things for my PhD!
@goksuntuncayengin7104
3 жыл бұрын
Hi Josh, thanks for the video! I want to ask that whether LDA always determines a (dimensions-1)D space or not. (Line for 2 pts, plane for 3 pts etc.)
@statquest
3 жыл бұрын
The number of dimensions is determined by the number of categories, not the amount of data. If you have 2 categories, you get 1 LD. If you have 3 categories, you get 2 LDs. If you have more categories, you get num. categories - 1 LDs.
@scottsun9413
Жыл бұрын
Really great videos, saved me from my data science classes. I'm applying for graduate program at UNC, hope I can have the opportunity to meet the content creators sometime in the future.
@statquest
Жыл бұрын
Best of luck!
@adejumobiidris2892
2 жыл бұрын
Thank you so much for helping me provide a faster solution for the confusion that has taken control of my head for 72h.
@statquest
2 жыл бұрын
Happy to help!
@hanaibrahim1563
2 жыл бұрын
Amazing. Thank you for this excellent video. Explained everything super clearly to me in a super concise manner without all the academic jargon getting in the way.
@statquest
2 жыл бұрын
Glad it was helpful!
@republic2033
6 жыл бұрын
Thank you, very educative and entertaining!
@statquest
6 жыл бұрын
You're welcome! :)
@tarunbirgambhir3627
5 жыл бұрын
Can you provide an example where high variance in the data from PCA is more important than high ‘separability’ of the data from LDA for a classification problem?
@statquest
5 жыл бұрын
I often use PCA to answer the question "are the data what I think they are, or were they mislabeled". PCA is useful when you want to use an "objective" method to see how your data clusters. For example, when I'm worried that I mislabeled my data, PCA would find clusters without using the labels. However, LDA requires knowing how I want to separate things - so I can't use it to determine if I labeled things correctly. Does that make sense?
@tarunbirgambhir3627
5 жыл бұрын
StatQuest with Josh Starmer In other words you mean PCA is used for Unsupervised Classification...makes sense
@statquest
5 жыл бұрын
@@tarunbirgambhir3627 Exactly.
@charlespang5515
5 жыл бұрын
@@statquest Would you consider Cluster analysis as another method to check on your labeling? I do that when I want to validate the manual labeling of my project data. I would love to hear the views from a real life practitioner like yourself.
@statquest
5 жыл бұрын
@@charlespang5515 Yes, any unsupervised method will work. Hierarchical or correlation clustering is another approach that does the same thing.
@oklu_
3 жыл бұрын
thank you for your kind, slow, and detailed explanation😭
@statquest
3 жыл бұрын
You’re welcome 😊!
@rmiliming
2 жыл бұрын
very clearly explained. the video is very enjoyable to watch too! Statquest has all that is needed to learn machine learning algos and stats well
@statquest
2 жыл бұрын
Thank you!
@alialsaady5
5 жыл бұрын
Thank you for the explanation, it's pretty clear. But there is something I dont understand. When you have 3 categories, LDA creates 2 axes to seperate the data. But what if you have 4 categories or more? How many axes will LDA create to seperate the data?
@statquest
5 жыл бұрын
LDA always creates one axis fewer than the total number of categories. So if you have 30 categories, you'll get 29 axes. That said... the axes are given in order of importance. So the first axis is the most important one. The second axis is the second most important one. etc. etc. etc. So it is often the case that even though LDA gives you 29 axes (if you have 30 categories), you only need the first 2 or 3.
@alialsaady5
5 жыл бұрын
@@statquest So the first axis is the most important one because this axes seperates the categories the best? And the second axis is the second most important one because this axis seperates the categories the second best?
@statquest
5 жыл бұрын
@@alialsaady5 Yes. Technically, the first axis accounts for the most variation among the categories. The second axis accounts for the second most variation among the categories. etc. etc. etc.
@alialsaady5
5 жыл бұрын
@@statquest Thank you, do you happen to have the sources where this information can be found? I want to process this information in my thesis
@statquest
5 жыл бұрын
@@alialsaady5 You can cite the videos. I can't remember exactly where I got all of the information for this video - probably wikipedia and The Elements of Statistical Learning - but I also read the original manuscripts and I scour the internet for examples and derivations.
@nishisaxena4831
4 жыл бұрын
much better than my university lecture that I listened to twice but couldn't understand ... this was awesome, thanks!
@statquest
4 жыл бұрын
Hooray! I'm glad the video was helpful. :)
@rodrigolivianu9531
4 жыл бұрын
Great video! Just wanted to point out that LDA is a classifier, which involves a few more steps than the procedure described here, such as assumption that the data is gaussian. The procedure here described is only the feature extraction/dimensionality reduction phase of the LDA. G
@statquest
4 жыл бұрын
You are correct! I made this video before I was aware that people had adapted LDA for classification. Technically we are describing "Fisher's Linear Discriminant". That said, using LDA for classification is robust to violations to the gaussian assumptions. For more details, see: sebastianraschka.com/Articles/2014_python_lda.html
@rodrigolivianu9531
4 жыл бұрын
StatQuest with Josh Starmer That said, I must admit I am having a really hard time understanding how the fisherian and baysian approach lead to the same conclusion even with completely different routes. If you have any source on that it would be of enormous help for my sanity haha
@swordchen7385
7 жыл бұрын
On the one hand ,we try to find the maximum distance between the two groups mean ,On the other hand ,we need to make sure the variance of the single group to be the minimum.This method can be as much as possible to separate the two sets of data.I understand right?
@h_2577
3 жыл бұрын
So this is the first video I am watching and it starts with the song "Statqueeeest". 😂❤
@statquest
3 жыл бұрын
Bam! :)
@h_2577
3 жыл бұрын
@@statquest and this video is very very useful. So thank u. 🙏
@statquest
3 жыл бұрын
@@h_2577 Double bam! :)
@michaelyang7657
4 жыл бұрын
Dude you're an Alpha, better than most of my professors
@statquest
4 жыл бұрын
Thank you! :)
@HelloWorld-ji9fp
4 жыл бұрын
Thanks Josh...Very informative video....
@statquest
4 жыл бұрын
Thank you! :)
@gaboceron100
3 жыл бұрын
Very illustrative, thanks for the video!
@statquest
3 жыл бұрын
Thanks!
@saiakhil1997
4 жыл бұрын
I really liked how you compared the processes of PCA and LDA analysis. I got to know a different way to view LDA due to this video
@statquest
4 жыл бұрын
Bam!
@greina6945
7 жыл бұрын
Very nice explanation. The only issue I have is that the first and second axes for both PCA and LDA are not Gene 1 and Gene 2. They are instead some linear combination of Gene 1 and Gene 2. So in a 10,000 gene space, you will get some combination of some of the 10,000 genes that clearly separate the two groups. For example, LD1 could be one third of Gene 12 plus one third of gene 45 plus one sixth of gene 456 plus one sixth of gene 1,234.
@Ellie_ho
4 жыл бұрын
best video. very clear. Thanks.
@statquest
4 жыл бұрын
Thanks! :)
@ravipandey3097
6 жыл бұрын
For a two-class problem, we can go with a single axis. a three-class problem has to go with at least 2 axes (three mean points) and similarly four-class has to have at-least three axes. Is my understanding right ?
@statquest
6 жыл бұрын
That is correct! :)
@vinceb8041
5 жыл бұрын
@@statquest This video was amazing! I have a similar question, regarding the space in which the newly created axes lie: if we have 2 genes and, say, 4 Categorical Response level (e.g. drug works, works a little, harms a little, harms only). Then LDA would give us 3 axes because the central points of each group together define a tetrahedron (thus 3 dimensions). Does this not mean that we traded a 2D graph for a 3D graph, because our new axes are now 3 and all perpendicular to each other? And in extension, what if we have >4 levels and the new axes become impossible to display, how does that benefit us in separating the data?
@statquest
5 жыл бұрын
@@vinceb8041 Just because you can't draw a picture of the result, doesn't mean that it isn't useful. You can do LDA with 5 categories, and get a 4 dimensional result (which you can't draw on paper), and still use it to classify new samples. Just apply the same transformation to the new samples and classify them using k-Nearest Neighbors. Even though you can't see it, you can still calculate the euclidian distances and determine which previously classified samples are closest to the new samples. Does that make sense?
@vinceb8041
5 жыл бұрын
@@statquest Thanks for the reply, I think I understand what you mean. So the LDA doesn't necessarily produce an output you can visualize but the output still has the useful properties we need for further analysis?
@statquest
5 жыл бұрын
@@vinceb8041 Yes! We can still use the output to classify new data.
@AbdoulahLy
3 ай бұрын
Waouwwwww, Wondrous, sinmple and clear
@statquest
3 ай бұрын
Thanks! :)
@Sam1998Here
2 ай бұрын
Hi, thank you for your explanation. It seems like the LDA as you have explained is different from the LDA explanation I found in the following video: kzitem.info/news/bejne/qoOcsox7iJ2kepg. I understand the math behind the video, but I am wondering how and why your explanation of LDA is equivalent to the video's explanation of LDA? From my understanding, it seems like that the math suggests LDA looks at pre-labeled data, calculates appropriate means and covariance(s) for each class label, and draws decision boundaries based on which class label for a given point would give the highest likelihood function value. How does that, as you said, maximize (\mu_1 - \mu_2)^/(s_1^2 + s_2^2) ? Also, what would the function we are trying to maximize look like when we have more than two labels? Could you also refer me to a paper that I may find helpful? I am comfortable with reading rigorous math. Thanks so much
@statquest
2 ай бұрын
This video covers the dimension reduction method that is also called "Fisher's linear discriminant". The LDA in the other video is a generalization of this method. For more details seeen.wikipedia.org/wiki/Linear_discriminant_analysis
@whenmathsmeetcoding1836
4 жыл бұрын
Your video are as like always awesome..
@statquest
4 жыл бұрын
Thanks! :)
@neillunavat
4 жыл бұрын
I am so glad this channel has grown to around 316k subscribers. Very well explained. The best of bests.
@statquest
4 жыл бұрын
Wow, thank you!
@chemicalbiomedengine
5 жыл бұрын
always excited when i look for a topic and its available on statquest
@statquest
5 жыл бұрын
Awesome! :)
@XShollaj
3 жыл бұрын
BAM BAM Josh, you're the man
@statquest
3 жыл бұрын
Thanks!
@BillHaug
2 жыл бұрын
I would agree that "awesome song..." is an appropriate label.
@statquest
2 жыл бұрын
bam!
@BillHaug
2 жыл бұрын
@@statquest I would even say double bam... btw... "we're going to do a lot of maths step by step" = triple bam
@statquest
2 жыл бұрын
@@BillHaug Awesome!!! I love that you're a connoisseur of StatQuest themes!!!
@tatidutra
2 жыл бұрын
Really helpul video, thank you! ;)
@statquest
2 жыл бұрын
Glad it was helpful!
@AtharvaShirode-ff8es
2 ай бұрын
You are just superb!! 8yrs. & still so concise and best explanation
@statquest
2 ай бұрын
Thanks!