Polynomial Regression in R | R Tutorial 5.12 | MarinStatsLectures

Рет қаралды 86,962

MarinStatsLectures-R Programming & Statistics

Жүктеу

Пікірлер: 100

@marinstatlectures
5 жыл бұрын
in this video tutorial we learn how to fit the polynomial regression model and assess the regression model in R using the partial F-test with examples. For more in-depth explanation of linear regression check our series on linear regression concept and R (bit.ly/2z8fXg1); Like to support us? You can Donate (statslectures.com/support-us), Share our Videos, Leave us a Comment, Give us a Like or Write us a Review! Either way, We Thank You!
@hannukoistinen5329
6 ай бұрын
Actually it is pretty much linear. You can always use log to make it more linear and then make the tests.
@dioszegigergo6770
7 жыл бұрын
Dear Marin and Ladan, hats off! Clearly explained with such a deep knowledge and human understanding! Thank you very-very much! You=lm (teacher~knowledge+I(statwizard^3)), a talent="TRUE" in your field. Enjoy your life with your family and if you find the time and opportunity, the new series on guiding Us=lm(astronauts~strayed+I(how^2)) in the space of R, is highly welcome! All the best, Gergo Dioszegi
@alfredkik3675
3 жыл бұрын
You videos helped me to complete my MSc degree successfully - thank you very much for your very informative videos!
@LuisFuentes1771
6 жыл бұрын
Hello Mike, With this video I've finished your course of videos of the introduction of R and I don't have the words to express my gratitude. Thanks to your amazing work I've entered the world of data science, and I will continue diving into this wonderful technique full of possibilities. Since I'm an student of Economics this will be incredibly useful. You have helped me inmensely without asking for anything, as I'm sure you have thousands of other people who feel equally as thankful The world needs more people like you, and I will try to continue the chain of helping others. Sincerly from Universidad Carlos III, Madrid, Luis
@nareshpandey103
4 жыл бұрын
Thank you for the best tutorial, you have provided the datasheet which made it more beneficial
@hapsyottahuang8529
5 жыл бұрын
Really excellent tutorail series. Thank you very much.
@marinstatlectures
5 жыл бұрын
you're welcome :)
@richaagrawal6677
7 жыл бұрын
Hello Mike, I am glad you managed to teach all of us with such explanatory and step by step approach. I watched all your series 5 videos and i wish more people can take advantage of your knowledge and skill. Thank you so much. Looking forward to more. Regards Richa
@anuragkumar2295
7 жыл бұрын
All the videos are very informative and interactive. Thanks for very much Professor. :)
@neofjcn01
3 жыл бұрын
THANKS MAN, THIS VIDEO WAS SUPER USEFUL
@huangb6403
7 жыл бұрын
Thanks for your linear regression series. So helpful!
@ivanalejandrovaciohernande5512
15 күн бұрын
Hi, Mike, What test should I perform on data prior the selection of a Polynomial Model? great video man
@alirostami8794
3 жыл бұрын
Excellent. Thank you so much for this helpful video. I'm waiting for a new tutorial video.
@matgg8207
2 жыл бұрын
This dude is a fxking live saver
@anuragkumar2295
7 жыл бұрын
Waiting for your new tutorials on R programming. :)
@angeld5093
8 жыл бұрын
Thank you so much. you are way better than my teacher
@marinstatlectures
8 жыл бұрын
Thank you +Anqi Dai
@patrysia0000
4 жыл бұрын
Great video! Thank you!
@marinstatlectures
4 жыл бұрын
you're welcome :)
@siddharthadas86
7 жыл бұрын
Fantastic series. Very clear and crisp explanations.Thank you very much again for this. Would it be possible to make some videos on longitudinal data and logistic regression?
@seeklikas
5 жыл бұрын
Very clear my friend. Thumb up
@marinstatlectures
5 жыл бұрын
thanks :)
@user-bh2rd1dz1z
4 жыл бұрын
Superb. Helped me a lot. Thank you!
@meeadhadi5209
Күн бұрын
Hi. Thank you for your great explanation. The page for Dataset & R Script doesn't exist and the provided link doesn't work.
@siddhft3001
3 жыл бұрын
This video is gold!
@eminaker6080
2 жыл бұрын
You're the best!
@md.masumbillah8222
2 жыл бұрын
well explained thanks for upload
@govindsharma9871
4 жыл бұрын
Great help. Thanks a lot.
@seyedomidshirdelan8609
7 жыл бұрын
Thanks for well-featured videos
@marinstatlectures
7 жыл бұрын
you're welcome :)
@HeatherRoseMusician
5 жыл бұрын
Great video! Very helpful :-)
@marziehsafari2440
2 жыл бұрын
That was interesting video comparing first- and second-order of polynomial for linear models, I really liked it. Although I am dealing with a mixed model right now and need to do the same comparison for the fist and second order of polynomial for it, and this does not work for me. Do you have some tutorial video for the mixed model as well? Thanks a lot.
@munsirali1896
6 жыл бұрын
Hello Respected Professor Mike Marin, I really appreciate your great tutorials about R. I have watched all of your lectures and paying you more and more gratitude for this great helpful lectures series. And hope it will be continue in future. Wish you have Happy and healthy life. Thank you very much, stay blessed!
@gutzbenj
7 жыл бұрын
Very nice! Thank you :)
@marinstatlectures
7 жыл бұрын
you're welcome +Benjamin Gutzmann
@ahmet3592
2 жыл бұрын
Thank you for the nicely explained tutorial. I have a question regarding the Polynomial function. Why do we use the property raw=T in this case? I am currently trying to understand that the multicolinearity is a general problem in this situation since x and x^2 are correlated. The solution to this usually presented by defining raw=F. Therefore by considering only orhtogonal polynomials. But why would orthogonal polynomials only solve the problem of multicolinearity ? Im lost in this field. I hope you can help me out.
@ericdu8490
4 жыл бұрын
Thanks for the tutorial. But may I ask how to use poly() function in multivariable regression? :D
@sabasaghatchi3408
5 жыл бұрын
Hi Mike. Thanks for your great videos. By adding a polynomial predictor how does the interpretation of it change?
@marinstatlectures
5 жыл бұрын
Well, it becomes harder to interpret the effect of that variable, as the effect of the X variable is not being modelled using a polynomial, and so the effect is not linear....the effect of a 1-unit increased in X on Y is not the same everywhere. one way to provide an interpretation is to take a value for x, calculate Y than calculate the value of Y for x+1....to this for a few different values of x, and this will tell you the effect of a 1-unit increase in x, for specific values of X. that's one way to go, if you wanted to talk about the effect of a 1-unit increase in X on Y.
@irondia73
4 жыл бұрын
Hi Marin, so when polynomial terms are in the model, how do you interpret the coefficients in a valuable way? Intuitively, coefficient makes sense for just "height" but what about "height^2"? Or "height^3" and so forth... thanks!
@marinstatlectures
4 жыл бұрын
in this case, the coefficient doesn't have a simple interpretation...that is because the relationship between X and Y isnt assumed to be something simple like a line (which has a slope with a nice simple interpretation). for X and X^2, the change in y for a 1-unit change in X is NOT the same everywhere....and so you can't have a simple interpretation. if you want to interpret the model coefficients, then there are other options for addressing a non-linearity. one that works and maintains a simple interpretation is to "categorize" the numeric variable (to convert it from numeric into a set of categories). we have a separate video talking about the different ways to address a non-linearity, focussing on the concepts of it. im linking to that here in case you wanted to explore that: kzitem.info/news/bejne/1YWw3XmsaYKgjaA
@chaitrabellur8743
2 жыл бұрын
Thankyou!
@dessywl
5 жыл бұрын
Is polynomial regression same with polynomial orthogonal? Thanks!
@camila_braz
3 жыл бұрын
thank you!!
@diegovirano
6 жыл бұрын
thanks a lot for the videos... very helpful. I have a question and for the community as well: is there any way to automate the process for selecting the best regression model? instead of running comparing by hand the models. I have a scenario with a lot of variables.
@marinstatlectures
6 жыл бұрын
Hi Diego, there are, but i wouldn't fully recommend them. here's a brief summary of some of that. you can read more about things like *step-wise selection* or *all-subsets*. the key word for these is *automated model selection procedures*. the "stepwise" approach alternates steps (forward and backward) adding and removing variables until you hit a steady state where you can not add/remove any variables. this also requires specifying a "maximum model" (e.g.. will you consider one-way interactions, or two-way interactions, etc). the "all possible subsets" considers every possible model with every possible subset of variables, and chooses the one with the lowest AIC (or lowers BIC if you prefer that). the drawback of these is that they are purely automated and dont allow input from the user, and are sort of a "black-box" approach. as an example of what i mean by this, suppose you have a set of data for a bunch of school aged children, and one variable is "age" and another is "grade that they are in". these variables contain almost the exact same info, but are not exactly the same. the one that is selected into your model will be based mostly on chance...i myself would prefer to have some control over which would be in the model (i would personally chose age as i think this is more meaningful than "grade they are in"). automated procedures also allow for variable to be included/excluded based on chance correlations. by chance, some meaningless variables will always end up correlated with your "Y" variable, and automated procedures will end up including these. i prefer an approach where if i KNOW conceptual that a variable is correlated with Y then i will want to get that into my model, similarly if i KNOW something is not correlated with Y i want to exclude that. these automated procedures let "chance" take a lot of control over your model building and variable selection. MY PERSONAL STANCE is that automated selection procedures can be useful as an exploratory tool..to help discover which variable may/may-not be important, but i would always revise a model and variables in it from there, and i wouldn't let an algorithm choose my model...i will combine "what i already know" with "what the data is telling me". hope that's helpful...
@diegovirano
6 жыл бұрын
Hi. Thanks a lot for your answer. It is really helpful in fact. For sure many people will also get a benefit from this answer. I will look for more information related to the topic as suggested. I use it as exploratory tool as well, with a lot of variables sounds like a good idea. Thanks a lot again.
@swiss.girl.travels3301
8 жыл бұрын
Hi Mike, I just watched your playlist about regression models in R and it was very helpfull! By now, you worked with the lm() function in R, but there are so many others like glm() or lmer() and glmer() from the lme4 package. What are the difference between those models? Certainly it is somehow depending on your data, but how can I find out, which model I should use for my analysis? It would be very great if you may have a tip on what I should focus... Thank you in advance!
@marinstatlectures
8 жыл бұрын
Hi +Fa Fa , the others are for entirely different regression models. *lm* is to fit a linear regression (y/outcome is numeric, and assumed normal). *glm* is for generalized linear models, which are a whole class of models on their own. for example, logistic regression is a generalized linear model (for y/outcome that is binary, and assumed binomial), Poisson regression is a generalized linear model (for y/outcome that is a count or rate, and assumed to follow a poisson distribution). and there are many other GLMs. *lme* models are linear mixed effect models, and are often used for longitudinal data. each of these are very large topics on their own. in a traditional stats department, there are usually multiple courses offered on GLMs, a full course on longitudinal data analysis, and so forth. so, i can not do these justice in a few short paragraphs. the short answer of which to use for your analysis would depend mostly on the type of data (and more importantly, the type of outcome (y) variable that you are working with). the example i use in the videos is for an outcome/y that is lung capacity (which is numeric/continuous) and assumed to be normally distributed, so I'm using linear regression. i hope that helps clarify some things.
@anthonyalanis8119
3 жыл бұрын
I noticed that the summary output for the cubic model had large p-values for all the coefficients but the multiple R-square still seemed large, the residual error seemed low, and the overall F-statistic was large too, thus we would reject the null (all coefficients=0). QUESTION: What should we say about each coefficient since their individual p-values are so high?
@witsqafa
4 жыл бұрын
Hi, thanks your video is really useful for me. I have a question, one of my coefficient regresion's p-value is not rejected when I applied linear and non linear regression model, do you have any suggestions for my case? Thanks in advance!
@witsqafa
4 жыл бұрын
is that even matter if I keep on using the models?
@Dr_Finbar
Жыл бұрын
When do you use an orthogonal polynomiall rather than a raw poly?
@anaswahid8520
4 жыл бұрын
Sir In lungcap vs height First you should check the correlation coeff 'r' If r=0 then it means no linear relationship It means that you can now go for Polynomial regression But why you fitted a Polynomial regression Here r is not 0 Then why you moved to a Polynomial regression concept
@marinstatlectures
4 жыл бұрын
Hi, polynomial regression is not for when the correlation is 0, it is an option when there is a relationship, but not a linear one (maybe a bit of a curved/non-linear relationship). We have a video that talks a bit about this: kzitem.info/news/bejne/1YWw3XmsaYKgjaA
@nickfire2k376
4 жыл бұрын
thanks alot
@shudanhao8643
Жыл бұрын
Thanks. But the data we could download is different from the video you use.
@Teodorast
6 жыл бұрын
Is there any way to control for a variable inside the model? For example, controlling for age
@KhaLed-pb4pu
5 жыл бұрын
what about the F-statistics p-value (2.2e-16)... what is its significance or importance compared to other p-values of height and height^2? which one should we consider?
@marinstatlectures
5 жыл бұрын
to be honest, none of them are particularly enlightening. the F-stat p-value is testing overall significance of the model....that is somewhat helpful, but it is testing if ALL coefficients are 0...so essentially testing if your model is significantly better than just guessing the mean y-value for everyone (is it better than nothing). the p-values for height and height^2 can be misleading as those variables are correlated (as well as can be correlated with other variables in a model), and so their p-values can get inflated by this collinearity. the best way to test significance of variables is to compare models with/without a variable included. we have a separate video talking about this here: kzitem.info/news/bejne/qJWlyKamj2lhhIY
@khairulnizam7439
7 жыл бұрын
hello, hi Mike Martin. i have a question for you and hopefully you can help me to answer it. first, this method of polynomial regression in r applicable if i have 3 variables (2 independent variables, and 1 dependent variables). and how to develop it? the second one is all the data that possible to use this method or there is any way to verify the data if the data can use this method or not? hopefully you can help me Mike, Thanks you
@CaptainCalculus
3 жыл бұрын
I think R has changed the way it would treat it if you just put x^2 into the equation since this video was made
@wakjiratesfahun3682
3 жыл бұрын
please upload new tutorials regarding quadratic regression.
@nirmalpurohit4067
7 жыл бұрын
Hi Mike, I have seen somewhere that , we have to divide data into two groups. One for development and second for valiadation/testing. So is it necessary to validate model before presenting it to business peers. Please advice. Regards Nirmal
@marinstatlectures
7 жыл бұрын
Hi Nirmal, it depends on the reason for you fitting a model. if you are using the model to make predictions (a predictive model) then you probably want to do some sort of validation of the mode (to ensure that it does make good and reliable predictions). there are lots of packages in R to do different sorts of validation. key words to research are "cross validation", "leave one out validation", and when you search those topics you will come across different sorts of validation methods. cross validation is probably what you want to research the most. good luck!
@nirmalpurohit4067
7 жыл бұрын
Hi mike, Thanks for your time. i request you to make a video for the validation of the model. I know you would do in 5 minutes. Other would have take hours to explain same things. i hope you will look up to my request. Again Thanks for make us understand R easy and faster. Cheers from India. Regards, Nirmal
@oacho3
4 жыл бұрын
Hi Mike and youtubers, I need to plot two sigmoid curves to the same dataset one for control and one for treatment points. If I subset the x axis it gives me an error. If I do not subset it gives me one line only to fit all the points. Do you have any suggestion to solve this? Thank you!
@lv274
5 жыл бұрын
Doesn't the inclusion of Height^2 and Height^3 in the model cause multicollinearity? BTW you make excellent content, Thank You.
@marinstatlectures
5 жыл бұрын
thanks! yes, including X^2, X^3,... would introduce collinearity between the X, X^2, etc. this may or may not be an issue. first, let me mention that solutions to this are often to either "center the X variable" (i.e. include X and (entered-X)^2 ....this can help reduce the collinearity between the two. you can also use "orthogonal polynomials" to reduce this. collinearity between X and X^2 would only really serve to inflate the SE for the coefficients for X and for X^2 (while it wouldn't really affect their coefficients, and the shape of the mode fit), so it is not such a big issue in this sense.
@betzthomas9693
4 жыл бұрын
In polynomial regression do we take log of the Y value Eg:lm(log(Y)~poly(X,2,raw = T)) .
@werekorden
4 жыл бұрын
good question I am looking for something similar. I need to make a polynomial regression with log10 in x values with poly and I don't get it.
@alexanderbutler2565
6 жыл бұрын
Hi, I'm getting the error message "Error in xy.coords(x, y, setLab = FALSE) : 'x' and 'y' lengths differ" when trying to add the regression lines to the original plot. But my data is of the same length. Any advice?
@marinstatlectures
6 жыл бұрын
Hi, if you expand a bit on the exact commands you've entered, i may be able to figure out the issue. the issue is that one of the variables (X and Y) that you are trying to plot has more elements than the other..but without knowing the code you've entered I'm not able to figure out where you've made an error
@TKSGL89
8 жыл бұрын
Hi Mike Marin, I'm so sad you stopped making videos! I have a question for you and I hope you can help me (you may start a new series talking about it :)). How do I treat historical datas? I have daily datas for 200 years. I have to plot them all first and then plot only the maximum for each year. And how can I do if I have 365 days in some years and 366 in others? Hope you understand what I mean. Thank you in advance!
@TKSGL89
8 жыл бұрын
I checked the function ts() but I got trouble dealing with it (especially with frequency)
@marinstatlectures
8 жыл бұрын
Hi +TKSGL89 , thanks! we haven't actually stopped making videos...life has just gotten busy, and we've had to slow down a bit...but we plan on continually making videos for the foreseeable future! we've actually got a few different ones in the works, and a list of topics we want to cover that is WAY too long...there are so many cool topics that could be covered,...just no time!! so, that's time series data you've got there, so you'll want to be using time series methods (looks like you've started there, with the ts() function). i wont have time to make anything helpful for you anytime soon, but id suggest to search around for resources for time series in R. as for picking out the max for each year, there are different ways to do that, and some of it depends on exactly how your data is. but you should have the *variable* of interest, as well as a *year* variable. to find the max for each year, you would use something like *max(variable[year==2015])* , and this could be done for every year. and you can do this in more efficient ways (like using apply statements, or other ways) once you've coded it in a simple way. hope that helps get you started!
@TKSGL89
8 жыл бұрын
Great to hear you haven't stopped! Thank you for these explanations and for being so quick! I'll be waiting for next videos :) see you soon
@divyaakella8881
6 жыл бұрын
What if when we around 8 independent variables? How to determine x2 / x3 values?
@marinstatlectures
6 жыл бұрын
I'm not sure what you mean by this. if you try clarifying, i may be able to help
@balazsgonczy3564
5 жыл бұрын
Why is it good to have orthogonal polynoms? Is it needed in the modell?
@balazsgonczy3564
5 жыл бұрын
And yeah your link is disposed. Not avilable.
@marinstatlectures
5 жыл бұрын
hi, it isn't completely necessary, but what the does is it reduces the collinearity between the predictors in the model...because X and X^2 will be highly correlated, and thus their SEs will get inflated...orthogonal polynomials addresses this
@marinstatlectures
5 жыл бұрын
try this one: statslectures.com/r-scripts-datasets
@콘충이
4 жыл бұрын
wow wow
@tolisBFMV
5 жыл бұрын
I think there is something I am missing here. When running anova for the 2 models, I can get the null hypothesis (not significant difference) but what about the alternative? If there is not significant difference, then couldn't it be that the full model is worse? My question more clearly: With anova, these are always the conditions for the models? That the alternative is that the full model is better? Or could the alt hypothesis be that the full model is worse? Thank you.
@marinstatlectures
5 жыл бұрын
yes, the alternative is always that the full model is "better" (it has significantly less unexplained error). adding an unnecessary variable can never increase the SSE (the unexplained error). this test is testing if the larger model has significantly lower unexplained error (lower SSE).
@montserratbelinchon3341
6 жыл бұрын
Does anybody know what about if I get a nonsignificant p value for my predictor for the first order polynomial beta and the second order beta is significant?
@alexslappey2290
6 жыл бұрын
How do you decide what degree of polynomial you should go to?
@marinstatlectures
6 жыл бұрын
to do a formal test of polynomial terms, you can try comparing a model that uses just "X" to one that also includes "X^2", to test if the model with "X^2" is significantly "better". if it is, then you can compare the model with X, X^2 to a model with X, X^2, X^3 and test if that model is a significant improvement,..and continue until the model does not improve. to do this test you can use the "Partial F test" or "L:likelihood Ratio Test", we have a video showing that here: kzitem.info/news/bejne/qJWlyKamj2lhhIY you can also decide conceptually which you think makes sense and begin from there. i work in health research, and most of the time we dont want to go beyond X^2 or maybe up too X^3, as beyond that usually isn't realistic. (e.g.) some things have a sort of exponential growth, and including X^2 may be appropriate. at times including X^3 to allow for another inflection may be relevant...but past that, there aren't many things where you could justify conceptually a relationship up to the power of X^4. the most important part in model building is that your model is conceptually sound....and not rely purely on statistical testing...but make sure that your model also makes sense in context.
@____darrah____
4 жыл бұрын
What about the multivariate case?
@marinstatlectures
4 жыл бұрын
im a bit unclear what you are asking, but if iyou are asking how to include a polynomial term in a model that also has many X variables it would look like this: lm(y ~ X1 + I(X1^2) + X2 + X3 +...) the exact same as shown in this video, except including other variables in there as well. hope that answers your question
@SyedKollol
7 жыл бұрын
IS it possible to find VIF in R?
@marinstatlectures
7 жыл бұрын
yes, but it's not in 'base-R' (at least to my knowledge it's not). there are packages that you can install that can get you the VIF and other related things.
@semihardbagels
4 жыл бұрын
your script link doesnt work.
@marinstatlectures
4 жыл бұрын
Hi @Semihardbagels . it is fixed now. let us know if you had any trouble with accessing the files.
@OlympiaMazzei-t5t
Ай бұрын
Kilback Center
@LarryBackstrom-s7t
26 күн бұрын
Jacobson Run
@SharonAnderson-c8l
Ай бұрын
Walsh Turnpike
@hannukoistinen5329
Ай бұрын
Never studied statistics? This stuff is absolutely linear:). Not even outliers.