I don't believe that I have ever watched one of your videos that I didn't come away with some new nugget. Thanks, Julia!
@hesamseraj
Жыл бұрын
As always, thank you for such great screen cast.
@tofreddy
Жыл бұрын
I stumbled into your channel. Thank you for the teachable moment.
@carvalhoribeiro
Жыл бұрын
Very Very useful. Thank you so much Julia !
@djangoworldwide7925
Жыл бұрын
Hey.. rsample::validation_set does not exist anymore. As to 24-06-2023 we can use validation_split/time_split/group_validation_split. I had a feeling it was the validation_split anyway but i wonder, maybe i should use the dev version?
@CaribouDataScience
Жыл бұрын
Thanks, that was interesting!
@wilrivera2987
Жыл бұрын
Dream job . To work in Posit
@geralgariza7199
Жыл бұрын
nice work! well done!
@anselmekouame1913
Жыл бұрын
Hi Julia, how might a multicollinearity affect the machine learning model? If multicollinearity is found, should we remove variables that are highly correlated?
@JuliaSilge
Жыл бұрын
If you are using a linear model, correlated features can be a big problem! In cases like that, you would want to remove features that are highly correlated with other ones, or use something like PCA. Check out feature engineering approaches like these: recipes.tidymodels.org/reference/step_corr.html recipes.tidymodels.org/reference/step_pca.html Tree-based models tend to do OK with correlated features and it often doesn't really help to handle them in a special way. Just crank it on through the model!
@anselmekouame1913
Жыл бұрын
@@JuliaSilge thank you bunch.
@omoniyitemitope6113
6 ай бұрын
Hi, I have these data with 35 variables and want to run some regression(RF,xgboost, etc..) on it. I am new to R and want to know if you have any special online training that I can register for?
@JuliaSilge
6 ай бұрын
I recommend that you work through this: www.tidymodels.org/start/ And then take a look at this book: www.tmwr.org/ Good luck!
@omoniyitemitope6113
6 ай бұрын
Thanks so much for your response. I followed one of your screencasts and got rsq of 0.37 for the RF model, is/are there anything I can do to improve the fit of my model?@@JuliaSilge
@JuliaSilge
6 ай бұрын
@@omoniyitemitope6113This definitely depends on the specifics of your situation! I recommend that you check out a resource like *Tidy Modeling with R* for digging deeper on the model building process: www.tmwr.org/
@omoniyitemitope6113
6 ай бұрын
@@JuliaSilgeThanks for your response. I will go through it. I did something that I did not know the statistical implication. I took the log of my dependent variable and performed a RF, and to my surprise I got % var explained to be 99.74, this looks too good to be true to me
@danielhallriggins9008
5 ай бұрын
Thanks Julia, love your videos! To get a more accurate sense of performance, would it be helpful to use {spatialsample} to account for spatial autocorrelation?
@JuliaSilge
5 ай бұрын
That would be a great thing to do! This dataset doesn't have explicitly spatial information in it (just county FIPS code) so you would need to join some spatial info together with the original dataset.
@konormccracken
Жыл бұрын
Always grateful for these videos! Though the grating little economist in me screamed a bit when you discounted the fixed-effect of "county" here 🫥
@JuliaSilge
Жыл бұрын
Ah yep! The xgboost algorithm does not have the ability to incorporate fixed effects the way that a multilevel model does, say like those from multilevelmod: multilevelmod.tidymodels.org/ However, we could still use a resampling approach that takes into account how a given county is in this dataset a bunch of times, to avoid overly optimistic performance estimates. We'd want to switch out `initial_split()` for `group_initial_split()` and `validation_split()` for `group_validation_split()`: rsample.tidymodels.org/reference/validation_split.html
Пікірлер: 20