Негізгі бет eXtreme Gradient Boosting XGBoost Algorithm with R - Example in Easy Steps with One-Hot Encoding

Күн бұрын

eXtreme Gradient Boosting XGBoost Algorithm with R - Example in Easy Steps with One-Hot Encoding

Рет қаралды 58,886

Dr. Bharatendra Rai

1 1

Пікірлер: 272

@Wissro
Жыл бұрын
Best video on the internet on XGBoost, you just saved my paper. Thanks a lot :)
@bkrai
Жыл бұрын
You're welcome!
@nyasha767
Жыл бұрын
I agree 100% with you.
@abhinavmishra7786
6 жыл бұрын
I got a much higher level of clarity in the concept of xgboost model and parameter usage with this video. Thanks a lot Sir
@bkrai
6 жыл бұрын
Thanks for comments!
@Viewfrommassada
6 жыл бұрын
Thanksssss a lot Prof! You sent me the link to this video and it REALLY helps. But just as someone suggested in the comment, the parameters in the model are very KEY and a much detailed explanation of them and the algorithm as a whole will REALLY REALLY be APPRECIATED too. I am blessed to be a subscriber of your videos!
@bkrai
6 жыл бұрын
Thanks for your comments and suggestion!
@gavinwebster8737
5 жыл бұрын
Best clarity so far on XGBoost, it helped a lot in my final project and in learning more about this algorithm compared to GBM.
@bkrai
5 жыл бұрын
Thanks for comments!
@amaanraza2704
6 жыл бұрын
Hi Bharatendra, I derive a lot of value from your tutorial that strike the right balance between being simple yet very useful. Love them!
@bkrai
6 жыл бұрын
Thanks for your feedback and comments!
@kartikrayaprolu9076
4 жыл бұрын
Such an elaborate explanation. Please keep posting such videos. They will be very useful for the community. I've benefitted a lot from this video.
@bkrai
4 жыл бұрын
Thank you, I will
@mayoordhokia
4 жыл бұрын
After weeks of searching for videos on using XGB and predicting continuous variable, I could not find any decent videos... nor were any of them as well explained (and entertaining) as your videos. Please make one for the community? Best wishes from London, UK
@bkrai
4 жыл бұрын
Thanks for the suggestion and comments, I'm adding this to my list of future video.
@vijaypalmanit
6 жыл бұрын
Thank you so much, this is the video i have been looking for long, didn't find anything interested, you have explain everything in detail and its interesting too.
@bkrai
6 жыл бұрын
Thanks for comments!
@flamboyantperson5936
6 жыл бұрын
Respect you sir. The kind of knowledge you are sharing from Massachusetts is very very helpful. Thank you so very much Sir.
@bkrai
6 жыл бұрын
Thanks!
@shadrackbadia1158
Жыл бұрын
Very easy to follow, no errors in code, just great.🤓🙂
@bkrai
Жыл бұрын
Great to hear!
@k5555-b4f
7 жыл бұрын
we can also increase the range on the y axis by using the following lines plot(e$iter, e$train_mlogloss, col = "blue", type = "l", ylim = c(0, 1)) lines(e$iter, e$test_mlogloss, col = "green") legend("topright", legend = c("Training Error", "Testing Error"), lty=c(1,1), col = c("blue", "green")) but i guess for the purposes of this video not using the ylim parameter can be intentional and warranted. Thank you for the great video as always
@bkrai
7 жыл бұрын
thanks!
@happylearning-gp
2 жыл бұрын
Thank you for this tutorial. Awesome. Step by step explanations made things much easier to understand
@bkrai
2 жыл бұрын
You're very welcome!
@bkrai
2 жыл бұрын
You may also find this useful: kzitem.info/news/bejne/qKOhrqp6rGJ4em0
@happylearning-gp
2 жыл бұрын
@@bkrai Thank you very much
@happylearning-gp
2 жыл бұрын
@@bkrai Thank you very much when you find time kindly have a look at my channel on R. Everything is like a standalone application kzitem.info/rock/DmEAmoLuyE0h61aGpthGvA
@bkrai
2 жыл бұрын
You are welcome!
@anderswigren8277
6 жыл бұрын
You are an skillful tutor. Keep going on and Happy New Year!
@bkrai
6 жыл бұрын
Happy New Year 2018!
@prahladbhat9516
4 жыл бұрын
This helped so much on a classification project I am doing. Much thanks!
@bkrai
4 жыл бұрын
You're very welcome!
@tonyjames5763
3 жыл бұрын
@@bkrai ee
@bhavanabhardwaj5253
6 жыл бұрын
Hello Sir, can you please share an example where the response variable is continuous?
@tamaraabzhandadze2712
3 жыл бұрын
That was a very good tutorial! I wonder if and how we could use the cross validation for choosing the eta, gamma, iteration etc parameters. I would be happy to have any suggestions.
@faisalmohammed672
4 жыл бұрын
Thank you for the tutorial. Given that you have a binary target, I was wondering why you haven't used objective='binary:logistic' and eval_metric = 'logloss'. Is there a downside to using "multi:softprob" for a binary classification problem when it is typically used for multiclass classification where n>2. Appreciate if you could help clarify this.
@anigov
6 жыл бұрын
Thank you Sir for making it so easy
@MSS864
3 жыл бұрын
I am enjoying watching your videos starting from the simplest to more complicated ones! Thank you Dr. Rai for your great explanation. I have one question, though: When you divide the data into train and test data, you are using data[ind==1, ] and data[ind==2, ]; it is not clear to me how this magically works; however, what I see is data[x, y], where the only values that y can take are blank, and integers from 1 to 400, and the only values y can take are blank, and integer values from 1 to 4. Can you explain to me what is going on? Or, is there any thing that I am missing?
@bkrai
3 жыл бұрын
You can refer to this for explanation: kzitem.info/news/bejne/s3il0KVmfXNyhm0
@mecobio2
5 жыл бұрын
The code has room for improvement. For instance, in the splitting of the data, instead of using sample(), you can use createDataPartition() instead, in order to preserve the proportion of the categories in Y variable. The improments goes from 0.7066667 to 0.7375. Another improvement is to used, say, 10 fold cross validation instead, and used caret R-package with train()
@bkrai
5 жыл бұрын
Thanks for sharing!
@swathinandakumar415
3 жыл бұрын
Thank you ,Sir, for explaining the model so well. I am doing something similar with my data. How can I show the probabilities of predictors. (similar to the one in decision tree)
@upskillwithchetan
4 жыл бұрын
Thank you sir! awesome explanation skills with depth of algo
@bkrai
4 жыл бұрын
Thanks for your comments and finding it useful!
@tadessemelakuabegaz9615
2 жыл бұрын
I have seen your lecture on logistic regression and randomForest as well. They are awesome. Do we require cross-validation in these ML methods? I haven't observed any cross-validation step in your lecture on LR, RF, and xgboost.
@bkrai
2 жыл бұрын
I've split data in to train and test. But no harm in doing CV.
@angappanmaruthachalam3054
4 жыл бұрын
your explanation is awesome !
@bkrai
4 жыл бұрын
Thanks!
@SmartMrSteve
4 жыл бұрын
Thanks for the amazing tutorial for the XGboost!. I can't believe that you make every application of machine learning so easy. I really want your help figure out applying XGboost on time-to-event data. There are so limited resources in terms of XGboost using cox model. Do you have any suggestions? thanks
@bkrai
4 жыл бұрын
I don't have at his time, but have added it to my list.
@gowrikaruppusami7757
4 жыл бұрын
very excellent explanation lot of thanks i have one doubt it is possible to use image data specially satellite data
@bkrai
4 жыл бұрын
For image data deep learning is more effective. You can explore ‘deep learning’ playlist on this channel.
@harishnagpal21
4 жыл бұрын
Thanks for the model. A big help for me.
@bkrai
4 жыл бұрын
Thanks for comments!
@hans4223
5 жыл бұрын
Simply Awesome and excellent ..
@bkrai
5 жыл бұрын
Thanks for comments!
@sebastianvarela2190
5 жыл бұрын
Hi Sir, Let me ask you a question. In a binary classification context, How do you predict when it is not possible to know the values of the target or outcome variable in a forecasting scenario? I mean you need to forecast a result and have a new dataset without the response variable, that is, you dont know if a student will be admitted or not, but need to make a prediction/using xgboost. I tried to do this by setting in the "test set" (the new dataset without the response variable) an outcome variable with a fixed value -0 for instance- to be able to run the xgboost, however the prediction is pretty unaccurate. Thanks very much!
@flamboyantperson5936
6 жыл бұрын
Sir I have one request before you start working on data kindly give a 2-3 min introduction on topic it's application in real world scenario and why it's better than other technique so we can understand what exactly it is and where we can implement it. Thank you Sir.
@bkrai
6 жыл бұрын
Thanks for the suggestion!
@irbobable
6 жыл бұрын
Fantastic tutorial, thank you!
@ramp2011
5 жыл бұрын
Would you consider using caret and calling xboost there directly? Is there a benefit from using this direct method versus using caret? Thank you
@bkrai
5 жыл бұрын
That should also work fine. As long as we use the same method, model performance is not likely to be significantly different.
@tathagataghosh5390
Жыл бұрын
Sir, Can you please make a video on stacking model for different DL models. Thanks a lot for informative videos sir.
@evansumido6191
2 жыл бұрын
hi sir. what line of code will i add if i want to see the confusion matrix that will also display 95% CI and Test P-value? great lecture. thank you.
@tadessemelakuabegaz9615
2 жыл бұрын
I watched your lecture on These models but they all are classification models
@bkrai
2 жыл бұрын
You can refer to: kzitem.info/news/bejne/l4mD2J57sHiahI4
@manaspradhan2166
6 жыл бұрын
Thank you sir, This is very helpful
@bkrai
6 жыл бұрын
thanks!
@Didanihaaaa
6 жыл бұрын
First I should appreciate for providing such helpful educational channel. Thanks a lot Sir. Kindly I have a question regards factor parameter. Should I turn all integer values to Factor? cuz I got an error that " xgb.DMatrix(data = as.matrix(train), label = train_label) : REAL() can only be applied to a 'numeric', not a 'integer'"? Could you please explain how did you choose the rank column to turn into the Factor and matrix variable? Best Regards,
@bkrai
6 жыл бұрын
I used rank as an example for dealing with factor variables. In your dataset if you have any factor variable, you can handle it in a similar manner.
@abhibhavsharma8706
4 жыл бұрын
Thankyou Sir, Please also give a guidance about how to install the package LightGBM in R and its uses
@bkrai
4 жыл бұрын
Thanks, I've added it to my list.
@missakboyajian6446
6 жыл бұрын
Hi Thanks for the video. I have a problem I think. When I do feature importance I am getting the target column also with it. My target column is 'dismissed' and I put it the first column. This is how i am loading it. train
@bkrai
6 жыл бұрын
I think lines 3 to 6 is not needed.
@ConsuelaPlaysRS
6 жыл бұрын
Thank you! I wish you would use caret more, though.
@bkrai
6 жыл бұрын
Thanks for the suggestion!
@sebastianvarela2190
5 жыл бұрын
Hi Sir, your videos are great. Let me ask you question: I have read that it is possible to implement survival analysis (cox regression) with the XGboost package, indicating "survival:cox" as the learning task parameter. I haven't found any tutorial on this issue. Do you know if it is necesary to make an extra work? for example to specify the time variable in someplace else? Thanks in advance.
@anshagarwal7020
4 жыл бұрын
Thank you for the tutorial..Really helped in understanding. I have a question why can't we do dummy encoding for categorical variables in xg boost??
@bkrai
4 жыл бұрын
You may try. It should work fine.
@jojo23srb
6 жыл бұрын
Q: what's stopping someone from just changing all their variables to numeric types and skipping over the one-hot encoding process altogether? Does it hurt the prediction?
@bkrai
6 жыл бұрын
I would suggest try both and compare results.
@黄聪-x4d
6 жыл бұрын
thank you for your sharing.
@akd9977
6 жыл бұрын
Thank you for explaining clearly. If I have five character indpendent variable in the dataframe and I don't want to drop it, How can I proceed with this concept. It means how the character would be converted to numeric data
@bkrai
6 жыл бұрын
You can do one-hot encoding as shown in the video.
@OrcaChess
6 жыл бұрын
Hello Bharatendra Rai, did you make a video about setting up a feature selection in R? It would be very useful for the case if you have lots of features / inputs and you want to find out which of these features are relevant to determine a feature subset for the classifier. Kind regards Jonathan
@bkrai
6 жыл бұрын
I'll be doing one in August.
@OrcaChess
6 жыл бұрын
Bharatendra Rai Looking forward to it! 👍 Thank You for your Deep and to the point Data Science tutorials - I recommend it in Karlsruhe every student who wants to run ML models in R.
@OrcaChess
6 жыл бұрын
Bharatendra Rai Looking forward to it! 👍 Thank You for your Deep and to the point Data Science tutorials - I recommend it in Karlsruhe every student who wants to run ML models in R.
@OrcaChess
6 жыл бұрын
Bharatendra Rai Looking forward to it! 👍 Thank You for your Deep and to the point Data Science tutorials - I recommend it in Karlsruhe every student who wants to run ML models in R.
@bkrai
6 жыл бұрын
Thanks for your comments and recommendations!
@gurgenhovakimyan329
4 жыл бұрын
Thank you very much. You helped me a lot.
@bkrai
4 жыл бұрын
Thanks for comments!
@deannanuboshi1387
2 жыл бұрын
Great video! Do you know how to get confidence or prediction interval for xgboost in r? Thanks
@bkrai
2 жыл бұрын
You can get more details here: kzitem.info/news/bejne/yXmCsYGfk3SFpYo
@zacs7971
4 жыл бұрын
Hello Professor, thank you for this video. I'm receiving this error after attempting to assign the same line of code you have in line 22. Any ideas on how to resolve? Error in setinfo.xgb.DMatrix(dmat, names(p), p[[1]]) : The length of labels must equal to the number of rows in the input data
@bkrai
4 жыл бұрын
Following provides some clue "length of labels must equal to the number of rows in the input data".
@tadessemelakuabegaz9615
2 жыл бұрын
Hi Rai. Hope everything is going good. I am currently working on an ML algorithm with a continuous outcome variable. I am new to a regression model. I want to develop randomForest and XGBoost regression. Can I ask for any reference video and codes related to a regression algorithm using RnadomForest and XGBoost
@bkrai
2 жыл бұрын
Refer to: kzitem.info/news/bejne/yXmCsYGfk3SFpYo
@vishwajitsen1434
5 жыл бұрын
Can you please upload videos LSTM in Keras in R for numerical categorical and multiclass outcomes....it would be really great
@bkrai
5 жыл бұрын
Thanks for the suggestion! It's on my list for future videos.
@harishnagpal21
5 жыл бұрын
Thanks for the video. In what scenario we should use eXtreme Gradient Boosting!
@bkrai
5 жыл бұрын
You can use it for better accuracy and faster run compared to many other methods.
@harishnagpal21
5 жыл бұрын
thanks a lot :)
@OrcaChess
6 жыл бұрын
Thank you so much for your instructive and insightful tutorial! I've one question: Do I only need one hot encoding for my inputs / features? What about the outputs, is xgboost able to forecast a categorical variable as a label? Or should I make one hot encoding for my labels as well? Kind regards Jonathan
@bkrai
6 жыл бұрын
For XGBoost, response variable also needs to be numeric. In the example that I used, admit is a factor variable but since it has two values 0 and 1 in numeric form, we didn't do anything. For further explanation about variables, you can also refer to this link: cran.r-project.org/web/packages/xgboost/vignettes/discoverYourData.html
@OrcaChess
6 жыл бұрын
Thank you very much for your explanations and the link! What is in your opinion in multi class cases more suitable - Suppose we have one categorical variable with 10 classes (0 to 9) every number is a class : What do you think is better? 1. Make one model to forecast this categorical variable -> getting 10 different probabilities which sum up to 1. 2. Make 10 different models which forecast for each of the 10 classes yes or no (0 an 1). In the end we take the model with the highest probability for the yes-case as the forecast Thanks in advance Jonathan
@harishnagpal21
4 жыл бұрын
I have on query. Here in this example we are aware about response variable in test set as we have divided actual data into 80/20. But in actual life like in Kaggle competitions we need to predict on Test set given by Kaggle where we need to predict on Response variable. So how that will fit into above code. ie how to do prediction on actual Test set in xgboost. Thanks in advance.
@bkrai
4 жыл бұрын
This code will not change much. But you will definitely have to make some adjustments before you can correctly submit your file on Kaggle. You can refer to this example: kzitem.info/news/bejne/laKak46cq3WUY6Q
@tadessemelakuabegaz9615
2 жыл бұрын
Hi Rai. Great job. I have one question. How can we construc ROC&ACU for the XGBOOST model
@bkrai
2 жыл бұрын
See if this help. It has more detailed coverage: kzitem.info/news/bejne/x6qgtKmGpIKCdWk
@tadessemelakuabegaz9615
2 жыл бұрын
@@bkrai Thank you so much
@bkrai
2 жыл бұрын
You are welcome!
@shinuignatious308
6 жыл бұрын
Thank you so much sir for your in-depth tutorials. Sir could u please post github link for the code as well.?
@bkrai
6 жыл бұрын
Link to the code is in the description area below the video.
@bkrai
5 жыл бұрын
Link ti GitHub: github.com/bkrai/Top-10-Machine-Learning-Methods-With-R
@navdeepagrawal7819
2 жыл бұрын
Sir, how we can optimize hyperparameters in the case of xgboost algo?
@bkrai
2 жыл бұрын
Refrr to this: kzitem.info/news/bejne/qKOhrqp6rGJ4em0
@ft753
4 жыл бұрын
Thanks very much for this tutorial - definitely made things easier to understand. I have a question regarding "objective" = "multi:softprob" in the parameter section. The admission problem in the example deals with a logistic problem, right? So why should we use multi:softprob instead of binary:logistic? If I try the model with this binary:logistic input my models fails. Would be great if you could help me out on when to use what objective! Thanks.
@bkrai
4 жыл бұрын
Multi works for 2 or more levels.
@adarsha1981
6 жыл бұрын
Hi Bharatendra, nice and veryuseful video.. i have a question.. in my case i have around 4.5 lacks observations and 250 features.. am trying to run XGBoost, its taking some time, thats ok. but not able to remove the XG boost... Note: my data is highly class imbalanced where 0's 75% and 1's 25%.. do you suggest to use XGBoost here? thanks !
@bkrai
6 жыл бұрын
I would suggest take care of class imbalance problem (CIP) before running XGBoost. It will improve accuracy significantly. Here is the link for CIP: kzitem.info/news/bejne/qaVosaCss5yWmpw
@supriyashinde5128
4 жыл бұрын
Thank you so much for the tutorial. I have a question How to plot ROC and AUC curve on the same data set. Can you provide the code for ROC and AUC curve.
@bkrai
4 жыл бұрын
Here is the link: kzitem.info/news/bejne/2qaFl3iGfn2KeaQ
@eliecerecology
5 жыл бұрын
Thanks for the video. I have a question why did not you use objective" = "binary:logistic"?
@bkrai
5 жыл бұрын
Yes, that should be more appropriate.
@popezee2029
6 жыл бұрын
Thanks for the instructive video Sir. I am using a test set that does not contain the dependent variable row because i am supposed to predict that column in a regression problem. How should i edit the script for test_label and watchlist? Thank you.
@bkrai
6 жыл бұрын
You can try this: new_matrix
@deepakpanigrahi9601
4 жыл бұрын
Hi Sir, Can we create a confusion matrix here instead of a table, I am not able to create one. Could you please guide.
@bkrai
4 жыл бұрын
It includes confusion matrix too.
@gabrielidalino1102
4 жыл бұрын
If you wanna use the CONFUSION MATRIX FUNCTION here is the code: p % data.frame() %>% mutate(label = test_label, max_prob = max.col(., "last")-1) CM = table(Prediction = pred$max_prob, Actual = pred$label) library(caret) confusionMatrix(CM)
@OrcaChess
6 жыл бұрын
Hello, is it possible to change the cutoff of the XGB-Model prediction? In my model evaluation phase I got the case where the AUC in my ROC curve of a model to another model was higher despite of a clearly worse confusion matrix and accuracy. My guess is that this could be a cutoff issue. Kind regards Jonathan
@bkrai
6 жыл бұрын
ROC curve already makes use of various cutoffs to draw the curve. With one cutoff value we will just get one point and not a curve. Looking at two curves can give you better idea about the reasons behind AUC difference.
@musasall5740
6 жыл бұрын
Excellent!
@chaitanyakmr
5 жыл бұрын
thanks a lot for the explanation.
@bkrai
5 жыл бұрын
Thanks for comments!
@foram224
5 жыл бұрын
I have one question, if you have created sparse matrix for train and test set then why are you using as.matrix for trainm in xgb.DMatrix? sparse matrix is also you can directly use. I am confused in xgb.DMatrix and before the step which is sparce.model.matrix. Another question I have, what if your responce variable is in position of 43 not 1 then still you have to use -1 in sparse matrix.? Thanks you so much for video its really nice but I have just questions depends on my dataset. Hopping for your reply. thanks.
@bkrai
5 жыл бұрын
For the 1st question, I would suggest try and see if it works. If it works then you are fine. I didn't fully understand 2nd question. Are you referring to code line 43?
@foram224
5 жыл бұрын
@@bkrai I appreciate your reply. For my data set if I use as.matrix on sparce.model.matrix than it was giving me an error. So, I am better using only sparce.model.matrix varibale directly in xgb.DMatrix. That is all clear now. you are getting mlogloss but I was getting merror. I used same parameters as yours.
@jojo23srb
6 жыл бұрын
Thanks for the video! A quick question though: What's the motivation behind the 'prob' vector in 'ind
@bkrai
6 жыл бұрын
prob is the probability. For more details about data partitioning, you can look at this link: kzitem.info/news/bejne/wolntWx7onl9l5w Also date variables are handled differently. Probably I'll do a video about it later.
@nguyenphananhhuy416
6 жыл бұрын
Is there any example of using XGboost to make prediction ? It seems that this video is for the classification case.
@Viewfrommassada
6 жыл бұрын
Hi Prof., I have come again with a question since I am learning a lot with your videos. Could you please explain very well the 'eta' parameter in xgboost and also I want to report the AUC metric in my xgboost model and I need your guidance. I have seen examples on google but I get error when i try. I am making a presentation on xgboost soon. Your help will be appreciated.
@bkrai
6 жыл бұрын
eta is the learning rate. When is is high, computation is faster, but you may miss the optimum. When it is low, computation is slower, but there is a better chance of hitting the optimum. Depending on the data size and problem, we try various values to explore what is best for a given problem. For AUC you can try this: kzitem.info/news/bejne/2qaFl3iGfn2KeaQ
@sanjayursal5330
4 жыл бұрын
If I construct a model with Rank as numeric the important graph gives it the highest important ???? While in the case of factorization the Gre and Gpa are shown as important. So the question then is by changing a numeric to factor is its influence/importance diminished in the model.
@bkrai
4 жыл бұрын
It is more important to use correct format. Since 'rank' is not a numeric or continuous variable, it should not be treated like that.
@liwenling1287
2 жыл бұрын
Thanks Rai for your help tutorial! It really helps me to understand and do XGBoost in R. Here I have a question, if I want to do with the regression problem, can I use the same code? or any parameter should I modify? Hope to hear from you soon.
@bkrai
2 жыл бұрын
You can see an example here: kzitem.info/news/bejne/yXmCsYGfk3SFpYo You can also get some practice by doing this competition: kzitem.info/news/bejne/paRmmGyeqomfiHY
@liwenling1287
2 жыл бұрын
@@bkrai , really helpful! Thanks again for your detail tutorial. Wish you all the best!
@bkrai
2 жыл бұрын
You are welcome!
@tadessemelakuabegaz9615
2 жыл бұрын
Dear Rai, I hope you doing well. I have 1 question. I am doing a machine learning model using the RandomForest and XGBoost algorithms. My data is a survey of samples derived from a large population. My data has a sampling weight which is the number of individuals in the population each respondent in the sample is representing. How can I apply this sampling weight in my ML model? The data also contains strata and clusters. Do I have to keep the sampling weight, strata, and cluster variables with my features?
@phuongk.kttp-mtnguyenkieul2761
5 жыл бұрын
Thank you for your valuable video. I have a question in bst_model step, it is not work. My data has number of class is 122. When I run, R result displays error: label must be in [0, num_class). I try so many nrounds value in range 0 and 122, but haven't worked. Hope to get your response. Many thanks!
@bkrai
5 жыл бұрын
I think 122 is too many classes. make sure you have enough data for each class otherwise there could be issues.
@phuongk.kttp-mtnguyenkieul2761
5 жыл бұрын
@@bkrai Do you have any solution to handle, Dr.?
@bkrai
5 жыл бұрын
Difficult to say much without looking at data
@adarsha1981
6 жыл бұрын
Hi Bharatendra.. i tried searching Bagging/Boosting and SMOTE videos from your playlist.. aren't they out yet? if not yet , waiting to see them :)..
@bkrai
6 жыл бұрын
Not yet.
@AnantaPradhan
5 жыл бұрын
Dear Professor, how can we find the weight of each independent variables?
@bkrai
5 жыл бұрын
You can get it from feature importance.
@saipri
3 жыл бұрын
Is there a video for checking the model using chi-square?
@send2milan
5 жыл бұрын
Sir, please create a video on catboost using catboost package (not catboost.caret). There are few examples in KZitem.
@bkrai
5 жыл бұрын
Thanks for the suggestion, I've added it to my list.
@send2milan
5 жыл бұрын
Thank you sir. And please highlight it's parameters explanation
@nithinmamidala
6 жыл бұрын
Please give an explanation about the algorithm so that its helpful to understand much better
@sheeqariff7974
6 жыл бұрын
Hi sir. Your video is very good and easy to understand. I have one question. What is the classifier algorithm used in the xgboost package for classification case? I had read some info in other website that the package includes "tree learning algorithms". Is it decision tree algorithm? thank you in advance for your clarification.
@kartikrayaprolu9076
4 жыл бұрын
Hi Sir, Why have you used "-1" in the sparse.model.matrix" function? Does it specify that the "first column" is not to be included or does it not include only one column i.e. the "response" variable?
@dhavalpatel1843
4 жыл бұрын
No. of classes are 2 so If we put -1 those classes will become 0 and 1 because in this case 0 is for not admitted and 1 is admitted
@bkrai
4 жыл бұрын
Thanks for the update!
@bkrai
4 жыл бұрын
here is an update: “-1” removes an extra column which this command creates as the first column.
@haanda47
6 жыл бұрын
Sir, can you please upload an video for Adaptive boosting in R. Thanks in Advance.
@bkrai
6 жыл бұрын
Thanks for the suggestion, I've added it to my list.
@TheLoggic
6 жыл бұрын
Cheers Amazing Video Mate!
@hilaav7449
5 жыл бұрын
Thank you it was very helpful!!
@bkrai
5 жыл бұрын
Thanks for comments!
@abiani007
4 жыл бұрын
how can i use this for regression? where to make changes? Plz confirm
@bkrai
4 жыл бұрын
Yes it can be done. You can refer to this: www.datatechnotes.com/2020/08/regression-example-with-xgboost-in-r.html
@happylearning-gp
2 жыл бұрын
I have a basic question. in logistic regression using lm function, we get model with predictors considered in that. but here, I don't know which are the predictors considered in the bst_model. could you please guide me to extract those predictors from the bst_model. Thank you very much
@mateuszbielik2912
2 жыл бұрын
Hello, great tutorial, helped me a lot. I got to the point when I see the error plot and both train and test data have exactly the same lines. One on top of another. Plus when I want to find the minimum iteration for test_mlogloss i am getting this message: [1] iter train_mlogloss test_mlogloss (or 0-length row.names) What can be the reason? :/
@mateuszbielik2912
2 жыл бұрын
> min(e$test_mlogloss) [1] 0.00657068 > e[e$test_mlogloss == 0.00657068,] [1] iter train_mlogloss test_mlogloss (or 0-length row.names)
@meghalgandhi4357
6 жыл бұрын
Thanks for the great explanation. I set nc = length(unique(train_label)), which is 2. Still i'm getting this error while training.... Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj) : [16:37:40] amalgamation/../src/objective/multiclass_obj.cc:78: Check failed: label_error >= 0 && label_error < nclass SoftmaxMultiClassObj: label must be in [0, num_class), num_class=2 but found 2 in label.
@bkrai
6 жыл бұрын
Difficult to say where the error is from this. You need to look at earlier part of codes.
@meghalgandhi4357
6 жыл бұрын
Thanks for your response. I'm following your each and every step. In my dataset, I have 9 variables, out of which 8 is independent ones. And Like your data, I have also first column to predict. In short, everything same, still I am getting error. I don't know why. As of now, I left this approach, I am using XGBoosting from Caret package. But I am really looking forward to work on xgboost package and resolve these issue.
@hmachira1
6 жыл бұрын
Thank you so much
@rachelfan4664
5 жыл бұрын
Hi Rai, my test data doesn't have response variables, I need to predict them. What should I do with all the test_matrix stuff?
@bkrai
5 жыл бұрын
You can artificially create it and fill with zeros.
@rachelfan4664
5 жыл бұрын
Bharatendra Rai thanks sir, will try
@jairjuliocc
6 жыл бұрын
Very useful , thank you!
@bkrai
6 жыл бұрын
Thanks for comments!
@datascience8272
6 жыл бұрын
Hello Sir, In a real scenario, where we have a separate test data with no dependent variable. How will the sparse.matrix.model work?
@prabhanjantattar32
6 жыл бұрын
Isn't it better to run sparse.model.matrix on the entire dataset first and then partition them in train and test. That way, new factor levels won't be there in test datasets, isn't it? Or I am missing something?
@bkrai
6 жыл бұрын
Probably that should work fine too. No harm in trying.
@utkarshprajapati9876
6 жыл бұрын
Hi Sir, nice and very useful video sir I want to ask when I use XGBoost algorithm then I do not need to use linear and logistic regression?
@utkarshprajapati9876
6 жыл бұрын
I want to use XGBoost algorithm in this problem. www.kaggle.com/c/house-prices-advanced-regression-techniques
@bkrai
6 жыл бұрын
It's better to try more methods and then see wgich one performs better.
@utkarshprajapati9876
6 жыл бұрын
@@bkrai okay sir thanks.
@utkarshprajapati9876
6 жыл бұрын
@@bkrai Sir u r really great man.
@sunilbobb
6 жыл бұрын
Hi Sir, Can you please teach how to use catboost and its significance
@bkrai
6 жыл бұрын
Thanks for the suggestion, I've added this to my list.
@upskillwithchetan
4 жыл бұрын
Hi Sir, I have confusion @4:18 you have mentioned that put -1 because "Admit" is first column in dataset but according to this blog www.analyticsvidhya.com/blog/2016/01/xgboost-algorithm-easy-steps/ - “-1” removes an extra column which this command creates as the first column. please confirm
@bkrai
4 жыл бұрын
You are right. Once 'admit' is there before ~ symbol, it is automatically out.
@AbhijeetSingh-xd9wz
5 жыл бұрын
Sir, while running the model while asigning train_matrix - i get an error Setinfo.xgb.DMatrix(dmat,names(p), p[[1]] : the length of lables must be equal to the number of rows in the input data
@AbhijeetSingh-xd9wz
5 жыл бұрын
Could you please help me out with this error ?
@bkrai
5 жыл бұрын
It says label and input data do not have equal length. Please check your data.
@SaranathenArun11E214
6 жыл бұрын
sir, any videos on bootstrap aggregation, stacking. pls help
@bkrai
6 жыл бұрын
Thanks for suggestion, I've added this to my list.
@SaranathenArun11E214
6 жыл бұрын
thanks so much sir and appreciate your help
@Viewfrommassada
6 жыл бұрын
Also Prof Rai, I am building an Ensemble model of Random Forest and Xgboost with R. My response variable has 2 levels 'Low' and 'High'. The response variable's scale is a factor in R. Without converting these '0's and '1's, can I build the model? Also, some of my predictor variables have levels A, B, C, D and E and their scales as detected my R are factors. Do I have to convert these to Zeros and Ones numbers even though they are factors before I use them?
@bkrai
6 жыл бұрын
When you use random forest, you do not need to convert categorical independent or dependent variable to numeric. But you definitely need numeric variable when using xgboost.
@Viewfrommassada
6 жыл бұрын
Your explanation helped a lot. Thanks. I am building an ensemble of Random Forest and Xgboost on a classification problem. I have imbalanced data so used your video to balance ONLY my training data. (I hope that's all that I need to do in terms of the balancing?). After balancing, I applied your One-Hot encoding tutorial on both my balanced Train data and my unbalanced Test data. My Xgboost is running well though I am yet to test it. BUT the problem is the Random Forest. When i pass the data through the RF I get the error message below:::: Error in t.default(x) : argument is not a matrix In addition: Warning messages: 1: In randomForest.default(x, y, mtry = mtryStart, ntree = ntreeTry, : The response has five or fewer unique values. Are you sure you want to do regression? 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'externalptr' What could be the solution to it? Your help is greatly be appreciated, Prof Rai!