this is the first video of yours I have come across, and it's by far the best I have found on this topic. Will be binging everything you have to offer from now on. Thanks for all the content, man!
@nazmuzzamankhan4764
3 жыл бұрын
I really liked the way you explained the steps with numbers. It helped me a lot to understand the notations of the equations.
@SebastianRaschka
3 жыл бұрын
glad to hear that it was useful!
@yerhoam
11 ай бұрын
Thank you for the great explanation ! I liked the way you say "prediction" :)
@hassandanamazraeh5975
2 жыл бұрын
A great course. Thank you very much.
@SebastianRaschka
2 жыл бұрын
Thanks for the kind words! Glad to hear it was useful!
@rohitgarg776
2 жыл бұрын
Thanks, explained very nicely
@newbie8051
Жыл бұрын
Well I understood the Gradient Boosting part, as in we focus on the residuals and further make trees to lower the loss of previously_made_trees. But couldn't grasp how XGBoost achieves this via parallel computations. Guess will have to read the paper : )
@urthogie
8 ай бұрын
Why does the tree in step 2 not have a third decision node to split Waunake and Lansing?
@asdf_600
2 жыл бұрын
Very nice video :) I was wondering why for gradient boosting we fit the derivative instead of the residual ? Intuitively that's what I would do :/
@SebastianRaschka
2 жыл бұрын
Good question. If we consider the squared error loss, "1/2(yhat-y)^2" we have "yhat-y" as the derivative but it is also what people refer to as residual in a linear regression context. Or in other words, the derivative looks like the residuals, so we basically do fit it to the derivative. If the loss is not the squared error loss, the derivative may be different, so we call it "pseudo residual" in general. However, we could also just be calling it loss derivative and don't use the term pseudo residual at all. I think it's just a convention in gradient boosting contexts to use the term pseudo residual.
@just4onecomment
3 жыл бұрын
Hi Professor, thank you very much for the educational video! Do you have any thoughts on how this stepwise additive model compares to fitting a very large model with many parameters in a "stepwise" fashion based on gradient descent? For example, freezing and additively training subnetworks of a neural model.
@SebastianRaschka
3 жыл бұрын
Interesting question. There's something called layerwise pre-training in the context of neural networks. It's basically somewhat similar to what you describe, training one layer at a time. The difference is really the structure of the model though, because it's fully connected layers rather than tree-based. But yeah, it's an interesting thought
@muhammadlabib3744
Жыл бұрын
i still wondering in minutes 13.19, why you choose age >= 30 as a root node? is that from residual or else?
Пікірлер: 19