A machine learning algorithm may benefit from data that is "standardized". If one column in a dataframe has a completely different scale than the rest, it may cause an overfit. To mitigate this you may consider scaling your dataset. This video will give a full introduction on what this does to your pipeline.
00:00 Introduction
01:01 Making the problem harder
02:54 Pipelines
04:57 Scaling Data
11:33 Prediction Surface
To learn more about scikit-learn scaling algorithms, you may appreciate this guide:
scikit-learn.org/stable/auto_...
The code for all of our videos can be found on this Github repository:
github.com/probabl-ai/youtube...
The code for this specific episode can be found here:github.com/probabl-ai/youtube...
Негізгі бет Ғылым және технология Scaling Datasets in Pipelines
Пікірлер: 4