Realtime use case to extract, transform (Regular expressions, CSV), join data from Kaggle through the python API interface. Topics covered in this track are:
Install kaggle api
Create account in kaggle.com if you do not have one already
Download access token from kaggle website and save it as kaggle.json in ~.kaggle directory
Choose a data set and describe the use case for this data and jod down the steps that involves to achieve the same
Connect and authenticate Kaggle API
Pull data files from this datasets
Understand three data files
Let us extract year from the movie title
Regex simulator - regex101.com/r/5XjNqh/1
Find distinct number of genre from movies data
Join all the data files and create one data frame
Keeping ratings data as the primary data, let us add twitter details and movie details
Analyse data
- How many user accounts exists?
- How many movies doesn't have titles?
- For each movie, what is the count, min, max and average rating?
- Sort the above stats by count of ratings
Exercise:
- Convert rating_timestamp to date
- For each move, find dates when minimum rating was given and maximum rating
Happy Learning !!! Pawan Yaddanapudi
Негізгі бет Ғылым және технология Track 3 | Python Data Engineering | Extract & Transform data through Kaggle API
Пікірлер: 1