Links to related videos: Postgres installation: kzitem.info/news/bejne/x6CPz4uOgXtogKQ Jupyter installation: kzitem.info/news/bejne/o2Z9k2hqnaN5hYI&t Build ETL pipeline with Python: kzitem.info/news/bejne/xZyl26OecoOViKw Pandas DS tutorial: kzitem.info/news/bejne/jqCb2K6coWSThmU Pandas Exploratory DA: kzitem.info/news/bejne/2mqJmJuEm55qeJg
@rafaelg8238
5 ай бұрын
Nice video, congrats. I have an example in prod env. I iterate a sql file over 100 databases, collect the data, create a column with the name of each database, generate a datafram and then insert it back into postgres in a staging area. Depending on the complexity of the query and also the volume of data in the dataframe, it has taken a long time. I do this using Python. Could you create a video with real cases like these and showing advanced techniques to process as quickly as possible?
@BiInsightsInc
5 ай бұрын
Hi Rafael, the speed depends on the amount of the data and resources available. I suggest you explore a multi-threaded library. Try Polars instead of Pandas or Spark (set up required). Pandas implementation is inherently single-threaded. This means that only one of your CPU cores can be utilized at any given time. Polars is a multi-threaded query engine and is effective at processing tasks concurrently.
@mugumemalte8667
4 ай бұрын
Thanks a lot sir ! But What's the best way to create a data pipeline using mage with the goal of creating a dashboard that's getting data from API's Say mage,MySQL, metabase.is it really feasible?
@BiInsightsInc
4 ай бұрын
Anything is possible. Mage is a new data pipeline tool and it let's you extract data from API amonst other sources. You can use it's loader to load data to relational database such as MySQL. I'm coveringa similar tool dlt you can build similar pipelines with it. I will cover mage in the future. Anyways, here is a link to a sample mage ETL pipeline. docs.mage.ai/guides/load-api-data#4-export-data-to-duckdb
Пікірлер: 6