DuckDB & Python | End-To-End Data Engineering Project (1/3)

Рет қаралды 17,935

MotherDuck

Жүктеу

Пікірлер: 39

@motherduckdb
7 ай бұрын
Part 2 of the serie is here 👀 kzitem.info/news/bejne/tKacq4WHkXR4gIY
@d.-m.-c
8 ай бұрын
Airflow/dagster demo would be great!
@motherduckdb
8 ай бұрын
Noted!!
@hendrikschmidt8485
8 ай бұрын
Yes please, also hosting it locally vs. remote + triggering the data ingestion (python skript) + dbt job :)
@JornadaDeDados
8 ай бұрын
Incredible content, I'm a content creator in Brazil and everyone here loves your work! We hosted a workshop in December on Duckdb, and many people are now using it.
@motherduckdb
8 ай бұрын
Amazing! Feel free to reach out to events@motherduck.com if we can help in anyway regarding this!
@bharatchitnavis9592
26 күн бұрын
I am blown away by your first video in the series 🤯 So much packed in the 41 minutes long video 🤩I am sure this’ll help many aspiring data engineers around the world. One of the best tutorials I’ve seen… worth every penny, well, I’d pay for content like this. Definitely going to share with my entire team and anybody who will listen 🥇🎉💫
@motherduckdb
20 күн бұрын
Ow, I'm glad you enjoyed it, and thank you for all your kind words !!!
@JacobUkokobili
7 ай бұрын
I would like to see the end-to-end project that will include Data orchestration and Cloud deployment
7 ай бұрын
That was a great video. I spent the weekend seeking good practices about building pipelines, and everything came together with Duckdb and those videos. Thanks for it, and looking forward to the next videos.
@afroken2069
6 ай бұрын
Thank you so much for a great tutorial series. I'm looking forward to Part 3 - Dashboarding. I am also looking forward to a video on Python Runtime tools and data orchestration. Thank you again!
@very_sank_u1147
8 ай бұрын
This was amazing. I learnt so much from your structured process.
@motherduckdb
8 ай бұрын
Glad it was useful !
@denisgarden1
5 ай бұрын
It would be great if you made a video on Prefect and DuckDB
@motherduckdb
3 ай бұрын
We did one here : kzitem.info/news/bejne/p2-vrql3iGdhg4Ysi=nbNL7gy1g8wkeQ7z :)
@mohammedsafiahmed1639
7 ай бұрын
fantastic
@mladenmladenov6269
7 ай бұрын
Great video. Thank you for this. I have a question regarding validation and testing. Would the pandas data frame validation work by using pandera with the actual pydantic model?
@motherduckdb
6 ай бұрын
It should! I do like pandera and use it here and there. I didn't want to add it here in this tutorial to not increase the complexity. Raw pydantic with a python function do the job 👍 - Mehdi
@bartallarsson332
16 сағат бұрын
Dagster please!! :-)
@vedanthbaliga7686
7 ай бұрын
Thanks for the video tutorial. When will the next video release?
@motherduckdb
7 ай бұрын
Soonish! :) Thanks for watching!
@andy_ap
6 ай бұрын
@motherduckbd you can get schema from schema tab in tabular or json way and it will be copied into clipboard
@motherduckdb
8 ай бұрын
What would you like to see as part of this end-to-end project? Data orchestration? Cloud deployment ? Let us know!
@tsambruni
8 ай бұрын
What about Duckdb / Google Cloud Storage integration? Is it in the roadmap?
@motherduckdb
8 ай бұрын
It's actually already supported! duckdb.org/docs/guides/import/s3_import.html#google-cloud-storage-gcs
@venys1388
8 ай бұрын
The tutorial could explain how to use docker, it doesn't make much sense for those who don't use docker
@motherduckdb
8 ай бұрын
Thanks for the feedback! The setup with devcontainer is optional; any Python environment would work! The video is quite long already, so spending time on previous knowledge for this goes a bit outside of the initial scope of the ingestion pipeline. But we could do a dedicated tutorial for that!
@rcamis
8 ай бұрын
Nice
@mohammedsafiahmed1639
7 ай бұрын
what extension enabled that crazy auto completions in the beginnng?
@motherduckdb
7 ай бұрын
Github Copilot!
@coding3438
7 ай бұрын
@@motherduckdbI was hoping it was a free tool! But I had my doubts! Thanks for the reply! Looking forward to the remaining videos and much more.
@bilalnaseem94
3 ай бұрын
How did you set up dev containers?
@motherduckdb
3 ай бұрын
1) If you meant defining the .devcontainer.json, VSCode provides templates out of the box. When you select in the command pallet (Cmd+shift P on Mac) : Reopen in Container > Add configuration ... > You would be prompt with different templates (Python/Nodejs, etc) that you can easily customized. 2) If you meant just running these, you need docker for desktop installed and then use the command pallet and select "Reopen in container" - this will detect the current .devcontainer.json configuration and open within the devcontainers
@bilalnaseem94
2 ай бұрын
@@motherduckdb Thanks. Also had another question. Why was there a need for models.py file and pydantic and on top of that do tests for that. I understand it is used to validate data types but can you please provide a more detailed explanation. Apologies if its a noob question. Also I really liked your approach of using Makefile and tests, most other tutorials dont do this. This is the best tutorial I have encountered uptil now. Please make more tutorials as there isnt a good data engineering course online. Please make a course or more tutorials would also be great.
@motherduckdb
2 ай бұрын
thanks for your kind words! We'll do our best! To answer your question, you want to validate that you define the right "model" for you data, hence the tests against the models defined in models.py. These tests validate that the model definition is correct and that it will throw an errors when the model doesn't match what we expect.
@Levy957
8 ай бұрын
pandas did all the work
@motherduckdb
8 ай бұрын
Well, actually, Arrow is the magic that makes it possible to retrieve data in pandas :) But we didn't do any compute. using pandas.
@bharatchitnavis9592
26 күн бұрын
If you were paying attention DuckDB rescued pandas’ ass