This is was super insightful. Would love more stuff like this!
@SeattleDataGuy
6 ай бұрын
glad you enjoyed it!
@hansmandler7284
6 ай бұрын
Yeah, That's what I literally did last weekend:) Good to see that the professionals do it the same way I did it.
@SeattleDataGuy
6 ай бұрын
What were you doing? Reading from an S3 bucket
@ansonnn_
6 ай бұрын
Thanks for the amazing video again as always. We are using Athena as our main "engine" (not sure if that's the right term) to directly connect with Apache Superset for our dashboarding purposes. Our datasets are mostly in Hudi format and very few in parquet format. We are always querying our datasets from S3 using PySpark. I don't think using another huge data warehouse solution like Snowflake or BigQuery makes sense. Or are we missing out something crucial here? Just some thoughts...
@PrinciplesOrDie
6 ай бұрын
You could've used Glue - Crawler to create the tables faster you can just alter the DDL code in Athena later if you didn't like the way it was put together
@SeattleDataGuy
6 ай бұрын
100%! I just wanted to go through the CSV S3 bucket option this time. But I am planning to go over AWS Glue and some of the various glue concepts(the etl, catalog, etc) in the future video. This is meant to be a series so I am trying to only add so much per video.
@SeattleDataGuy
6 ай бұрын
If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/ or join the discord here discord.gg/2yRJq7Eg3k
Пікірлер: 18