Best ever tutorial on AWS-Glue. Thank you very much :)
@CalceyTech
3 жыл бұрын
Glad it was helpful!
@inspiremindsetv
2 жыл бұрын
@@CalceyTech Do you have any courses on Coding or how to write clean code...I love your code
@CalceyTech
2 жыл бұрын
@@inspiremindsetv Thanks for the kind words! We are a software development services provider and maintaining high coding standards is obviously key. Here's good guide with some tips that will help. This is one of the books that we can recommend - www.amazon.com/dp/0132350882/ref=emc_b_5_t Also the articles/books and resources mentioned here are highly recommended to get a better understanding of best practices and industry standards related to enterprise software development - martinfowler.com/tags/clean%20code.html
@okelitsenyathi8515
4 жыл бұрын
Exactly what i was looking for. Thanks
@CalceyTech
3 жыл бұрын
Glad I could help
@shovan3112
4 жыл бұрын
Thank you. Nice detailed way of step by step. Liked it. Please upload more videos on aws glue & pyspark coding in it for various transformations...
@CalceyTech
3 жыл бұрын
Will upload soon
@adesojialu6208
Жыл бұрын
yes, i support
@racingnerd1425
4 жыл бұрын
Thank You!!! I get so importance information from this video.
@CalceyTech
3 жыл бұрын
Glad it was helpful!
@VamsiDhar-jt3lp
Жыл бұрын
very helpful
@philsongtg
3 жыл бұрын
Hi Manuka - This is a good video because normally no one touches the scripting part of it. Even the AWS documentation is missing that. So kudos to you for presenting the grey region. Also, I have been trying to emulate certain ETL use cases from Informatica to Glue and I am thinking if it makes sense to first create a dynamic dataframe from the glue catalog --> convert to dataframce --> do all the transformations like Trimming, date format conversion, creating new columns(with case statement logics) etc etc and then finally convert it back to dynamic frame and then write it back to the catalog table. does that sound reasonable?
@Warrior_praful
3 жыл бұрын
Awesome man, keep sharing keep learning
@CalceyTech
3 жыл бұрын
Thanks a ton
@shovan3112
4 жыл бұрын
Thank you very much for the nice explanation, loved it. Would like to request you one more detailed video of Glue ETL for SCD type 1 (merge upsert) type logic on s3 to S3
@hayekianman
4 жыл бұрын
checkout the aws-data-wrangler library released by aws github.com/awslabs/aws-data-wrangler many usecases here including upserts
@sushantdewulker
4 жыл бұрын
Hi thanks for the great video. I have learnt that recently AWS has given full setup to execute glue job locally. It would be of great help if you can make the same setup on visual studio code and create a video of the same. There no documentation or setup steps any where.
@CalceyTech
4 жыл бұрын
@sushant thanks for the information.We could definetly give a try. keep updated with us...
@pk-wanderluster
3 жыл бұрын
I’m also looking for guidance on how to configure open source jars with glue job. Could you please make a video on it?
@eric321ification
2 жыл бұрын
This was very helpful
@udaynayak4788
Жыл бұрын
thank you for covering detailed info, can you please create one UDF in aws glue, like how to create then register and call to execute
@CalceyTech
Жыл бұрын
Thanks for the suggestion!
@wcmad7250
4 жыл бұрын
Great video
@mayurthakkar3066
3 жыл бұрын
Excellent excellent video. There are not many good videos that explains AWS Glue basic programs. I am pretty new to AWS Glue and trying to create an upsert job that will insert else update my data present in a Redshift table and source is a csv file on S3. Can you please post a video that will explain how upsert works in Glue? Thanks in advance!
@CalceyTech
3 жыл бұрын
Hi Mayur, Glad you enjoyed the video. We will definitely take into consideration your request.
@ranmehra
4 жыл бұрын
Is that possible to debug the code which you developed in the video in VS Code or pycharm IDE by setting aws creds in your IDE locally instead of running on console?
@jhontorres9519
4 жыл бұрын
Thank You!!!
@SoumilShah
3 жыл бұрын
Hello i have a Question how can you run glue job locally to make sure it works aka i saw some article running glue jobs on docker ?
@vivekprasad6342
3 жыл бұрын
Good work
@CalceyTech
3 жыл бұрын
Thank you! Cheers!
@redolfmahlaule9893
3 жыл бұрын
Lets say my data its residing in PostgreSQL ... How can u connect it to glue then to S3
@Videos-rj1ek
2 жыл бұрын
HI Manuka, can you make a video of moving glue code from dev to qa and prod..
@CalceyTech
2 жыл бұрын
Hi there, We will tackle this in one of our incoming videos, so please make sure to follow us on KZitem.
@marcin2x4
3 жыл бұрын
Do you have a tutorial on how to set up local environment? Getting awsglue package etc. Thanks!
@95SUJITH
3 жыл бұрын
do you have an answer to this ?
@marcin2x4
3 жыл бұрын
@@95SUJITH I'm trying to install environment on Win10 but no luck... (github.com/awslabs/aws-glue-libs/issues/82)
@95SUJITH
3 жыл бұрын
@@marcin2x4 I think you can only run it in Linux
@CalceyTech
3 жыл бұрын
Not yet! Coming soon!
@marcin2x4
2 жыл бұрын
It seems that once aws-glue-libs is installed, glue scripts are to be placed there as well. 4 me this fixed the moduleNotFound error even though everything is installed.
@VenkatSamaOfficial
4 жыл бұрын
Hi, how to create Informatica kind of workflows on Glue?
@vivekdbit
4 жыл бұрын
Hi Friend, Great Work. I have 1 question for you. I have data to get it from 2 tables with join. E.g. users_table and users_add_table (both having one is to one mapping) join on user_id Which is the best way of the following?? 1. Get users_table_df and users_add_table_df, then Join.apply on user_id to get the final dataFrame 2. glueContext.read.format("jdbc") .option("driver", jdbc_driver_name).......... .option("dbtable", YOUR_QUERY) In the 2nd approach i have written SQL joins in YOUR_QUERY
@CalceyTech
4 жыл бұрын
I think performance-wise 2nd option is better. In the data frame approach, there are a lot of python objects involved to compute the result and execute 2 DB queries to bring the final result. In the SQL approach, everything is done in-memory.
@snehgoyal5617
2 жыл бұрын
Hi Manuka I need to copy json files data from AWS elastic search to S3 bucket using glue can u plz help me in that
@CalceyTech
2 жыл бұрын
You should be able achieve it using the same flow in this tutorial with some changes to the data extraction step. Use a JDBC driver for ElasticSearch instead.
@nandkishoringavale2845
4 жыл бұрын
Very helpful.. just one thing i have four sql tables and want create four parquet files with respective that tables. So for that can i create four python script job or handle in one script file using looping.? please advise me.
@CalceyTech
4 жыл бұрын
Simply can run the job on a single script instead of running multiple scripts. Complete read source table and write of parquet file one after another to avoid loss all read data in the case of job failure.
@nandkishoringavale2845
4 жыл бұрын
@@CalceyTech Thanks for the reply.
@vivekdbit
4 жыл бұрын
One of my column in MySql is of JSON datatype. How do I transform it into flatten along with my other columns data??
@CalceyTech
4 жыл бұрын
First, convert the dynamic frame to Spark Dataframe --------------------------------- datasource0 = datasource0.toDF() Then add a new column to spark data frame. Use spark user defined function to extract value from json object. --------------------------------- from pyspark.sql.functions import udf getNewValues = udf(lambda val: val+1) # you can extract value from json here instead of val+1 datasource0 = datasource0.withColumn('New_Col_Name', getNewValues(col('existing_col'))
@vandenjain2399
3 жыл бұрын
Hi, I am trying to perform ETL in Glue and I am using the snowflake-connector-python module. it shows module error as it cannot import the module, can you please tell me how can I use the custom libs of python in glue? Thanks
@CalceyTech
3 жыл бұрын
Hi Vanden, Snowflake community blog provides several examples of how to use their python connector and JDBC connector on AWS Glue Job. You'll find proper ways to do and discussions on issues with their modules. community.snowflake.com/s/article/AWS-Glue-Job-in-Python-Shell-using-Wheel-and-Egg-files community.snowflake.com/s/article/How-To-Use-AWS-Glue-With-Snowflake
@upzk7752
4 жыл бұрын
How this will be different if I create a job using the type as python shell? Can you demonstrate that as well?
@CalceyTech
4 жыл бұрын
You can use the python shell for general purpose python script which You can use these jobs to schedule and run tasks that don't require an Apache Spark environment.
@prabhakarachyuta6397
4 жыл бұрын
HELLO BRO, I have created job and updated script As per your tutorial but I am getting an error saying that "connection Timedout". Please see full error message "com.amazon.support.exceptions.GeneralException: [Amazon](500150) Error setting/closing connection: Connection timed out.' ..... Please advise what else I missed. thanks
@debanjanbose8205
4 жыл бұрын
I am also getting same error, Can any one tell me what to do.
@CalceyTech
4 жыл бұрын
Actually this issue is not related to your GLUE script. It seems like the AWS environment that is running your Glue script does not have permission to access your external or internal database using DB information you have provided into the script. Because of that AWS automatically cut off the connection with time out error.
@luliu5094
3 жыл бұрын
Really useful tutorial! Thank you! Just wondering what I should do if I need to import another python script such as `import script2`, how should I setup in the job config? I've tried to store this script in s2 bucket and add the location in 'Security configuration, script libraries, and job parameters (optional)'->'Python library path', but it gave me error `ModuleNotFoundError: No module named 'script2`, does anyone know how to fix this? Thanks.
@CalceyTech
3 жыл бұрын
If your python script is just a single python file (ibb.co/HB6grhL), Upload it to the S3 bucket and add the S3 path as a glue job's python library path. (ibb.co/PgPrLCx) First, make sure your Glue job's IAM role has access to S3. Then you need to add import statements on top of the script file to use definitions on the external script file. (ibb.co/23Kx0vp)
@luliu5094
3 жыл бұрын
@@CalceyTech Thank you very much
@mithaleemohapatra5153
4 жыл бұрын
I am new to AWS glue.Can you please create a video on how to get the AWS glue lib to the local VS code IDE.
@CalceyTech
4 жыл бұрын
First, take a clone from github.com/awslabs/aws-glue-libs. Open an empty workspace from VSCode. Then copy "awsglue" folder from the cloned repository to the VSCode workspace as I have done in the video.
@mithaleemohapatra5153
4 жыл бұрын
@@CalceyTech Thank you for your reply.Yes i have done that and getting an error: "ModuleNotFoundError: No module named 'dynamicframe'". Do I need to install the spark distribution in my local? I have already installed the pyspark client.
@VenkatSamaOfficial
4 жыл бұрын
I want the data to be picked from my on-premise DB and than put to On-premise DB
@CalceyTech
4 жыл бұрын
AWS Glue ETL jobs can interact with a variety of data sources inside and outside of the AWS environment. For optimal operation in a hybrid environment, AWS Glue might require an additional network, firewall, or DNS configuration. Have a look: aws.amazon.com/blogs/big-data/how-to-access-and-analyze-on-premises-data-stores-using-aws-glue/
@RR.SANCHARI
5 жыл бұрын
Can you give a training
@CalceyTech
4 жыл бұрын
Hi @reddy since we are a customer-centric software development company we can't actually focus on it, but don't hesitate to ask anything if you need to clarify
@bhavanivani448
2 жыл бұрын
Hi,can please share this site URL for get phython script
@CalceyTech
2 жыл бұрын
I've used AWS Glue for the demo and the code was written on AWS glue's script editor. Here are the references to follow AWS Glue - aws.amazon.com/glue/ AWS Glue Labs Git - github.com/awslabs/aws-glue-libs AWS Glue PySpark Extension - docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-extensions.html Apache Spark - JDBC Data Sources - spark.apache.org/docs/latest/sql-data-sources-jdbc.html
@marlonholland955
5 жыл бұрын
I've been writing my etl scripts in the amazon web browser. How can I do this in the Vs Code IDE like you? I'm pretty new to programming.
@CalceyTech
5 жыл бұрын
Hey Marlon, there is no direct way to deploy your python script from ide on the local machine. What I have done is to create a work space on vs code with AWS glue python library file which helps us to get the advantage of IntelliSense. Then after that implementation is done, I just copy and paste the script to AWS glue console
@marlonholland955
5 жыл бұрын
@@CalceyTech Hi, thanks for the reply! That's what I meant to ask, sorry. My actual question now is how can I get the awsglue library in VS code in order to get the advantage of IntelliSense? I tried pip install awsglue..
@CalceyTech
5 жыл бұрын
@@marlonholland955 First clone AWS Glue python library repository - github.com/awslabs/aws-glue-libs. Then copy the awsglue folder from cloned content into your working workspace. Finally, create a new python file(custom-script.py) within the workspace. Hereafter, you'll able to use python imports from AWS glue within your custom script files in your workspace
@kingstonxavier
4 жыл бұрын
Thanks dude, for the video. Where can I download the jar file? Could you please comment the link?
@CalceyTech
4 жыл бұрын
You can download it from here: search.maven.org/artifact/mysql/mysql-connector-java/8.0.15/jar
@kingstonxavier
4 жыл бұрын
@@CalceyTech Thank you. I am still getting an error "An error occured while calling o70.load. Communication link failure" any suggestions?
@surendraagraharapu7061
Жыл бұрын
this code is not visible ,pls share te code
@CalceyTech
Жыл бұрын
Hi Surendra, you can find the code in the attached link: gist.github.com/manukaprabath95/72816c32b3f0fcadc5260180f39889d0
Пікірлер: 81