This channel is Goldmine for Pyspark Data engineers.
@manjulakumarisammidi1833
11 ай бұрын
instead of caching the dataframe @14:17 defining bad_data_df before good_data_df will also work, just another approach. Thanks for the video sir.
@anandattagasam7037
Жыл бұрын
Thanks for your brief explaination, i would go with 4th option (BadRecords path) instead of 5th (ColumnNnamedBadRecords).
@arshiyakub17
Жыл бұрын
Thank you so much for the video on this. I have been searching for this for a long time and finally got what I needed from this video.
@Jgiga
2 жыл бұрын
Thanks for sharing
@Technology_of_world5
Жыл бұрын
Good massage, thankyou lot 👍
@mohitupadhayay1439
5 ай бұрын
Can we do the same for XML and JSON files?
@sravankumar1767
2 жыл бұрын
Nice explanation 👌 👍 👏
@mesukanya9828
Жыл бұрын
Thank you so much... very well explained :)
@TRRaveendra
Жыл бұрын
Thank you 🙏
@jobiquirobi123
2 жыл бұрын
Just find out your tutoriales, they look pretty nice, thank you!
@TRRaveendra
2 жыл бұрын
Thank You 👍
@muruganc2350
Жыл бұрын
Thanks. good to learn!
@shayankabasi160
2 жыл бұрын
Very nice
@Basket-hb5jc
6 ай бұрын
very valuable
@mehmetkaya4330
Жыл бұрын
Thank you for the great tutorials!
@TRRaveendra
Жыл бұрын
Thanks for watching my channel videos
@srijitachaturvedi7738
2 жыл бұрын
Is this approach works while reading json data instead of csvs?
@TRRaveendra
2 жыл бұрын
Yes for normal json you can use the same option. for multiline json you can use option("multiline","true") otherwise it will create default _corrupt_record column.
@ketanmehta3058
Жыл бұрын
Excellent ! Clearly explained each and every option to load the data. @TeckLake Can we use this option with the JSON data as well?
@bharathsai232
Жыл бұрын
Permissive mode is not detecting malformed date types i mean if we have date as 2013-02-30 spark read in permissive mode is not detecting this as bad data
@mohitupadhayay1439
4 ай бұрын
Still we could not find the proper reason why the records went into corrupt when the column are very huge
@YOGESHMULEY-n1j
4 ай бұрын
i got , query returns no records
@saisaranv
Жыл бұрын
Hi TechLake Team, thanks for the wonderful video and helped a lot. can you please help me with 2 errors which am facing right now : 1. "cannot cast string into integer type" even after specific data schema defined 2. complex json flattening (i had gone through video 13 but my data is too complex in nature to flatten). appreciated in help please
@TRRaveendra
Жыл бұрын
Tgrappstech@gmail.con ping me ur schema or sample data i can verify
@saisaranv
Жыл бұрын
@@TRRaveendra Done..please check one. Thank you for your reply :)
@hannawg7747
2 жыл бұрын
Hi Sir, Do you provide training Azure ADB ADF ?
@TRRaveendra
2 жыл бұрын
Yes I do. please reach me on tgrappstech@gmail.com
@chriskathumbi2292
2 жыл бұрын
Hello, good video. I have a question concerning spark. When I use local data like parquet and csv and make a tempview or or just normal spark, and try to use distinct/group by or window functions, I get an error and I've seen this on my windows/linux and docker container. What could be causing this?
@TRRaveendra
2 жыл бұрын
what kind of error are you getting. is it related to datafile path? or is it related to missing columns or wrong group by query?
@chriskathumbi2292
2 жыл бұрын
@@TRRaveendra if I use df.show() and the df contains group by, window function or distinct Py4JJavaError: An error occurred while calling o69.showString.
@chriskathumbi2292
2 жыл бұрын
@@TRRaveendra Funny thing is that on Google Colabs where I have to install pyspark on launch, doesn't have this issue
@chimorammohan8392
2 жыл бұрын
@@chriskathumbi2292 this might be code error pls share the code
bro, thanks for your inputs. can you please help me how to handle this? empid,fname|lname@sal#deptid 1,mohan|kumar@5000#100 2,karna|varadan@3489#101 3,kavitha|gandan@6000#102 Expected output empid,fname,lname,sal,deptid 1,mohan,kumar,5000,100 2,karan,varadan,3489,101 3,kavitha,gandan,6000,102
Пікірлер: 36