very helpful, my lead data scientist was not helping me , finally i got this video after so much searching , and i was working for an assignment for change the company. while doing assignment , i got the idea of appending into rows of particular column that i was stuck in my current company. thanks a lot!! please make a video on json file, converting unstructured data to structured. and you are using jupyter notebook that is so awesome to understand unlike other videos .
@DataScienceGarage
4 жыл бұрын
Thanks a lot for the comment! It is a good idea about JSON, keeping in mind that and I will make such tutorial in near future.
@basicmaths3443
4 жыл бұрын
@@DataScienceGarage waiting for it.. subscribed and pressed the bell icon already. Keep doing great job and helping us. In industry everyone pulls your leg to rise. I am feeling so relieved. Thanks and best wishes to u. From India
@dipsikhaphukan5563
4 жыл бұрын
I want to extract some information in structured format by reading a doctor's prescription. Information such as medicine name, time for taking those medicine and the quantity ,etc. Can you please help?
@amangautam2658
4 жыл бұрын
one of the best explanation!! kudos to u!!
@DataScienceGarage
4 жыл бұрын
Thanks for kind words! Wishes!
@azharshaikh8197
3 жыл бұрын
Thanks. It really helped. But if we want to fetch the date for eg. (December 31, 2009). How can we write the regex code for it. Pls help.
@amangautam2658
4 жыл бұрын
after executing the code it shows me this error when I type group() for date line: 'NoneType' object has no attribute 'group' , but the code is running without error when using without group, idk how to solve this
@adrianajimenezambel5527
3 жыл бұрын
In my case the data frame was called df! I solved like this: for row in range(0,len(df)): if re.search(date_pattern, df.iat[row,index_description]) != None: date = re.search(date_pattern, df.iat[row,index_description]).group() df.iat[row, index_date] = date else: date = None
@milikeumed5668
2 жыл бұрын
@@adrianajimenezambel5527 thank you a lot
@ibteshamchowdhury2060
2 жыл бұрын
@@adrianajimenezambel5527 God Bless !!!!!
@kkarthikkumar6250
2 жыл бұрын
Can anyone help me out. I just need to find the position of a date format ( 05/02/2020 or Nov 5 2004 ) in a string in python language. Please let me know if you get answer.
@yuzhiyan99
5 жыл бұрын
Thank you so much for the video. everything else works perfectly for me except .group() part. I kept getting this message "AttributeError: 'NoneType' object has no attribute 'group'". Any suggestions on this issue?
@郭育誠-b1r
4 жыл бұрын
You can check whether there is None in your re.search()
@suryakunapareddy3193
4 жыл бұрын
how should i solve this issue??
@suryakunapareddy3193
4 жыл бұрын
@@郭育誠-b1r what to do if there is, also i want to use group()
@郭育誠-b1r
4 жыл бұрын
@@suryakunapareddy3193 you can remove None first in your re.search() , then you can use group() in your re.search()
@amangautam2658
4 жыл бұрын
@@郭育誠-b1r how to remove none ? any link or a video auggestion?
@LuizPerciliano_78
3 жыл бұрын
my friend, my dataframe has rows that will not be filled, for example, not all rows have a date, so the column will be "Empty", however, when using "group ()" it has the following error - AttributeError: 'NoneType' object has no attribute 'group'. However, it records on other lines with the search information. Do you know how to treat it? Thank you for the excellent class.
@anuran6180
2 жыл бұрын
Simply handle the exceptions - use try statement for the grouping and add AttributeError as the exception part and under it write pass so only those which have date will be extracted and the rest will have NaN values
@yadunandanacharya8951
5 жыл бұрын
"No.3/B, 8th Main, Nandhini Layout, Bangalore - 560096, Near Mahalakshmi Layout" Can I extract Pincode and Area name like "Nandhini Layout" from the Address column above using Pandas and "re" Library as you have shown above Sir? And please tell me how?
@KuftuKa
3 жыл бұрын
Could you do a video on how to do the same but with lambda expressions?
@DataScienceGarage
3 жыл бұрын
Thanks for suggestion. I will keep it in my mind and that is in the list for the next videos.
@cordularaecke
4 жыл бұрын
Sorry, assumed description column would be the 'index', revised solution; data = { 'description': [ 'made payment on 04/11/2019', 'Meeting with clients (07/06/2014)', 'Christmas party will take place on 20/12/2018', 'Valentine day is on 14/02/2018 this year', 'Easter was in 21/04/2018 this year', '17/06/2019 was a hot day in Lithuania', 'My birthday is on 28/05/2019, not quite long ago' ], 'values': [2000, 0, 1400, 140, 740, 20, 175] } df = pd.DataFrame(data) df.insert(0, 'date', df.description.str.extract(r'(\d{2}\/\d{2}\/\d{4})', expand=False)) df
@DataScienceGarage
4 жыл бұрын
Yep, maybe. But you can freely use you custom dataset for your own purposes.
@erenhan
3 жыл бұрын
this is great, thanks for explaanation
@DataScienceGarage
3 жыл бұрын
Thanks for watching! :)
@sriramcharankola1832
2 жыл бұрын
What if there are multiple regex patterns? Can someone write it here?
@varshadevgankar8242
4 жыл бұрын
hi! i want a function in python that identify which column have date in them? please help me out in this..
@abhishekranjane960
4 жыл бұрын
are you also applying for internshala lannet internship by any chance??
@varshadevgankar8242
4 жыл бұрын
@@abhishekranjane960 I didn't get ur question,,
@ContentsOfTable
4 жыл бұрын
Suggestions: sniff out column name for keyword date or similar language relative to the document; sniff out the data types of the columns to see if that gets you to the goal.
@and_and1
4 жыл бұрын
Hello, I have this mistake "RecursionError: maximum recursion depth exceeded" How can I solve this?
@and_and1
4 жыл бұрын
I made mistake but anyway I have the same isuue like @Yuzhi Yan, any suggestions?
@and_and1
4 жыл бұрын
for row in range(0, len(df)): try: date = re.search(date_pattern, df.iat[row, index_description]).group() df.iat[row, index_date] = date except AttributeError: df.iat[row, index_date] = '' "
@mariaanasbludovice6859
3 жыл бұрын
Excelent video! Thank you But what if I have more than one date/code by row? How can I extract both in the same row?
@singisheroo
3 жыл бұрын
Hey Maria, I have a similar use case. Did you find a solution for this?
@mariaanasbludovice6859
3 жыл бұрын
@@singisheroo Hey Sharath, yes. I basically applied a For loop and saved all codes in a list, keeping the row id. Then I merged the results into the main table... I can try to explain this better if you'd like.
@georgesmith3022
5 жыл бұрын
Maybe you could also convert them to datetime in order to do some sorting
@DataScienceGarage
5 жыл бұрын
Yes it is possible. There are special functions that makes transformations from string to datetime format in Pandas dataframe: pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
@basicmaths3443
4 жыл бұрын
@@DataScienceGarage i tried with datetime, it gives perfect output only if u hv dates and no other numbers in whole string.
@basicmaths3443
4 жыл бұрын
@@DataScienceGarage can u tell me how to extract all type of date format in one line code/?? format like 02-Jan-1993 , 02/02/1993, and all other formats
@whatistrending3311
4 жыл бұрын
The code ran into error. It says AttributeError: 'str' object has no attribute 'iat'
@DataScienceGarage
4 жыл бұрын
Yes, 'str' object does not have .iat method. Could you please show where I perform iat on 'str' object? You should apply .iat method on Pandas Dataframe.
@KirillBezzubkine
4 жыл бұрын
3:47 - aaaaand...CHOO
@LuizPerciliano_78
3 жыл бұрын
show this video, thanks my dear friend
@LashaGoch
3 жыл бұрын
Thank you!!!
@kgravikumar
3 жыл бұрын
Nice
@erick2051
4 жыл бұрын
Good explanation, but you are iterating the most inefficient way you can iterate through a dataframe. Use better series.str.find, or even a dataframe.apply would be better
@DataScienceGarage
4 жыл бұрын
Yes, you are correct. I choosed this way to explain for demonstration purposes only.
@erick2051
4 жыл бұрын
@@DataScienceGarage great then! Thank you very much
@gauravarora7344
4 жыл бұрын
amazing its is.
@DataScienceGarage
4 жыл бұрын
Thanks!
@wirechair
4 жыл бұрын
what if dates appear all as: d/m/yy dd/m/yy d/mm/yy dd/mm/yy
@ContentsOfTable
4 жыл бұрын
There is probably multiple ways to deal with this, but one suggestion is to tune your regex, kzitem.info/news/bejne/pmqX3HaOm4CAepg, to match your needs for the document being processed so you don't get false matches. This can be a simple as changing the regex to \d{1,2}\/\d{1,2}\/\d{2,4} or as complicated as using the everything matcher: (?:(?:31 ?(\/|-|\.) ?(?:0?[13578]|1[02]|(?:Jan|Mar|May|Jul|Aug|Oct|Dec))) ?\1 ?|(?:(?:29|30) ?(\/|-|\.) ?(?:0?[1,3-9]|1[0-2]|(?:Jan|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)) ?\2 ?))(?:(?:1[6-9]|[2-9]\d)?\d{2})|(?:29 ?(\/|-|\.) ?(?:0?2|(?:Feb)) ?\3 ?(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))|(?:0?[1-9]|1\d|2[0-8]) ?(\/|-|\.) ?(?:(?:0?[1-9]|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep))|(?:1[0-2]|(?:Oct|Nov|Dec))) ?\4 ?(?:(?:1[6-9]|[2-9]\d)?\d{2})|(?:(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep)|(?:1[0-2]|(?:Oct|Nov|Dec))|(?:0[1-9]|1[012])) ?(\/|-|\.) ?(?:0[1-9]|[12][0-9]|3[01]) ?\5 ?(?:(?:1[6-9]|[2-9]\d)?\d{2})|(?:(?:1[6-9]|[2-9]\d)?\d{2}) ?(\/|-|\.) ?(?:(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep)|(?:1[0-2]|(?:Oct|Nov|Dec))|(?:0[1-9]|1[012])) ?\6 ?(?:0[1-9]|[12][0-9]|3[01]) Choose your own adventure.
@wirechair
4 жыл бұрын
Peter Worden whoaaaaaaaaaaa, thank you sir!! Much appreciated!!
@DataScienceGarage
5 жыл бұрын
If you found any useful in this video I reccomend to check another one in pararell. Pandas tips. DataFrames. Creating subsets. Sorting data. Summarizing data: kzitem.info/news/bejne/25l_z4upo5yofWk
Пікірлер: 61