The practice dataset and SQL statements for this video tutorial are available here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@woolfel
2 жыл бұрын
One important thing to ask is "what causes duplicate records in the database?" In large applications where data comes from multiple sources, "de-duping" before insert is a real problem. It's good to know how to deleting duplicates, but in some cases that's not a viable option. For example, if the table has 100 million rows, deleting duplicates could be quite expensive and shouldn't be run during business hours. When I ask SQL questions in interviews, I'm not looking for "how do I use group by for de-duping." I want to see the candidate thinking about the larger problem and taking time to understand business needs. Cleaning up dupes after the fact doesn't scale and candidates that ask "what is causing dupes and what's the impact" are the developers I want. A developer that only knows how to group by, but never bothers to ask "why is this happening and what is the root problem" aren't people I will hire.
@LearnatKnowstar
2 жыл бұрын
Love your thoughts! Thank you for sharing this with us 👍
@redguard128
2 жыл бұрын
Depends what are you interviewing for. Asking a simple developer about business problems doesn't make much sense. If you want to hire a tech lead, team lead, project manager, architect or a consultant, then yes.
@woolfel
2 жыл бұрын
@@redguard128 if I'm interviewing an entry level developer, I still ask the question for a few reasons. The first one is to expose the candidate to important issues they will eventually have to deal with. The second is to emphasize SQL isn't just for programming sake, it's to manage data and solve functional needs of the application. If applications aren't checking for dupes before inserting data, your database is going to become a pile of garbage very quickly.
@redguard128
2 жыл бұрын
@@woolfel For me a developer is an executive role. He has to do what I tell him/her to do. The "Why"s and "How"s isn't their concern. Sometimes the business decides that duplicates is what they want so a developer that prematurely solves a problem, actually deals more damage than fixes. Some businesses run on circular logic, repeating themselves, defining settings everywhere, modifying global variables in functions, running multiple databases with duplicate data, having too low spec or too high spec servers, etc.
@woolfel
2 жыл бұрын
@@redguard128 that's one way to do it. I work in the consulting world and growing our developers is very important to me. The faster the developer learns, the less hand holding I need to do and it makes the entire team more productive. I've worked in fortune 500 world long enough to know hiring a bunch of low level developers who can't grow causes more problems. In healthcare and finance sector, duplicate data causes huge data integrity issues. I would say 90% of the ETL work in fortune 500 deal with dirty data. When issues happen it's because of dirty data (bad references, missing data and dupes). Many of our customers waste 3-12months dealing with dirty data every year. A developer that isn't thinking about these issues and constantly growing will become obsolete. I have seen fulltime employees (aka not consultants) work this way. I question is that a good thing to teach people? Who wants to stay a low level engineer forever and be someone else's slave? Who wants to work at a job where the tech lead treats them like a pair of fingers?
@surajpatil3579
2 жыл бұрын
Thanks for making this interview questions series.. cleared my major doubts
@LearnatKnowstar
2 жыл бұрын
Glad to hear that! Thank you for your support.
@whitecrowuk575
2 жыл бұрын
Best is to use row_number() and partition by to create sequence column on duplicates - applicable to all kind of duplicates ( identical rows especially)
@MahaLakshmi_vlogs
2 жыл бұрын
Yes
@KS211000
2 жыл бұрын
True
@gauraosayasikamal4683
2 жыл бұрын
Good way of teaching
@LearnatKnowstar
2 жыл бұрын
Thank you
@shitalgavasane8699
5 ай бұрын
Output sequence of below query should be - select firstname,lastname,count(*) from employee gropy by fistnmae,lastname- output- firstname lastname count(*) Adam ownes 2 Mark wills 1 natasha lee 2 ruley jones 1
@visaalakshiselvaraj5572
3 жыл бұрын
Wonderful 😍 thanks a lot 🙏
@javeedakramshaik7052
4 жыл бұрын
I appreciate your work mam and videos are good explanatory
@LearnatKnowstar
4 жыл бұрын
Thank you
@skdonsingh
3 жыл бұрын
From rank function it's better .. thanks
@Analystmind
2 жыл бұрын
You are amazing teacher
@LearnatKnowstar
2 жыл бұрын
Thank you 🙏
@kayk1388
2 жыл бұрын
Deleting from CTE deletes data from the source table????????? How?
@hnaidu.pro21
4 ай бұрын
Yes, you can try the same. WITH CTE_dup AS ( SELECT EmpID, FirstName, LastName, ROW_NUMBER() OVER (PARTITION BY FirstName, LastName ORDER BY EmpID) AS rownum FROM [dbo].[tblDuplicate] ) DELETE FROM CTE_dup WHERE rownum > 1; SELECT * FROM [dbo].[tblDuplicate]
@kebincui
2 жыл бұрын
very good; thanks
@LearnatKnowstar
2 жыл бұрын
Thank you
@praldandagi9574
2 жыл бұрын
Video is visible but explanation is superb
@user-mx8pw4wt7e
Жыл бұрын
Really appreciate your effort.. If possible please add table script as well, it will helpful for beginner's. Thank you!
@LearnatKnowstar
Жыл бұрын
Thank you. We have started adding the table scripts in our latest videos!
@LearnatKnowstar
Жыл бұрын
The practice dataset and SQL statements are now available here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@ersuresh4488
Жыл бұрын
as per my understanding if we have >2 duplicate records in a table then rank() and denserank() will not not work here in this case we have to use row_number() only!!!
@omkarkhandekar7430
3 жыл бұрын
Excellent Mam😍
@LearnatKnowstar
3 жыл бұрын
Thank you
@papachoudhary5482
2 жыл бұрын
Thanks
@LearnatKnowstar
2 жыл бұрын
Thank you
@niteshsingh4377
Жыл бұрын
please upload dataset as well to follow along.
@LearnatKnowstar
Жыл бұрын
The practice dataset and SQL statements are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@kristyowens2284
4 жыл бұрын
Please post more SQL queries.
@LearnatKnowstar
4 жыл бұрын
Will be posting more soon.
@rajasekharm9049
2 жыл бұрын
To delete duplicates just we can go through distinct * from emp we can delete duplicates from entire table
@narendrachowdary991
2 жыл бұрын
Good knowledge Bro
@buttercup9884
2 жыл бұрын
Not really in the example these records were not duplicates as they differed at EmployeeID - distinct would work only is all fields would be the same. There are other ways of getting rid of duplicates as using rowids something like (using the example where duplicates existed at first,last name, phone and emal DELETE FROM EMPLOYEE1 WHERE ROWID NOT IN (SELECT MIN(ROWID) FROM EMPLOYEE1 GROUP BY FIRSTNAME,LASTNAME,PHONE,EMAIL);
@danieljust295
3 жыл бұрын
The question is how to delete duplicates from table, not how to display duplicates or how to display unique - two different things. The correct answer is DELETE FROM WHERE max() … , even though this is inefficient. It’s about how to ingest data into table without duplicates.
@Datapassenger_prashant
2 жыл бұрын
It is also about how much time query is taking to execute
@danieljust295
2 жыл бұрын
prashant verma Right, but the execution time this is secondary problem, first is the functionality, then optimization.
@Datapassenger_prashant
2 жыл бұрын
@@danieljust295 absolutely right, if we just consider the test question thn yes your approach was the simplest..
@danieljust295
2 жыл бұрын
prashant verma I add also that “deleting duplicates” is ambiguous phrase. Does it mean to completely remove duplicated rows from the table or leave unique row in case there are multiple rows with the same values (duplicates).
@Datapassenger_prashant
2 жыл бұрын
@@danieljust295 that's a question we should ask before proceeding with the query. However, in an interview we don't get chance. So here we can assume.. remove the duplicate from the table. As she said in her video we can keep the latest one
@spedeveryday9699
2 жыл бұрын
Very Helpful. Thank you
@LearnatKnowstar
2 жыл бұрын
Thank you
@marcuslee2026
2 жыл бұрын
Excellent. Thank you!
@LearnatKnowstar
2 жыл бұрын
Glad it was helpful!
@garlaamar8973
3 жыл бұрын
Thanks u mam for sharing this.🙏
@LearnatKnowstar
3 жыл бұрын
Thank you
@LearnatKnowstar
4 жыл бұрын
Have you been asked a SQL query interview question that you couldn't answer? Let us know in the comments below and we will answer those in our upcoming videos!
@sachingaware6822
3 жыл бұрын
in my last interview interviewer ask me what is difference between count(*) and count(1) and which one is performance wise better so can u please make video on this.
@hellocartoons1735
2 жыл бұрын
Rollback; before commit;
@akhildas3743
3 жыл бұрын
Can we use dense rank also?
@naveenreddy2059
Жыл бұрын
ne style bavundi akka
@tiagosilva856
2 жыл бұрын
You are a bless in my life 😘
@shashidharr6595
2 жыл бұрын
Nice 👍helpful
@LearnatKnowstar
2 жыл бұрын
Thank you
@EternalEvanesce
Жыл бұрын
What is difference between roll_number and rank function? Both can be used interchangeably??
@pankajbhatt7373
2 жыл бұрын
So deleting data with in cte will delete data in table too , how? Does this happens in derived table and views also?
@lucas1365
2 жыл бұрын
Same question
@ouramazingnature
2 жыл бұрын
Hello there, Can you please do a video on how to add a new large dataset (million of rows) to an exsiting table without deleting the data in the table. Provided that column names and data types are the same in old table and new added data. Thanks
@NoonAndKnight
2 жыл бұрын
may not be best method, but create a loop to insert batch records. 20k columns at a time. make sure you set the loop to end once finished
@Beast70
2 жыл бұрын
Best method, drop indexes, create a loop based on optimal size of batch, typically 20k to 500k, using bulk insert, once complete re-apply indexes.
@LearnatKnowstar
2 жыл бұрын
Very good approaches have been mentioned in the comments. Thank you
@TechnoSparkBigData
5 ай бұрын
How deleting the records from CTE is deleting the rows from main table?
@akhildas3743
3 жыл бұрын
For finding the duplicate values we can use 'having count >1', it wont be possible to use a delete funtion here and the having clause as in a subquery?
@LearnatKnowstar
3 жыл бұрын
It will delete all occurrences of the duplicate records. The method explained retains one occurrence of the duplicate records.
@akhildas3743
3 жыл бұрын
@@LearnatKnowstar thank u
@narayanagottipati5980
4 жыл бұрын
Hi madam can you please make a video on how to recover accidentally deleted data from the table. Thanks in advance
@LearnatKnowstar
4 жыл бұрын
That's a great question. Will definitely plan a video soon.
@sarmasvali6147
2 жыл бұрын
We can use roll back command, I think, if we delete the data accidentally. But this command can't be executed for DDL commands
@m00050
Жыл бұрын
Very helpful. I am totally new. Learning SQL. Question, what platform is this where u are running sql queries?
@LearnatKnowstar
Жыл бұрын
This is SQL Server. You can download the software for free from Microsoft website.
@arshadmohammed1090
2 жыл бұрын
In the last approach you mentioned, we are deleting the values from the cte right ? Not from the main table ?
@LearnatKnowstar
2 жыл бұрын
Deleting from CTE will delete it from the underlying table.
@arshadmohammed1090
2 жыл бұрын
@@LearnatKnowstar got it. I wasn't aware of this fact about cte back then.
@shivakumarj4080
2 жыл бұрын
Hello, This video is very helpful. But I have one question, The maximum employee ID is the duplicate one right? But you are deleting min of employee ID. Could you please clarify that?
@LearnatKnowstar
2 жыл бұрын
yes, you can delete max of employee id considering it as a duplicate. It was just assumed in the example that we want to retain the max employee id.
@sonyguptaagrawal6028
Жыл бұрын
please provide practice database ... that you have used in this video
@LearnatKnowstar
Жыл бұрын
The practice dataset and SQL statements are available here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@sonyguptaagrawal6028
Жыл бұрын
@@LearnatKnowstar thank you
@unknownnaanu
2 жыл бұрын
Select rank() over ( partition by firstname order by employeeid) as employeenumber, firstname,lastname,phone,email from employee; Please correct me if i am wrong .
@parsuramkumar2826
Жыл бұрын
Thanks for the video, can you pls tell me how the datasets used here can be found or accessed ?
@LearnatKnowstar
Жыл бұрын
The practice datasets are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@harikrishnasai6104
2 жыл бұрын
We can use rownumber also right ? While we deleting the duplicates in cte
@LearnatKnowstar
2 жыл бұрын
Yes
@sweetysweetyvghb
3 жыл бұрын
Mam...instead of employee id , can we use rowid here and order by rowid ...because in many tables practically column like emp I'd won't be present.
@LearnatKnowstar
3 жыл бұрын
You will need to choose a key column that identifies a unique record in the table
@sweetysweetyvghb
3 жыл бұрын
@@LearnatKnowstar can I not use rowid ?
@LearnatKnowstar
3 жыл бұрын
Rowid would be unique for each row in the table. Each duplicate row will have its own rowid and hence rowid can not be used. You need to identify a key column that represents a unique record to business
@ShubhamRai06
3 жыл бұрын
what if there are more no of column and just have different timestamp , or user but having duplicate values( key columns) then in that case how can we delete duplicate using row_number...
@LearnatKnowstar
3 жыл бұрын
You just need to use key columns in partition by clause
@benarjiyt994
2 жыл бұрын
@@LearnatKnowstar 0
@shashidharr6595
Жыл бұрын
👍👍👍
@LearnatKnowstar
Жыл бұрын
Thank you
@asutoshnayak1391
3 жыл бұрын
If the data set is bigger suppose 10000 rows then how would you remove duplicate from that without seeing which one is duplicate and to remove ? Please tell me.
@ishitvasingh9902
Жыл бұрын
Hi, I have a query please reply to what is wrong in it, my employ table contains, id, name, sal, email with dup_emp as (select *,dense_rank() over (partition by email order by id desc) as dens_rnk from employ e) delete from dup_emp where dens_rnk >1 now this code is showing this error, SQL Error [42P01]: ERROR: relation "dup_emp" does not exist Position: 127 i am selecting everything , including the cte, and then executing the query
@sorbasishshaw5161
Жыл бұрын
with Employee_CTE as (Select *, RANK() over (partition by FirstName, Lastname order by EmployeeID desc) as Rank from Employee); delete from Employee_CTE where Rank > 1; whenever I am typing this block of code in my oracle db (11g), I am getting an error ORA-00923: FROM keyword not found where expected. can somebody please help me in this matter?
@LearnatKnowstar
Жыл бұрын
You do not need to terminate the CTE with a semi colon.
@jaydeeppatidar4189
2 жыл бұрын
Question:- You have created CTE and deleted records from CTE then how data got deleted from the original table? Will wait for answer from anyone.
@LearnatKnowstar
2 жыл бұрын
This is a feature of CTE. If you delete from CTE , it will delete from the underlying table 👍
@jaydeeppatidar4189
Жыл бұрын
@@LearnatKnowstar thank you!
@geraldsegun3297
2 жыл бұрын
When I try to duplicate your example on mysql, i get the error Error Code: 1288. The target table employye_cte of the DELETE is not updatable . This is the query I am trying to run with employye_cte as (select firstname, lastname, employeenumber, row_number() over (partition by lastname order by employeenumber) as rownumber from employees1) delete from employye_cte where rownumber = '2' What am I missing? Thanks
@ivanbesando556
2 жыл бұрын
this is also my problem, i found that mysql couldn't delete a subquery. do you find the solution for this?
@geraldsegun3297
2 жыл бұрын
@@ivanbesando556 not so far yet
@aakashvaish3757
2 жыл бұрын
Can you please share the DDL of all the questions mentioned ?
@LearnatKnowstar
Жыл бұрын
The DDLs are available her e- know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@souravmoha2224
2 жыл бұрын
Hi Could you please share the sample data.Thanks
@LearnatKnowstar
Жыл бұрын
The practice dataset and SQL statements are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@GopiVardhan
10 ай бұрын
Where is dataset to download
@velo1337
2 жыл бұрын
delete from dbo.employee where employeeid not in (select max(employeeid) from dpo.employee group by firstname, lastname)
@abiodun.alawal8533
Жыл бұрын
Does anyone know how i can have access to the datasources used in this video?
@LearnatKnowstar
Жыл бұрын
The practice dataset and SQL statements are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@sakesh404
2 жыл бұрын
Dear Mam, please zoom,writings are not readable. 🙏🏼
@LearnatKnowstar
2 жыл бұрын
Sure.Noted. In latest videos, the font is enlarged.
@gabrielsalomao5897
3 жыл бұрын
Doesnt work, table not updateable
@Sayon____bhattacharjee
4 жыл бұрын
DEAR MADAM , PLEASE HELP....... WITH NEW_TABLE AS ( SELECT ID, F_N, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) AS RANK FROM EMPLOYEES ) DELETE FROM NEW_TABLE WHERE RANK>1; ---------------------------------------------------------------------------- ERROR MSG: ORA-00928: missing SELECT keyword 00928. 00000 - "missing SELECT keyword" *Cause: *Action: Error at Line: 6 Column: 1
@sarmasvali6147
2 жыл бұрын
You have used row num in beginning and rank command at the end. How it will execute. Use either rownum or rank.
@harshitchoudhary4693
3 жыл бұрын
Where can I get these practice tables?
@LearnatKnowstar
3 жыл бұрын
You can practice with tables in the Microsoft Adventure Works database.
@LearnatKnowstar
Жыл бұрын
The practice dataset and SQL statements are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@ajay5028
2 жыл бұрын
Video not cleared...your voice is clered... Not able to see words
@vikastiwari6564
2 жыл бұрын
I appreciate your way of explanations. I just loved it...I would say thank you but I want you to make a video on common expiration which are very important in SQL. one more question for you it that. Suppose there are two table and name is table A and table B Table A having Table B having ID | STUDENT NAME ID | SUJECT | MARKS 1 A 2 ENGLISH 40 2 B 4 ENGLISH 60 3 C 5 MATHS 100 4 6 SCIENCE 80 Find out student name who got max mark? I had been asked this question. plz solve here so other people can also get to know. Thank you very much in advance Mam....I will keep on waiting for answer of above question.
@LearnatKnowstar
2 жыл бұрын
Thank you. Please see the below video- It has a similar query to the one in your comment. kzitem.info/news/bejne/u2lqvmWXZqyArJw
@kayk1388
2 жыл бұрын
What is in table A?
@hanam6138
2 жыл бұрын
I hope my answers right, if wrong please let me know tks. Select A.student name from A inner join B on A.id = B.id where max(b.marks)
@jaitiwari241
Жыл бұрын
Where is sql code to practice
@LearnatKnowstar
Жыл бұрын
The practice dataset and SQL statements are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@vishnuvamshi4645
2 жыл бұрын
PUT QUERY IN DESCRIPTION
@LearnatKnowstar
Жыл бұрын
The practice dataset and SQL statements are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@ashokreddy3366
2 жыл бұрын
Rownum is best instead of these
@geraldmedrano8594
2 жыл бұрын
My answer: Google
@LearnatKnowstar
2 жыл бұрын
Google might lead you here 👍
@shantanu6940
Жыл бұрын
waste of my time
@chaitanyakumar6790
2 жыл бұрын
Hi , Actually after having the query SELECT *, (RANK() OVER (PARTITION BY Firstname,Lastname ORDER BY EmployeeId asc)) AS Rank1 FROM dbo.Employee1, I am unable to get the data in the order of EmployeeId, its getting in the order of Firstname alphabetical order, Can you please let me know the issue
@jamesgg9950
2 жыл бұрын
DELETE FROM aliasB FROM dbo.employee1 aliasA INNER JOIN dbo.employee1 aliasB ON aliasA.FirstName = aliasB.FirstName AND aliasA.LastName = aliasB.LastName AND aliasA.EmployeeID < aliasB.EmployeeID -- ONLY valid if PK supports '
Пікірлер: 147