Practice Complex queries- find duplicate records

- May 19, 2022

Write a SQL Query to fetch all the duplicate records in a table.
Table Name: USERS
Note: Record is considered duplicate if a user name and his email is present more than once.
Approach: Partition the data based on user name and then give a row number to each of the partitioned user name. If a user name exists more than once then it would have multiple row numbers. Using the row number which is other than 1, we can identify the duplicate records.

create table users
(
user_id int primary key,
user_name varchar(30) not null,
email varchar(50));

insert into users values (1, ‘Sumit’, ‘sumit@gmail.com’);
insert into users values (2, ‘Reshma’, ‘reshma@gmail.com’);
insert into users values (3, ‘Farhana’, ‘farhana@gmail.com’);
insert into users values (4, ‘Robin’, ‘robin@gmail.com’);
insert into users values (5, ‘Robin’, ‘robin@gmail.com’);
insert into users values (6, ‘Farhana’, ‘farhana@gmail.com’);
insert into users values (7, ‘Farhana’, ‘farhana@gmail.com’);

expected output should be Farina and Robin as rows

–Solution to find duplicate using row_number() window function
select * from(
select
user_id,
user_name,
email,
ROW_number() over(partition by USER_NAME, EMAIL order by USER_ID desc) as ROW_ID
from users)
where ROW_ID=2;

Tech Studio Online

Practice Complex queries- find duplicate records

Write a SQL Query to fetch all the duplicate records in a table.

Popular posts from this blog

What is Garbage collection in Spark and its impact and resolution

How to change column name in Dataframe and selection of few columns in Dataframe using Pyspark with example

Window function in PySpark with Joins example using 2 Dataframes (inner join)