Complex SQL Queries: Find duplicate records

Please refer all my below post and other solutions also mentioned in this post

Practice Complex queries- find duplicate records

–find duplicate record ALL WAYS

SELECT * FROM USERS;

–USING ROWNUM

SELECT * FROM(
SELECT USER_ID,USER_NAME,EMAIL,
ROW_NUMBER() OVER (PARTITION BY USER_NAME,EMAIL ORDER BY USER_ID) AS RN
FROM USERS
) WHERE RN=2;

–OUTPUT:IF WE HAVE TO FETCH ONLY DUPLICATE RECORDS ONCE ALL ROWS WONT BE RETURNS, BELOW IS THE SOLUTION

–RETURN ALL DUPLICATE RECORDS

SELECT * FROM USERS WHERE (USER_NAME,EMAIL) IN (
SELECT USER_NAME,EMAIL FROM(
SELECT USER_ID,USER_NAME,EMAIL,
ROW_NUMBER() OVER (PARTITION BY USER_NAME,EMAIL ORDER BY USER_ID) AS RN
FROM USERS
) WHERE RN=2
);

–USING GROUP BY
SELECT * FROM USERS WHERE (USER_NAME,EMAIL) IN (
SELECT USER_NAME,EMAIL FROM(
SELECT USER_NAME,EMAIL, COUNT(1) CNT
FROM USERS
GROUP BY USER_NAME,EMAIL
) WHERE CNT>1
);

 

Popular posts from this blog

Window function in PySpark with Joins example using 2 Dataframes (inner join)

Complex SQL: fetch the users who logged in consecutively 3 or more times (lead perfect example)

Credit Card Data Analysis using PySpark (how to use auto broadcast join after disabling it)