Complex SQL Queries: Find duplicate records

Please refer all my below post and other solutions also mentioned in this post

Practice Complex queries- find duplicate records

–find duplicate record ALL WAYS

SELECT * FROM USERS;

–USING ROWNUM

SELECT * FROM(
SELECT USER_ID,USER_NAME,EMAIL,
ROW_NUMBER() OVER (PARTITION BY USER_NAME,EMAIL ORDER BY USER_ID) AS RN
FROM USERS
) WHERE RN=2;

–OUTPUT:IF WE HAVE TO FETCH ONLY DUPLICATE RECORDS ONCE ALL ROWS WONT BE RETURNS, BELOW IS THE SOLUTION

–RETURN ALL DUPLICATE RECORDS

SELECT * FROM USERS WHERE (USER_NAME,EMAIL) IN (
SELECT USER_NAME,EMAIL FROM(
SELECT USER_ID,USER_NAME,EMAIL,
ROW_NUMBER() OVER (PARTITION BY USER_NAME,EMAIL ORDER BY USER_ID) AS RN
FROM USERS
) WHERE RN=2
);

–USING GROUP BY
SELECT * FROM USERS WHERE (USER_NAME,EMAIL) IN (
SELECT USER_NAME,EMAIL FROM(
SELECT USER_NAME,EMAIL, COUNT(1) CNT
FROM USERS
GROUP BY USER_NAME,EMAIL
) WHERE CNT>1
);

 

Popular posts from this blog

How to change column name in Dataframe and selection of few columns in Dataframe using Pyspark with example

What is Garbage collection in Spark and its impact and resolution

Credit Card Data Analysis using PySpark (how to use auto broadcast join after disabling it)