Posts

Showing posts from May, 2022

Pyspark RDD examples

from pyspark.sql import SparkSession sc = SparkSession.builder.master(‘local[1]’)\ .appName(‘RDD_Methods_Examples’)\ .getOrCreate() print(sc.version) Output: 3.2.1 rddNum = sc.parallelize([1,2,3,4,5,6,7,8,9,10]) rddNum = rddNum.map(lambda x : x+10) rddNum = rddNum.filter(lambda x : x % 2 == 0) print(rddNum.reduce(lambda a,b : a+b)) Output: 80 nameRdd = sc.parallelize([‘Deepak’,’Simmi’,’Simran’,’Sukhwinder’,’Sanki’,’ShotTemper’]) rddNum = nameRdd.filter(lambda name : name.startswith(‘S’)) print(rddNum.collect()) rddNum = nameRdd.filter(lambda name : not name.startswith(‘S’)) print(rddNum.collect()) ['Simmi', 'Simran', 'Sukhwinder', 'Sanki', 'ShotTemper']['Deepak'] #union example rddNum = sc.parallelize([1,2,3,4,5,6,7,8,9,10,30,21,45,23,22,77,44]) divisibleByTwo = rddNum.filter(lambda x : x%2 == 0) divisibleByThree = rddNum.filter(lambda x : x%3 == 0) print(divisibleByTwo.collect()) print(divisibleByThree.collect()) rddUnion = divisibleByTw

Everything about Binary Tree and its all traversal techniques (recursive and itterative) with examples

package org.dpq.ds.tree;import java.util.Stack;public class Tree<T>{ public static void main(String[] args) { TreeNode<Integer> root = new TreeNode<Integer>(1); root.setLeft(new TreeNode<Integer>(2)); root.setRight(new TreeNode<Integer>(3)); root.getLeft().setLeft(new TreeNode<Integer>(4)); root.getLeft().setRight(new TreeNode<Integer>(5)); root.getRight().setLeft(new TreeNode<Integer>(6)); root.getRight().setRight(new TreeNode<Integer>(7)); Tree<Integer> tree = new Tree<Integer>(); //Tree// 1// / \// 2 3// /\ /\// 4 5 6 7 //expected result for inorder(LNR) 4 2 5 1 6 3 7 //expected result for preorder(NLR) 1 2 4 5 3 6 7 //expected result for preorder(NLR) 4 5 2 6 7 3 1 System.out.println("recursive inorder \n"); tree.inOrder(root); System.out.println("recursive pre

COMPLEX QUERIES: ALL POSSIBLE QUERIES

–find duplicate record ALL WAYS SELECT * FROM USERS; –USING ROWNUM SELECT * FROM( SELECT USER_ID,USER_NAME,EMAIL, ROW_NUMBER() OVER (PARTITION BY USER_NAME,EMAIL ORDER BY USER_ID) AS RN FROM USERS ) WHERE RN=2; –OUTPUT:IF WE HAVE TO FETCH ONLY DUPLICATE RECORDS ONCE ALL ROWS WONT BE RETURNS, BELOW IS THE SOLUTION –RETURN ALL DUPLICATE RECORDS SELECT * FROM USERS WHERE (USER_NAME,EMAIL) IN ( SELECT USER_NAME,EMAIL FROM( SELECT USER_ID,USER_NAME,EMAIL, ROW_NUMBER() OVER (PARTITION BY USER_NAME,EMAIL ORDER BY USER_ID) AS RN FROM USERS ) WHERE RN=2 ); –USING GROUP BY SELECT * FROM USERS WHERE (USER_NAME,EMAIL) IN ( SELECT USER_NAME,EMAIL FROM( SELECT USER_NAME,EMAIL, COUNT(1) CNT FROM USERS GROUP BY USER_NAME,EMAIL ) WHERE CNT>1 ); –second last record –USING WINDOW FUNCTION SELECT * FROM( SELECT EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER, HIRE_DATE, JOB_ID, SALARY, MANAGER_ID, DEPARTMENT_ID, ROW_NUMBER() OVER ( ORDER BY EMPLOYEE_ID DESC) AS RN FROM EMPLOYEES ) WHERE RN=2; –

COMPLEX QUERY: Prefer the account id with the least value in case of same number of unique patients

Note : Prefer the account id with the least value in case of same number of unique patients Table Name : PATIENT_LOGS Approach : First convert the date to month format since we need the output specific to each month. Then group together all data based on each month and account id so you get the total no of patients belonging to each account per month basis. Then rank this data as per no of patients in descending order and account id in ascending order so in case there are same no of patients present under multiple account if then the ranking will prefer the account if with lower value. Finally, choose upto 2 records only per month to arrive at the final output. SOLUTION: SELECT ACCOUNT_ID, MONTH, PATINET_PER_MONTH FROM( SELECT ACCOUNT_ID,MONTH,PATINET_PER_MONTH, ROW_NUMBER() OVER (PARTITION BY MONTH ORDER BY ACCOUNT_ID DESC) AS RN FROM ( SELECT ACCOUNT_ID,MONTH, COUNT(2) PATINET_PER_MONTH FROM( select DISTINCT ACCOUNT_ID, PATIENT_ID,TO_CHAR(date1,’MONTH’) MONTH from patient_logs ) GROU

Complex SQL: fetch all the records when London had extremely cold temperature for 3 consecutive days or more

From the weather table, fetch all the records when London had extremely cold temperature for 3 consecutive days or more   Note : Weather is considered to be extremely cold when its temperature is less than zero. Table Name : WEATHER Approach : First using a sub query identify all the records where the temperature was very cold and then use a main query to fetch only the records returned as very cold from the sub query. You will not only need to compare the records following the current row but also need to compare the records preceding the current row. And may also need to compare rows preceding and following the current row. Identify a window function which can do this comparison pretty easily.   SOLUTION: SELECT * FROM( SELECT ID, CITY, TEMPERATURE, DAY, CASE WHEN TEMPERATURE<0 AND TEMPERATURE > LEAD(TEMPERATURE) OVER (ORDER BY ID) AND TEMPERATURE > LEAD(TEMPERATURE,2) OVER (ORDER BY ID) THEN ‘YES’ WHEN TEMPERATURE<0 AND TEMPERATURE < LAG(TEMPERATURE) OVER (ORDER BY ID

COMPLEX SQL: write a SQL query to interchange the adjacent student names.(LEAD and LAG functions perfect example)

From the students table, write a SQL query to interchange the adjacent student names. Note : If there are no adjacent student then the student name should stay the same. Table Name : STUDENTS Approach : Assuming id will be a sequential number always. If id is an odd number then fetch the student name from the following record. If id is an even number then fetch the student name from the preceding record. Try to figure out the window function which can be used to fetch the preceding the following record data. If the last record is an odd number then it wont have any adjacent even number hence figure out a way to not interchange the last record data. Create table create table students ( id int primary key, student_name varchar(50) not null ); Data population : insert into students values(1, ‘James’); insert into students values(2, ‘Michael’); insert into students values(3, ‘George’); insert into students values(4, ‘Stewart’); insert into students values(5, ‘Robin’); Solution: SELECT ID,S

Complex SQL: write a SQL query to interchange the adjacent student names

Note : If there are no adjacent student then the student name should stay the same. Table Name : STUDENTS Approach : Assuming id will be a sequential number always. If id is an odd number then fetch the student name from the following record. If id is an even number then fetch the student name from the preceding record. Try to figure out the window function which can be used to fetch the preceding the following record data. If the last record is an odd number then it wont have any adjacent even number hence figure out a way to not interchange the last record data. –Table Structure: drop table students; create table students ( id int primary key, student_name varchar(50) not null ); insert into students values(1, ‘James’); insert into students values(2, ‘Michael’); insert into students values(3, ‘George’); insert into students values(4, ‘Stewart’); insert into students values(5, ‘Robin’); select * from students; Solution:

Complex SQL: fetch the users who logged in consecutively 3 or more times (lead perfect example)

From the login_details table, fetch the users who logged in consecutively 3 or more times. Table Name : LOGIN_DETAILS Approach : We need to fetch users who have appeared 3 or more times consecutively in login details table. There is a window function which can be used to fetch data from the following record. Use that window function to compare the user name in current row with user name in the next row and in the row following the next row. If it matches then fetch those records. –Table Structure: drop table login_details; create table login_details( login_id int primary key, user_name varchar(50) not null, login_date date); delete from login_details; insert into login_details values (101, ‘Michael’, current_date), (102, ‘James’, current_date), (103, ‘Stewart’, current_date+1), (104, ‘Stewart’, current_date+1), (105, ‘Stewart’, current_date+1), (106, ‘Michael’, current_date+2), (107, ‘Michael’, current_date+2), (108, ‘Stewart’, current_date+3), (109, ‘Stewart’, current_date+3), (110, ‘

Complex SQL Queries: Find duplicate records

Please refer all my below post and other solutions also mentioned in this post Practice Complex queries- find duplicate records –find duplicate record ALL WAYS SELECT * FROM USERS; –USING ROWNUM SELECT * FROM( SELECT USER_ID,USER_NAME,EMAIL, ROW_NUMBER() OVER (PARTITION BY USER_NAME,EMAIL ORDER BY USER_ID) AS RN FROM USERS ) WHERE RN=2; –OUTPUT:IF WE HAVE TO FETCH ONLY DUPLICATE RECORDS ONCE ALL ROWS WONT BE RETURNS, BELOW IS THE SOLUTION –RETURN ALL DUPLICATE RECORDS SELECT * FROM USERS WHERE (USER_NAME,EMAIL) IN ( SELECT USER_NAME,EMAIL FROM( SELECT USER_ID,USER_NAME,EMAIL, ROW_NUMBER() OVER (PARTITION BY USER_NAME,EMAIL ORDER BY USER_ID) AS RN FROM USERS ) WHERE RN=2 ); –USING GROUP BY SELECT * FROM USERS WHERE (USER_NAME,EMAIL) IN ( SELECT USER_NAME,EMAIL FROM( SELECT USER_NAME,EMAIL, COUNT(1) CNT FROM USERS GROUP BY USER_NAME,EMAIL ) WHERE CNT>1 );  

PERFECT SQL BLOG

https://techtfq.com/blog/learn-how-to-write-sql-queries-practice-complex-sql-queries#google_vignette=

COMPLEX SQL PRACTICE:From the doctors table, fetch the details of doctors who work in the same hospital but in same specialty.

From the doctors table, fetch the details of doctors who work in the same hospital but in different specialty. Table Name : DOCTORS Approach : Use self join to solve this problem. Self join is when you join a table to itself. drop table doctors; create table doctors ( id int primary key, name varchar(50) not null, speciality varchar(100), hospital varchar(50), city varchar(50), consultation_fee int ); insert into doctors values(1, ‘Dr. Shashank’, ‘Ayurveda’, ‘Apollo Hospital’, ‘Bangalore’, 2500); insert into doctors values(2, ‘Dr. Abdul’, ‘Homeopathy’, ‘Fortis Hospital’, ‘Bangalore’, 2000); insert into doctors values(3, ‘Dr. Shwetha’, ‘Homeopathy’, ‘KMC Hospital’, ‘Manipal’, 1000); insert into doctors values(4, ‘Dr. Murphy’, ‘Dermatology’, ‘KMC Hospital’, ‘Manipal’, 1500); insert into doctors values(5, ‘Dr. Farhana’, ‘Physician’, ‘Gleneagles Hospital’, ‘Bangalore’, 1700); insert into doctors values(6, ‘Dr. Maryam’, ‘Physician’, ‘Gleneagles Hospital’, ‘Bangalore’, 1500); SOLUTION: USING

COMPLEX SQL PRACTICE:-From the doctors table, fetch the details of doctors who work in the same hospital but in different specialty.

From the doctors table, fetch the details of doctors who work in the same hospital but in different specialty. Table Name : DOCTORS Approach : Use self join to solve this problem. Self join is when you join a table to itself. drop table doctors; create table doctors ( id int primary key, name varchar(50) not null, speciality varchar(100), hospital varchar(50), city varchar(50), consultation_fee int ); insert into doctors values(1, ‘Dr. Shashank’, ‘Ayurveda’, ‘Apollo Hospital’, ‘Bangalore’, 2500); insert into doctors values(2, ‘Dr. Abdul’, ‘Homeopathy’, ‘Fortis Hospital’, ‘Bangalore’, 2000); insert into doctors values(3, ‘Dr. Shwetha’, ‘Homeopathy’, ‘KMC Hospital’, ‘Manipal’, 1000); insert into doctors values(4, ‘Dr. Murphy’, ‘Dermatology’, ‘KMC Hospital’, ‘Manipal’, 1500); insert into doctors values(5, ‘Dr. Farhana’, ‘Physician’, ‘Gleneagles Hospital’, ‘Bangalore’, 1700); insert into doctors values(6, ‘Dr. Maryam’, ‘Physician’, ‘Gleneagles Hospital’, ‘Bangalore’, 1500); SOLUTION: USING

SQL COMPLEX QUERY- find max Nth and min Nth salary department wise from employee table

Again there are many way to do this and also we can use a few window functions to achieve the same result. As an added challenge, try out solving this query using a different window function and then comment out your query. Question seems to be last post where we have calculated max or min department wise salary but when it comes to end than that logic won’t work instead we can use windows row_num() function Solution: –nth highest or nth lowest salary from each department from Employee table –highest SELECT * FROM( SELECT ROW_NUMBER() OVER (partition by department_id ORDER BY salary) AS MIN_SALARY_RN, ROW_NUMBER() OVER (partition by department_id ORDER BY salary DESC) AS MAX_SALARY_RN, EMP.* FROM employees EMP ) WHERE MIN_SALARY_RN=1 OR MAX_SALARY_RN=1; –second highest SELECT * FROM( SELECT ROW_NUMBER() OVER (partition by department_id ORDER BY salary) AS MIN_SALARY_RN, ROW_NUMBER() OVER (partition by department_id ORDER BY salary DESC) AS MAX_SALARY_RN, EMP.* FROM employees EMP ) WHER

SQL COMPLEX QUERY- find max and min salary department wise from employee table

Table Name : EMPLOYEE Approach : Write a sub query which will partition the data based on each department and then identify the record with maximum and minimum salary for each of the partitioned department. Finally, from the main query fetch only the data which matches the maximum and minimum salary returned from the sub query. Expected Output Again there are many way to do this and also we can use a few window functions to achieve the same result. As an added challenge, try out solving this query using a different window function and then comment out your query. –heighest or minimum salary from each department SELECT * FROM EMPLOYEES EMP JOIN ( SELECT SALARY,EMPLOYEE_ID, min(salary) OVER (partition by department_id) AS MIN_SALARY_RN, max(salary) OVER (partition by department_id) AS MAX_SALARY_RN FROM employees EMP ) emp_new ON EMP.EMPLOYEE_ID = EMP_NEW.EMPLOYEE_ID AND (EMP.SALARY = EMP_NEW.MIN_SALARY_RN OR EMP.SALARY = EMP_NEW.MAX_SALARY_RN);

Practice complex SQL queries-find second last record

Write a SQL query to fetch the second last record from employee table. Table Name : EMPLOYEE Approach : Using window function sort the data in descending order based on employee id. Provide a row number to each of the record and fetch the record having row number as 2. Solution-1 select * from employees where employee_id in( select MAX(employee_id)-1 as employee_id from employees ); — there is issue in this query, lets say employee_id 205 is not present then this query will fail Solution-2 –PERFECT SOLUTION USING WINDOW FUNCTION WITHOUT PARTION BY SELECT * FROM( SELECT ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID DESC) AS RN, EMP.* FROM employees EMP) WHERE RN=2;

Practice Complex queries- find duplicate records

Write a SQL Query to fetch all the duplicate records in a table. Table Name : USERS Note : Record is considered duplicate if a user name and his email is present more than once. Approach : Partition the data based on user name and then give a row number to each of the partitioned user name. If a user name exists more than once then it would have multiple row numbers. Using the row number which is other than 1, we can identify the duplicate records. create table users ( user_id int primary key, user_name varchar(30) not null, email varchar(50)); insert into users values (1, ‘Sumit’, ‘sumit@gmail.com’); insert into users values (2, ‘Reshma’, ‘reshma@gmail.com’); insert into users values (3, ‘Farhana’, ‘farhana@gmail.com’); insert into users values (4, ‘Robin’, ‘robin@gmail.com’); insert into users values (5, ‘Robin’, ‘robin@gmail.com’); insert into users values (6, ‘Farhana’, ‘farhana@gmail.com’); insert into users values (7, ‘Farhana’, ‘farhana@gmail.com’); expected output should be Far

Kafka useful commands on Macbook or linux

Start up the Zookeeper. ./zookeeper-server-start.sh ../config/zookeeper.properties Add the below properties in the server.properties listeners=PLAINTEXT://localhost:9092auto.create.topics.enable=false Start up the Kafka Broker ./kafka-server-start.sh ../config/server.properties How to create a topic ? ./kafka-topics.sh –create –topic test-topic –bootstrap-server localhost:9092 –replication-factor 1 –partitions 4 Output: Created topic test-topic. Describe Topic command ./kafka-topics.sh –describe –topic test-topic –bootstrap-server localhost:9092 output: Topic: test-topic TopicId: O-uBj0D_R6aMhKMsTUgqhg PartitionCount: 4 ReplicationFactor: 1 Configs: segment.bytes=1073741824 Topic: test-topic. Partition: 0   Leader: 0. Replicas: 0. Isr:   0 Topic: test-topic. Partition: 1. Leader: 0. Replicas: 0. Isr:   0 Topic: test-topic. Partition: 2. Leader: 0. Replicas: 0. Isr:   0 Topic: test-topic. Partition: 3. Leader: 0. Replicas: 0. Isr:   0 How to instantiate a Console Producer? Without Key