From the login_details table, fetch the users who logged in consecutively 3 or more times. Table Name : LOGIN_DETAILS Approach : We need to fetch users who have appeared 3 or more times consecutively in login details table. There is a window function which can be used to fetch data from the following record. Use that window function to compare the user name in current row with user name in the next row and in the row following the next row. If it matches then fetch those records. –Table Structure: drop table login_details; create table login_details( login_id int primary key, user_name varchar(50) not null, login_date date); delete from login_details; insert into login_details values (101, ‘Michael’, current_date), (102, ‘James’, current_date), (103, ‘Stewart’, current_date+1), (104, ‘Stewart’, current_date+1), (105, ‘Stewart’, current_date+1), (106, ‘Michael’, current_date+2), (107, ‘Michael’, cur...
Input details: #● File has json records #● Each record has fields: #○ user_id #○ card_num #○ merchant #○ category #○ amount #○ ts ### Below analysis to be done Sample data: +------+--------+---------+--------+----------+-------+|amount|card_num| category|merchant| ts|user_id|+------+--------+---------+--------+----------+-------+| 243| C_108| food| M_102|1579532902| U_104|| 699| C_106|cosmetics| M_103|1581759040| U_103|| 228| C_104| children| M_110|1584161986| U_103| Application: Here we will disable auto broadcast and then join 2 data frames and will see execution plan which should use broadcast join Solution: from pyspark.sql import SparkSession from pyspark.sql.functions import broadcast spark = SparkSession.builder.master(‘local[2]’)\ .appName(‘RDD_Methods_Examples’)\ .getOrCreate() print(spark.conf.get(“spark.sql.autoBroadcastJoinThreshold”)) spark.conf.set(“spark.sql.autoBroadcastJoinThreshold”, -1) pr...