Tech Studio Online

Technology Notes by Deepak Bhardwaj

PERFECT SQL BLOG

Get link
Facebook
X
Pinterest
Email
Other Apps

- May 20, 2022

https://techtfq.com/blog/learn-how-to-write-sql-queries-practice-complex-sql-queries#google_vignette=

Interview Questions SQL

Get link
Facebook
X
Pinterest
Email
Other Apps

What is Garbage collection in Spark and its impact and resolution

- September 17, 2021

Garbage Collection Spark runs on the Java Virtual Machine (JVM). Because Spark can store large amounts of data in memory, it has a major reliance on Java’s memory management and garbage collection (GC). Therefore, garbage collection (GC) can be a major issue that can affect many Spark applications. Common symptoms of excessive GC in Spark are: Slowness of application Executor heartbeat timeout GC overhead limit exceeded error Spark’s memory-centric approach and data-intensive applications make it a more common issue than other Java applications. Thankfully, it’s easy to diagnose if your Spark application is suffering from a GC problem. The Spark UI marks executors in red if they have spent too much time doing GC. Spark executors are spending a significant amount of CPU cycles performing garbage collection. This can be determined by looking at the “Executors” tab in the Spark application UI. Spark will mark an executor in red if the executor has spent more than 10% of the time in gar...

Complex SQL: fetch the users who logged in consecutively 3 or more times (lead perfect example)

- May 22, 2022

From the login_details table, fetch the users who logged in consecutively 3 or more times. Table Name : LOGIN_DETAILS Approach : We need to fetch users who have appeared 3 or more times consecutively in login details table. There is a window function which can be used to fetch data from the following record. Use that window function to compare the user name in current row with user name in the next row and in the row following the next row. If it matches then fetch those records. –Table Structure: drop table login_details; create table login_details( login_id int primary key, user_name varchar(50) not null, login_date date); delete from login_details; insert into login_details values (101, ‘Michael’, current_date), (102, ‘James’, current_date), (103, ‘Stewart’, current_date+1), (104, ‘Stewart’, current_date+1), (105, ‘Stewart’, current_date+1), (106, ‘Michael’, current_date+2), (107, ‘Michael’, cur...

Window function in PySpark with Joins example using 2 Dataframes (inner join)

- October 14, 2022

Input details: #● File has json records #● Each record has fields: #○ user_id #○ card_num #○ merchant #○ category #○ amount #○ ts ### Below analysis to be done Sample data: +------+--------+---------+--------+----------+-------+|amount|card_num| category|merchant| ts|user_id|+------+--------+---------+--------+----------+-------+| 243| C_108| food| M_102|1579532902| U_104|| 699| C_106|cosmetics| M_103|1581759040| U_103|| 228| C_104| children| M_110|1584161986| U_103| Solution: from pyspark.sql.functions import col # this can be done without using window function creditCardData = spark.read.json(“card_transactions.json”) useridMaxSpendDF = df.groupby(‘user_id’).max(‘amount’) useridMaxSpendDF=useridMaxSpendDF.withColumnRenamed(“max(amount)”,”max_amount”) useridMaxSpendDF=useridMaxSpendDF.withColumnRenamed(“user_id”,”m_user_id”) cond = [creditCardData.user_id == useridMaxSpen...

Views

Labels

Apache Kafka10
Big data103
Cassandra1
Data Structures9
Database7
Design Patterns28
HDFS5
Hive7
Interview Questions119
Java90

Java 101
Java 112
Java 122
Java 813
Java 97
Oracle16
PySpark18
Python4
REST API2
Scala8
Spark80
Spark Performance Tuning Questions42
Spring31
Spring Boot1
Spring Integration26
SQL20

Show more Show less

PERFECT SQL BLOG

Popular posts from this blog

What is Garbage collection in Spark and its impact and resolution

Complex SQL: fetch the users who logged in consecutively 3 or more times (lead perfect example)

Window function in PySpark with Joins example using 2 Dataframes (inner join)