predicate push down in spark

->What is predicate push down?
Predicate push down is an optimization technique to process only the required data and can be applied to Spark Queries by defining filters in where conditions.

->How it Optimizes?
Predicate Push downs limits the number of files and partitions that Spark reads while querying, thus reducing disk I/O.
Also querying on data in buckets with predicate push downs produce results faster with less shuffle

->How to determine if predicate push down is being used in queries?
Using the explain method on a Dataset (or EXPLAIN in Spark SQL)

 

a close up of a document

Popular posts from this blog

How to change column name in Dataframe and selection of few columns in Dataframe using Pyspark with example

What is Garbage collection in Spark and its impact and resolution

Window function in PySpark with Joins example using 2 Dataframes (inner join)