predicate push down in spark
->What is predicate push down?
Predicate push down is an optimization technique to process only the required data and can be applied to Spark Queries by defining filters in where conditions.
->How it Optimizes?
Predicate Push downs limits the number of files and partitions that Spark reads while querying, thus reducing disk I/O.
Also querying on data in buckets with predicate push downs produce results faster with less shuffle
->How to determine if predicate push down is being used in queries?
Using the explain method on a Dataset (or EXPLAIN in Spark SQL)