Spark scenario based questions
consider you have a 40 node Spark cluster (32 cores X 128 GB)
you are processing roughly 3.75 TB Data
The processing involves filtering, aggregations, joins etc.
you are getting out of memory error when you run your spark job.
Question 1:
===========
What could be all possible reasons for out of memory errors?
Question 2:
==========
what are the ways to identify the exact issue?
Question 3:
==========
what are the ways to fix it?