Out of memory Excepetions in Spark

- September 17, 2021

Out of Memory Exceptions
- Driver Memory Exceptions
  - Exception due to Spark driver running out of memory
  - Job failure because the Application Master that launches the driver exceeds memory limits
- Executor Memory Exceptions
  - Exception because executor runs out of memory
  - FetchFailedException due to executor running out of memory
  - Executor container killed by YARN for exceeding memory limits

Out of Memory Exceptions:

Spark jobs might fail due to out of memory exceptions at the driver or executor end. When troubleshooting the out of memory exceptions, you should understand how much memory and cores the application requires, and these are the essential parameters for optimizing the Spark application. Based on the resource requirements, you can modify the Spark application parameters to resolve the out-of-memory exceptions.

Driver Memory Exceptions:

Exception due to Spark driver running out of memory:

Description: When the Spark driver runs out of memory, exceptions similar to the following exception occur.

Exception in thread "broadcast-exchange-0" java.lang.OutOfMemoryError: Not enough memory to build and broadcast the tableto all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1or increase the spark driver memory by setting spark.driver.memory to a higher value

Resolution: Set a higher value for the driver memory, using one of the following commands in Spark Submit Command Line Options on the Analyze page:
- --conf spark.driver.memory= <XX>g
  OR
- --driver-memory <XX>G

Job failure because the Application Master that launches the driver exceeds memory limits:

Description: A Spark job may fail when the Application Master (AM) that launches the driver exceeds the memory limit and is eventually terminated by YARN. The following error occurs:

Diagnostics: Container [pid=<XXXXX>,containerID=container_<XXXXXXXXXX>_<XXXX>_<XX>_<XXXXXX>] is running beyond physical memory limits.Current usage: <XX> GB of <XX> GB physical memory used; <XX> GB of <XX> GB virtual memory used. Killing container

Resolution: Set a higher value for the driver memory, using one of the following commands in Spark Submit Command Line Options on the Analyze page:
- --conf spark.driver.memory= <XX>g
  OR
- --driver-memory <XX>G
As a result, a higher value is set for the AM memory limit.

Executor Memory Exceptions:

Exception because executor runs out of memory

Description: When the executor runs out of memory, the following exception might occur.

Executor task launch worker for task XXXXXX ERROR Executor: Exception in task XX.X in stage X.X (TID XXXXXX)java.lang.OutOfMemoryError: GC overhead limit exceeded

Resolution: Set a higher value for the executor memory, using one of the following commands in Spark Submit Command Line Options on the Analyze page:
- --conf spark.executor.memory= <XX>g
  OR
- --executor-memory <XX>G

FetchFailedException due to executor running out of memory

Description: When the executor runs out of memory, the following exception may occur.

ShuffleMapStage XX (sql at SqlWrapper.scala:XX) failed in X.XXX s due to org.apache.spark.shuffle.FetchFailedException:failed to allocate XXXXX byte(s) of direct memory (used: XXXXX, max: XXXXX)

Resolution: From the Analyze page, perform the following steps in Spark Submit Command Line Options:
1. Set a higher value for the executor memory, using one of the following commands:
  - --conf spark.executor.memory= <XX>g
    OR
  - --executor-memory <XX>G
2. Increase the number of shuffle partitions, using the following command: --spark.sql.shuffle.partitions

Executor container killed by YARN for exceeding memory limits

Description: When the container hosting the executor needs more memory for overhead tasks or executor tasks, the following error occurs.

org.apache.spark.SparkException: Job aborted due to stage failure: Task X in stage X.X failed X times,most recent failure: Lost task X.X in stage X.X (TID XX, XX.XX.X.XXX, executor X): ExecutorLostFailure(executor X exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. XX.X GBof XX.X GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead

Resolution: Set a higher value for spark.yarn.executor.memoryOverhead based on the requirements of the job. The executor memory overhead value increases with the executor size (approximately by 6-10%). As a best practice, modify the executor memory value accordingly.
To set a higher value for executor memory overhead, enter the following command in Spark Submit Command Line Options on the Analyze page: --conf spark.yarn.executor.memoryOverhead=XXXX
Note
For Spark 2.3 and later versions, use the new parameter spark.executor.memoryOverhead instead of spark.yarn.executor.memoryOverhead.
If increasing the executor memory overhead value or executor memory value does not resolve the issue, you can either use a larger instance, or reduce the number of cores.
To reduce the njmber of cores, enter the following in the Spark Submit Command Line Options on the Analyze page: --executor-cores=XX. Reducing the number of cores can waste memory, but the job will run.

Tech Studio Online