Everything about YARN and its mode
Yet another resource negotiator (YARN) is Hadoop’s compute framework that runs on top of HDFS, which is Hadoop’s storage layer.
YARN follows the master slave architecture. The master daemon is called ResourceManager and the slave daemon is called NodeManager. Besides this application, life cycle management is done by ApplicationMaster, which can be spawned on any slave node and is alive for the lifetime of an application.
When Spark is run on YARN, ResourceManager performs the role of Spark master and NodeManagers work as executor nodes.
While running Spark with YARN, each Spark executor is run as YARN container.
Spark applications on YARN run in two modes:
- yarn-client: Spark Driver runs in the client process outside of YARN cluster, and ApplicationMaster is only used to negotiate resources from ResourceManager
- yarn-cluster: Spark Driver runs in ApplicationMaster spawned by NodeManager on a slave node
The yarn-cluster mode is recommended for production deployments, while the yarn- client mode is good for development and debugging when you would like to see immediate output. There is no need to specify Spark master in either mode as it’s picked from the Hadoop configuration, and the master parameter is either yarn-client or yarn-cluster.