What are applications, jobs, stages and tasks in Spark?

- August 15, 2021

We get a lot of questions on the differences in Spark applications, jobs, stages and tasks. Also we see there is a lot of misunderstanding about these topics with new learners and experienced Spark developers alike.

Task

Task is the smallest execution unit in Spark. A task in spark executes a series of instructions. For eg. reading data, filtering and applying map() on data can be combined into a task. Tasks are executed inside an executor.

Stage

A stage comprises several tasks and every task in the stage executes the same set of instructions.

Job

A job comprises several stages. When Spark encounters a function that requires a shuffle it creates a new stage. Transformation functions like reduceByKey(), Join() etc will trigger a shuffle and will result in a new stage. Spark will also create a stage when you are reading a dataset.

Application

An application comprises several jobs. A job is created, whenever you execute an action function like write().

Summary

A Spark application can have many jobs. A job can have many stages. A stage can have many tasks. A task executes a series of instructions.

Tech Studio Online