Spring Batch Interview Questions

Q:Explain spring batch framework.
A: 
Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch builds upon the productivity, POJO-based development approach, and general ease of use capabilities people have come to know from the Spring Framework, while making it easy for developers to access and leverage more advanced enterprise services when necessary.

Q:When to use Spring Batch?
A: 
Consider an environment where users have to do a lot of batch processing. This will be quite different from a typical web application which has to work 24/7. But in classic environments it’s not unusual to do the heavy lifting for example during the night when there are no regular users using your system. Batch processing includes typical tasks like reading and writing to files, transforming data, reading from or writing to databases, create reports, import and export data and things like that. Often these steps have to be chained together or you have to create more complex workflows where you have to define which job steps can be run in parallel or have to be run sequentially etc. That’s where a framework like Spring Batch can be very handy. Spring Boot Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. It also provides more advanced technical services and features that will enable extremely high-volume and high performance batch jobs though optimization and partitioning techniques.Simple as well as complex, high-volume batch jobs can leverage the framework in a highly scalable manner to process significant volumes of information.

Q:Explain the Spring Batch framework architecture.
A:

batchintvw_1

This layered architecture highlights three major high level components: Application, Core, and Infrastructure.The application contains all batch jobs and custom code written by developers using Spring Batch. The BatchCore contains the core runtime classes necessary to launch and control a batch job. It includes things such as aJobLauncher,Job, andStepimplementations. Both Application and Core are built on top of a commoninfrastructure. This infrastructure contains common readers and writers, and services such as theRetryTemplate, which are used both by application developers(ItemReaderandItemWriter) and the coreframework itself.

Q:How Spring Batch works?
A:

boot13_1
  • step – A Step that delegates to a Job to do its work. This is a great tool for managing dependencies between jobs, and also to modularise complex step logic into something that is testable in isolation. The job is executed with parameters that can be extracted from the step execution, hence this step can also be usefully used as the worker in a parallel or partitioned execution.
  • ItemReader – Strategy interface for providing the data. Implementations are expected to be stateful and will be called multiple times for each batch, with each call to read() returning a different value and finally returning null when all input data is exhausted. Implementations need not be thread-safe and clients of a ItemReader need to be aware that this is the case. A richer interface (e.g. with a look ahead or peek) is not feasible because we need to support transactions in an asynchronous batch.
  • ItemProcessor – Interface for item transformation. Given an item as input, this interface provides an extension point which allows for the application of business logic in an item oriented processing scenario. It should be noted that while it’s possible to return a different type than the one provided, it’s not strictly necessary. Furthermore, returning null indicates that the item should not be continued to be processed.
  • ItemStreamWriter – Basic interface for generic output operations. Class implementing this interface will be responsible for serializing objects as necessary. Generally, it is responsibility of implementing class to decide which technology to use for mapping and how it should be configured. The write method is responsible for making sure that any internal buffers are flushed. If a transaction is active it will also usually be necessary to discard the output on a subsequent rollback. The resource to which the writer is sending data should normally be able to handle this itself.

Q: What is difference between Step, Tasklet and Chunk in Spring Batch?

Well that’s actually a good question. Here’s an example of configuration:

<job id="sampleJob" job-repository="jobRepository">    <step id="step1" next="step2">        <tasklet transaction-manager="transactionManager">            <chunk reader="itemReader" writer="itemWriter" commit-interval="10"/>        </tasklet>    </step>    <step id="step2">         <tasklet ref="myTasklet"/>    </step></job>

You have a Job, this job is made of steps. Most of the time, these steps are successive. You define in what order your work must be done with steps: you do step 1, then step 2, then step 3, you can do step 4 if step 3 failed, or go directly to step 5, etc.

What is done in the Step is represented by a tasklet, they do the task.

In spring batch, you’ll mostly do chunk oriented processing: with a reader, a processor, and a writer. From the official documentation:

Chunk oriented processing refers to reading the data one at a time, and creating ‘chunks’ that will be written out, within a transaction boundary

But you can make your own tasklet, and set it in your step. For example, a tasklet that executes a SQL query.

So, the steps are ordered in a job, each step contains a tasklet, which does a task. One of those tasklet (and probably the most used one) is the chunk oriented processing tasklet.

Q: Have you implement Spring Batch Tasklet? What was the use case?
A:
 The Tasklet which is a simple interface with just one method execute. Using this we can perform single tasks like executing queries, deleting files.

Q:What is Tasklet in Spring Batch?
A: 
Spring Batch provides a Tasklet interface, which will be called to perform a single task only, like clean or delete or set up resources before or after any step execution.

Q: How can we schedule a Spring Batch Job?
A: 
Spring Batch can be scheduled using Cron Job.

What is a batch job?

A batch job is a program which typically:

  1. Deals with humongous amount of data in an offline mode.
  2. Reads data from one or multiple sources. Sources can be flat files, databases, streaming messages etc. .
  3. Execute actions on the data. Actions can be transformation, validation, aggregation, applying business rules etc.
  4. Writes data to one or more destinations. Destinations could be same or different than source.

Why do we use Spring Batch for Batch processing?

  1. Spring batch framework is created by Accenture team which had years of experience in building batch processing systems and thought to share the knowledge.
  2. The whole project is collaborated by Spring team. So, it works seamlessly with Spring.
  3. The patterns used in the framework are production tested patterns which are available out-of-the box

We can create our own library also but why reinvent the wheel, right?

Tell me some of the Spring Batch framework terminologies?

Here are some of the keywords and concepts used in Spring Batch:

  1. Job
  2. Job Instance
  3. Job Execution
  4. Step
  5. Step Execution
  6. Item Reader
  7. Item Processor
  8. Item Writer
  9. Job Launcher

What is JobInstance in Spring Batch framework?

JobInstance represents Logical Job Run which will then be executed.

Here is the formula for it:

JobInstance Job JobParameters

So, one JobInstance is different from another depending upon the JobParameters it is supplied.

For example:

One JobInstance of the above LoaderJob will be to run it on Jan 1, 2020 and second JobInstance of the same Job will be to run it on Jan 2, 2020.

What is JobParameters in Spring Batch?

JobParameters is a Spring Batch object which encapsulates the configuration required to run a JobInstance.

What is JobExecution in Spring Batch?

It is again a Spring Batch object which represents an attempt to run a JobInstance.

For example:

Jan1,2020 JobInstance failed twice for some reason before it completed successfully in third attempt.

So, 1 JobInstance will have 3 JobExecutions.

JobExecution object hold the statistics, status and other information related to the Job throughout it’s lifetime.

What is a Step in Spring Batch Framework?

A Spring Batch Job can be defined as a sequence of Steps.

Thus, A Step represents one of those sequence Steps.

For example:

Earlier mentioned LoaderJob can be defined in 3 Steps:

Step 1: To read the data from the flat files.

Step 2: To validate the read data against some rules.

Step 3: To write the validated data in the database.

step in spring batch

What is a StepExecution?

Analogous to JobExecution, a StepExecution represents an attempt to run a Step.

StepExecution object has all the information related to the Step.

It is also linked to the JobExecution of which it is a part.

What are different types of process flow for Step execution?

  1. Tasklet Model
  2. Chunk Model

Explain Chunk oriented processing.

How to choose between Tasklet model and Chunk model?

Typically, when the Step execution task is simple, we choose Tasklet model and if the task processing is complex, we go for Chunk Model.

Popular posts from this blog

Window function in PySpark with Joins example using 2 Dataframes (inner join)

Complex SQL: fetch the users who logged in consecutively 3 or more times (lead perfect example)

Credit Card Data Analysis using PySpark (how to use auto broadcast join after disabling it)