Monday 22 July 2013

Spring Batch with Example

Purpose:

    A typical batch program generally reads a large number of records from a database, file, or queue,
processes the data in some fashion, and then writes back data in a modified form to database,file-system, mailer etc.

Spring Batch automates this basic batch iteration, providing the capability to process similar transactions as a set, typically in an offline environment without any user interaction.

How it works?

Spring Batch works like read data in some chunk size[configurable] from data source, and write that chunk to some resource.
Here data source for reader could be flat files[text file, xml file, csv file etc], relational database[e.g. mysql], mongodb.
Similarly writer could write data read by reader to flat files, relation database, mongodb, mailer etc.

Reading, processing, writing all together is termed as Job.

Most importanat part of Spring Batch is that it keeps track of metadata for each job, its step etc. How? we will talk in next section.


Implementation and Configurations:

First of all, lets go with job specific configuration, getting into other bean configurations later.
Job can consist of more than on steps, in also flow between steps can be decided.

In tasklet you would provide reader, writer and commit-interval.

  <job id="jobId">  
     <step id="stepId">  
       <tasklet>  
         <chunk reader="xmlFileReader" writer="jdbcWriter" commit-interval="5"></chunk>  
       </tasklet>  
     </step>  
   </job>  

Let say you have xml file to be read and write those file data to database.
Read xml files in sizes of 5[commit-interval] entity in iterative way, and insert into database table.

Here, You could make use of org.springframework.batch.item.xml.StaxEventItemReader for reading xml data[here from users.xml file placed in classpath], and
org.springframework.batch.item.database.JdbcBatchItemWriter to write data to database.

<!--READER-->
 <beans:bean id="xmlFileReader" scope="step" class="org.springframework.batch.item.xml.StaxEventItemReader" p:fragmentRootElementName="user"  
     p:resource="classpath:/users.xml" p:unmarshaller-ref="unmarshaller"/>  

       
You would have to make use of unmarshaller in order to convert xml data into java POJOs.   
   <beans:bean id="unmarshaller"  
     class="org.springframework.oxm.castor.CastorMarshaller">  
     <beans:property name="mappingLocation" value="classpath:/userMapping.xml" />  
   </beans:bean>  

userMapping.xml specifies binding of java classes to XML document. We will show content of userMapping.xml and users.xml later.


<!-WRITER-->
 <beans:bean id="jdbcItemWriter"class="org.springframework.batch.item.database.JdbcBatchItemWriter" p:assertUpdates="true"  
   p:dataSource-ref="dataSource">  
   <beans:property name="sql">  
   <beans:value>  
     <![CDATA[  
     insert into USER_(  
     FIRST_NAME, LAST_NAME, COMPANY, ADDRESS)  
     values ( :firstName, :lastName, :company, :address)  
     ]]>  
   </beans:value>  
 </beans:property>  
 <beans:property name="itemSqlParameterSourceProvider">  
 <beans:bean  
   class="org.springframework.batch.item.database.BeanPropertyItemSql  
   ParameterSourceProvider" />  
 </beans:property>  
 </beans:bean>  



Before, getting into how this would actually run, lets talk about how Spring Batch manages to store metadata of any batch job.

Basically we have to define certain beans for that as shown below.


 <beans:bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource"  
     destroy-method="close" p:driverClassName="${dataSource.driverClassName}"  
     p:username="${dataSource.username}" p:password="${dataSource.password}"  
     p:url="${dataSource.url}" />  
   <!--THIS WILL CREATE DATABASE TABLE FOR STORING METADATA OF BATCH JOB-->  
   <jdbc:initialize-database data-source="dataSource">  
     <jdbc:script location="classpath:/org/springframework/batch/core/schema-drop-mysql.sql" />  
     <jdbc:script location="classpath:/org/springframework/batch/core/schema-mysql.sql" />  
    </jdbc:initialize-database>  
   <!-- -->  
   <beans:bean id="transactionManager"  
     class="org.springframework.jdbc.datasource.DataSourceTransactionManager"  
     p:dataSource-ref="dataSource" />  
   <beans:bean id="jobRegistry"  
     class="org.springframework.batch.core.configuration.support.MapJobRegistry" />  
   <beans:bean id="jobLauncher"  
     class="org.springframework.batch.core.launch.support.SimpleJobLauncher"  
     p:jobRepository-ref="jobRepository" />  
   <beans:bean id="jobRegistryBeanPostProcessor"  
     class="org.springframework.batch.core.configuration.support.JobRegistryBeanPostProcessor"  
     p:jobRegistry-ref="jobRegistry" />  
   <beans:bean id="jobRepository"  
     class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean"  
     p:dataSource-ref="dataSource" p:transactionManager-ref="transactionManager" />  
   <!--CENTALIZING PROPERTIES INTO SOME BATCH.PROPERTIES FILE FOR CONVINIENCE -->  
   <beans:bean id="batch"  
     class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer"  
     p:location="classpath:/batch.properties" p:ignoreUnresolvablePlaceholders="true" />  

   
   
batch.properties would be like:
  ##datasource properties for where batch related tables are created  
     dataSource.password=root  
     dataSource.username=root  
     dataSource.driverClassName=com.mysql.jdbc.Driver  
     dataSource.url=jdbc:mysql://localhost:3306/batch  



Now what exactly cause batch to run. Below is the code.

Typically such code would be run by some scheduler which will invoke batch offline.

       
      JobLauncher jobLauncher = (JobLauncher) applicationContext.getBean("jobLauncher");  
      Job job = (Job) applicationContext.getBean("jobId");  
      jobLauncher.run(job, new JobParametersBuilder().addDate("start", new  Date()).toJobParameters());  

           
After excuting this, you can check job status in table batch_job_execution table.
Other tables also provide important information about job/steps etc like read count, commit count , cause of failure etc.

Lets see how users.xml and userMapping.xml looks like.
userMapping.xml

 <?xml version="1.0" encoding="UTF-8"?>  
<mapping>
 <class name="com.example.Users">
  <field name="users" type="com.test.User" 
   collection="arraylist">
   <bind-xml name="user" />
  </field>
 </class>
 <class name="com.example.User">
  <map-to xml="user" />
  <field name="firstName" type="string">
   <bind-xml name="firstName" node="element" />
  </field>
  <field name="lastName" type="string">
   <bind-xml name="lastName" node="element" />
  </field>
   <field name="company" type="string">
   <bind-xml name="company" node="element" />
  </field>
   <field name="address" type="string">
   <bind-xml name="address" node="element" />
  </field>
 </class>
</mapping> 


com.example.User would be simple POJO having getter/setter for string type for firstName,lastName,company,address. And com.example.Users will having list of type com.example.User with getter and setter.

<!-DATA TO BE READ -->
user.xml
<users>
 <user>
  <firstName>Pankaj</firstName>
  <lastName>Kathiria</lastName>
  <company>CD</company>
  <address>AHM</address>
 </user>
 <user>
  <firstName>Pankaj1</firstName>
  <lastName>Kathiria1</lastName>
  <company>CD1</company>
  <address>AHM1</address>
 </user>
</users> 



Here we have not writtin any code for reader/writer, if anyone wants to add some extra behaviour in this, spring provided classes can be overriden.

Spring Batch comes with many other types of reader/writer [e.g. FlatFileItemReader, FlatFileItemWriter, JdbcCursorItemReader, JdbcBatchItemWriter, HibernateCursorItemReader, HibernateItemWriter, MongoItemReader, MongoItemWriter, SimpleMailMessageItemWriter etc.]

I hope this post would be helpful.
Example code available at Spring Batch Example.
Happy Learning!