Kohei Nozaki's blog 

JBatch examples: bulk loading from database to CSV file


Posted on Sunday May 24, 2015 at 03:52PM in JBatch


In previous entry, we looked how to load data from CSV file to database. in this entry, we will look how to load data from database to CSV file. we’ll use JdbcItemReader to read data from database and CsvItemWriter to write data to as a CSV file.

Setup

In this setup we’ll use WildFly 9.0.0.CR1.

Assume we already have a table forex and data in H2 database that created and populated in previous entry.

For JdbcItemReader, we need an another datasource which is Non-JTA, references the same database to JTA one. for detail see this conversation.

data-source add \
 --name=MyNonJtaDS \
 --driver-name=h2 \
 --jndi-name=java:jboss/datasources/MyNonJtaDS \
 --user-name=sa \
 --password=sa \
 --connection-url=jdbc:h2:/tmp/myds;AUTO_SERVER=TRUE \
 --jta=false

Next, create job artifacts.

/src/main/resources/META-INF/batch-jobs/save-csv.xml

Note that MyNonJtaDS is used, not MyDS. and all of classes that used in the job are supplied within jberet-support.

<job id="save-csv" version="1.0" xmlns="http://xmlns.jcp.org/xml/ns/javaee">
    <step id="save">
        <chunk>
            <reader ref="jdbcItemReader">
                <properties>
                    <property name="dataSourceLookup" value="java:jboss/datasources/MyNonJtaDS"/>
                    <property name="sql"
                              value="SELECT symbol, ts, bid_open, bid_high, bid_low, bid_close, volume FROM forex ORDER BY symbol, ts"/>
                    <property name="beanType" value="java.util.List"/>
                </properties>
            </reader>
            <writer ref="csvItemWriter">
                <properties>
                    <property name="resource" value="#{jobParameters['resource']}"/>
                    <property name="header"
                              value="symbol, ts, bid_open, bid_high, bid_low, bid_close, volume"/>
                    <property name="beanType" value="java.util.List"/>
                </properties>
            </writer>
        </chunk>
    </step>
</job>

Run the job

Issue following command. this saves a CSV file into /tmp/save.csv:

curl 'http://localhost:8080/jbatch-example-1.0-SNAPSHOT/jbatch/rest/start/save-csv?resource=/tmp/save.csv'

After job execution is done, check the CSV file is created as expected:

symbol,ts,bid_open,bid_high,bid_low,bid_close,volume
USDJPY,2015-04-01 00:00:00.0,119.566,119.566,119.551,119.565,0
USDJPY,2015-04-01 00:01:00.0,119.566,119.581,119.565,119.579,0
USDJPY,2015-04-01 00:02:00.0,119.581,119.586,119.581,119.583,0
...

The project which used in this entry can be obtained from my GitHub repository.


JBatch examples: bulk loading from CSV file to database


Posted on Sunday May 24, 2015 at 03:07PM in JBatch


Bulk loading is a typical usecase of batch application. in this entry, I give you a example of bulk loading from a CSV file.

There is a supplemental package named jberet-support, which contains many useful classes that implemented ItemReader or ItemWriter for common usecases. in this entry, we’ll use CsvItemReader to read a CSV file and JdbcItemWriter to write data to database.

Setup

In this setup we’ll use WildFly 9.0.0.CR1.

First, we need a CSV file, whatever. so we’ll use a forex historical data which can be downloaded from http://www.histdata.com/download-free-forex-historical-data/?/ascii/1-minute-bar-quotes/usdjpy/2015/4 . download it and unpack, put DAT_ASCII_USDJPY_M1_201504.csv somewhere in your environment. this file contains data like:

20150401 000000;119.566000;119.566000;119.551000;119.565000;0
20150401 000100;119.566000;119.581000;119.565000;119.579000;0
20150401 000200;119.581000;119.586000;119.581000;119.583000;0
...

Next, define a JTA datasource on WildFly. following is an example command which defines a H2 datasource using jboss-cli:

data-source add \
 --name=MyDS \
 --driver-name=h2 \
 --jndi-name=java:jboss/datasources/MyDS \
 --user-name=sa \
 --password=sa \
 --connection-url=jdbc:h2:/tmp/myds;AUTO_SERVER=TRUE

After confirmed outcome was success, issue following command to test a connection:

/subsystem=datasources/data-source=MyDS:test-connection-in-pool

Next, create a table to store dataset. issue following command to start H2 console, in the base directory of your WildFly instance:

java -cp ./modules/system/layers/base/com/h2database/h2/main/h2*.jar org.h2.tools.Shell -url "jdbc:h2:/tmp/myds;AUTO_SERVER=TRUE" -user sa -password sa

Execute following DDL:

CREATE TABLE forex (
    symbol VARCHAR(6) NOT NULL,
    ts TIMESTAMP NOT NULL,
    bid_open NUMERIC(10,3) NOT NULL,
    bid_high NUMERIC(10,3) NOT NULL,
    bid_low NUMERIC(10,3) NOT NULL,
    bid_close NUMERIC(10,3) NOT NULL,
    volume INTEGER NOT NULL,
    PRIMARY KEY(symbol, ts)
);

Next, create a batch application.

pom.xml

You need following dependencies in your pom.xml:

<dependencies>
    <dependency>
        <groupId>javax</groupId>
        <artifactId>javaee-api</artifactId>
        <version>7.0</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.batchee</groupId>
        <artifactId>batchee-servlet-embedded</artifactId>
        <version>0.2-incubating</version>
    </dependency>
    <dependency>
        <groupId>org.jberet</groupId>
        <artifactId>jberet-support</artifactId>
        <version>1.1.0.Final</version>
    </dependency>
    <dependency>
        <groupId>net.sf.supercsv</groupId>
        <artifactId>super-csv</artifactId>
        <version>2.3.1</version>
    </dependency>
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>2.5.3</version>
    </dependency>
</dependencies>

/src/main/resources/META-INF/batch-jobs/load-csv.xml

The job uses csvItemReader and jdbcItemWriter that are supplied within the jberet-support package.

<job id="load-csv" version="1.0" xmlns="http://xmlns.jcp.org/xml/ns/javaee">
    <step id="load">
        <chunk>
            <reader ref="csvItemReader">
                <properties>
                    <property name="resource" value="#{jobParameters['resource']}"/>
                    <property name="headerless" value="true"/>
                    <property name="delimiterChar" value=";"/>
                    <property name="beanType" value="java.util.List"/>
                </properties>
            </reader>
            <processor ref="forexItemProcessor">
                <properties>
                    <property name="symbol" value="#{jobParameters['symbol']}"/>
                </properties>
            </processor>
            <writer ref="jdbcItemWriter">
                <properties>
                    <property name="dataSourceLookup" value="java:jboss/datasources/MyDS"/>
                    <property name="sql"
                              value="INSERT INTO forex (symbol, ts, bid_open, bid_high, bid_low, bid_close, volume) values (?, ?, ?, ?, ?, ?, ?)"/>
                    <property name="beanType" value="java.util.List"/>
                </properties>
            </writer>
        </chunk>
    </step>
</job>

/src/main/java/jbatch/ForexItemProcessor.java

@Named
@Dependent
public class ForexItemProcessor implements ItemProcessor {
    private static final DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("uuuuMMdd HHmmss");

    @Inject
    @BatchProperty
    private String symbol;

    @Override
    public Object processItem(final Object item) throws Exception {
        final List items = (List) item;
        return Arrays.asList(symbol,
                Timestamp.valueOf(LocalDateTime.parse((String) items.get(0), FORMATTER)),
                new BigDecimal((String) items.get(1)),
                new BigDecimal((String) items.get(2)),
                new BigDecimal((String) items.get(3)),
                new BigDecimal((String) items.get(4)),
                Integer.valueOf((String) items.get(5)));
    }
}

Run the batch

Example if you put the forex CSV in /tmp/DAT_ASCII_USDJPY_M1_201504.csv:

curl 'http://localhost:8080/jbatch-example-1.0-SNAPSHOT/jbatch/rest/start/load-csv?symbol=USDJPY&resource=/tmp/DAT_ASCII_USDJPY_M1_201504.csv'

After the job done, check dataset within your database using the H2 CLI:

sql> select * from forex;
SYMBOL | TS                    | BID_OPEN | BID_HIGH | BID_LOW | BID_CLOSE | VOLUME
USDJPY | 2015-04-01 00:00:00.0 | 119.566  | 119.566  | 119.551 | 119.565   | 0
USDJPY | 2015-04-01 00:01:00.0 | 119.566  | 119.581  | 119.565 | 119.579   | 0
USDJPY | 2015-04-01 00:02:00.0 | 119.581  | 119.586  | 119.581 | 119.583   | 0
...
(31572 rows, 505 ms)

The project which used in this entry can be obtained from my GitHub repository.


JBatch examples: get started JBatch with WildFly and batchee-servlet-embedded


Posted on Sunday May 24, 2015 at 12:43PM in JBatch


JSR352 aka JBatch is the standardized batch processing framework for the Java EE platform. it eases tedious work on batch programming such as transaction management of bulk processing, parallel processing, flow control. and it gives well-integrated job information management mechanism, well-designed interfaces that enables us to develop common modules for frequently use. there are some convenient modules aim to be used in typical situation. in this entry, I introduce you some examples to get started.

Setup

Setup WildFly 9.0.0.CR1: download the full distribution from wildfly.org and unpack.

Next, create a war application contains following resources:

pom.xml

This contains a dependency to batchee-servlet-embedded. it brings a simple web application which enables us to control batch jobs, also it supplies simple REST style interface.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>jbatch-example</groupId>
    <artifactId>jbatch-example</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>war</packaging>

    <properties>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <failOnMissingWebXml>false</failOnMissingWebXml>
    </properties>

    <dependencies>
        <dependency>
            <groupId>javax</groupId>
            <artifactId>javaee-api</artifactId>
            <version>7.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.batchee</groupId>
            <artifactId>batchee-servlet-embedded</artifactId>
            <version>0.2-incubating</version>
        </dependency>
    </dependencies>

</project>

/src/main/java/jbatch/MyBatchlet.java

@Named
@Dependent
public class MyBatchlet extends AbstractBatchlet {
    @Override
    public String process() throws Exception {
        System.out.println("Hello, JBatch");
        return null;
    }
}

/src/main/resources/META-INF/batch-jobs/simple-job.xml

<job id="simple-job" version="1.0" xmlns="http://xmlns.jcp.org/xml/ns/javaee">
    <step id="myStep">
        <batchlet ref="myBatchlet"/>
    </step>
</job>

Deploy and run the batch

Then deploy the war, and go to http://localhost:8080/jbatch-example-1.0-SNAPSHOT/jbatch/ from your browser. you’ll see following page:

26c69072 c35c 4885 b8ab cb7c8445c7ac

Then click New Batch button. you’ll be transited to following page. enter simple-job to the text box, then click Set Job Name, and click Submit

41f38684 4e35 4e09 8463 28b81ae49f92

If the batch executed successfully, you’ll be transited to following page:

5913a110 1457 483b 878b d8d08f2520a2

Also you’ll see following output in your WildFly console:

12:23:02,046 INFO  [stdout] (Batch Thread - 2) Hello, JBatch

You can see job execution history from the web. click simple-job in home page and you’ll be transited following page:

e2f7d3c1 c52b 4261 9727 a3a22c1277ce

Instead of using web browser, you can launch a job with simple REST style API as follows:

curl http://localhost:8080/jbatch-example-1.0-SNAPSHOT/jbatch/rest/start/simple-job

For details of the REST API, you can see help with following command:

curl http://localhost:8080/jbatch-example-1.0-SNAPSHOT/jbatch/rest/

It shows:

Known commands are:

* start/ - start a new batch job
  Sample: http://localhost:8080/myapp/jbatch/rest/start/myjobname?param1=x&param2=y
  BatchEE will start the job and immediately return

* status/ - query the current status
  Sample: http://localhost:8080/myapp/jbatch/rest/status/23
  will return the state of executionId 23

* stop/ - stop the job with the given executionId
  Sample: http://localhost:8080/myapp/jbatch/rest/stop/23
  will stop the job with executionId 23

* restart/ - restart the job with the given executionId
  Sample: http://localhost:8080/myapp/jbatch/rest/restart/23
  will restart the job with executionId 23

The project which used in this entry can be obtained from my GitHub repository.