Kohei Nozaki's blog 

Entries tagged [jbatch]

JBatch examples: bulk loading from database to CSV file


Posted on Sunday May 24, 2015 at 03:52PM in JBatch


In previous entry, we looked how to load data from CSV file to database. in this entry, we will look how to load data from database to CSV file. we’ll use JdbcItemReader to read data from database and CsvItemWriter to write data to as a CSV file.

Setup

In this setup we’ll use WildFly 9.0.0.CR1.

Assume we already have a table forex and data in H2 database that created and populated in previous entry.

For JdbcItemReader, we need an another datasource which is Non-JTA, references the same database to JTA one. for detail see this conversation.

data-source add \
 --name=MyNonJtaDS \
 --driver-name=h2 \
 --jndi-name=java:jboss/datasources/MyNonJtaDS \
 --user-name=sa \
 --password=sa \
 --connection-url=jdbc:h2:/tmp/myds;AUTO_SERVER=TRUE \
 --jta=false

Next, create job artifacts.

/src/main/resources/META-INF/batch-jobs/save-csv.xml

Note that MyNonJtaDS is used, not MyDS. and all of classes that used in the job are supplied within jberet-support.

<job id="save-csv" version="1.0" xmlns="http://xmlns.jcp.org/xml/ns/javaee">
    <step id="save">
        <chunk>
            <reader ref="jdbcItemReader">
                <properties>
                    <property name="dataSourceLookup" value="java:jboss/datasources/MyNonJtaDS"/>
                    <property name="sql"
                              value="SELECT symbol, ts, bid_open, bid_high, bid_low, bid_close, volume FROM forex ORDER BY symbol, ts"/>
                    <property name="beanType" value="java.util.List"/>
                </properties>
            </reader>
            <writer ref="csvItemWriter">
                <properties>
                    <property name="resource" value="#{jobParameters['resource']}"/>
                    <property name="header"
                              value="symbol, ts, bid_open, bid_high, bid_low, bid_close, volume"/>
                    <property name="beanType" value="java.util.List"/>
                </properties>
            </writer>
        </chunk>
    </step>
</job>

Run the job

Issue following command. this saves a CSV file into /tmp/save.csv:

curl 'http://localhost:8080/jbatch-example-1.0-SNAPSHOT/jbatch/rest/start/save-csv?resource=/tmp/save.csv'

After job execution is done, check the CSV file is created as expected:

symbol,ts,bid_open,bid_high,bid_low,bid_close,volume
USDJPY,2015-04-01 00:00:00.0,119.566,119.566,119.551,119.565,0
USDJPY,2015-04-01 00:01:00.0,119.566,119.581,119.565,119.579,0
USDJPY,2015-04-01 00:02:00.0,119.581,119.586,119.581,119.583,0
...

The project which used in this entry can be obtained from my GitHub repository.


JBatch examples: bulk loading from CSV file to database


Posted on Sunday May 24, 2015 at 03:07PM in JBatch


Bulk loading is a typical usecase of batch application. in this entry, I give you a example of bulk loading from a CSV file.

There is a supplemental package named jberet-support, which contains many useful classes that implemented ItemReader or ItemWriter for common usecases. in this entry, we’ll use CsvItemReader to read a CSV file and JdbcItemWriter to write data to database.

Setup

In this setup we’ll use WildFly 9.0.0.CR1.

First, we need a CSV file, whatever. so we’ll use a forex historical data which can be downloaded from http://www.histdata.com/download-free-forex-historical-data/?/ascii/1-minute-bar-quotes/usdjpy/2015/4 . download it and unpack, put DAT_ASCII_USDJPY_M1_201504.csv somewhere in your environment. this file contains data like:

20150401 000000;119.566000;119.566000;119.551000;119.565000;0
20150401 000100;119.566000;119.581000;119.565000;119.579000;0
20150401 000200;119.581000;119.586000;119.581000;119.583000;0
...

Next, define a JTA datasource on WildFly. following is an example command which defines a H2 datasource using jboss-cli:

data-source add \
 --name=MyDS \
 --driver-name=h2 \
 --jndi-name=java:jboss/datasources/MyDS \
 --user-name=sa \
 --password=sa \
 --connection-url=jdbc:h2:/tmp/myds;AUTO_SERVER=TRUE

After confirmed outcome was success, issue following command to test a connection:

/subsystem=datasources/data-source=MyDS:test-connection-in-pool

Next, create a table to store dataset. issue following command to start H2 console, in the base directory of your WildFly instance:

java -cp ./modules/system/layers/base/com/h2database/h2/main/h2*.jar org.h2.tools.Shell -url "jdbc:h2:/tmp/myds;AUTO_SERVER=TRUE" -user sa -password sa

Execute following DDL:

CREATE TABLE forex (
    symbol VARCHAR(6) NOT NULL,
    ts TIMESTAMP NOT NULL,
    bid_open NUMERIC(10,3) NOT NULL,
    bid_high NUMERIC(10,3) NOT NULL,
    bid_low NUMERIC(10,3) NOT NULL,
    bid_close NUMERIC(10,3) NOT NULL,
    volume INTEGER NOT NULL,
    PRIMARY KEY(symbol, ts)
);

Next, create a batch application.

pom.xml

You need following dependencies in your pom.xml:

<dependencies>
    <dependency>
        <groupId>javax</groupId>
        <artifactId>javaee-api</artifactId>
        <version>7.0</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.batchee</groupId>
        <artifactId>batchee-servlet-embedded</artifactId>
        <version>0.2-incubating</version>
    </dependency>
    <dependency>
        <groupId>org.jberet</groupId>
        <artifactId>jberet-support</artifactId>
        <version>1.1.0.Final</version>
    </dependency>
    <dependency>
        <groupId>net.sf.supercsv</groupId>
        <artifactId>super-csv</artifactId>
        <version>2.3.1</version>
    </dependency>
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>2.5.3</version>
    </dependency>
</dependencies>

/src/main/resources/META-INF/batch-jobs/load-csv.xml

The job uses csvItemReader and jdbcItemWriter that are supplied within the jberet-support package.

<job id="load-csv" version="1.0" xmlns="http://xmlns.jcp.org/xml/ns/javaee">
    <step id="load">
        <chunk>
            <reader ref="csvItemReader">
                <properties>
                    <property name="resource" value="#{jobParameters['resource']}"/>
                    <property name="headerless" value="true"/>
                    <property name="delimiterChar" value=";"/>
                    <property name="beanType" value="java.util.List"/>
                </properties>
            </reader>
            <processor ref="forexItemProcessor">
                <properties>
                    <property name="symbol" value="#{jobParameters['symbol']}"/>
                </properties>
            </processor>
            <writer ref="jdbcItemWriter">
                <properties>
                    <property name="dataSourceLookup" value="java:jboss/datasources/MyDS"/>
                    <property name="sql"
                              value="INSERT INTO forex (symbol, ts, bid_open, bid_high, bid_low, bid_close, volume) values (?, ?, ?, ?, ?, ?, ?)"/>
                    <property name="beanType" value="java.util.List"/>
                </properties>
            </writer>
        </chunk>
    </step>
</job>

/src/main/java/jbatch/ForexItemProcessor.java

@Named
@Dependent
public class ForexItemProcessor implements ItemProcessor {
    private static final DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("uuuuMMdd HHmmss");

    @Inject
    @BatchProperty
    private String symbol;

    @Override
    public Object processItem(final Object item) throws Exception {
        final List items = (List) item;
        return Arrays.asList(symbol,
                Timestamp.valueOf(LocalDateTime.parse((String) items.get(0), FORMATTER)),
                new BigDecimal((String) items.get(1)),
                new BigDecimal((String) items.get(2)),
                new BigDecimal((String) items.get(3)),
                new BigDecimal((String) items.get(4)),
                Integer.valueOf((String) items.get(5)));
    }
}

Run the batch

Example if you put the forex CSV in /tmp/DAT_ASCII_USDJPY_M1_201504.csv:

curl 'http://localhost:8080/jbatch-example-1.0-SNAPSHOT/jbatch/rest/start/load-csv?symbol=USDJPY&resource=/tmp/DAT_ASCII_USDJPY_M1_201504.csv'

After the job done, check dataset within your database using the H2 CLI:

sql> select * from forex;
SYMBOL | TS                    | BID_OPEN | BID_HIGH | BID_LOW | BID_CLOSE | VOLUME
USDJPY | 2015-04-01 00:00:00.0 | 119.566  | 119.566  | 119.551 | 119.565   | 0
USDJPY | 2015-04-01 00:01:00.0 | 119.566  | 119.581  | 119.565 | 119.579   | 0
USDJPY | 2015-04-01 00:02:00.0 | 119.581  | 119.586  | 119.581 | 119.583   | 0
...
(31572 rows, 505 ms)

The project which used in this entry can be obtained from my GitHub repository.


JBatch examples: get started JBatch with WildFly and batchee-servlet-embedded


Posted on Sunday May 24, 2015 at 12:43PM in JBatch


JSR352 aka JBatch is the standardized batch processing framework for the Java EE platform. it eases tedious work on batch programming such as transaction management of bulk processing, parallel processing, flow control. and it gives well-integrated job information management mechanism, well-designed interfaces that enables us to develop common modules for frequently use. there are some convenient modules aim to be used in typical situation. in this entry, I introduce you some examples to get started.

Setup

Setup WildFly 9.0.0.CR1: download the full distribution from wildfly.org and unpack.

Next, create a war application contains following resources:

pom.xml

This contains a dependency to batchee-servlet-embedded. it brings a simple web application which enables us to control batch jobs, also it supplies simple REST style interface.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>jbatch-example</groupId>
    <artifactId>jbatch-example</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>war</packaging>

    <properties>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <failOnMissingWebXml>false</failOnMissingWebXml>
    </properties>

    <dependencies>
        <dependency>
            <groupId>javax</groupId>
            <artifactId>javaee-api</artifactId>
            <version>7.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.batchee</groupId>
            <artifactId>batchee-servlet-embedded</artifactId>
            <version>0.2-incubating</version>
        </dependency>
    </dependencies>

</project>

/src/main/java/jbatch/MyBatchlet.java

@Named
@Dependent
public class MyBatchlet extends AbstractBatchlet {
    @Override
    public String process() throws Exception {
        System.out.println("Hello, JBatch");
        return null;
    }
}

/src/main/resources/META-INF/batch-jobs/simple-job.xml

<job id="simple-job" version="1.0" xmlns="http://xmlns.jcp.org/xml/ns/javaee">
    <step id="myStep">
        <batchlet ref="myBatchlet"/>
    </step>
</job>

Deploy and run the batch

Then deploy the war, and go to http://localhost:8080/jbatch-example-1.0-SNAPSHOT/jbatch/ from your browser. you’ll see following page:

26c69072 c35c 4885 b8ab cb7c8445c7ac

Then click New Batch button. you’ll be transited to following page. enter simple-job to the text box, then click Set Job Name, and click Submit

41f38684 4e35 4e09 8463 28b81ae49f92

If the batch executed successfully, you’ll be transited to following page:

5913a110 1457 483b 878b d8d08f2520a2

Also you’ll see following output in your WildFly console:

12:23:02,046 INFO  [stdout] (Batch Thread - 2) Hello, JBatch

You can see job execution history from the web. click simple-job in home page and you’ll be transited following page:

e2f7d3c1 c52b 4261 9727 a3a22c1277ce

Instead of using web browser, you can launch a job with simple REST style API as follows:

curl http://localhost:8080/jbatch-example-1.0-SNAPSHOT/jbatch/rest/start/simple-job

For details of the REST API, you can see help with following command:

curl http://localhost:8080/jbatch-example-1.0-SNAPSHOT/jbatch/rest/

It shows:

Known commands are:

* start/ - start a new batch job
  Sample: http://localhost:8080/myapp/jbatch/rest/start/myjobname?param1=x&param2=y
  BatchEE will start the job and immediately return

* status/ - query the current status
  Sample: http://localhost:8080/myapp/jbatch/rest/status/23
  will return the state of executionId 23

* stop/ - stop the job with the given executionId
  Sample: http://localhost:8080/myapp/jbatch/rest/stop/23
  will stop the job with executionId 23

* restart/ - restart the job with the given executionId
  Sample: http://localhost:8080/myapp/jbatch/rest/restart/23
  will restart the job with executionId 23

The project which used in this entry can be obtained from my GitHub repository.


Testing a JBatch job using Arquillian on remote WildFly


Posted on Thursday Mar 05, 2015 at 05:14PM in JBatch


I pushed entire project to https://github.com/lbtc-xxx/arquillian-jbatch .

While I prefer remote EJB way like my previous posting for JBatch testing, it works well for simple project. but little slower than remote EJB on my environment.


Using Apache BatchEE's server API with JBeret


Posted on Wednesday Mar 04, 2015 at 06:11PM in JBatch


Apache BatchEE is a fork of the JSR352 reference implementation with many additional features. it has useful REST APIs built on JAX-RS so we can manipulate (start, stop, restart, and so on) batches through REST API. fortunately, it is well modularized so we can use its REST API implementation with other JSR352 implementation such as JBeret.

curl -H 'Content-Type: application/json' \
     -d '{"entries": [ {"key": "myKey1", "value": "myValue1"}, {"key": "myKey2", "value": "myValue2"} ]}' \
     http://localhost:8080/batcheetest/jbatch/batchee/execution/start/myjob

Then you’ll see following output in WildFly console:

17:57:18,608 INFO  [stdout] (Batch Thread - 9) Hello world!
17:57:18,609 INFO  [stdout] (Batch Thread - 9) Job Parameters: {myKey2=myValue2, myKey1=myValue1}

It’s much better than create a servlet which kicks the batch.

And I haven’t tested yet, it also have useful client API for that REST API. we can use JobOperator transparently thanks to its proxy. for details of REST API and BatchEE, see following URLs:

UPDATE:

I added a test case which uses client API of BatchEE, but it doesn’t work with released version. you need to apply a patch by hand. for details refer https://issues.apache.org/jira/browse/BATCHEE-59

Also I created an example of a test class which uses Arquillian. this won’t work with 0.2-incubating but will work with future versions.

@RunWith(Arquillian.class)
public class MyJobArquillianIT {
    @ArquillianResource
    private URL url;
    private JobOperator jobOperator;

    @Deployment(testable = false)
    public static Archive<?> war() {
        final File[] files = Maven.configureResolver()
                .loadPomFromFile("pom.xml")
                .resolve("org.apache.batchee:batchee-jaxrs-server")
                .withTransitivity()
                .asFile();
        return ShrinkWrap.create(WebArchive.class)
                .addClass(HelloBatchlet.class)
                .addAsResource("META-INF/batch-jobs/myjob.xml")
                .addAsLibraries(files);
    }

    @Before
    public void before() {
        jobOperator = BatchEEJAXRSClientFactory.newClient(url.toExternalForm() + "jbatch");
    }

    @Test
    public void test() {
        Properties jobParameters = new Properties();
        jobParameters.setProperty("someKey", "someValue");
        final JobExecution jobExecution = waitForFinish(jobOperator.start("myjob", jobParameters));
        Assert.assertEquals(BatchStatus.COMPLETED, jobExecution.getBatchStatus());
    }

    private static final Collection<BatchStatus> BATCH_END_STATUSES
            = EnumSet.of(BatchStatus.COMPLETED, BatchStatus.FAILED, BatchStatus.STOPPED, BatchStatus.ABANDONED);

    private JobExecution waitForFinish(long executionId) {
        JobExecution jobExecution;
        while (!BATCH_END_STATUSES.contains((jobExecution = jobOperator.getJobExecution(executionId)).getBatchStatus())) {
            try {
                Thread.sleep(100);
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }
        return jobExecution;
    }
}

Web frontend

Its web-frontend GUI (batchee-servlet-embedded) works even against JBeret runtime as follows. you can view executions, jobs and its definition, and can start jobs with custom parameters.

42afff86 2c07 4c4f 8a93 9b90f4910b43

It also exposes simple-rest API which is more useful when issue the command by hand or some cases (e.g. cron job). in case of this example, you can start a job as follows:

$ curl 'http://localhost:8080/batcheetest/jbatch-gui/rest/start/myjob?param1=x&param2=y'
5
OK

You can read its help as follows:

$ curl 'http://localhost:8080/batcheetest/jbatch-gui/rest/'
-1
FAILURE
Unknown command:
The returned response if of MIME type text/plain and contains the following information
  {jobExecutionId} (or -1 if no executionId was detected)\n
  OK (or FAILURE)\n
  followed by command specific information

Known commands are:

* start/ - start a new batch job
  Sample: http://localhost:8080/myapp/jbatch/rest/start/myjobname?param1=x&param2=y
  BatchEE will start the job and immediately return

* status/ - query the current status
  Sample: http://localhost:8080/myapp/jbatch/rest/status/23
  will return the state of executionId 23

* stop/ - stop the job with the given executionId
  Sample: http://localhost:8080/myapp/jbatch/rest/stop/23
  will stop the job with executionId 23

* restart/ - restart the job with the given executionId
  Sample: http://localhost:8080/myapp/jbatch/rest/restart/23
  will restart the job with executionId 23