20150401 000000;119.566000;119.566000;119.551000;119.565000;0 20150401 000100;119.566000;119.581000;119.565000;119.579000;0 20150401 000200;119.581000;119.586000;119.581000;119.583000;0 ...
JBatch examples: bulk loading from CSV file to database
TweetPosted on Sunday May 24, 2015 at 03:07PM in JBatch
Bulk loading is a typical usecase of batch application. in this entry, I give you a example of bulk loading from a CSV file.
There is a supplemental package named jberet-support
, which contains many useful classes that implemented ItemReader
or ItemWriter
for common usecases. in this entry, we’ll use CsvItemReader
to read a CSV file and JdbcItemWriter
to write data to database.
Setup
In this setup we’ll use WildFly 9.0.0.CR1.
First, we need a CSV file, whatever. so we’ll use a forex historical data which can be downloaded from http://www.histdata.com/download-free-forex-historical-data/?/ascii/1-minute-bar-quotes/usdjpy/2015/4 . download it and unpack, put DAT_ASCII_USDJPY_M1_201504.csv
somewhere in your environment. this file contains data like:
Next, define a JTA datasource on WildFly. following is an example command which defines a H2 datasource using jboss-cli
:
data-source add \ --name=MyDS \ --driver-name=h2 \ --jndi-name=java:jboss/datasources/MyDS \ --user-name=sa \ --password=sa \ --connection-url=jdbc:h2:/tmp/myds;AUTO_SERVER=TRUE
After confirmed outcome was success
, issue following command to test a connection:
/subsystem=datasources/data-source=MyDS:test-connection-in-pool
Next, create a table to store dataset. issue following command to start H2 console, in the base directory of your WildFly instance:
java -cp ./modules/system/layers/base/com/h2database/h2/main/h2*.jar org.h2.tools.Shell -url "jdbc:h2:/tmp/myds;AUTO_SERVER=TRUE" -user sa -password sa
Execute following DDL:
CREATE TABLE forex ( symbol VARCHAR(6) NOT NULL, ts TIMESTAMP NOT NULL, bid_open NUMERIC(10,3) NOT NULL, bid_high NUMERIC(10,3) NOT NULL, bid_low NUMERIC(10,3) NOT NULL, bid_close NUMERIC(10,3) NOT NULL, volume INTEGER NOT NULL, PRIMARY KEY(symbol, ts) );
Next, create a batch application.
pom.xml
You need following dependencies in your pom.xml
:
<dependencies> <dependency> <groupId>javax</groupId> <artifactId>javaee-api</artifactId> <version>7.0</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.batchee</groupId> <artifactId>batchee-servlet-embedded</artifactId> <version>0.2-incubating</version> </dependency> <dependency> <groupId>org.jberet</groupId> <artifactId>jberet-support</artifactId> <version>1.1.0.Final</version> </dependency> <dependency> <groupId>net.sf.supercsv</groupId> <artifactId>super-csv</artifactId> <version>2.3.1</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.5.3</version> </dependency> </dependencies>
/src/main/resources/META-INF/batch-jobs/load-csv.xml
The job uses csvItemReader
and jdbcItemWriter
that are supplied within the jberet-support
package.
<job id="load-csv" version="1.0" xmlns="http://xmlns.jcp.org/xml/ns/javaee"> <step id="load"> <chunk> <reader ref="csvItemReader"> <properties> <property name="resource" value="#{jobParameters['resource']}"/> <property name="headerless" value="true"/> <property name="delimiterChar" value=";"/> <property name="beanType" value="java.util.List"/> </properties> </reader> <processor ref="forexItemProcessor"> <properties> <property name="symbol" value="#{jobParameters['symbol']}"/> </properties> </processor> <writer ref="jdbcItemWriter"> <properties> <property name="dataSourceLookup" value="java:jboss/datasources/MyDS"/> <property name="sql" value="INSERT INTO forex (symbol, ts, bid_open, bid_high, bid_low, bid_close, volume) values (?, ?, ?, ?, ?, ?, ?)"/> <property name="beanType" value="java.util.List"/> </properties> </writer> </chunk> </step> </job>
/src/main/java/jbatch/ForexItemProcessor.java
@Named @Dependent public class ForexItemProcessor implements ItemProcessor { private static final DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("uuuuMMdd HHmmss"); @Inject @BatchProperty private String symbol; @Override public Object processItem(final Object item) throws Exception { final List items = (List) item; return Arrays.asList(symbol, Timestamp.valueOf(LocalDateTime.parse((String) items.get(0), FORMATTER)), new BigDecimal((String) items.get(1)), new BigDecimal((String) items.get(2)), new BigDecimal((String) items.get(3)), new BigDecimal((String) items.get(4)), Integer.valueOf((String) items.get(5))); } }
Run the batch
Example if you put the forex CSV in /tmp/DAT_ASCII_USDJPY_M1_201504.csv
:
curl 'http://localhost:8080/jbatch-example-1.0-SNAPSHOT/jbatch/rest/start/load-csv?symbol=USDJPY&resource=/tmp/DAT_ASCII_USDJPY_M1_201504.csv'
After the job done, check dataset within your database using the H2 CLI:
sql> select * from forex; SYMBOL | TS | BID_OPEN | BID_HIGH | BID_LOW | BID_CLOSE | VOLUME USDJPY | 2015-04-01 00:00:00.0 | 119.566 | 119.566 | 119.551 | 119.565 | 0 USDJPY | 2015-04-01 00:01:00.0 | 119.566 | 119.581 | 119.565 | 119.579 | 0 USDJPY | 2015-04-01 00:02:00.0 | 119.581 | 119.586 | 119.581 | 119.583 | 0 ... (31572 rows, 505 ms)
The project which used in this entry can be obtained from my GitHub repository.