This is a continuation of .
Batching
I ran each of these with a commit size of 1 and 10 for a total of 100 records. I was curious what types of improvements would be seen for each batch job and how the increased processing would affect completion time. Additionally, the idea of batching is to save time on the most expensive part of the operation (writing). For this experiment I just wanted to test the batching’s affects. Again, this was tested on both Spring Batch and Akka.
Spring Batch Implementation
For easier configuration I implemented Spring Batch using Spring Boot. For something as small as this it leads to easier configuration and coding. Spring Batch has built in readers, processors, and writers. So I leveraged the framework as much as possible. Below is a sample of the step for a commit size of 1. The complete code is linked below.
public Step step1() throws Exception {
return stepBuilders.get("step1")
.<Person, Person>chunk(1)
.reader(reader())
.processor(processor())
.writer(writer(getDataSource()))
.build();
}
Spring Batch Code
Akka Implementation
For better comparison I reused the same Spring Batch writer and processor in the Akka implementaiton. So the only difference is the writer and the framework. In this I implemented my own writer using a JDBC connection.
For the implementation I utilized an Akka dispatcher and Akka router. These two pieces provide control of the execution of an akka actor. The dispatcher was discussed above. The router is similar to network routers. It simply distributes the workload to the actor in a particular fashion. In this case I will be using the round robin implementation. I made use of the balancing dispatcher which allows the actor to tell the system “hey i’m done” at which point more work is assigned to it.
In my implementation a read is performed and then that object is passed to an actor which completes the processing. This process continues until the reader has exhausted it’s dataset.
Below is the code sample:
ActorSystem _system = ActorSystem.create("balancing-dispatcher",
ConfigFactory.load("dispatcher").getConfig("MyDispatcherExample"));
ActorRef actor = _system.actorOf(new Props(ProcessWriteBatchActor.class)
.withDispatcher("balancingDispatcher1").withRouter(
new RoundRobinRouter(1)));
connection = getDataSource().getConnection();
Long startTime = new Date().getTime();
ItemReader itemReader = new App().itemReader();
Person person;
while ((person = (Person) itemReader.read()) != null) {
actor.tell(person);
}
_system.awaitTermination();
connection.close();
There are two separate modules for a commit size of 1 (AkkaBatch) and commit size of 10 (AkkaBatch10): Akka Batch
The Results
I ran both the frameworks on similar loads and similar commit sizes. I compared the time each took to complete the job. The sample size for all of these was 100. All the results are measured in milliseconds.
Sync
Spring Batch
Spring Batch
Akka
Akka
Commit Size
Default
Thread Pool Default
Thread Pool 1
Fork Join 1
1
1,345
1414
896
882
10
899
858
336
308
Concurrent
Spring Batch
Spring Batch
Spring Batch
Spring Batch
Akka
Akka
Akka
Akka
Akka
Akka
Commit Size
Thread Pool Core Size 4
Thread Pool Core Size 8
Async Threading Default
Async Threading Unlimited
Thread Pool 4
Thread Pool 8
Thread Pool 20
Fork Join 4
Fork Join 8
Fork Join 20
1
1239
1437
1319
1327
804
837
766
865
868
846
10
800
842
816
865
294
387
330
293
341
319
Results
Conclusion
The results show that leveraging Akka in all situations leads to a quicker job. The results even showed that a Spring Batch job with a larger commit size is comparable to Akka with a smaller commit size. The speed up with Akka also increased with a larger commit size.
The one aspect I did not test was longer running implementations. While the Spring Batch framework requires more configuration it is designed to be long running and it is possible the sample size I tested with was not enough to get it’s full value.