Java Java EE J2EE

Akka actor thread utilization and optimization

August 13, 2011

2,063 views

6 min read

1 comment

I am working on a project where we have http requests from an IVR Voice browser making VXML requests. Each request is a phone caller and each new caller in our application will spawn 6 Akka actor requests for various account look-ups.

The issue I had was the concurrent new caller volume of ~20 was backing up the Akka queues and then timing out the akka requests. This was puzzling as the Dispatcher thread pool was set to a core size of 10 which seemed to spawn off 50 akka threads total.

Then we used JMeter to simulate new callers and as we started increasing up the number of callers, we noticed that Akk was not using the new threads.

The JMEter tests simulated 120 new callers with a 60 second ramp-up time and would loop for forever for a total of 300 seconds. Here is the result of that test showing only 1 thread used.

120 callers, 1 thread per actor	64-bit mac	Core Due (2 cores)
	# of Samples	requests/sec	KB/sec
/start/initDivisionalIvr.vxml	891	2.826507629	4.551670977
/welcome/spanishDivisionalIvr_rc.vxml	834	2.839313254	7.22583041
/welcome/postWelcome.vxml	822	2.818176208	16.66339157
/identify/disambiguateAniOrCed.vxml	813	2.801477581	5.709652063
/identify/gatherGetServiceInfoResult.vxml	805	2.697043303	13.15092606
TOTAL	4165	13.21257495	44.4114435

When I look at the threads that are being used, I noticed only 1 thread is active, thus why the performance is so slow.

We would fill up each Actors queue around 50-60 concurrent callers so the numbers where quite small. Not enough for our target traffic which needs be in excess of 300 concurrent callers per machine. We could just get many more machines, bu that is not the best approach.

Here is the akka configuration we had

    <akka:dispatcher id="appt-dispatcher"
                     type="executor-based-event-driven-work-stealing"
                     name="dispatcher-appointments">
        <akka:thread -pool queue="unbounded-linked-blocking-queue"
                          fairness="true"
                          core-pool-size="4"
                          max-pool-size="4"
                          keep-alive="60000"
                          rejection-policy="caller-runs-policy"></akka:thread>
    </akka:dispatcher>

    <akka:typed -actor id="appointmentActor"
                      interface="com.comcast.ivr.core.actors.AppointmentActor"
                      implementation="com.comcast.ivr.core.actors.AppointmentActorImpl"
                      timeout="${actor.appointment.timeout}"
                      scope="singleton"
                      depends-on="appointmentServiceClient,applicationProperties"
                      lifecycle="permanent">
        <akka:dispatcher ref="dispatcher-Appointment"></akka:dispatcher>
        <property name="appointmentServiceClient" ref="appointmentServiceClient"></property>
    </akka:typed>

Maybe I missed this in the documentation, but it seems that even though we created the actor as prototype, there was still only 1 instance of the Actor created, and then only 1 thread was being used because an Actor is bound to a queue. You can see that 1 thread per actor is working while the others are waiting.

In order to be able to use multiple threads per typed actor we needed to create an ActorRegistry and create multiple Actors per type.

Right now we created the actors individually. We are going to refactor this to be dynamic instead of static, but for our proof of concept, this is what we are going use:

    <akka:dispatcher id="appt-dispatcher"
                     type="executor-based-event-driven-work-stealing"
                     name="dispatcher-appointments">
        <akka:thread -pool queue="unbounded-linked-blocking-queue"
                          fairness="true"
                          core-pool-size="2"
                          max-pool-size="12"
                          keep-alive="60000"
                          rejection-policy="caller-runs-policy"></akka:thread>
    </akka:dispatcher>

    <akka:typed -actor id="appointmentActor1"
                      interface="com.comcast.ivr.core.actors.AppointmentActor"
                      implementation="com.comcast.ivr.core.actors.AppointmentActorImpl"
                      timeout="${actor.appointment.timeout}"
                      scope="singleton"
                      depends-on="appointmentServiceClient,applicationProperties"
                      lifecycle="permanent">
        <akka:dispatcher ref="appt-dispatcher"></akka:dispatcher>
        <property name="appointmentServiceClient" ref="appointmentServiceClient"></property>
    </akka:typed>

    <akka:typed -actor id="appointmentActor2"
                      interface="com.comcast.ivr.core.actors.AppointmentActor"
                      implementation="com.comcast.ivr.core.actors.AppointmentActorImpl"
                      timeout="${actor.appointment.timeout}"
                      scope="singleton"
                      depends-on="appointmentServiceClient,applicationProperties"
                      lifecycle="permanent">
        <akka:dispatcher ref="appt-dispatcher"></akka:dispatcher>
        <property name="appointmentServiceClient" ref="appointmentServiceClient"></property>
    </akka:typed>
    ... more actors omitted ...

This is the initial Actor Load balance Registry.

import java.util.Random;
import static akka.actor.Actors.registry;

public class ActorLoadBalancer {

    // TODO: Implement proper load balancing algorithm using CyclicIterator
    @SuppressWarnings("unchecked")
    public static  T actor(Class targetClass) {
        Object[] workers = registry().typedActorsFor(targetClass);
        // Routing.loadBalancerActor(new CyclicIterator(Arrays.asList(workers));
        int actorNumber = new Random().nextInt(workers.length);
        return (T) workers[actorNumber];
    }
}

We started with 4 actors for each registry at first then ran the same JMeter test with 120 callers and our numbers where an order of magnitude better

120 callers, 4 Actors	32-bit windows 2003 server running JDK 1.6 in client mode	2 cpu Core Due (4 cores total)
sampler_label	aggregate_report_count	aggregate_report_rate	aggregate_report_bandwidth
/start/initDivisionalIvr.vxml	4692	14.97601348	24.11664671
/welcome/spanishDivisionalIvr_rc.vxml	4663	15.63133776	39.78053341
/welcome/postWelcome.vxml	4662	15.63275434	104.8341055
/identify/disambiguateAniOrCed.vxml	4607	15.65893633	31.91425794
/identify/gatherGetServiceInfoResult.vxml	4607	15.42448298	85.53283018
TOTAL	23231	74.14914092	273.2926494

Wen I looked at the thread usage, I was able to see better thread utilization for the actor threads:

But I noticed there where still some threads that where backing up so I created 3 of the actors with 10 instances, and the others stayed with 4. I can to this numbering with trial-and-error to see how many I could use. But I was still able to get another large improvement

120 callers, up to 10 Actor	32-bit windows 2003 server running JDK 1.6 in client mode	2 cpu Core Due (4 cores total)
sampler_label	aggregate_report_count	aggregate_report_rate	aggregate_report_bandwidth
/start/initDivisionalIvr.vxml	6537	21.78194068	35.07560601
/welcome/spanishDivisionalIvr_rc.vxml	6532	21.94147819	55.83934783
/welcome/postWelcome.vxml	6532	21.94199413	146.6590149
/identify/disambiguateAniOrCed.vxml	6443	21.9398909	44.60519867
/identify/gatherGetServiceInfoResult.vxml	6441	21.69716936	119.9142785
TOTAL	32485	106.8501171	393.0813908

Here is the test I ran and noticed this thread activity

By creating more Actors, I noticed more of the threads where actually running verse wait state.

Final Actor count:
Broadcast Messages	10
Outage	10
Identify	4
GetServiceInfo	8
Payment	4
Appointment	10

This was the most actors in most any combination I could get without 1 of 2 things happening.

1. If I increased all actors evenly, the slow Actors where backing up while the faster actors where more idol so there where many threads that where not running waiting for the other actors to free up.

2. I then wanted to increase the number of slow actors to get more through-put but then after the exact number I had above, I started getting HTTP transport errors like this:

Caused by: com.sun.xml.internal.ws.client.ClientTransportException: HTTP transport error: java.net.BindException: Address already in use: connect
 at com.sun.xml.internal.ws.transport.http.client.HttpClientTransport.getOutput(Unknown Source)
 at com.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.process(Unknown Source)
 at com.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.processRequest(Unknown Source)
 at com.sun.xml.internal.ws.transport.DeferredTransportPipe.processRequest(Unknown Source)
 at com.sun.xml.internal.ws.api.pipe.Fiber.__doRun(Unknown Source)
 at com.sun.xml.internal.ws.api.pipe.Fiber._doRun(Unknown Source)
 at com.sun.xml.internal.ws.api.pipe.Fiber.doRun(Unknown Source)
 at com.sun.xml.internal.ws.api.pipe.Fiber.runSync(Unknown Source)
 at com.sun.xml.internal.ws.client.Stub.process(Unknown Source)
 at com.sun.xml.internal.ws.client.sei.SEIStub.doProcess(Unknown Source)
 at com.sun.xml.internal.ws.client.sei.SyncMethodHandler.invoke(Unknown Source)
 at com.sun.xml.internal.ws.client.sei.SyncMethodHandler.invoke(Unknown Source)
 at com.sun.xml.internal.ws.client.sei.SEIStub.invoke(Unknown Source)
 at $Proxy138.lookupApplicationConfigurationProperties(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor163.invoke(Unknown Source)

I found this reference to the error: http://java.net/jira/browse/JAX_WS-485
But I still need to research if the Spring-ws implementation, which is the web service each of these Actors is calling has a keep-alive set or not. I know Akka has the keep-alive set to 60000/ms.

Some observations

1. When I set core-pool-size=”1″ each actor spawned only 1 thread and each thread was used more.

2. When I set core-pool-size=”2″ I noticed each actor spawned 2 threads, and each of the threads sometimes where used interchangeably, but not always.

Conclusion

All-in-all I was able to get a good amount of traffic through-put. I feel there is more that I can achieve, but I might be limited by the operation teams choice of Windows 2003 running a 32-bit client mode jvm.

I think I can do further research to see if netstat -anop tcp will show that connections are indeed kept alive, and if not, look into the Spring-ws to see why this would not be the case.

I found further reading on testing the keep-alive http://forum.springsource.org/showthread.php?37961-TCPMon-setup-for-Spring-WS&s=e49f481d08f73e3a1c2fab8a11fad5fb

Hope this helps.

Mick Knutson

Java, JavaEE, J2EE, WebLogic, WebSphere, JBoss, Tomcat, Oracle, Spring, Maven, Architecture, Design, Mentoring, Instructor and Agile Consulting. http://www.baselogic.com/blog/resume

View all posts

Akka actor thread utilization and optimization

Some observations

Conclusion

Related

Mick Knutson

Java / JavaEE / Spring Boot Channel

Follow Me

Recent Posts

Recent Comments

BLiNC Supporters

BLiNC Adsense

Archives

Meta

Newsletter

Akka actor thread utilization and optimization

Some observations

Conclusion

Related

Mick Knutson

You may also like

Java 9 error starting eclipse Neon 3

Runnable class and Basic features of the Thread class Java JSE 8

Runnable class and Basic features of the Thread class Java JSE 8

Java / JavaEE / Spring Boot Channel

Follow Me

Recent Posts

Recent Comments

BLiNC Supporters

BLiNC Adsense

Archives

Meta

Newsletter