I am working on a project where we have http requests from an IVR Voice browser making VXML requests. Each request is a phone caller and each new caller in our application will spawn 6 Akka actor requests for various account look-ups.
The issue I had was the concurrent new caller volume of ~20 was backing up the Akka queues and then timing out the akka requests. This was puzzling as the Dispatcher thread pool was set to a core size of 10 which seemed to spawn off 50 akka threads total.
Then we used JMeter to simulate new callers and as we started increasing up the number of callers, we noticed that Akk was not using the new threads.
The JMEter tests simulated 120 new callers with a 60 second ramp-up time and would loop for forever for a total of 300 seconds. Here is the result of that test showing only 1 thread used.
120 callers, 1 thread per actor | 64-bit mac | Core Due (2 cores) | |
# of Samples | requests/sec | KB/sec | |
/start/initDivisionalIvr.vxml | 891 | 2.826507629 | 4.551670977 |
/welcome/spanishDivisionalIvr_rc.vxml | 834 | 2.839313254 | 7.22583041 |
/welcome/postWelcome.vxml | 822 | 2.818176208 | 16.66339157 |
/identify/disambiguateAniOrCed.vxml | 813 | 2.801477581 | 5.709652063 |
/identify/gatherGetServiceInfoResult.vxml | 805 | 2.697043303 | 13.15092606 |
TOTAL | 4165 | 13.21257495 | 44.4114435 |
When I look at the threads that are being used, I noticed only 1 thread is active, thus why the performance is so slow.
We would fill up each Actors queue around 50-60 concurrent callers so the numbers where quite small. Not enough for our target traffic which needs be in excess of 300 concurrent callers per machine. We could just get many more machines, bu that is not the best approach.
Here is the akka configuration we had
<akka:dispatcher id="appt-dispatcher" type="executor-based-event-driven-work-stealing" name="dispatcher-appointments"> <akka:thread -pool queue="unbounded-linked-blocking-queue" fairness="true" core-pool-size="4" max-pool-size="4" keep-alive="60000" rejection-policy="caller-runs-policy"></akka:thread> </akka:dispatcher> <akka:typed -actor id="appointmentActor" interface="com.comcast.ivr.core.actors.AppointmentActor" implementation="com.comcast.ivr.core.actors.AppointmentActorImpl" timeout="${actor.appointment.timeout}" scope="singleton" depends-on="appointmentServiceClient,applicationProperties" lifecycle="permanent"> <akka:dispatcher ref="dispatcher-Appointment"></akka:dispatcher> <property name="appointmentServiceClient" ref="appointmentServiceClient"></property> </akka:typed>
Maybe I missed this in the documentation, but it seems that even though we created the actor as prototype, there was still only 1 instance of the Actor created, and then only 1 thread was being used because an Actor is bound to a queue. You can see that 1 thread per actor is working while the others are waiting.
In order to be able to use multiple threads per typed actor we needed to create an ActorRegistry and create multiple Actors per type.
Right now we created the actors individually. We are going to refactor this to be dynamic instead of static, but for our proof of concept, this is what we are going use:
<akka:dispatcher id="appt-dispatcher" type="executor-based-event-driven-work-stealing" name="dispatcher-appointments"> <akka:thread -pool queue="unbounded-linked-blocking-queue" fairness="true" core-pool-size="2" max-pool-size="12" keep-alive="60000" rejection-policy="caller-runs-policy"></akka:thread> </akka:dispatcher> <akka:typed -actor id="appointmentActor1" interface="com.comcast.ivr.core.actors.AppointmentActor" implementation="com.comcast.ivr.core.actors.AppointmentActorImpl" timeout="${actor.appointment.timeout}" scope="singleton" depends-on="appointmentServiceClient,applicationProperties" lifecycle="permanent"> <akka:dispatcher ref="appt-dispatcher"></akka:dispatcher> <property name="appointmentServiceClient" ref="appointmentServiceClient"></property> </akka:typed> <akka:typed -actor id="appointmentActor2" interface="com.comcast.ivr.core.actors.AppointmentActor" implementation="com.comcast.ivr.core.actors.AppointmentActorImpl" timeout="${actor.appointment.timeout}" scope="singleton" depends-on="appointmentServiceClient,applicationProperties" lifecycle="permanent"> <akka:dispatcher ref="appt-dispatcher"></akka:dispatcher> <property name="appointmentServiceClient" ref="appointmentServiceClient"></property> </akka:typed> ... more actors omitted ...
This is the initial Actor Load balance Registry.
import java.util.Random; import static akka.actor.Actors.registry; public class ActorLoadBalancer { // TODO: Implement proper load balancing algorithm using CyclicIterator @SuppressWarnings("unchecked") public static T actor(Class targetClass) { Object[] workers = registry().typedActorsFor(targetClass); // Routing.loadBalancerActor(new CyclicIterator(Arrays.asList(workers)); int actorNumber = new Random().nextInt(workers.length); return (T) workers[actorNumber]; } }
We started with 4 actors for each registry at first then ran the same JMeter test with 120 callers and our numbers where an order of magnitude better
120 callers, 4 Actors | 32-bit windows 2003 server running JDK 1.6 in client mode | 2 cpu Core Due (4 cores total) | |
sampler_label | aggregate_report_count | aggregate_report_rate | aggregate_report_bandwidth |
/start/initDivisionalIvr.vxml | 4692 | 14.97601348 | 24.11664671 |
/welcome/spanishDivisionalIvr_rc.vxml | 4663 | 15.63133776 | 39.78053341 |
/welcome/postWelcome.vxml | 4662 | 15.63275434 | 104.8341055 |
/identify/disambiguateAniOrCed.vxml | 4607 | 15.65893633 | 31.91425794 |
/identify/gatherGetServiceInfoResult.vxml | 4607 | 15.42448298 | 85.53283018 |
TOTAL | 23231 | 74.14914092 | 273.2926494 |
Wen I looked at the thread usage, I was able to see better thread utilization for the actor threads:
But I noticed there where still some threads that where backing up so I created 3 of the actors with 10 instances, and the others stayed with 4. I can to this numbering with trial-and-error to see how many I could use. But I was still able to get another large improvement
120 callers, up to 10 Actor | 32-bit windows 2003 server running JDK 1.6 in client mode | 2 cpu Core Due (4 cores total) | |
sampler_label | aggregate_report_count | aggregate_report_rate | aggregate_report_bandwidth |
/start/initDivisionalIvr.vxml | 6537 | 21.78194068 | 35.07560601 |
/welcome/spanishDivisionalIvr_rc.vxml | 6532 | 21.94147819 | 55.83934783 |
/welcome/postWelcome.vxml | 6532 | 21.94199413 | 146.6590149 |
/identify/disambiguateAniOrCed.vxml | 6443 | 21.9398909 | 44.60519867 |
/identify/gatherGetServiceInfoResult.vxml | 6441 | 21.69716936 | 119.9142785 |
TOTAL | 32485 | 106.8501171 | 393.0813908 |
Here is the test I ran and noticed this thread activity
By creating more Actors, I noticed more of the threads where actually running verse wait state.
Final Actor count: | |||
Broadcast Messages | 10 | ||
Outage | 10 | ||
Identify | 4 | ||
GetServiceInfo | 8 | ||
Payment | 4 | ||
Appointment | 10 |
This was the most actors in most any combination I could get without 1 of 2 things happening.
1. If I increased all actors evenly, the slow Actors where backing up while the faster actors where more idol so there where many threads that where not running waiting for the other actors to free up.
2. I then wanted to increase the number of slow actors to get more through-put but then after the exact number I had above, I started getting HTTP transport errors like this:
Caused by: com.sun.xml.internal.ws.client.ClientTransportException: HTTP transport error: java.net.BindException: Address already in use: connect at com.sun.xml.internal.ws.transport.http.client.HttpClientTransport.getOutput(Unknown Source) at com.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.process(Unknown Source) at com.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.processRequest(Unknown Source) at com.sun.xml.internal.ws.transport.DeferredTransportPipe.processRequest(Unknown Source) at com.sun.xml.internal.ws.api.pipe.Fiber.__doRun(Unknown Source) at com.sun.xml.internal.ws.api.pipe.Fiber._doRun(Unknown Source) at com.sun.xml.internal.ws.api.pipe.Fiber.doRun(Unknown Source) at com.sun.xml.internal.ws.api.pipe.Fiber.runSync(Unknown Source) at com.sun.xml.internal.ws.client.Stub.process(Unknown Source) at com.sun.xml.internal.ws.client.sei.SEIStub.doProcess(Unknown Source) at com.sun.xml.internal.ws.client.sei.SyncMethodHandler.invoke(Unknown Source) at com.sun.xml.internal.ws.client.sei.SyncMethodHandler.invoke(Unknown Source) at com.sun.xml.internal.ws.client.sei.SEIStub.invoke(Unknown Source) at $Proxy138.lookupApplicationConfigurationProperties(Unknown Source) at sun.reflect.GeneratedMethodAccessor163.invoke(Unknown Source)
I found this reference to the error: http://java.net/jira/browse/JAX_WS-485
But I still need to research if the Spring-ws implementation, which is the web service each of these Actors is calling has a keep-alive set or not. I know Akka has the keep-alive set to 60000/ms.
Some observations
1. When I set core-pool-size=”1″ each actor spawned only 1 thread and each thread was used more.
2. When I set core-pool-size=”2″ I noticed each actor spawned 2 threads, and each of the threads sometimes where used interchangeably, but not always.
Conclusion
All-in-all I was able to get a good amount of traffic through-put. I feel there is more that I can achieve, but I might be limited by the operation teams choice of Windows 2003 running a 32-bit client mode jvm.
I think I can do further research to see if netstat -anop tcp will show that connections are indeed kept alive, and if not, look into the Spring-ws to see why this would not be the case.
I found further reading on testing the keep-alive http://forum.springsource.org/showthread.php?37961-TCPMon-setup-for-Spring-WS&s=e49f481d08f73e3a1c2fab8a11fad5fb
Hope this helps.
Recent Comments