I have been running into a serious issue with Avaya Voice portal the past several month where Avaya (AVP) will freeze a caller resulting in a call to the customer that is ‘Dead-Air’ then the caller gets hung up on, or sometimes the dead-air lasts long enough where the caller just hangs up.
While investigating this issue, I have come across several interesting.
From one of the Avaya MPP servers that are servicing callers, there is an Apache server with 10 servers being instantiated to service requests. These requests are retrieving VXML content from a remote Tomcat server, and prompt resources from another remote Apache server.
The first thing I have been is monitoring how many sockets the Avaya application is using this command:
netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n; echo httpd processes: [`ps aux | grep httpd | wc -l`]
This is what I have been getting:
1 established) 1 Foreign 1 LAST_ACK 3 SYN_SENT 7 LISTEN 124 TIME_WAIT 170 CLOSE_WAIT 227 ESTABLISHED httpd processes: [10]
The issue I have with this is the sheer number of CLOSE_WAIT threads.
I was digging and found a reference to others having issues:
https://issues.apache.org/bugzilla/show_bug.cgi?id=50615
I have been tracking this for two months now, and notice there is not a hung CLOSE_WAIT as I have described in another blog of mine XXX. But they do seem to last for quite a while. I have estimated the CLOSE_WAIT’s last for around 30 seconds before going back into TIME_WAIT and this was from a verbal discussion I had with an Avaya engineer.
I started looking at what communication the MPP was having with an individual Tomcat server and found:
$ netstat -an | grep -c "xxx.xxx.xxx.xxx:8080"; netstat -an | grep "xxx.xxx.xxx.xxx:8080" 23 tcp 0 0 xxx.xxx.xxx.xxx:45525 yyy.yyy.yyy.yyy:8080 TIME_WAIT tcp 0 0 xxx.xxx.xxx.xxx:45186 yyy.yyy.yyy.yyy:8080 TIME_WAIT tcp 1 0 xxx.xxx.xxx.xxx:45830 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT tcp 0 0 xxx.xxx.xxx.xxx:45836 yyy.yyy.yyy.yyy:8080 TIME_WAIT tcp 1 0 xxx.xxx.xxx.xxx:45832 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT tcp 1 0 xxx.xxx.xxx.xxx:45864 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT tcp 0 0 xxx.xxx.xxx.xxx:45889 yyy.yyy.yyy.yyy:8080 TIME_WAIT tcp 1 0 xxx.xxx.xxx.xxx:45918 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT tcp 1 0 xxx.xxx.xxx.xxx:45932 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT tcp 1 0 xxx.xxx.xxx.xxx:45930 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT tcp 1 0 xxx.xxx.xxx.xxx:45967 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT tcp 1 0 xxx.xxx.xxx.xxx:45989 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT tcp 0 0 xxx.xxx.xxx.xxx:45997 yyy.yyy.yyy.yyy:8080 TIME_WAIT tcp 1 0 xxx.xxx.xxx.xxx:46003 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT tcp 0 0 xxx.xxx.xxx.xxx:46022 yyy.yyy.yyy.yyy:8080 ESTABLISHED tcp 0 0 xxx.xxx.xxx.xxx:46031 yyy.yyy.yyy.yyy:8080 ESTABLISHED tcp 0 0 xxx.xxx.xxx.xxx:46028 yyy.yyy.yyy.yyy:8080 ESTABLISHED tcp 0 0 xxx.xxx.xxx.xxx:46037 yyy.yyy.yyy.yyy:8080 ESTABLISHED tcp 0 0 xxx.xxx.xxx.xxx:45587 yyy.yyy.yyy.yyy:8080 TIME_WAIT tcp 1 0 xxx.xxx.xxx.xxx:45743 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT tcp 1 0 xxx.xxx.xxx.xxx:45758 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT tcp 0 0 xxx.xxx.xxx.xxx:45766 yyy.yyy.yyy.yyy:8080 TIME_WAIT tcp 1 0 xxx.xxx.xxx.xxx:45801 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT
Now I ran the inverse from the target Tomcat server to find far less connections, and most where in FIN_WAIT_2 state.
C:\Documents and Settings\MKnuts6173c>netstat -an | grep -c "yyy.yyy.yyy.yyy" 17 C:\Documents and Settings\MKnuts6173c>netstat -an | grep "yyy.yyy.yyy.yyy" TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:45801 FIN_WAIT_2 TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:45930 FIN_WAIT_2 TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:45932 FIN_WAIT_2 TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:45967 FIN_WAIT_2 TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:46022 FIN_WAIT_2 TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:46028 FIN_WAIT_2 TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:46037 FIN_WAIT_2 TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:46056 FIN_WAIT_2 TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:46067 ESTABLISHED TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:46071 FIN_WAIT_2 TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:46107 ESTABLISHED TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:46118 ESTABLISHED TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:46121 ESTABLISHED TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:46134 ESTABLISHED TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:46137 ESTABLISHED TCP yyy.yyy.yyy.yyy:8080 xxx.xxx.xxx.xxx:46145 ESTABLISHED TCP yyy.yyy.yyy.yyy:8090 xxx.xxx.xxx.xxx:46560 FIN_WAIT_2 TCP yyy.yyy.yyy.yyy:8090 xxx.xxx.xxx.xxx:46941 FIN_WAIT_2
This appears that Avaya is opening sockets to remote resources, holding the sockets until the entire call is completed, then sending the final ACK
Reference to others having issues: https://issues.apache.org/bugzilla/show_bug.cgi?id=50615
A normal TCP connection in terms of protocol looks as follows:
1. The client sends a SYN to the server.
2. The server responds with a SYN and ACK to the client.
3. The client responds with an ACK to the server.
Connection is now established and data transfer takes place (the steps above are known as a 3 way handshake).
When the server is closing the connection, the following sequence takes place:
4. The server sends a FIN and an ACK to the client.
5. The client sends an ACK to the server.
6. The client sends its own FIN and ACK to the server
7. The server sends and ACK to the client.
(NOTE that if it is the client that closes the connection, then you should replace server with client and client with server in steps 4 – 7 since a close on the connection can be called from both sides).
Connection is now closed and the ports are reset and are available for the next connection. You will see below a Network Monitor trace that hopefully will illustrate the above better.
So, where does the CLOSE_WAIT and FIN_WAIT_2 states come into play in the scenario above? In the scenario above the ports are in this state after step 5.
On the client the port will be in CLOSE_WAIT:
TCP xxx.xxx.xxx.xxx:45801 yyy.yyy.yyy.yyy:8080 CLOSE_WAIT
On the server the port will be in FIN_WAIT_2:
TCP yyy.yyy.yyy.yyy:8090 xxx.xxx.xxx.xxx:46941 FIN_WAIT_2
The client has just sent the first ACK and the server is waiting for the FIN and ACK from the client. This state is normally very short lived, But for some reason on our servers, they are lasting quite a long time.
Conclusion
The Avaya Voice browsers seem to manage Threads in a very messy way and talking with the Avaya engineers, they state there is nothing wrong with their application but cannot seem to explain why this is occurring and also do not seem to actually know how Sockets work. I was not able to find an Avaya engineer that understood (admittedly) how socket worked nor what CLOSE_WAIT and FIN_WAIT_2 even meant.
I am very worried about the MPP’s in production and I feel this issue is directly related to several different call related errors including ‘dead-air’, ‘dropped-calls’ and ‘caller session-reset’ and am still investigating these issues even though I feel I have no hope to fix the issues, just address them and mitigate these issues on Avaya’s behalf. How nice of me right 😉
Recent Comments