Hello all, after releasing ColdBox 3.5 to our production servers as part of a release last week, we began seeing very weird behaviors across all 4 of our production servers which adversely affected performance to the point where we eventually had to revert back to ColdBox 3.1 until I can figure out the source of the issues.
I haven’t ruled out the possibility of something stupid going on in my own code base, but I ran many hours of jmeter tests and can very reliably produce the behaviors with 3.5 and they completely go away as soon as I swap out for the previous version of ColdBox. I wanted to see if anyone else has seen these issue at all. My environment is 64-bit Windows, CF9 Ent, IIS7.
The three main behaviors I saw were:
- Randomly hanging requests. These would pop up within 20-30 minutes of any load on the server, and affected any page on the site. They would time out after about 5 minutes. A number of different pages would hang, but in EVERY case, the line of code was accessing a CGI-scoped variable and the top part of the stack trace was identical. My understanding of CGI vars is that CF talks to IIS over the JRUN connector and asks it for the headers on the request. The threads were all stuck at a socketRead() and could not be killed (though they would time out eventually)
Full thread dump Java HotSpot™ 64-Bit Server VM (14.3-b01):“jrpp-6” runnable
at java.net.SocketInputStream.socketRead0(Native Method)
- Higher-than-normal memory usage My servers are normally happy around 1.5 to 2.5 GBs of heap used. Memory usage jumped to 4 - 6GB of heap in use. (Total heap is 8GB) I think there’s a chance the high memory usage was caused by the hanging requests, but that’s just a hunch.
- Null Pointer Exceptions on every request The worst behavior of all is after a few hours of uptime, servers would just completely fall over and begin returning the following error on every single request. What was interesting was our non-ColdBox legacy site on the same server would respond fine, but the ColdBox site would just begin erroring out on every page hit. This would continue until the ColdFusion service was restarted. The error was deep in the bowels of Java and usually seems to be related to resolving interface implemntations such as the trace below which was thrown when Controller.cfc tried to create an instance of coldbox.system.ioc.Injector. It’s interesting to note that Injector implements an interface in 3.5, but not in the previous version we rolled back to.
Let me know if any of you recognize any of these errors or if you have any ideas. I’m going to keep running tests to see if I can narrow down what causes these behaviors.