I would be interested in having you report back some of what you changed after you get things working smoothly.
I’ve been reading the thread, but was away from a full-size keyboard most of the weekend.
A few other suggestions I don’t think I saw mentioned
-
Were you able to get some stack traces of what the JVM was doing when memory started spiking? Even though the requests seemed to be all over your site, sometimes the stack will show you they were all doing something related. (like client storage access for instance)
-
Hmm, now that I mention number 1, where ARE your client vars stored? Make sure they’re not in the registry.
-
If you are thinking this is a JVM GC issue, have you enabled verbose memory logging for the JVM? This is invaluable information if you know how to decipher it. You need to add some JVM args and specify a log file location. The file you get will show you the exact size of each of your heap spaces (new/old) as well as your perm gen (which lives outside the heap). By the way, I second the earlier comment that 1 Gb is excessive for perm gen. It will also tell you how often GCs are running, AND how long they are taking. I would highly recommend using the IBM Workbench GC log analyzer. It turns that giant file of numbers into graphs. From there you will be equipped to make changes to your GCs via additional JVM args such as frequency of full GCs. (There’s two types of GCs, “minor” which only affects your young gen I believe and stop-the-world “full” gc’s in which the full heap is processed and items in young gen are promoted to old gen etc.) You can also tune the ratio of generations in your heap based on whether you have more young objects or old.
-
If/when memory spikes, you could try to grab a heap dump of the JVM. This will create a giant file as large as your heap which is a snapshot of every object in memory. It can be introspected with tools such as the Eclipse Memory Analyzer to find memory leaks and dominators as well as GC roots. It’s not for the faint of heart, but it’s one of the gritty tools of Java debugging.
-
Also, I’ve never used this one in a live environment, but if you are willing to switch VMs (this would be better suited for your dev or staging server), you can use Oracle’s jRockit Mission Control to achieve the same thing as the heap dump-- but it’s a live look at your heap and will show you which objects are trending and how many instances there are. Again, it’s pretty gritty, but I’ve used it a couple times to prove memory leaks at the Java level. The biggest trick with these tools is “translating” all the java objects into what CF sees. For instance, a simple array of numbers in CF is probably several dozen Java Objects behind the scenes.
-
JRUN metrics can also be enabled (assuming you are using JRun) by turning them on in an XML file. They report memory usage and hits (which is info you can easily get from fusion-reactor), but they also tell you a few other things such as number of active sessions which can be useful.
Good Luck. As a disclaimer I haven’t used any of these tools on 64-bit machines.
Here are some links for you to read up on JVM logging, heap dumps, Mission Control, and Jrun logging.
verbose JVM logging:
http://java.sun.com/developer/technicalArticles/Programming/GCPortal/
Reading verbose GC logs:
http://www.talkingtree.com/blog/index.cfm/2010/9/9/JVM-Memory-Management-and-ColdFusion-Log-Analysis
Analyzing verbose GC logs automatically:
http://www.ibm.com/developerworks/java/library/j-ibmtools2/index.html
Heap dump Memory Analyzer Tool and JRockit Mission Control.
http://www.ghidinelli.com/2009/07/16/finding-memory-leaks-coldfusion-jvm
Enabling and reading JRun metrics logging:
http://kb2.adobe.com/cps/191/tn_19120.html
http://www.cfwhisperer.com/post.cfm/10-steps-to-a-stable-and-performant-web-application-step-2
Thanks!
~Brad