RE: [coldbox:11671] Re: Coldbox Upgrade

bdw429s · September 19, 2011, 4:09pm

I would be interested in having you report back some of what you changed after you get things working smoothly.

I’ve been reading the thread, but was away from a full-size keyboard most of the weekend.

A few other suggestions I don’t think I saw mentioned

Were you able to get some stack traces of what the JVM was doing when memory started spiking? Even though the requests seemed to be all over your site, sometimes the stack will show you they were all doing something related. (like client storage access for instance)
Hmm, now that I mention number 1, where ARE your client vars stored? Make sure they’re not in the registry.
If you are thinking this is a JVM GC issue, have you enabled verbose memory logging for the JVM? This is invaluable information if you know how to decipher it. You need to add some JVM args and specify a log file location. The file you get will show you the exact size of each of your heap spaces (new/old) as well as your perm gen (which lives outside the heap). By the way, I second the earlier comment that 1 Gb is excessive for perm gen. It will also tell you how often GCs are running, AND how long they are taking. I would highly recommend using the IBM Workbench GC log analyzer. It turns that giant file of numbers into graphs. From there you will be equipped to make changes to your GCs via additional JVM args such as frequency of full GCs. (There’s two types of GCs, “minor” which only affects your young gen I believe and stop-the-world “full” gc’s in which the full heap is processed and items in young gen are promoted to old gen etc.) You can also tune the ratio of generations in your heap based on whether you have more young objects or old.
If/when memory spikes, you could try to grab a heap dump of the JVM. This will create a giant file as large as your heap which is a snapshot of every object in memory. It can be introspected with tools such as the Eclipse Memory Analyzer to find memory leaks and dominators as well as GC roots. It’s not for the faint of heart, but it’s one of the gritty tools of Java debugging.
Also, I’ve never used this one in a live environment, but if you are willing to switch VMs (this would be better suited for your dev or staging server), you can use Oracle’s jRockit Mission Control to achieve the same thing as the heap dump-- but it’s a live look at your heap and will show you which objects are trending and how many instances there are. Again, it’s pretty gritty, but I’ve used it a couple times to prove memory leaks at the Java level. The biggest trick with these tools is “translating” all the java objects into what CF sees. For instance, a simple array of numbers in CF is probably several dozen Java Objects behind the scenes.
JRUN metrics can also be enabled (assuming you are using JRun) by turning them on in an XML file. They report memory usage and hits (which is info you can easily get from fusion-reactor), but they also tell you a few other things such as number of active sessions which can be useful.

Good Luck. As a disclaimer I haven’t used any of these tools on 64-bit machines.
Here are some links for you to read up on JVM logging, heap dumps, Mission Control, and Jrun logging.

verbose JVM logging:

http://java.sun.com/developer/technicalArticles/Programming/GCPortal/
Reading verbose GC logs:

http://www.talkingtree.com/blog/index.cfm/2010/9/9/JVM-Memory-Management-and-ColdFusion-Log-Analysis
Analyzing verbose GC logs automatically:

http://www.ibm.com/developerworks/java/library/j-ibmtools2/index.html

Heap dump Memory Analyzer Tool and JRockit Mission Control.

http://www.ghidinelli.com/2009/07/16/finding-memory-leaks-coldfusion-jvm

Enabling and reading JRun metrics logging:
http://kb2.adobe.com/cps/191/tn_19120.html
http://www.cfwhisperer.com/post.cfm/10-steps-to-a-stable-and-performant-web-application-step-2

Thanks!

~Brad

Jonathan_Perret · September 19, 2011, 4:50pm

Thanks for the info. Right now what we have done is moved another app we have to the new 64 bit server. It has about the same amount of load. It has been running very nicely for the last almost 3 hours. This app has much less code than the other. Our thought process is if this app can run then it must be a problem in our other app. If so we will start breaking it down handler by handler trying to find the problem using load testing info Mark sent us.

Some things we changed going from 2.6 to 3.1.

We implemented injection. When doing so we had some old code were some beans that were scoped to functions we were calling. I don’t know why the previous person did that. We modified the code so simply pass the bean to the function now for processing.
We implemented SES. So we also introduced buildlink.
We introduced some new features to the website that our users don’t know about yet and those portions of the site haven’t been hit during any of request history leading up to the crashes.
We rewrote most of our handlers into script due to personal coding preferences.

I’ve ran var scoper on the app and fixed some minor things it pointed out but it seems taht did not produce much change in the results.

What’s the best way for me to get a stack trace? Is this still valid? http://kb2.adobe.com/cps/183/tn_18339.html

The link about stack traces with jstack went to a page on oracle’s site that wasn’t found.
When I run jps from my jdk bin directory it doesn’t give me back the proccessid so I can try to run the other command noted on the web page.

Thanks.

Jonathan

Jonathan_Perret · September 19, 2011, 5:02pm

I try to get the stack but I get.

C:\Program Files\Java\jdk1.6.0_27\bin>jstack 4968
4968: Not enough storage is available to process this command

So maybe I need to scale back the memory to the jvm and then try to get the stack.

Mark_Mandel · September 19, 2011, 9:25pm

Glad to hear you are starting to resolve your issues!

I’m also a fan of doing heap snapshots. You can actually give your JVM args to create a heap snapshot as soon as your server OOMs. Brad is right in that you really need to know your way around the CF internals, but it can be a great way to work out if it’s a memory leak or not, and what is actually taking up all the memory.

Mark

whostheJBoss1 · September 19, 2011, 9:59pm

Do you happen to have any of the auto reload options turned on, such as configAutoReload, etc?

Jonathan_Perret · September 19, 2011, 10:14pm

Some of my settings.

handlersIndexAutoReload =
false,
configAutoReload = false,
handlerCaching = true,
eventCaching = true,
proxyReturnCollection = false,
flashURLPersistScope = “session”

lmajano · September 20, 2011, 3:58pm

Mark do you mind posting that args for the heap dumps.

Luis Majano
CEO
Ortus Solutions, Corp
Toll free phone/fax: 1-888-557-8057
Mobile: 909-248-3408
www.ortussolutions.com
www.coldbox.org

RE: [coldbox:11671] Re: Coldbox Upgrade

Latest Ortus News

GIVING BACK

HARVESTING IN SPANISH

WOODSEDGE