CB 3 LTE M5 cache reap bugs

Since we released our app on M4 a couple weeks ago we've been
experiencing performance issues where, under load, the server would
get bogged down and eventually max out on running requests and spin
until I restart CF. We use the coldbox cache for quite a few short-
lived and medium-lived objects. One of those objects is a large
result set used for security that is loaded into cache when a user
logs in (if it doen't already exist). Generating it is pretty
intensive, but since it only happens about once per login it's ok.
When hanging, SeeFusion showed me that pretty much every running
request was trying to generate this large security query over and over
again.
Stack traces showed me that the eviction policy was running over and
over and the CB cache debug panel showed my cache was way over the
limit of items (400), and a large number of of those items were
expired but hadn't been reaped.

So, long story short, the reap() method in CB 3 cache manager from
alpha to M5 has a couple bugs. In previous versions (2.6.3), a copy
of the pool metadata was obtained with
getObjectPool().getpool_metadata() and that struct was used to look up
cache items and see if they were expired and clear them. The newer
version replaced that with the lookup() method which not only ignores
expired objects (so they were never getting cleared and remaining in
the cache causing constant evictions) but also registers a hit against
them (resetting lastAccessed and inflating the hit count which throws
off your eviction policy). Secondly, the clearKey() method used the
lookup() method as well which prevented it from even clearing anything
since lookup said it didn't exist.

Now, having said all that-- from the looks of the code, M6 has cleared
all this up with the way indexer's objectExists method works. (Very
happy to see the indexer caches metadata now Luis-- I was going to
suggest that be cached!)

So basically, I put all this out here in case anyone is using
milestone 5 or less of CB 3 and having memory/cache issues like what I
described above. Until you upgrade to the latest milestone, you can
fix the problem fairly easily with some changes to the cache manager.
Ask if you need more specifics.

Hopefully I'll be able to upgrade to M6 soon, but I've got to check my
release schedule since we like to give any new version of ColdBox a
while to sit on dev while we (hopefully) find any kinks. Of course,
our lack of load testing kept us from noticing this bug until we had
production traffic hitting the site.

Thanks!

~Brad

Great stuff Brad,

As an FYI, the code in github at this point in time is code freezed (except the IOC and Wirebox integrations). Basically, if all goes well, the code in github will be the RC release and hopefully the final release. Most of the issues you spoke about have been addressed in CacheBox and its performance has both been improved and stabilized.

So please evaluate what’s on github at this point in time, as if all goes well, I will be releasing RC1 by the weekend.

Luis F. Majano
President
Ortus Solutions, Corp

ColdBox Platform: http://www.coldbox.org
Linked In: http://www.linkedin.com/pub/3/731/483
Blog: http://www.luismajano.com
IECFUG Manager: http://www.iecfug.com

Sweet. So is there a SVN tag for RC1 right now, or is it just the main M6 branch?
It sounds like I should hold off on M6 and wait to upgrade straight to RC1

Thanks!

~Brad

yes, hold off on m6. I have not tagged it yet, waiting to see if I can release wirebox 1.0 with it also.

Luis F. Majano
President
Ortus Solutions, Corp

ColdBox Platform: http://www.coldbox.org
Linked In: http://www.linkedin.com/pub/3/731/483
Blog: http://www.luismajano.com
IECFUG Manager: http://www.iecfug.com

yes, hold off on m6. I have not tagged it yet, waiting to see if I can release wirebox 1.0 with it also.

Speaking of wirebox, I hope it isn’t too late to mention a couple things on my wish list for that… :slight_smile:

Right now, the beanFactory is a major bottle neck for my app. We’ve had to cut down on pages that create a lot of transient objects-- not because of how long it takes ColdFusion to create objects, but how long it takes getModel() and autoWire() to run on them and inject their dependencies. While observing some stack traces on our production environment the other day, I made some notes:

The first operation that I noticed appearing frequently in my stack traces is fileExists() and expandPath(). This was mostly from resolving the location of the model to be created over and over again. Since it involves file system access, it is understandably slow. The location of models is a prime peice of information to cache. I haven’t tried M6 yet, but was digging around on subversion the other day was was very happy to see that it appears you already addressed this with the instance.refLocationMap struct.

The second item is all calls to getMetaData(). This function is evil incarnate because it hits the file system and I always see it in my stack traces getting the last datetime modified of the component file. (I don’t don’t quite understand this since we use trusted cache) getMetaData() is called several times for every model that is created, including in the debug logging which means it gets called regardless of whether or not you are even using debug-level logs from coldbox.system. I would like to suggest that we cache metaData for models that the beanFactory creates. Once we have aquired the metaData for a give compnent path, I can’t think of a good excuse to perform that expensive task again. I would love it if once my app has started up and created one of every model I have, getMetaData is never called again.

The last item is probably the worst offender. It is the exclusive named lock that the beanFactory wraps around model creation. I have one example in my app of a search result screen that calls getModel() to retrieve a transient object once for each result returned. When several of these requests were happening concurrently, they spent a LOT of time waiting for CF locks since only one of them could create an instance of that particular object at a given time. I think the named lock is desirable for any model object being cached or following a singleton pattern since we need to ensure that only one copy is created. However, for my transients that is not the case. Even in the case where a transient is being injected with a singleton, I can’t think of a reason why two simultaneous requests can’t have their own instance of the same transient object created at the same time. I think part of the problem with the current approach is that you don’t know if you are dealing with a singleton until you get the components meta data, which you can’t do until you have created it-- and that’s after you open the named lock. This is where the caching of meta data comes back into play. If we know that “com.myapp.security.securityBean” is not going to be cached BEFORE we create it, we can skip the named lock entirely and avoid single-threading that request as well as any other requests currently requesting an instance of that object. I would expect that the first time a model was retreived it would need to be locked for good measure until we were able to procure and cache its meta data. If it wasn’t for CF7 compatibility, you could simply use getComponentMetaData() to find out about the component PRIOR to creating it the first time. And to clarify, for components that we know are singletons or being cached, I would still expect the named lock to always be used while creating them.

Thoughts?

Thanks!

~Brad

Sorry, meant to include this example in my previous post. As an example of just how slow it is to create objects with the bean factory, here is a little test script I ran on production (with production settings, caching, etc) on CB 3.0 M4



The first example creates 150 of my user beans directly with createObject which skips the autowiring. The second example asks the beanFactory for 150 autowired user beans. Here is the output:

createObject(): 110ms
beanFactory: 7984ms

That’s 70 times slower! And 8 seconds is a pretty long time to wait for 150 objects. If I need to create more than a couple hundred transients on a page, it gets insanely slow.

In case you’re curious, the user bean I used in my example has 7 dependencies being autowired into it and they are all singletons so they already exist in the cache and do not need created.

Thanks!

~Brad

I have seen simliar behavior in development. It has never been an issue for me as most of my autowired model objects or handlers are singletons in cache, but in development I have seen a hit if I need to autowire a number of dependencies.

We will have to have a look and see if we can speed it up.

Curt Gratz

Also, just to put this out there, I would be more than happy to make some of my suggested changes to the beanFactory and submit them back, but at this point I assume it's in flux as Luis is working on it for WireBox so I'd just be stepping on toes.

~Brad

I was curious if getMetadata was that much slower then
getComponentMetadata data, so I ran a quick test. Assuming the object
is already created, the results where the same. If you needed to create
the object(which is obviously needed for getMetadata), then the
getMetadata was slower.

Not a completely scientific test I am sure.

Curt

I kind of figured both would be just as evil since they both hit the file system. I find it interesting that getMetaData() seems to re-parse the cfc file. Apparently it can't get everything it needs from the object instance. Either way, that's why I would prefer to completely stop calling the getMetaData function all together once everything is cached.

FWIW, I'm on ACF8 with JRE 1.6.0_12 on Windows.

~Brad

Brad, Duly noted!!

  1. The first issue you discussed. file system access was fixed on the refLocation maps, so discovery is only done once. This was on my list and it will still be improved in wirebox final, so only once is the object discovered via scan locations. Right now there is also a “side effect” lol, that if you do mappings, it still tries to discover it instead of making the mapping static. So that will change in wb 1.0 where the mappings will give you a static destination, thus avoiding lookup.

  2. The second item is all calls to getMetaData().
    This is also on my list of addressing. As of now, a Mapping object will represent a binding configuration for a transient, singleton, etc. Basically a signature of the CFC or other type. Then getMetadata() will only be done once and cached in the mapping. So this is definitely on the list.

  3. Named lock,
    Great idea brad, I will keep this in mind for transient or no-scope objects and only lock whenever a scope is defined.

I will be showcasing some of the wirebox development soon. Just need to iron out some stuff.

Luis F. Majano
President
Ortus Solutions, Corp

ColdBox Platform: http://www.coldbox.org
Linked In: http://www.linkedin.com/pub/3/731/483
Blog: http://www.luismajano.com
IECFUG Manager: http://www.iecfug.com

Brad,

I just committed tonight an update to the bean factory so it uses the
new mixer utility I have consolidated. Wirebox's approach will use
the new mixer utility, but I am finalized the metadata retrievals in
the other branch. However, can you please try your performance tests
now.

Thanks

Awesome, Luis. I'll give this another look.

I actually just installed CF Builder (Switching from DW) last week and we switched over (finally!) to using subversion at work so I have been learning how to check out repositories and such.

Can you help me with the URL of the coldbox repo, and are your latest changes in the trunk?

Thanks

~Brad

You will have to use Git to download from github: https://github.com/ColdBox/coldbox-platform or click on the download
Luis F. Majano
President
Ortus Solutions, Corp

ColdBox Platform: http://www.coldbox.org
Linked In: http://www.linkedin.com/pub/3/731/483
Blog: http://www.luismajano.com
IECFUG Manager: http://www.iecfug.com

Has anyone got the nightly build script to push to Subversion as yet?

For about 15 of my projects that are SVN it is a pain in the ass to merge the changes between what I have made and what is in GIT for ColdBox.

Regards,

Andrew Scott

http://www.andyscott.id.au/

OK, Luis, I had a chance this morning to revisit my test timing the
creation and wiring of objects using what I got off the nightly build
a few days ago.
I had to change my config to use a "cacheSettings" struct instead of
"cacheEngine" to keep it from running in compat mode. I couldn't find
anything about that online, so I assume it was a recent change that
will make it into the final release notes.

I re-ran my test of creating 150 objects with createObject and 150 of
the same object with getModel for a baseline using ColdBox 3 RC1:
createObject(): 62ms
beanFactory: 3656ms

Using the nightly of ColdBox, I see this:
createObject(): 63ms
beanFactory: 1487ms

That is certainly an improvement. If I turn on trusted cache and put
everything in production mode I can get the bean factory time down to
1 second. Still, about 15 times slower than a naked createObject
though. Here's what I'm still seeing in my stack traces as time hogs:

expandPath() calls in the locateModel method of the beanFactory.cfc
(File system access)
fileExists() calls in locateModel method of beanfactory.cfc (file
system access)
getMetaData() calls in getModel method of beanFactory.cfc (file system
access)
getMetaData() calls in autoWire method of beanFactory.cfc (file system
access)

Those pretty much still align with my original suggestions, which are
to wipe out calls to expandPath, fileExists, and getMetaData() after
the first time we create an object and cache that info.

Now, my test above was a single request. You'll also remember we have
some performance problems as well that only show up with concurrent
request. Above I show that it takes about 1487ms for a single request
to get a model from the beanFactory 150 times. When I run that
request 4 times simultaneously, it takes 3281 ms. That just doubled
the time! If I fire 20 simultaneous requests, it can take up to 10
seconds to create 150 objects. The waits were from our named locks in
beanFactory colliding. Again, my suggestion is to only lock the
creation of singletons if they don't already exist.

The only other thing I keep seeing in the stack traces is
java.lang.Class.getClassLoader0(Native Method) but I don't think
there's anything we can do about that. (I have Java 1.6.0_12)

Also, these tests are fairly close to the real-life scenario I'll get
when my marketing team releases an E-mail campaign to a few million
people and traffic starts flooding to certain pages on the site. If
I've got 20 people all hitting my search page and pulling back 150
results apiece and I am creating a transient for each result that
basically has the same effect as my test.

Thanks for your work, Luis!

~Brad

Thanks Brad,

The improvement is definitely a great thing to see. This is the only things I can do to the current infrastructure that will remain as compatibility mode also as it needs to be cf7 compatible for 3.0 (Unfortunately). However, I will be adding the final touches of wirebox 1.0, so we can give those a round of tests. Basically the things you mentioned are removed as location discovery is only done once, metadata retrieval is done once and locking is only done by life cycle scopes that require it. So transients/no-scope do not. Anyways, If you want to see the progress, download the wirebox branch and start digesting the code and maybe give me a code review (that would be great). There is still ways to go, but it is forming rather nicely!

Thanks again for this community effort, I think we can nail it down further to around 200ms or so.

Luis F. Majano
President
Ortus Solutions, Corp

ColdBox Platform: http://www.coldbox.org
Linked In: http://www.linkedin.com/pub/3/731/483
Blog: http://www.luismajano.com
IECFUG Manager: http://www.iecfug.com

I notice the same issues only when creating my handlers that have a lot
of autowiring happening. The afterHandlerCreation interception point
takes a long time to run and gets multiples slower if I run multiple
events per request.

Turn handler caching on and of course this issue is eliminated, but
annoying in development and could be an issue with transient objects.

Definitely worth a look.

Curt

Brad,

Is your test something you can share?

Thanks.

Aaron Greenlee
http://aarongreenlee.com/

I have been doing some tests today to see how fast a getMetadata() can be done on an instance and on my cf9 machine with 256mb heap for a 500 iteration it was below 2ms. I don’t know if that is so bad as sometimes I need to find out the unique path of the incoming instance:

getMetadata(arguments.target).name

In order to parse or do lookups. I have tried several approaches to identify the incoming CFC instance uniquely, but I cannot get faster than getMetadata().

Luis