Coldbox: Very high CPU (It's RollingFileAppender vs FileAppender)

I’m going to start here with this post to see if I can get anywhere. We are in the process of upgrading our app from a much older Coldbox to 7.3.0+16. We’re doing this mostly one… file… at… a… time. Move, refactor as needed, spin it up, test the screens (rebuild the CSS, ugh).

One of the things we noticed, and solving it was pushed to me, was very high CPU usage, even when the site is completely idle. Now, for me, it’s running about 55-65% on my older i7. Coworker has a bigger, badder (and newer) i9, and he runs about 20-25% at idle.

So, various points:

  • This is happening on two vastly different machines, two different ends of the country.
  • Old Coldbox, unsure of version, Coldbox.cfc file in root has date of 10/10/2007 does NOT have this problem, only the new Coldbox.
  • No, there are no scheduled tasks running.
  • The CPU doesn’t crank up and stay up until our first Coldbox request. Restarting CF, even going into the CF admin doesn’t cause the CPU crank.
  • Coded added to the /config/Scheduler.cfc in both beforeAnyTask and afterAnyTask functions doesn’t show any output to the console as directed so pretty sure nothing is firing there.
  • Only modules are cfmessagebox, cbsecurity and cbstorages, all unmodified from original Commandbox installation.
  • Not running on Commandbox, but rather using Windows service styles. (CF 2023.0.11.330713). I’ve tried running with the service itself and my normal ConEmu batch file setup.
  • We do have many Wirebox objects, over 600. Many are services, most are “bean” models. Certainly not all of them are instantiated, just a few basic ones just after a CF restart and login.
  • Disabled virus scanner completely.
  • I have tested with just a stupid-simple (non-Coldbox) site comprised of 2 files, a basic application.cfc and an index.cfm file that just does a of its folder and outputs the count. The CPU always returns to normal after the request is done.
  • We built a “threads dump”, and there are many obj-skimmer-[all our datasources] threads. I can’t find the word “skimmer” in any of the Coldbox code, though. And I don’t have any idea if our old app has those going on, or even what those are at the moment.

After startup:
image
After CF admin login and page request and return to idle:
image
After a non-Coldbox page hit (simple cfdirectory page described above)
image
After last Coldbox app request, login event presented, sitting idle:
image
Note: The value (63%) fluctuates, absolutely implying something is happening.

Here’s that thread dump, if it helps at all: (Yes, I had to make it pretty!)

If anyone has any idea of where I can start, that’d be awesome.

I just used Commandbox to scaffold a new, clean 7.3.0+16 and it doesn’t have this behavior.

Still, if anyone has any ideas to look into, it would be appreciated.

Hi @CaptainPalapa the thread dump is your indicator of what’s going on. It would seem that Adobe is doing something once a request hits and keeps things running in the background.

Connect a Java Profiler to the engine
https://www.baeldung.com/java-profilers

That would tell you what threads are running and doing what. From your analysis it seems Adobe is doing “magic” in the background.

Okay, I am NOT a java guy. But finally got one of those programs to connect to my local JVM (?) and got some data. There are several of these pool-66-thread-XXX? gizmos.

So, I searched first for “proxyInvoke” but didn’t find anything. But search for"DynamicProxy" found all this:
image

@lmajano does that help explain anything that may lead you to any suggestions? This sure looks like weird scheduler stuff? Co-worker did some more investigation and said these two things:

  • I also looked at cacheservice fairly extensively and turned on logging for debug level which shows all interactions, and it is not constantly logging so I don’t think that is the problem.
  • I did the same with coldbox.system category and it also does not continueously log new entries as the CPU for the coldfusion service spins

Also, “poller” is coming up as a high CPU thing, code found here: (This is a newer Coldbox from 2 days ago…)

Also, they BOTH have the file versioned coldbox-7.3.0+16 but they are NOT the same Coldbox files. I’m seeing BoxLang references in this newer version. But unfortunately it still has the same issue.

@bdw429s do you have any insight here?

Just some thoughts:

Any thread prefixed with pool- is coming from the JVM common thread pool, which the Coldbox scheduler uses for cache eviction, etc. I believe worker- prefixed threads are Adobe’s own threads for the mail queuing and log rotation - but it has been a minute since I dug in to those.

I am wondering if the obj-skinner- prefixed threads are coming from either:

A. Adobe’s attempt to enforce the maximum number of threads in a Standard license ( Enterprise has no thread limits ) or

B. The Coldfusion performance monitoring toolset - which could be locking on to those threads and attempting to monitor them - even though they are outside of the request threads and are just waiting for the next interval to run.

So two questions: Is the performance monitoring toolset enabled and is this a Standard or Enterprise license?

@jclausen Right now, this is just developer installations. We’re working to get our site on the new Coldbox, so not running on any prod servers. The obj-skinner threads, they aren’t taking up a bunch of CPU. I just saw them as being “there”. (Again – not a java guy.) As to the perfmon, I suspect the answer is no, based on this:

(I turned off as much of any monitoring stuff I could find while trying to get the gizmos @lmajano pointed me to running.

I’ll be glad to provide as much information as is necessary, gang.

Once you are in “Developer” mode and no longer in “Trial”, however, Adobe does enforce the 20 thread limit.

I’m not sure that is what is going on, but it’s worth noting.

Is the 2023 installation running Adobe’s packaged version of Java 17, BTW?

Can you explain what this means? Does it mean, that a vanilla app has no issues, but your app does? Is this still using Adobe?

Okay, hold off for a bit. We have it narrowed down to Logbox. I’ll have more later. @bdw429s @lmajano @jclausen

@lmajano @jclausen @bdw429s

Okay, guys… we nailed it down to a single issue: RollingFileAppender vs. FileAppender. We removed the “Rolling” and all was back to 0% CPU on idle., Looking at the code, it’s not obvious what the problem is. First we tried adding both “async”=false and true, no change. Checking the file size, unless there’s some lock conflict we can’t see… (and we weren’t even writing to the files just to get to our login screen).

I will add this: We have 28 appenders, including the ColdboxTracer appender. We have separate files for all our external integrations, and several for separate sections of the system. With “Rolling” removed, we’ll have to manually manage our own file rotation. The plan at the moment is to do it manually upon our code releases. Next step will be to create our own scheduled task and service (passing along paths and max file sizes) and running it daily. (We understand that won’t help with a DDOS.)

When this issue is solved within Logbox, we can undo these plans. Here is a sample of our appender setups, they are mostly all the same:


and another pair:

Would love to see a note on this thread if y’all are able to fix it in a future update. I’m sure none of your testing has 30 appenders! (I originally thought it was Wirebox, and was going to generate 1000 services to test with on a clean install!)

Ok, technically it’s not a bug. It’s a configuration issue.

All Rolling file appenders have their own task to monitor themselves. Meaning if you had 28 rolling file appenders, you have 28 threads running. Therefore taking CPU. The issue is that the default configuration for these rolling file monitors are every 1 minute.

function onRegistration(){
		super.onRegistration();

		variables.logbox
			.getTaskScheduler()
			.newTask(
				name  : "file-appender-#getName()#-log-rotation",
				task  : this,
				method: "logRotation"
			)
			.inMinutes()
			.delay( 5 ) // Don't start immediately, give it a breathing room
			.spacedDelay( 1 ) // Runs again, after this spaced delay once each reap finalizes
			.start();

		return this;
	}

What I would suggest, is change the spaced delay to 5 minutes. Let’s see if that helps. In reality, I am not sure why it was so low of 1 minute. Try 5 or 10 minutes, and let’s see if that alleaveates your symptoms. If that’s the case, then we can introduce a setting for this.

It sounds like you are honing in on the issue, but I’ll add my 2 cents for how I find high CPU usage in background threads. In FusionReactor, go to “Resources” and “Thread Visualizer”

image

Then on that page, click “CPU(ms)” to sort the threads by CPU used, desc.

This is a rolling snapshot of the threads using the most CPU cycles in the last second or so and will allow you see even background scheduler threads which are busy.

Once you’ve identified a thread that seems extra busy, start the profiler by clicking the little stop watch icon to the right of that thread’s data.

image

Wait for a few seconds (maybe even up to a minute, if necessary) and then click the red Stop button again to stop the profiler.

image

Now, to view the results of that profile, click on “Profiler” in the left nav and then “Profile History”

image

Open up the last profile, which should have the name of the thread you were just profiling, and see what it was doing during that time.

I have a screencast I did a while back here that shows how to interpret profiler data. Note, a background thread may spend a lot of its time parked, so if it’s doing something on a schedule, you’d have to profile long enough to catch it in the act, and ignore any park or wait portions of the profile.

I’m not aware of any other JVM tools right off that offer a feature as useful as FusionReactor’s profiler.

I understand the plug for FusionReactor, but I’m not involved in budgets, unfortunately, and have no license to spend.

Obviously, we found the issue. And while I hate to argue with @lmajano , I can’t agree it’s a “configuration issue” (if you mean on our part) because there’s no config setting for that.

Having said that, I changed it from .spacedDelay(1) to .spacedDelay(10). Instead of 60-70% and higher, my CPU now sits at 20-25%. Better than 70%, but in no way better than 0% at idle. Something is still happening that shouldn’t be. I’ll have to set it back to non-rolling. Honestly,

Trying more, I went into the logRotation function and did this:

In less than 10 seconds (mind you, with the spacedDelay(10)), I had over 1,000 entries of that console dump. I set my ConEmu to higher, but for some reason, all I see is a thousand. There’s a bug. Right?

If you don’t have FR, identify one of the threads and get repeated stack traces of it to catch it in the act of whatever it’s doing. You can get a thread dump of the entire JVM with jstack and there are other tools that should allow you to focus on the stack for a single thread. There’s a good chance the log rotation is running more often than expected, or is taking longer for some reason. Are these log files on a local disk, or a network drive?

Ok, try this

variables.logbox
			.getTaskScheduler()
			.newTask(
				name  : "file-appender-#getName()#-log-rotation",
				task  : this,
				method: "logRotation"
			)
			.delay( 5, "minutes" ) // Don't start immediately, give it a breathing room
			.spacedDelay( 10, "minutes" ) // Runs again, after this spaced delay once each reap finalizes
			.start();

And report back.

Setting back all to .RollingFileAppender and making that change did return my processor back to 0% on idle. That IS the fix, it appears.

@lmajano Do we need to consider making that a config setting, but with a really good default like (10, “minutes”) or even less frequently for most people?

Ok I know what it is. And yes, it’s a bug :slight_smile:

The spacedDelay was using the default of 10 milliseconds :face_vomiting:

So yes, that is the fix. I will also make it configurable. Expect this to be in the be today

1 Like

Also! GREAT JOB!!! TRACKING THIS DOWN!

You have earned a badge of mega java detective today :male_detective: