[Coldbox 5.3] CachePanel of cbdebugger module crashing Tomcat/Apache HTTPD

Setup is Apache HTTPD 2.4, Tomcat 8.5 (also v9.0.12), java 1.8.0 -191 (also on OpenJDK 11), Lucee 5.3, Windows 10 64b. Pretty much stock configs of Tomcat and Apache and Lucee.

Apache vhosts proxy setup like this:

ProxyPreserveHost On ProxyPassMatch ^/(.+\.cf[cm])(/.*)?$ ajp://127.0.0.1:8009/$1$2

When testing sites on DEV I was getting 503 errors. I’d notice Tomcat logs would have these errors:

`
06-Nov-2018 12:29:31.466 SEVERE [ajp-nio-8009-exec-10] org.apache.catalina.core.StandardWrapperValve.invoke Servlet.service() for servlet [CFMLServlet] in context with path [] threw exception
java.lang.IllegalThreadStateException
at java.lang.Thread.start(Unknown Source)
at lucee.runtime.spooler.SpoolerEngineImpl.start(SpoolerEngineImpl.java:165)
at lucee.runtime.spooler.SpoolerEngineImpl.add(SpoolerEngineImpl.java:154)
at lucee.runtime.CFMLFactoryImpl.releaseLuceePageContext(CFMLFactoryImpl.java:210)
at lucee.runtime.engine.CFMLEngineImpl._service(CFMLEngineImpl.java:1106)
at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:1037)
at lucee.loader.engine.CFMLEngineWrapper.serviceCFML(CFMLEngineWrapper.java:102)
at lucee.loader.servlet.CFMLServlet.service(CFMLServlet.java:51)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
at org.apache.coyote.ajp.AjpProcessor.service(AjpProcessor.java:479)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:806)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Unknown Source)

Exception in thread “ajp-nio-8009-exec-10” java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(Unknown Source)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(Unknown Source)
at java.util.concurrent.locks.ReentrantLock.unlock(Unknown Source)
at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:103)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:31)
at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Unknown Source)
`

and Apache log would simultaneously have something like this:

[Tue Nov 06 11:55:08.158259 2018] [proxy_ajp:error] [pid 8372:tid 1396] (70014)End of file found: AH01030: ajp_ilink_receive() can't receive header [Tue Nov 06 11:55:08.158259 2018] [proxy_ajp:error] [pid 8372:tid 1396] [client ::1:56450] AH00992: ajp_read_header: ajp_ilink_receive failed, referer: http://supersimple.localhost/ [Tue Nov 06 11:55:08.158259 2018] [proxy_ajp:error] [pid 8372:tid 1396] (120006)APR does not understand this error code: [client ::1:56450] AH00878: read response failed from (null) (*), referer: http://supersimple.localhost/

All sites become unresponsive. If I hit any site without Apache (i.e. over port 8080) I can still access it fine. The only fix is a complete Tomcat restart.

To debug and remove any chance it was my code, I downloaded the latest advanced script template and slowly turned on features. It seems that with debug mode on and showing only the CachePanel and a couple of F5 refreshes would produce the errors above. Turning off CachePanel and I cannot make it crash.

Those stack traces don’t seem to come from the debugger, so I am not sure how this can happend. Do you hae any stack traces coming from the debugger?

No, that’s all that is in the error log files. I never receive any Lucee errors as the first inkling of trouble is the 503 from Apache. When that happens, the Apache and Tomcat error logs have just the errors I pasted earlier.

I can only say that by enabling the CachePanel of the debugger module and hitting F5 a few times, I get the above error and eventually a Tomcat restart is needed. The above errors were from a stock Coldbox Advanced script template with the only changes being made are the coldbox mapping in Application.cfc and enabling the debug module in the Coldbox config for dev mode. Setting showCachePanel to false rectifies this.

My assumption is that it’s some sort of threading issue going on with Tomcat/Lucee/etc. and it’s just something in this module that’s fast tracking the error. There’s another thread here that discusses similar issues:

I think the error is happening at a low level in Lucee as it is trying to clean up the thread, and it causes the request to not be taken well by your AJP connector. The error message in question about illegal thread state happens when you try to start the same thread in Java twice. Here is the code in the core of Lucee:

https://github.com/lucee/Lucee/blob/5.3.1/core/src/main/java/lucee/runtime/CFMLFactoryImpl.java#L205

What is happening here is Lucee is setting up a task to run after the main thread completes that will release the page context, however the spooler thread is already started. The only cause of this that I can imagine is that two ColdFusion threads at the same time have shared the same page context and are both attempting to start the same spooler thread. I have a feeling this is a Lucee bug and probably brought about by two Ajax calls perhaps that fire simultaneously. It’s also possible that some code inside the debugging spins off a thread which is sharing a page context.

Can you do some research, with FusionReactor or similar to see what requests (and cfthreads if possible) are firing at the time of the error? Also have you enabled any of the special Lucee settings that run web requests inside of a separate thread? If I get a chance, I will ask Micha about this here at CFCamp since he is likely to know more about the Lucee code which is executing.

Thanks Brad. I agree with your assessment. Unfortunately, I do not have FusionReactor. Re: special Lucee settings - no. The only non-stock settings in Lucee are to “preserver key case” in structs, JEE sessions and enabling “smart white space management”. That’s it. I’ll be interested in what Micha has to say as I do see others having the same or similar issues with no apparent redress.

I showed Micha the error tonight in the bar on my phone. He asked to enter a Lucee ticket with the error so he can look at it.

All the best work is done at the bar… usually sketched out on cocktail napkins.

DONE!
https://luceeserver.atlassian.net/browse/LDEV-2072

Looks like somebody else has isolated what looks to be the issue:
https://luceeserver.atlassian.net/browse/LDEV-2077

This looks to have been fixed in Lucee 5.3.1.91 and 5.3.2.14. See for details
https://luceeserver.atlassian.net/browse/LDEV-2077