Error retrieving value for key [LUCEE-STORAGE:CLIENT:FF812C48-444D-4A2B-ADDB-1C436146A387:XXXX.COM] from Couchbase cache.

Is there a special setting on couchbase bucket possibly that could alleviate this intermittent problem with retrieving client vars. This usually happens after a tomcat restart. It goes away after a few minutes or subsequent restarts.

error:

Error Type: java.lang.RuntimeException 

Message: Error retrieving value for key [LUCEE-STORAGE:CLIENT:FF812C48-444D-4A2B-ADDB-1C436146A387:XXXX.COM] from Couchbase cache. 
Additional Info: 
 Cause: java.lang.RuntimeException;  

Stack Trace: 
 Error retrieving value for key [LUCEE-STORAGE:CLIENT:FF812C48-444D-4A2B-ADDB-1C436146A387:XXXX.COM] from Couchbase cache.
	at ortus.extension.cache.util.Functions.evaluate(Unknown Source):-1
	at ortus.extension.cache.util.Functions.evaluate(Unknown Source):-1
	at ortus.extension.cache.couchbase.CouchbaseCache.createCacheEntry(Unknown Source):-1
	at ortus.extension.cache.couchbase.CouchbaseCache.getCacheEntry(Unknown Source):-1
	at ortus.extension.cache.couchbase.CouchbaseCache.getValue(Unknown Source):-1
	at lucee.runtime.type.scope.storage.StorageScopeCache._loadData(StorageScopeCache.java:119):119
	at lucee.runtime.type.scope.client.ClientCache.getInstance(ClientCache.java:63):63
	at lucee.runtime.type.scope.client.ClientCache.getInstance(ClientCache.java:73):73
	at lucee.runtime.type.scope.ScopeContext.getClientScope(ScopeContext.java:236):236
	at lucee.runtime.type.scope.ScopeContext.getClientScopeEL(ScopeContext.java:263):263
	at lucee.runtime.PageContextImpl.clientScopeEL(PageContextImpl.java:1374):1374
	at lucee.runtime.type.scope.UndefinedImpl.reinitialize(UndefinedImpl.java:582):582
	at lucee.runtime.PageContextImpl.setApplicationContext(PageContextImpl.java:2819):2819
	at lucee.runtime.listener.ModernAppListener.initApplicationContext(ModernAppListener.java:434):434
	at lucee.runtime.listener.ModernAppListener._onRequest(ModernAppListener.java:115):115
	at lucee.runtime.listener.ModernAppListener.onRequest(ModernAppListener.java:101):101
	at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2265):2265
	at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2228):2228
	at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:456):456
	at lucee.loader.servlet.CFMLServlet.service(Unknown Source):-1
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:729):729
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291):291
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206):206
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52):52
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239):239
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206):206
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219):219
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106):106
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502):502
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142):142
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79):79
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88):88
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518):518
	at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:844):844
	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:668):668
	at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2463):2463
	at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2452):2452
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142):1142
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617):617
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61):61
	at java.lang.Thread.run(Thread.java:745):745
 
URL Parameters: None 

Form Fields: None 

Cookie Variables: 
  wombatid: 16D9B1CA51086BC8088DDA96A9E13D54 
  cfid: ff812c48-444d-4a2b-addb-1c436146a387 
  jsessionid: A2C87EFA7841685F8FB802FE71035166 
  emptypadding: 1 
  cftoken: 0 

Hi Marc, can you also check your catalina.out logs as well all the logs in WEB-INF/lucee/logs/ at the time that error happens and see if there are other log messages – often times from the spymemcached libraries that shed some light.

Increasing the timeout may help, but usually I see that error when a Couchbase node goes offline and can’t serve up the document. Unfortunately, the “real” issue is often time logged separately and the log file differs based on installation.

java.lang.RuntimeException: Error retrieving key [LUCEE-STORAGE:CLIENT:569E2E81-ED02-4DA7-83BB-E82C9232BADE:CP.ACTORSACCESS.COM] from Couchbase cache.
at ortus.extension.cache.couchbase.CouchbaseCache.getCacheEntry(Unknown Source)
at ortus.extension.cache.couchbase.CouchbaseCache.getValue(Unknown Source)
at lucee.runtime.type.scope.storage.StorageScopeCache._loadData(StorageScopeCache.java:119)
at lucee.runtime.type.scope.client.ClientCache.getInstance(ClientCache.java:63)
at lucee.runtime.type.scope.client.ClientCache.getInstance(ClientCache.java:73)
at lucee.runtime.type.scope.ScopeContext.getClientScope(ScopeContext.java:236)
at lucee.runtime.type.scope.ScopeContext.getClientScopeEL(ScopeContext.java:263)
at lucee.runtime.PageContextImpl.clientScopeEL(PageContextImpl.java:1374)
at lucee.runtime.type.scope.UndefinedImpl.reinitialize(UndefinedImpl.java:582)
at lucee.runtime.PageContextImpl.setApplicationContext(PageContextImpl.java:2819)
at lucee.runtime.listener.ModernAppListener.initApplicationContext(ModernAppListener.java:434)
at lucee.runtime.listener.ModernAppListener._onRequest(ModernAppListener.java:115)
at lucee.runtime.listener.ModernAppListener.onRequest(ModernAppListener.java:101)
at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2265)
at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2228)
at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:456)
at lucee.loader.servlet.CFMLServlet.service(Unknown Source)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518)
at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:844)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:668)
at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2463)
at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2452)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.CancellationException: Cancelled
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:176)
at net.spy.memcached.internal.GetFuture.get(GetFuture.java:69)
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1227)
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1254)
… 38 more
2016-07-28 07:55:39.362 WARN net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl: Operation canceled because authentication or reconnection and authentication has taken more than one second to complete.
java.util.concurrent.CancellationException: Cancelled
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:176)
at net.spy.memcached.internal.GetFuture.get(GetFuture.java:69)
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1227)
at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:1254)
at ortus.extension.cache.couchbase.CouchbaseCache.getCacheEntry(Unknown Source)
at ortus.extension.cache.couchbase.CouchbaseCache.getValue(Unknown Source)
at lucee.runtime.type.scope.storage.StorageScopeCache._loadData(StorageScopeCache.java:119)
at lucee.runtime.type.scope.client.ClientCache.getInstance(ClientCache.java:63)
at lucee.runtime.type.scope.client.ClientCache.getInstance(ClientCache.java:73)
at lucee.runtime.type.scope.ScopeContext.getClientScope(ScopeContext.java:236)
at lucee.runtime.type.scope.ScopeContext.getClientScopeEL(ScopeContext.java:263)
at lucee.runtime.PageContextImpl.clientScopeEL(PageContextImpl.java:1374)
at lucee.runtime.type.scope.UndefinedImpl.reinitialize(UndefinedImpl.java:582)
at lucee.runtime.PageContextImpl.setApplicationContext(PageContextImpl.java:2819)
at lucee.runtime.listener.ModernAppListener.initApplicationContext(ModernAppListener.java:434)
at lucee.runtime.listener.ModernAppListener._onRequest(ModernAppListener.java:115)
at lucee.runtime.listener.ModernAppListener.onRequest(ModernAppListener.java:101)
at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2265)
at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2228)
at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:456)
at lucee.loader.servlet.CFMLServlet.service(Unknown Source)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518)
at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:844)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:668)
at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2463)
at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2452)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)

This seems to be the core issue:

*net.spy.memcached.protocol.binary.BinaryMemcachedNodeImpl: Operation
canceled because authentication or reconnection and authentication has
taken more than one second to complete.*
*java.util.concurrent.CancellationException: Cancelled*

How many nodes are in your cluster and do you have replicas configured?
Also is there a spike in traffic or any kind of network latency that might
be involved?

Thanks!

~Brad

*Developer Advocate*
*Ortus Solutions, Corp *

E-mail: brad@coldbox.org
ColdBox Platform: http://www.coldbox.org
Blog: http://www.codersrevolution.com

there are 3 nodes, 2 replicas (not sure I need replicas) very little traffic right now to CB via the extension. The clusters are on the same switch and are very powerful machines that do nothing else but CB.

This issue only happens after I restart tomcat. I do not get any of these errors once all runs well. Sometime restarting tomcat a couple of times fixes the issue.

I use domain names in the extension settings. do you think I should instead use ip addresses. Our internal DNS is from a domain controller and is very fast as well.

When we experienced this issue we upgraded the couchbase client jar and httpcore jars.

We replaced:

couchbase-client-1.2.0.jar -> couchbase-client-1.4.10.jar
httpcore-4.1.1.jar -> httpcore-4.3.jar
httpcore-nio.4.1.1.jar -> httpcore-nio-4.3.jar

These are all stored in the /opt/lucee/lib/lucee-server/context/lib folder.

This resolved all the issues we were seeing. We have used this with couchbase 3 and 4

Hope that helps

I replaced the 3 jars and I’m now getting:

java.lang.RuntimeException: Error retrieving key [LUCEE-STORAGE:CLIENT:6B9A1CB9-07E2-46B3-BF88-F9BAC5C7AA9B:XXXX.COM] from Couchbase cache.
at ortus.extension.cache.couchbase.CouchbaseCache.getCacheEntry(Unknown Source)
at ortus.extension.cache.couchbase.CouchbaseCache.getValue(Unknown Source)
at lucee.runtime.type.scope.storage.StorageScopeCache._loadData(StorageScopeCache.java:119)
at lucee.runtime.type.scope.client.ClientCache.getInstance(ClientCache.java:63)
at lucee.runtime.type.scope.client.ClientCache.getInstance(ClientCache.java:73)
at lucee.runtime.type.scope.ScopeContext.getClientScope(ScopeContext.java:236)
at lucee.runtime.type.scope.ScopeContext.getClientScopeEL(ScopeContext.java:263)
at lucee.runtime.PageContextImpl.clientScopeEL(PageContextImpl.java:1374)
at lucee.runtime.type.scope.UndefinedImpl.reinitialize(UndefinedImpl.java:582)
at lucee.runtime.PageContextImpl.setApplicationContext(PageContextImpl.java:2819)
at lucee.runtime.listener.ModernAppListener.initApplicationContext(ModernAppListener.java:434)
at lucee.runtime.listener.ModernAppListener._onRequest(ModernAppListener.java:115)
at lucee.runtime.listener.ModernAppListener.onRequest(ModernAppListener.java:101)
at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2265)
at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2228)
at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:456)
at lucee.loader.servlet.CFMLServlet.service(Unknown Source)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518)
at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:844)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:668)
at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2463)
at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2452)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Error creating cacheEntry for key [LUCEE-STORAGE:CLIENT:6B9A1CB9-07E2-46B3-BF88-F9BAC5C7AA9B:XXXX.COM].
at ortus.extension.cache.couchbase.CouchbaseCache.createCacheEntry(Unknown Source)
… 38 more
Caused by: lucee.runtime.exp.NativeException: java.lang.NullPointerException
at ortus.extension.cache.util.Functions.evaluate(Unknown Source)
at ortus.extension.cache.util.Functions.evaluate(Unknown Source)
… 39 more
Caused by: java.lang.NullPointerException
… 41 more

If I delete my cookies then it works fine

Now I’m getting no client cache error but tomcat drops the connection with:

java.lang.NullPointerException
at ortus.extension.cache.util.Functions.serialize(Unknown Source):-1
at ortus.extension.cache.util.Functions.serialize(Unknown Source):-1
at ortus.extension.cache.couchbase.CouchbaseCache.put(Unknown Source):-1
at lucee.runtime.type.scope.storage.StorageScopeCache.store(StorageScopeCache.java:137):137
at lucee.runtime.type.scope.storage.StorageScopeCache.touchAfterRequest(StorageScopeCache.java:101):101
at lucee.runtime.PageContextImpl.release(PageContextImpl.java:576):576
at lucee.runtime.CFMLFactoryImpl.releaseLuceePageContext(CFMLFactoryImpl.java:176):176
at lucee.runtime.engine.CFMLEngineImpl.serviceCFML(CFMLEngineImpl.java:463):463
at lucee.loader.servlet.CFMLServlet.service(Unknown Source):-1
at javax.servlet.http.HttpServlet.service(HttpServlet.java:729):729
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291):291
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206):206
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52):52
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239):239
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206):206
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219):219
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106):106
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502):502
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142):142
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79):79
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88):88
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518):518
at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:844):844
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:668):668
at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2463):2463
at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2452):2452
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142):1142
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617):617
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61):61
at java.lang.Thread.run(Thread.java:745):745

I’ve narrowed it down to the following:

If I get the error and I clear my cookies. it works.
If I restart tomcat it continues to work.
If I restart the whole machine I get “Error retrieving value for key …”
If I clear my cookies it works.

restarting the machine breaks all existing sessions.

I purchased 2 production licenses from you with the intent to purchase many more. right now I can’t use this as it would mean that all our users would need to clear their cookies everytime one of our production vms restarted. Not acceptable.

Help…

one more thing:

If i restart the server I get the error. If I then restart lucee through the administrator the error goes away.

Marc

Can you make sure your logs are also attached please. Updating the jars is not beneficial as it might breaks certain methods we use. That would void our support and warranty. So make sure you use the jars provided.

We use this extensively with many clients without the issues you are seeing except when there are network latency issues.

We need more info, Brad?

Luis Majano
CEO
Ortus Solutions, Corp
www.ortussolutions.com
P/F: 1-888-557-8057

Have you tried increasing the operation timeout in the cache settings? I’ve never seen those NPEs before on any server. Possibly related to updating your jars to unsupported versions.

Are you still seeing this error in the logs every time the error occurs?

Operation canceled because authentication or reconnection and authentication has taken more than one second to complete.

During deployments we usually have to kill the tomcat service as it sometimes takes too long to stop cleanly. Could this have an impact on the extension?

I can imagine that likely leaves hanging network connections open. Did you see the questions in my last message?

The jars are back to what the extension installed.

What do you mean by "Have you tried increasing the operation timeout in the cache settings? I’ve never seen those NPEs before on any server. "

My settings in services -cache are currently

Operation Queue Max Block Time: 10000
Operation Timeout: 2500
Timeout Exception Threshold: 998

What I meant was have you tried a number higher than 2500? I also mean that I’ve never seen those NPE (null pointer exception) errors on any server before-- neither a client nor my own test machines.

Is it possible that the way we deploy our servers has something to do with it?

I have 2 sets of git repos that include tomcat/apache/lucee etc

One is for development VMS the other is for Production VMS.

I have been creating new servers by getting the preconfigured binaries/settings/etc from the git repo. It is possible that the license is not valid when copied that way onto a new VM?