ColdBox 3.6 - Advice on Query/Caching strategy for large dataset

HI folks,

I working on an AngularJS / ColdBox app and need some advice on how to handle something. The app is a real-time interface which connects to a Websocket channel.

When the interface is loaded, events will come in through the socket and be pushed to a JS array. Prior to the events going through the websocket they are being processed by CF, stored in a DB, and then funnelled to the socket. There are about 6 different types of events that come through. Depending on the situation, the audience size of the people who are generating those events could be extremely large - i.e 100,000 users or more, and depending on the interaction each user could generate several events. So very quickly both CF and JS will be doing some heavy lifting.

If the interface is reloaded in the browser I need to get the history of the previous events…possibly several hundred thousand records…and process/filter them for display in the JS interface. I’ve never built anything that needed a query of this magnitude and I’m trying to determine how I could build a caching strategy, either as the events come into CF (before socket), appending to an cached object, or build some sort of iterative loader that looks at the total number of records, divides that into pages, and then javascript loops through the pages to load the full data set.

In terms of architecture, I’m using an external provider for web sockets, and the interface that displays the events is on a different server than the system that receives the events. both systems have a shared JDBC JSON cache. I’m currently not using anything like Node.JS, which may have some benefit in this type of situation as an added technology.

I would appreciate your thoughts on the above.

Thanks.

Nolan

Nolan,

If you using json type objects for storage and retrieval of data, I’d highly recommend elasticsearch over a database.

I store similar data, and I’ve found elasticsearch to be excellent; you just throw it a json object, and it stores it and returns it in pageable format, and it’s highly searchable/filterable too and it’s incredibly fast.

http://www.elasticsearch.org/

Not sure if it will fit your requirements, but I thought I’d throw that in there for you.

Tom.

Thanks Tom,

I will take a look.

Nolan

Tom,

This looks very promising, thank you. Could you shed some light on your environment configuration? i.e dedicated machine? how much storage/ram?

Cheers,

Nolan

I have a dedi yes –Elasticsearch is running on an EC2 large instance (7GB), but I’ve also got other things running on that server; tomcat (alfresco), MySQL and it’s a NFS server too. ElasticSearch is tuned to use 1GB of RAM, which I’ve found ample. Storage wise I have 1TB, but elasticsearch only uses about 500Mb at the moment.

I’d guess it would run fine on a EC2 Small instance…

Hi Tom.

Thanks for the environment details.
Another question for you…Did you use it as a ColdBox cache and create a cache provider or did you simply use the REST api? In other words, how did you interact with Elasticsearch from within your CB app?

Thanks.

Nolan

Rest. I have a cfc I use to interface with it. Happy to share it, although it’s really just a simple function which accepts a few parameters. I’d show you an example if you’re interested?

https://gist.github.com/tgmweb/6031116

There’s a simple example

Rest. I have a cfc I use to interface with it. Happy to share it, although it’s really just a simple function which accepts a few parameters. I’d show you an example if you’re interested?

That was ElasticCache – AWS version of MemcacheD.

ElasticSearch is different, it’s basically a fork of Lucene I think….

That was ElasticCache – AWS version of MemcacheD.

ElasticSearch is different, it’s basically a fork of Lucene I think….

This is great. I’m going to look at writing a CB plugin for this and will post it on forgebox (with credit to you of course Tom!)

Thanks!

Nolan