RE: [coldbox:17840] Caching/searching large datasets

bdw429s · March 18, 2013, 4:04pm

If you’re dealing with 1M records, I’d stay away from QofQ. They are great, but start to slow down a bit once you get past a few hundred thousand records. Their performance also goes down drastically the more complex WHERE clauses you put on them.
Honestly, you’ll get better performance filtering a large result set every time in the Database. Of course, that’s not to say that you can’t/shouldn’t employ some form of query caching for oft-run selects. Another way to speed your page load is to reduce the amount of data you’re templating. You said you’re returning 1000 records. Google filters through billions of pages, and look how many records they return: less than 20! Pay attention to the amount of HTML you’re sending back to the client. We had a search screen that returned about 1000 records and it was shoving 4-5 MBs of HTML down the client which is just wrong. (We just turned it into a load of the JSON only (cached) and are doing client-side templating with Backbone.js and Underscore.js with infinite scroll so we only load as far as they’ve scrolled down to). A lot of search result page load time is “perceived” time that the client spends downloading source, executing JavaScript, and fetching assets such as images.

So, back to your database-- that’s really where you probably need to focus. Your version of ColdBox likely has nothing at all to do with this (though you should still upgrade
Now first of all, a full-text index such as Verity, SOLR, or even one built into the RDMS (not sure if Oracle has one, but I know SQL Server does) really hinges on one thing for me-- is it a keyword search and how “good” does it need to be? If all the fields are dropdowns or checkboxes, then don’t bother. If one field is something simple like title contains the keyword “x”, then you’re probably still fine. (Just remember, indexes can’t be used for “contains” queries) If part of the search allows users to type a word or phrase and then does a fuzzy match against multiple full text columns in the database, then I’d totally look into a full-text search engine.

So, your query:

Parametrize all inputs. This will allow your DB to cache the execution plan (or at least as many plans as their are possible if statement combinations)
Try refining the initial select to as few columns as necessary into a temp table. This will increase the chance of hitting “covered” indexes and reducing bookmarked lookups. Then, once you’ve found your matching records, join in the other data you need to return at the end.
Use your Explain Plan, or Show Execution Plan feature. Sorry, I’m a MS SQL guy, so I have no Oracle experience, but your DB doesn’t have to be a black box and you should be able to find the slowest pieces of that query and find what indexes are missing to speed it up. Look for table scans, and index scans. Index seeks are ideal. (Every table needs to have a primary key too) Honestly, this last step is the biggest, but also the hardest. If you don’t know how to look at execution plans in Oracle, talk to your DBA. If you are the DBA, learn
And finally, if you have to, look at denormalizing some of your data structure either with indexed views or manual processes to get as many filtered columns in a single index as possible. Remember INNER JOINs perform faster than LEFT OUTER JOINS in most cases. ESPECIALLY, if you are filtering on some of the data in the outer join.

1 Million records isn’t too crazy many. A well indexed query can still perform very well, but also look at client statics such as how much data is being transferred from the DB to the client. Large text fields can be a culprit there. This is the “Show Client Statics” menu item in SSMS or SQL Server.

Good luck!

Thanks!

~Brad

ColdBox Platform Evangelist
Ortus Solutions, Corp

E-mail: brad@coldbox.org
ColdBox Platform: http://www.coldbox.org
Blog: http://www.codersrevolution.com

RE: [coldbox:17840] Caching/searching large datasets

Latest Ortus News

GIVING BACK

HARVESTING IN SPANISH

WOODSEDGE