More parser speed improvements in BoxLang (8x faster)

I’m excited to report the very latest snapshot of BoxLang has some nice parser performance updates. This was planned to be part of our first release candidate build, but it just wasn’t quite ready yet.

The biggest bottleneck in our ANTLR parsing was dealing with all the reserved keywords in the language, which (unlike most other languages) can also be variable or function names. That means code like this works

if = "brad"
when = "wood"
function try( else, return ) {
  return return & else;
}
while = "try";
variables[ while ]( if, when )

but it can really be a chore for the parser to figure out what the heck you mean! Even if you don’t use keywords as variables, the parser still has to figure out whether

if( true )
{
  brad = "wood";
}

if an actual if statement, or just a method call followed by a statement block. There’s a lot of lookahead that have to occur to enumerate over all the possible statements and find the best/right one.

I’ll skip the gory details, but we’ve done a major overhaul on how the parser recognizes keywords vs identifiers (variable names, method names, etc). This is a bigger change than I’d love to toss in after our first release candidate, but the performance improvements are totally worth it. It made the parser 4-5 times faster out of the gate, plus it removed enough ambiguity in our ANTLR grammar that we were able to activate a super-fast parsing mode that made parsing even faster.

Total speed depends on the size of the file, the type of the file, and whether the parser has warmed up its in-memory cache, but I’m seeing on average an 8x improvement in parsing speeds now in the latest snapshot vs the release candidate.

As a test, I used the Feature Audit tool to recursively scan the entire ForgeBox.io site’s source code which (including modules and framework) is 1,249 CF files.

  • ‘1.0.0-rc.1’ - 32 seconds
  • ‘1.0.0-snapshot’ - 4 seconds

I also observed JVM memory usage cut in half now with the new parser, as it makes more efficient use of ANTLR’s internal parsing cache.

The caveat here is there were a TON of edge cases we had to go through and fix during testing. I tested the new parser on over 14,000 files ranging from the top 50 packages on ForgeBox, to every major Ortus library and framework, and even the MASA CMS, and FW/1 source. I’m sure there are still a couple edge cases we didn’t catch, so please give your BL test sites a once-over on the latest snapshot using the feature audit tool and report back any parsing regressions you see along with the source file, if possible. I’ll get them fixed before our 2nd release candidate goes out.

Note, these changes affect both the CF parser and the BoxLang parsers.

3 Likes