High Precision math and "Big Numbers" in BoxLang

bdw429s · August 12, 2024, 11:20pm

In our last BoxLang beta, we released some new features to help BoxLang work more naturally out-of-the-box with high precision and big numbers.

So, what does this mean?

Big Numbers

Let’s start with the easier to understand one. ColdFusion developers aren’t used to thinking about underlying Java numeric datatypes, but Java’s

Integer
Long
Double

all have a built in limit of how much data they can store. Java devs can choose a “smaller” type like Int which saves on memory usage and uses hardware accelerated math operations. Generally, this is just something CF does for you behind the scenes, and Adobe CF 2018+ and Lucee 5, generally use a Double for most numbers which will store around 17 digits of information. You can store a larger number like

123123123123123123123123123

in a Double, but behind the scenes, not all of that is actually stored. All java actually tracks is

1.2312312312312312 E 26

which means some digits of the original number are gone. So if you run the math equation

11111111111111111111 + 22222222222222222222

you get:

Windows calculator: 33333333333333333333
Lucee 5: 33333333333333330000 ← Data lost
Adobe CF 2023: 3.33333333333E+019 ← Data lost
BoxLang: 33333333333333333333
Lucee 6: 33333333333333333333

You can see these large math operations “just work” in BoxLang and Lucee 6 without the need for special handling or use of the unsafe precisionEvaluate() BIF.

High-precision floating point

The phrase “high precision” is prolly a misnomer here, because it’s a well known issue in many programming languages (not just Java) that certain decimal values cannot be stored accurately as a floating point number. I know this can seem confusing since some decimals LOOK to our eyes like they are a very simple value. It’s not that the values are too large, it’s that there’s no exact representation in binary for them.
I really don’t want to get off in the weeds of floating point math, and there’s a great deal written about this already, but imagine for a moment how we’d represent the “simple” fraction 1/3 in base 10

0.33333333333333333333333333333333333333333333333333...

Even though 1/3 seems like a simple/small value, the decimal representation repeats forever because we can’t represent that exact value in decimal format. The same is true of seemingly simple decimal values such as 0.1 when representing them as binary (base 2). It looks like this

0.00011001100110011001100110011001100110011001100110011...

and like 1/3, it repeats forever. This means a computer cannot accurately store the decimal value of one tenth without having to eventually truncate and round!
This is why the following CF code

(.1 + .2).toString()

gives very surprising results. Note the .toString() is to prevent CF from “cheating” and just rounding off the value to hide what it’s really storing behind the scenes.

Lucee 5: 0.30000000000000004 ← Precision error
Adobe CF 2023: 0.30000000000000004 ← Precision error
BoxLang: 0.3
Lucee 6: 0.3

Again, Adobe CF and Lucee give an approximate value due to using Java Doubles behind the scenes. BoxLang and Lucee 6 don’t suffer precision loss and correctly return 0.3.

BigDecimal to the rescue!

You may not be too worried about the use case of very large numbers, but the floating point math has bitten every single developer who’s been around long enough, and can wreak havoc on the simplest of math calculations. CF tries to hide it by rounding these values off when outputting numbers as strings, but it’s still there and affects equality checks, subsequent math operations, and more.
Java has a special class that’s capable of handling both large numbers AND floating point precision as expected. It’s called BigDecimal and it can make your life much easier by seamlessly handling very large numbers and always giving the correct result when working with decimals.

We’ve followed Lucee’s lead (but not directly copied them) by introducing automatic type promotion to a BigDecimal number WHERE NECESSARY so your math operations keep cranking like you expect without you needing to worry about what’s going on under the hood.

Furthermore, Java’s BigDecimal class allows you to choose the level of precision you want to use. Java 21 defaults to “unlimited” precision, but we’ve dialed that back to the IEEE 754-2019 decimal128 format, which has 34 digits of precision, and uses a rounding mode of HALF_EVEN. You can change the amount of precision BoxLang uses for BigDecimal operations at any time like so:

import ortus.boxlang.runtime.types.util.MathUtil;
MathUtil.setPrecision( 100 );

Lucee attempts unlimited precision first, and if that errors (which will happen if rounding needs to occur), it will default back to DECIMAL128 precision. This is not nearly as good for performance TBH and can result in many math operations executing twice!

Only Where Necessary

This is where we break further from Lucee’s implementation. Lucee 6 uses BigDecimals across the board for ALL numbers. That’s too heady-handed for us since quite a lot of numbers tracked as part of normal loops and result iteration are rather small integers and it totally makes sense to still use integers to store them and use super fast integer-based math to process them.

That’s why BoxLang has a smart parser, which will always store a number in the smallest package possible, opting to promote the type only when actually necessary.

n = 1;  // smaller than 10 digits stores in an Integer
n = 11111111111111; // Smaller than 20 digits stores in a Long
n = 111111111111111111111111111; // Anything larger stores in a BigDecimal
n = 12.34;  // All floating point values, store in a BigDecimal

The “bigger” types are contagious. So if you add together an Integer and a Long, we store the result in a Long. If you add a Long and a BigDecimal together, we store the result in a BigDecimal. The idea is to always keep things small and fast until we can’t any longer. All built in math operations and BIFs in BoxLang have been updated to be smart enough to match the precision of the input. If you ask for the power of a Double, you’ll get Double math and a Double result. If you ask for the power of a BigDecimal, you’ll get BigDecimal math and a BigDecimal result. And this all happens automatically, without you really needing to know or care what’s happening under the hood.

What’s the cost?

Great question! BigDecimals come with the following two drawbacks:

They take a few more bytes of heap space to store than a Double
They take a few more nanoseconds to perform math operations on than a primitive type. (Yes, nanoseconds, as in a billionth of a second)

How much overhead really depends on your application, but in our testing, it’s quite negligible.

A Java Double in memory will take around 24 Bytes and a BigDecimal will take around 64 Bytes. If you’re working on a low-memory device like a Raspberry Pi or a small server, you may see a difference, but it’s likely to go unnoticed in today’s heap sizes. One thousand Java Doubles stored in memory would be around 23 KB and the name number of BigDecimals would be around 63 KB. We’re not breaking the bank here

How about times? Multiplying two floating point numbers such as

12.34 * 56.78

takes only a few billionths of a second. That right, it’s measured in nanoseconds! If I perform that multiplication 1 million times in a row the BigDecimal version runs as fast as the Doubles, or up to 2x slower. But remember, we’re measuring in sub milliseconds here. One million floating point operations still averages 4-6 milliseconds on my machine for Doubles and 5-9 milliseconds for BigDecimals.

How many math operations (like plus, minus, multiply and divide) does a typical web app perform per request? This will vary based on your application, but I did some tests with a ColdBox sample app, and here is what I found.

Application reinit (runs a lot more code)

Around 27 thousand math operations performed
Only 11 of them were BigDecimal math

A “normal” ColdBox page view

14 total math operations performed
Only one of them was BigDecimal

With our strategy to only resort to BigDecimal math where needed, we think you get the best of both worlds. Super fast performance and low memory wherever possible, and all math “just works” no questions asked.

What If I still don’t want it?

After some initial feedback, we have also added in a flag where you can toggle this high precision behavior. If you do a lot of math and specifically are OK dealing with possible floating point rounding errors from time to time, you can opt out of BigDecimal math entirely by setting this in your boxlang.json

   "useHighPrecisionMath" : false

In this mode, large numbers and decimals will go into a Java Double like before.

aliaspooryorik · August 13, 2024, 7:24am

I like how the Groovy docs describe why they use BigDecimal:

Support a ‘least surprising’ math model to scripting language users. This means that exact, or decimal math should be used for default calculations. This scheme assumes that by default, groovy literals with decimal points are instantiated as BigDecimal objects rather than binary floating points (Float, Double).

https://docs.groovy-lang.org/latest/html/gapi/org/codehaus/groovy/runtime/typehandling/NumberMath.html

High Precision math and "Big Numbers" in BoxLang

Big Numbers

High-precision floating point

BigDecimal to the rescue!

Only Where Necessary

What’s the cost?

What If I still don’t want it?

Latest Ortus News

GIVING BACK

HARVESTING IN SPANISH

WOODSEDGE