Java Method References and higher order functions

bdw429s · July 9, 2024, 9:55pm

We’ve added some more goodies to our BoxLang Java interop, this time around method references and high order functions. These are things CF has never let you do, leaving Java interop a second-class citizen in the language.

CF already allows you to grab a reference to a UDF or closure as a variable, pass it around, and invoke it.

myInstance = new myClass();
myInstanceMethod = myInstance.myMethod;
myInstanceMethod();

BL also allows you to grab a reference to a static method from a Box class as well:

myStaticUDF = src.test.java.TestCases.phase3.StaticTest::sayHello;
myStaticUDF();

Now, in BoxLang, we’ve elevated Java methods, both instance and static to also be objects you can pass around, invoke, and send into a higher order function (a function that accepts functions).
When you reference a method on a Java class without the parenthesis (just like our BL examples above), you will get a special Function instance that wraps up the Java method, allowing it to be treated as a function, passed into any argument which is typed as a function, and invoked headlessly.

Here we capture the static valueOf() Method from the Java String class, and place it into a variable, where we invoke it.

import java:java.lang.String;
javaStaticMethod = java.lang.String::valueOf;
result = javaStaticMethod( "test" ) // New string of "test"

This example captures the toUpperCase method from a String instance. Note, the method is still bound to the original String instance, and when invoked, will be invoked against that original instance

javaInstanceMethod = "my string".toUpperCase
result = javaInstanceMethod() // "MY STRING"

And finally, here we use a Java method to pass directly in place of a UDF or Closure to a higher order function.

import java.util.Collections;
// Use the compare method from the Java reverse order comparator to sort a BL array
[ 1, 7, 3, 99, 0 ].sort( Collections.reverseOrder().compare  ) // [ 99, 7, 3, 1, 0 ]

We grab the compare method from Java’s reverse order comparator and pass it directly into the array sort method in BoxLang, reversing our array!

Sadly, you can’t use many java methods with CF/BL’s reduce, each, map higher order BIFs because they pass additional arguments to the function and java is strict on arguments so examples like this do not work

import java:java.lang.Math;
[ 1, 2.4, 3.9, 4.5 ].map( Math::floor ) // Error because floor only accepts a single arg!

since Math.floor() accepts a single argument but arrayMap() passes 3 arguments to the map function.

This new functionality mixed with our new ability to pass BoxLang closures into java methods that accept functional interfaces/lambdas gives great two-way compatibility to pass methods and functions in both directions between Java and BoxLang!

In this example, we get a Java Stream from our BoxLang array and then call the Java filter() method on the stream, passing a BoxLang Lambda in, which is automatically converted to the correct Java functional interface!

fruits = [ "apple", "banana", "cherry", "ananas", "elderberry" ];
result = fruits.stream()
  .filter(  fruit -> fruit.startsWith( "a" ) )
  .toList();

seancorfield · July 10, 2024, 1:28am

This is all awesome work, and a big improvement over what CFML allows.

I do think it’s a mistake for member functions like map/reduce/filter/etc to invoke functions with extra arguments. I think it was a mistake for CF and I still think it’s a mistake for BL. Your docs do not explain what arguments are passed to the function argument of these functions, BTW:

(the BIF docs are no clearer)

I feel very strongly that:

map should always take a unary function and be called with each element (only)
filter – same as map: unary function returning a Boolean (or maybe truthy/falsey)
reduce should always take a function of two arguments: an accumulator and each element of the collection in turn
each – same as map although I care much less about it since it’s more imperative than functional

I’m not sure of the best way to “fix” this at a language level (given the constraint of bx-compat at least needing to make things CF-compatible).

Worst case, I guess I could live with map1, filter1, etc that explicitly call the function with just one argument (and, I guess, reduce2 although it’s passed three args for structs – which in Clojure would be called reduceKV).

Better would be to make the default in BL be as above (unary for map/filter/etc and strictly binary for reduce/trinary for reduce struct) and have mapx, filterx, reducex or similar for the CF-compat versions to transpile to…

aliaspooryorik · July 10, 2024, 8:13am

bdw429s:

examples like this do not work
import java:java.lang.Math;
[ 1, 2.4, 3.9, 4.5 ].map( Math::floor ) // Error because floor only accepts a single arg!
since Math.floor() accepts a single argument but arrayMap() passes 3 arguments to the map function.

Just a +1 from me and subscribing.

lmajano · July 10, 2024, 8:32am

I would definitely not be opposed to it @seancorfield
I think it was just an easy way (cop out) way to do it, and we have matched it for compact.

However, I would rather go into a fluent design like Streams do. So How about potentially removing the extra arguments which deal with parallel computations and move them to a fluent design:

Collection.inParallel().each()
Collection.inParallel().map()
Collection.inParallel().filter()
Collection.inParallel().reduce()

We could also add extra arguments to the inParallel()

InParallel( numeric maxThreads = 20, customExecutor )

Thoughts?

bdw429s · July 10, 2024, 6:36pm

I get what you’re saying about those being unary, but I’ve never minded it in CF since the functions allow extra args and it never hurt anything while providing more data to the function. Now, there’s prolly an argument for losing some purity to the functions by giving them access to the entire array, etc which encourages the function to do things like mutate the data structure.

Removing the additional arguments would pose an issue with CF compat. Another approach would be for the higher order BIFs to detect if the function they’ve been passed is a Java method (there is a dedicated subclass of ortus.boxlang.runtime.types.Function for this) and only pass the single argument in that case.

The downside would be you couldn’t write a Java method taking advantage of all the additional arguments and use it, but that’s prolly a pretty edge case. So basically, the higher order BIFs would

pass all args for Closures, UDFs, BL Lambdas
only pass the minimum args for Java methods or functional interfaces

This could allow BoxLang higher order functions calling BoxLang functions to be more dynamic, but tighten the screws when calling java methods instead.

seancorfield · July 10, 2024, 8:17pm

I think you’re answering a different question?

I’m not talking about the arguments to map, reduce, etc – I’m talking about the arguments to the function passed into those (where map is calling the function with multiple arguments instead of just the “expected” one).

seancorfield · July 10, 2024, 8:25pm

I definitely think passing in the whole collection as an extra arg to the invoked function is a bad idea. If folks really need that, they could use a closure anyway:

var coll = ...;
var added = coll.map( e => (e + coll[1]) );

I was also referring to things like (array) map which is passed the element (good), the index (bad), and the array (very bad):

someArray.map(function(item [,index, array]){} [, parallel] [, maxThreads])

I guess you could special case the type of function passed in to determine whether to pass those extra args or not but that seems a bit “magic”. I’d really rather lean in the direction of having additional member functions, so you have an explicit guarantee of “just the item” (by default) and the *x versions to pass extra args (CF-compat).

bdw429s · July 11, 2024, 6:12am

Yeah, so this sort of feels like figuring out the lesser of two evils:

Add some “magic” to the higher order BIFs to not pass the “extra” args to Java methods and FI/SAMs
Add a second set of higher order BIFs with a variant name that don’t pass the extra args

TBH, right now I hate the first bullet the least as I think it fragments the language the least and makes the BIFs we have “just work” like people expect. I’m still open to discussion on the matter, but for now I put in a version of bullet one to all the list, array, and struct higher order BIFs (excluding sort, which has no “extra args”)

This means this example DOES now work!

import java:java.lang.Math;
[ 1, 2.4, 3.9, 4.5 ].map( Math::floor ) // [ 1.0, 2.0, 3.0, 4.0 ]

One thing I’m not sure about is the struct ones-- since CF has no concept of a Map.Entry like Java to encapsulate the key and value in a single object, we’re sort of stuck with a key/value pair of args to really be useful. So the struct BIFs won’t pass the struct in the 3rd argument right now when calling a Java method or FI/SAM, instead just passing the two key/value args. You can still get the underlying Java Map out of a struct

{}.getWrapped()

or drop into a Java stream of entry sets

{}.getWrapped().entrySet().stream()

… which is quite easy with our BL Closure/lambda bridge.

// This is BoxLang code!!
{ foo:"bar" }
  .getWrapped()
  .entrySet()
  .stream()
  .parallel()
  .forEach( e -> println( e.getKey() & " : " & e.getValue() ) )

I could prolly add a helper method for getting a stream of entry sets to help with LoD

In other news, I took this main effort a step further. In addition to getting and passing around a reference to a Java method, you can now ALSO pass Java Lambdas, or really any instance implementing a FI (functional interface) or SAM (single abstract method).

Ticket: [BL-338] - Welcome

This change is basically in the function caster and any UDF or BIF with an argument typed as function will cast (wrap up) a SAM so it can be invoked as a BoxLang function just like we’re doing with methods. Note, this doesn’t change the behavior of the BIF isCustomFunction() which is still checking specifically for the BL UDF/Closure/Lambda types.

It’s a little difficult to come up with a simple stand-alone example purely in BL, so the example in the ticket uses some Java code to define a functional interface instance (lambda) and then passes that into my BoxLang context to use it as an array filter function.

// Create Java lambda
Predicate<String> myJavaPredicate = ( t ) -> t.equals( "brad" );
// Put it in the variables scope to use
variables.put( "myJavaPredicate", myJavaPredicate );

// Now use the lambda above in BoxLang code directly as a function
instance.executeSource(
    """
        myArry = [ "brad", "luis" ];
        result = myArry.filter( myJavaPredicate );
    """,
    context );

Here is a better example that uses a functional interface not declared as a Lambda. It’s nearly identical to the one in my first post, but instead of directly grabbing the compare method, we just pass the comparator directory which is recognized as a SAM and the compare() method is auto-discovered as the single abstract method in our implemented interface.

import java.util.Collections;
[ 1, 7, 3, 99, 0 ].sort( Collections.reverseOrder()  )

Checking if a class implements a functional interface is pretty straight forward, and I can even optimize it as an instanceof check for the known functional interfaces in the JDK like Predicate, Consumer, or Supplier.

Detecting a SAM is a little more sketchy of an operation, as basically ANY class implementing ANY interface that has ANY singular abstract method can be a SAM. Furthermore a class may implement more than one SAM interface and with BL’s loose typing, it’s not clear which one you intend to call so we just have to pick the first one and hope.

Additionally, I found (when some tests started failing) that there are several unexpected JDK classes which fit the definition of a SAM including java.lang.Double and java.lang.String. Both of those implement Comparable which has a single abstract method, so that means you can literally just pass a string to a UDF expecting a function, and BL will go, “oh look, a SAM” and wrap it up.

I’m not sure how I feel about that and I’m tempted to limit it to SAM’s specifically bearing the functional interface annotation, but even if I do that, it’s still too wide of a net. The Comparable interface does already use the functional annotation, yet is implemented by a lot of classes. Open to suggestions here. I love supporting stuff, but this one almost feels like it may bite us given the ambiguity in BoxLang. In java, the strict type of the receiving argument would remove this ambiguity.

seancorfield · July 11, 2024, 4:29pm

This is a reasonable compromise and there is certainly precedent for this (as noted earlier, Clojure specifically has reduce with reduce-kv for the struct key/value approach). Interesting to learn about .getWrapped() – and as you say, with the greatly improved Java interop in BL, that makes the Map.Entry approach easy to reach for if you need it.

Yeah, there are definitely some ambiguities around SAMs and without some sort of Java interop type hinting syntax, you’re kind of at the mercy of “whatever matches”. Clojure chose to only implement auto-coercion to FIs but then it also has a type hint syntax to disambiguate calls.

Limiting to just FIs isn’t too bad a restriction – but you’d need to note it in the docs. There are a few SAMs that are really convenient to have auto-coercion for (that are not FIs). I’d have to go digging in the recent discussions around this in Clojure because I ran across a couple that we used heavily at work where auto-coercion could have simplified our code.

seancorfield · July 11, 2024, 5:49pm

One thing that Clojure introduced recently as part of this whole related method references etc stuff is “method values”.

In BL, you can get a Java static method as a reference via Math::floor for example. In Clojure you can also get a member function generically, and it wraps it in a function for you. The equivalent in BL would be something like:

import java.lang.String;
["hello", "world"].map( String.toUpperCase ) // ["HELLO", "WORLD"]

I don’t know whether . or :: would be appropriate here or whether you’d want a new syntax (the closest equivalent to what Clojure provides would effectively be String::.toUpperCase – like a static reference but with a . to indicate an instance method instead of a static method).

Behind the scenes, this becomes:

import java.lang.String;
["hello", "world"].map( s -> s.toUpperCase() ) // ["HELLO", "WORLD"]

Thoughts?

It’s not much “sugar” over the inline lambda…

var f = s -> s.toUpperCase();
// vs.
var f = String::.toUpperCase;

but it just feels more “symmetrical” to me.

bdw429s · July 11, 2024, 6:55pm

Ironically, in my example above of the Comparable interface, Comparable actually has the functional interface annotation, which is a little odd since you’d never really use a Lambda to represent that SAM as it’s more of an instance method and doesn’t seem very “functional” to me.

Anyway, even if I were to limit to just FI’s basic types like String and Double would still fit that. This already has caused issues in my arraySort() tests, where the first argument can be a string sort type or a call back, but when I pass a string to the Function caster and ask, “can, you turn this into a function?”, the cast says “yep, I can!” I’m realizing that just because a type implements a FI/SAM doesn’t necessarily mean it’s usable as a function in any arbitrary scenario.

In the case of our higher order BIFs, I know what specific FIs I actually are about. For example arrayEach()/array.each() specifically can use a java.util.function.Consumer, not just ANY FI. So the better question to ask I think is “Can this this class be converted to a function that is a Consumer?”. That’s a much more specific question that yields a more useful answer and I’ll have a think about modifying the function caster to be able to limit that. I’m imagine some sort of BL syntax where a UDF could type its arguments like this

function myCustomFunc( Function-Consumer callback, Array data ) {
  for( var e in data ) {
     callback( e )
  }
}

or perhaps this (to avoid parsing issues with special chars in the type)

@callback.type function:Consumer
function myCustomFunc( callback, Array data ) {
  for( var e in data ) {
     callback( e )
  }
}

so we could advise the function caster on the specific FIs that were valid to wrap.

Such a limit would only apply to a FI most likely as I assume most “known” SAMs are specifically FI’s (consumer, predicate, producer, etc). I’m not exactly sure how I would provide a hint to the function caster on what sort of SAMs to allow unless I knew the specific interface name ahead of time. I don’t have any good examples of this right off so it’s hard to say.

I’m very interested in what Clojure is doing here. I’ll do some Googles, but any docs you may have would be great.

Ooh, this is very interesting indeed. Right now BL does allow you to get a method reference of a static method (which has no associated instance) and a method refence to an instance method (which retains the original instance) but I hadn’t considered a method reference that really only tracked the name of the method and called that method name on whatever instance was passed in. I’m curious what Clojure would do in this case:

[ "hello", 42 ].map( String::.toUpperCase )

where the value passed to the map may not be a string, but perhaps Clojure’s array typing is strong enough to never allow that scenario.

seancorfield · July 11, 2024, 8:19pm

Clojure only tries to implement FIs (and SAMs if it decides to do it) for functions – since arbitrary data types don’t make sense: so it would never try to wrap a String or Double. Perhaps that’s your better approach here?

user=> (map String/.toUpperCase ["hello", 42])
Error printing return value (ClassCastException) at user/eval144$invoke (NO_SOURCE_FILE:1).
class java.lang.Long cannot be cast to class java.lang.String (java.lang.Long and java.lang.String are in module java.base of loader 'bootstrap')

You asked for String’s .toUpperCase() method so it knows the arguments must be cast to String and 42 is a Long.

bdw429s · July 11, 2024, 9:20pm

Got it. That’s a perfectly reasonable error, I was just curious. Our current dynamic object (Java proxy basically) will call any method on any object without any particular regard to what type the object is. BoxLang wouldn’t necessarily need to know we wanted the toUpperCase method from java.lang.String. We could simply say call the method of name X on the target object and our plumbing would look the same. Storing the type would be more for validation purposes.

I did some research (using ChatGPT, which I only somewhat trust). There’s been a couple different use cases discussed here

passing invokable functions declared in BoxLang into a Java class and allowing it to be used directly as a FI/SAM.
passing a method declared in a Java class or a class implementing a FI/SAM that was declared in java to Boxlang and having BoxLang treat it as an invocable function.

I found documentation on the first bullet in Closure, including the typing hints that allows Clojure to optimize the bytecode to avoid reflection. I recall asking Railo/Lucee years ago about introducing hints to optimize bytecode to avoid reflection, but I couldn’t get them interested. I still have that idea in my back pocket if the invoke dynamic approach still doesn’t provide enough performance gain.

Regarding the second bullet, ChatGPT tells me that Clojure doesn’t do auto-coercion when going that direction. Instead, it seems all the cases of wrapping up an existing instance of a FI/SAM just involve wrapping it in a function like so (BL code)

invokableBLFunction = i => existingJavaSAMInstance.accept( i )
// now I can call my consumer as a BL function
invokableBLFunction( "foo" )

This has me questioning if I’m trying too hard to make this “just work”. Is it acceptable to force devs to manually wrap FI/SAM instances to make them a first class function or is there benifit (and perhaps Clojure features ChatGPT doesn’t know about) to do this.

Basically, is there value in this idea I’ve been noodling on:

// In this case, the "apply" method is not declared explicitly, but inferred from introspection of the SAM interface
invokeableBLFunction = existingJavaSAMInstance castas "function:java.util.function.Consumer"
// now I can call my consumer as a BL function
invokableBLFunction( "foo" )

I kept finding docs for reify in Clojure, but that appears to be the basic equivalent to createDynamicProxy() for implementing an interface on the fly, but with the ability to pass an anonymous function instead of a entire class. (thus bullet #1)

This is all great discussion! I want to make sure we do this right.

seancorfield · July 11, 2024, 10:00pm

Right, the features I’m talking about are new in Clojure 1.12 (not “Closure” BTW) and it is still in Beta releases (we’re running it in production already), so ChatGPT wouldn’t know about that yet.

Clojure - Clojure 1.12.0-alpha10 talks about it a bit.

The auto-coercion to FIs is talked about briefly here: Clojure - Clojure 1.12.0-alpha12

Clojure - Clojure 1.12.0-beta1 summarizes all the changes at a very high level for this release.

The main features for what we’re talking about:

SomeClass::staticMethod – Java static method reference treated as a BL function (Clojure has had this for years)
SomeClass::.instanceMethod – Java instance method reference treated as x -> x.instanceMethod() (new in Clojure 1.12)
someBLfunction passed when Java expects FI or SAM – BL function/closure/lambda is auto-proxied to the FI/SAM-implementing class with appropriate method call (new in Clojure 1.12 – and for now they are only handling FIs)

I don’t think this example plays into any of that invokableBLFunction = i => existingJavaSAMInstance.accept( i ) since that’s already doable and doesn’t need any new functionality to work?

bdw429s · July 12, 2024, 4:41am

Right, that’s why I thought Clojure didn’t have an equivalent feature, and it seems it doesn’t.

That’s what I’m asking I mean, we can make an argument that there are also more verbose ways of doing the other features as well that don’t involve auto-coercion. Your second bullet, for example, is just saving a few characters of boilerplate. So the question is, is there value in a language providing an automatic way to take a FI/SAM declared in java and wrap it up as a first-class function? And if so, do any of the proposed approaches I’ve put forward (basically making it part of the casting system but with hints) make any sense?

seancorfield · July 12, 2024, 6:03am

I’m confused. I’m not even sure what you think this “feature” is.

This… just doesn’t make sense to me.

I’ll try to clarify what I think are the four cases we were discussing – three relate to method references, only one relates to FI/SAM:

SomeClass::staticMethod – treating such Java static method references as if they were full-blown BL functions (Clojure has supported this for years)
someObject.instanceMethod – treating such bound Java instance method references as full-blown BL functions (Clojure does not support this)
SomeClass::.instanceMethod – treating such unbound Java instance method references as functions that can later be called on a specific object, with arguments (Clojure is adding this in 1.12; this is the new functionality I was discussing above)
passing a BL function (closure/lambda) to a Java method that expects a FI/SAM implementing class instance – auto-coercing the BL function to an instance of a proxy class that implements that FI/SAM (Clojure previously required reify to do this but in 1.12 is natively supporting this auto-coercion for FIs only)

The fourth point was where your tests broke because strings are Comparable and that’s an FI – so my suggestion was for the caster to only allow functions to be cast and treat non-functions as an error.

Your last post seemed to suggest a fifth case which… did I miss that?

spillsthrills · July 12, 2024, 8:26pm

I think it is worth looking at how Kotlin does this too:

seancorfield · July 12, 2024, 9:11pm

That works well in a language with static types – Kotlin is requiring you to specify the interface type as a “hint” for converting the lambda into a instance of a class that implements the SAM.

If you don’t have static types, you either need new syntax added to support that, or you need to be able to do it implicitly, based on the target of the usage (which is what Clojure does and what we’re talking about BoxLang doing).

bdw429s · July 12, 2024, 11:24pm

Yep, and those all make sense to me.

Yep, there is 100% a 5th case I’ve been discussing for the past 4 message of mine I thought I had described it well, provided examples, linked to the JIRA ticket for, etc but I guess we’re talking about too many things at once here.

This is not the same as taking first class functions of BoxLang and passing them into Java-land so they can be used by the JDK or 3rd party Java libs as a FI/SAM This is the opposite where BoxLang can take an existing instance of a Java FI/SAM and use it as a first class BoxLang Function. So you see, the other way around!

Here is one of the examples again from above to demonstrate a use case for this:

import java.util.Collections;
// ArraySort auto-casts the FI/SAM to a first-class BoxLang function
[ 1, 7, 3, 99, 0 ].sort( Collections.reverseOrder()  )

So, if we look at the Collections class, see the reverseOrder() method returns a pre-built Comparator instance, which is a FI/SAM interface.

Now, I COULD have done this additional boilerplate

import java.util.Collections;
myJavaComparator = Collections.reverseOrder()
[ 1, 7, 3, 99, 0 ].sort( (a,b) => myJavaComparator.compare( a, b ) )

but as you can see in my first example, auto-casting the Comparator SAM to a first-class BoxLang function allows me to use this java comparator just directly with the BoxLang higher order functions.

Another potentially-valid use case I haven’t taken the time to whip up would be passing a Java Lambda to a BoxLang method invocation requiring a function via JSR-223.

seancorfield · July 12, 2024, 11:41pm

Got it! Now that makes sense. Hadn’t thought of that before but it would be quite nice too!

Java Method References and higher order functions

Latest Ortus News

GIVING BACK

HARVESTING IN SPANISH

WOODSEDGE