Date Handling in BoxLang

Dates as strings are a PITA. For example 1/2/29. That could be read as:

  • 1st of January 1929
  • 1st of January 2029
  • 2nd of February 1929
  • 2nd of February 2029
  • 29th lunary cycle on the 2nd moon of the 1st planet from the sun

CFML allows for dates to used as strings in various functions. For example:
dateAdd('d', 30, '8/3/2014')

Is it worth being strict in BoxLang and not allow strings at all and only accept a datetime object instead?

I agree. Dealing with string dates is painful, terrible and just a PITA. They can be read in many different forms. I would be open to this and add compact functions for CFML to allow strings.

@bdw429s @jclausen thoughts?

1 Like

Maybe supporting an unambiguous and standard format like ISO-8601 variants would be sensible in a loosely-/dynamically-typed language.

Equally I’d be completely fine having date functions just take dates, and have like myDate = Date.fromString("1/2/29", "d/m/yy") method to quickly, easily, and unambiguously convert from strings.

The format string could conceivably default to ISO-8601 formats.

It def should not default to USAn format. Because a) it’s a minority format; b) honestly, what are you lot like? :wink:

1 Like

Yes! Maybe use parse as the naming convention?

:100: :grin:

I was discussing this a bit with @lmajano yesterday. I prefer to support as much of the lenient parsing that Lucee and ACF do in casting dates, but that parsing needs to be locale aware ( as well as zone aware ). Both Lucee and ACF are way too US-centric ( Lucee defaults to a hard-coding of the US locale for date parsing, if a locale is not specified in the arguments )

Of the BIFs lsParseDateTime takes the locale in account with every parse attempt, and we have tests for Chinese formats, Russian, etc.

Right now our DateTimeCaster is not aware of the timezone or the locale ( provided by either the request or runtime ). As such, there are hard-coded parsing masks. Note in the link above that the mask dd/MM/yyyy precedes MM/dd/yyyy. This means that 1/2/2024 will always get parsed first as Nov 1 2024. Only if it fails to parse the second position ( e.g. the number is greater than 12 ), will it ever fall back to MM/dd/yyyy. This is obviously problematic.

My preference would be to keep the lenient parsing, but make all string to date casting locale ( and tz ) aware from the context. Right now the plan is to move this to an interception, which receives the context. That will mostly eliminate the need for hard-coded masks and the result woud be:

  • If your locale is en_UK, the short form date mask will dd/mm/yyyy
  • If your locale is en_US, the short form would be mm/dd/yyyy

The same would apply in other locales with different short/medium/long form formats - which are currently only influenced beyond the JVM locale by the LS... datetime functions.

That said, you already have the ability to operate on date objects, via the member functions, so using the short format mask, you can do:

firstOfJune = lsParseDateTime( "31/5/2024", "en_UK"  ).add( "d", 1 );
ISOString = firstOfJune.format( "iso" );
1 Like

We initially discussed global type namespaces with static methods, but moved away from implementing them in the inital BL release - primarily because they were mostly going to end up duplicating existing BIFs ( e.g. myDate = parseDateTime("1/2/29", "d/M/yy") is the same as the above, and is fewer characters )

Existing CFML BIFs. I think in its language design, one should decouple BoxLang from CFML.

I recommend you put the CFML into the CFML compat module, and otherwise just don’t refer to it at all when designing the “lang” part of “BoxLang”.

There is no point making BoxLang the same as CFML except spelt differently. Not least of all cos CFML is a bit of a mess, and I don’t think the world needs another CFML. If yer gonna do that then just… don’t bother. Set yer sights lower and just make it CfmlBox which is a competitor to Lucee.

4 Likes

Existing CFML BIFs. I think in its language design, one should decouple BoxLang from CFML.
I recommend you put the CFML into the CFML compat module, and otherwise just don’t refer to it at all when designing the “lang” part of “BoxLang”.

I think there are places where we are doing that - or we’ve modified the BIFs, removed support for others, or have migrated specific implementations to the compat module. We will continue to do so.

The discussion around BIFs and implementation was an ongoing one that took months of discussion. In the end, the ones we chose to retain as core BoxLang BIFs were the ones we felt make the language better by retaining.

We will continue to add BoxLang only functionality and language constructs. That doesn’t mean an aptly named BIF like parseDateTime a perfectly valid BIF for BoxLang, as well as CFML.

If you are feeling like that BIF needs to be moved to compat, could you clarify? I already touched on the namespace choices in this first revision. @lmajano can weigh in on those, as well.

It’s not the same though, as the 2nd argument of parseDateTime is optional in CFML - so you end up being open to ambiguity. I think the mask should be required so that code is clear to read 100% of the time.

Yeah - there is a good opportunity in BoxLang (not BoxLang Compat) to not only keep what’s good about CFML but also get rid of (not just flag as deprecated) the stuff that isn’t.

I’m going to have to disagree on this one. I think keeping the mask optional is a good thing. Even Javascript’s Date constructor will accept ambiguous string input and attempt to parse it.. I think continuing to support a certain amount of ambiguity ( and support “best attempt” parsing ) is a good feature for the language.

…and Javascript’s date parsing is horrible - hence there being a million JS libraries to solve the problem :slight_smile:

I don’t want “best attempt” in code - I want to know exactly what it’ll do when I deploy on any server with any locale. Maybe that’s just me though.

Oh I more mean it form the perspective of how to position one’s thinking. Don’t do the “What Would Jesus^h^h^h^h^hCFML Do?” thing, when designing BoxLang.

CFML is not a well-designed language. it’s a mess. If you couple yer thinking to CFML too much when you design BoxLang, the world will just end up with another mess.

I think yer code example above is a good example of the mess:

firstOfJune = lsParseDateTime( "31/5/2024", "en_UK"  ).add( "d", 1 )

OK so first we use a global function. Then we call a method. Why. Why aren’t both operations calling methods? Or functions.

And like naming… that looks like “is parse date time” (I know it’s not). But what’s “LS”? Probably “Locale Specific” (I have to say I don’t actually know!! I’ve only been doing CFML for 25yrs after all ;-)). Why would one even need a special function for handling locale-aware dates? As opposed to what? non-locale-aware? Why would a language even have those?! Why is it not just DateTime.parse()? And on one hand we’ve got lsParseDateTime (DateTime at the end), but dateAdd (Date at the beginning). And… Date… not DateTime. And why is the dateAdd function noun-verb not verb-noun like one would generally expect, and recommend in one’s code?

CFML is a mess.

I understand its history and partly how it got to where it is. But a bit part of that history and why it’s a mess is lack of coherent planning. Plus really lack of a vision as to what sort of language it would be from the beginning.

Avoid this in BoxLang.

This is why I asked as soon as I heard about it “what sort of thing are you aiming for here?” (in the language). Luis pointed me to the Zoom meeting we had a coupla weeks back, but I really don’t think that answered the question (it was a sales / PR meeting really, not a technical one. Fine, but not what I was after).

3 Likes

I think that is a pretty compelling argument right there on where CFML has in the past tried to be helpful but just ended up creating something of a mess at times.

1 Like

I’m learning a lot from this public discourse. If you do a lot of this back and forth publicly, you’re going to train a lot of BoxLang users without much effort. I’m not at a place where I can dive into this now, but when I do, having seen conversations like this will be a huge help.

3 Likes

Yeah pretty gobsmacked to have JS cited as an example of date handling. In a positive way, anyhow.

Absolutely.

No loosey goosey.

There are ppl raising issues in the CFML community every month or so having being caught out by CFML’s “best attempt” parsing. It’s an example of “how not to”.

TBH though, I’d be OK with the pattern param being optional, provided the default was ISO-8601. Standards exist for a reason.

I could see this possibly working in like a reverse pattern inference approach. If the passed-in string was \d{4}-\d{2}-\d{2}, then it can be inferred it’s yyyy-mm-dd. If it’s \d{4}-\d{2}, it can be inferred it’s yyyy-mm. But if it’s \d{2}-\d{2}-\d{2} (which is not an ISO-8601 format), then the function call should throw a FormatNotSupportedException (etc). There should be no “ah they might mean yy-mm-dd, I’ll just assume that”. It’s not a standard format, so it should not work. Garbage in: garbage out. It’s OK for require the dev to be precise in their code, so as to remove the chance of being caught out by unexpected shit later on.

If they wanna pass in a string which is \d{2}-\d{2}-\d{2}, then they can provide the pattern as well, to demonstrate “yes, I am meaning to do this”.

The Date class could also expose consts that map to familiar formats, eg:

Date.parse("01/02/03", Date::US_SHORT) // where Date::US_SHORT = mm/dd/yy

(and don’t make up those consts… go check if there’s a standard / idiom for those too, in existing JVM languages)

4 Likes

Yup. I was just looking at Kotlin / Java and it will parse a string if it’s in ISO format - anything which is ambigous it errors on.

import java.time.LocalDate
fun main() {
    /*
     * these will error as ambiguous
     * val localDate = LocalDate.parse("01-02-2024")
     * val localDate = LocalDate.parse("06-02-01") 
     * */
    // this works as in yyyy-MM-dd
    val localDate = LocalDate.parse("2024-02-01") 
    println(localDate)
}

I agree with this. The function is probably more aptly named parseLocalizedDateTime ( even if it is more verbose ). That’s actually a trivial change to change the function. I think I may do that and ask Brad to add a transpiler rule for CFML.

Right now the LSParse... methods flow through to the main Parse.. BIF implementations. The only differences is that they pass through the locale argument explicitly. BIFs are locale aware by default. For example:

setLocale(  "zh_Hant" );
writeOutput( dateFormat( now(), "long" ) );

will give you the same result as:

lsDateFormat( now(), "long", "zh_Hant"  );

Lol. :slight_smile: Yep. CFML is a mess, but we are trying to be careful that we keep the good and put the rest to bed in the compat module. Much of CFML’s loosey-goosey date parsing comes from all of the different legacy formats which may come out of different DBMS platforms, etc. Personally, I use ISO now in all DateTime operations when I develop.

I’m not dismissing your concerns our criticisms here. I just think there’s benefit in being flexible - and the java.time package actually makes this pretty easy, because we can grab the locale and construct a chain of optional parsers that will recognize different localized formats that are commonly used.

We’re not going to get crazy and support edge cases and I hear your concerns. We don’t want to go off in to the weeds either.

1 Like

I’ve been playing with LSDate functions to see if I was worrying about something trivial, but and turns out that CFML date parsing really is terrible - it just tries masks until it can parse it regardless of what the locale is.

Some simple test cases

<cfscript>

	function assert( actual, expected ) {
		var wanted = isSimpleValue( expected) ? expected : serializeJSON( expected );
		var got = isSimpleValue( actual ) ? actual : serializeJSON( actual );
		if ( wanted == got ) {
			writeOutput( "âś”" )
		} else {
			writeOutput( "âś—" )
		}
		writeOutput( " expected `#wanted#`, got `#got#`<br>" );
		if ( wanted != got ) {
			// writeDump( arguments )
		}
	}
	
	writeoutput("<hr>Passing English (UK)<hr>");
	setLocale("English (UK)");
	assert( getLocale(), "English (UK)" );
	assert( lsIsDate("01/02/2024" ), true );
	assert( lsIsDate("02/01/2024" ), true );
	assert( lsIsDate("02/31/2024" ), false );
	
	writeoutput("<hr>Passing English (US)<hr>");
	setLocale("English (US)");
	assert( getLocale(), "English (US)" );
	assert( lsIsDate("01/02/2024" ), true );
	assert( lsIsDate("02/01/2024" ), true );
	
	// but then things go bad...
	writeoutput("<hr>Incorrect English (UK)<hr>");
	setLocale("English (UK)");
	assert( getLocale(), "English (UK)" ); // correct
	// not a valid dd/mm/yyyy UK date
	assert( lsIsDate("01/31/2024" ), false ); // correct
	writeDump( lsDateFormat("01/31/2024", "full" ) ); // WRONG! Wednesday, 31 January 2024 
	
	writeoutput("<hr>Incorrect English (US)<hr>");
	setLocale("English (US)");
	assert( getLocale(), "English (US)" ); // correct
	assert( lsIsDate("31/01/2024" ), false ); // not a valid US date
	
	// not a valid US mm/dd/yyyy date so should ERROR but doesn't
	writedump( lsDateFormat("31/01/2024", "full" ) ); // WRONG! Wednesday, 31 January 2024 

</cfscript>

I really don’t understand why any language would want to allow incorrect data to be processed - these are supposed to be locale aware functions. So going back to my original thought about BoxLang (note not CFML), I think being strict and not allow strings at all and only accept a datetime object makes sense.