SES courses and URL encoded question marks

I ran across a little bug today using the SES interceptor from an
error that popped into my logs. (Coldbox 2.6.3)
If you have an SES URL that ends in a URL encoded question mark with
NOTHING after it like so (%3f = ?):

http://www.example.com/index.cfm/%3F

You receive the following error "The 3 parameter of the Mid function,
which is now -1, must be a non-negative int".

If we open up \system\interceptors\ses.cfc and scroll down to line 259
we will see a check for a ? in the path_info:
<cfif requestString CONTAINS "?">

If we find it, we then run a regular expression that looks for a
question mark followed by 0 or more characters and then an equals sign
(=) on line 261:
<cfset varMatch = REFind("\?.*=", requestString, 1, "TRUE") />

The varMatch struct is used down on like 268 in a mid function that
assumes the regular expression found something:
<cfset requestString = Mid(requestString, 1, (varMatch.pos[1]-1)) />

The problem here is that the regex statement isn't necessarily going
to find the same the the "contains" statement does since it is a bit
more specific.

I know this is basically more of an anomaly, but one of my users
managed to find the error and it could be easily circumvented by
running the regex up front and using the results of that instead of
the contains statement.

~Brad

Brad can you post your solution.

Luis

Solution would be interesting for Unicode content too.

We have still problems with Unicode content on SES URLs and need to spend some time to check in details.

Sure Luis,

I simply replaced lines 258 - 269 of ses.cfc which used to read
thusly:
<!--- fix URL variables (IIS only) --->
<cfif requestString CONTAINS "?">
  <!--- Match the positioning of the ? --->
  <cfset varMatch = REFind("\?.*=", requestString, 1, "TRUE") />
  <!--- Now copy values to the RC. --->
  <cfset qsValues = REreplacenocase(requestString,"^.*\?","","all")>
  <cfloop list="#qsValues#" index="qsVal" delimiters="&">
    <cfset rc[listFirst(qsVal,"=")] = listLast(qsVal,"=")>
  </cfloop>
  <!--- Clean the request string. --->
  <cfset requestString = Mid(requestString, 1, (varMatch.pos[1]-1)) />
</cfif>

...to read like this:
<!--- fix URL variables (IIS only) --->
<!--- Match the positioning of the ? --->
<cfset varMatch = REFind("\?.*=", requestString, 1, "TRUE") />
<cfif varMatch.pos[1]>
  <!--- Now copy values to the RC. --->
  <cfset qsValues = REreplacenocase(requestString,"^.*\?","","all")>
  <cfloop list="#qsValues#" index="qsVal" delimiters="&">
    <cfset rc[listFirst(qsVal,"=")] = listLast(qsVal,"=")>
  </cfloop>
  <!--- Clean the request string. --->
  <cfset requestString = Mid(requestString, 1, (varMatch.pos[1]-1)) />
</cfif>

The only difference really is I removed the contains statement, and
consistently used the the same regex search to decide whether the CFIF
should be entered. Performance is the only reason I can imagine that
contains would have been used, but that's just a guess. The problem
was that the contains statement is a more liberal search than the
regex which looked for additional characters along with the question
mark. Just because the CFIF got entered didn't necessarily mean that
the regex was going to find anything.

Also, here is a more realistic URL from our application which errored
before my change:
http://devadmin.example.com/index.cfm/ContactManager/ehCompanyAdmin/search/%3F

@Oguz: I don't think my code change above really deals with encoding
so much as consistency in the token being searched for since the mid
function at the bottom of the CFIF block assumes the regular
expression WILL find a match if the CFIF has been entered.

~Brad

bug logged and will be fixed soon.