All special characters showing up as à (A tilde)

Using coldbox, Coldfusion 9.

I have tested this with a form-post and a url parameter. In both cases, I submit the string:
“à á Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü ß”,
and in both cases I immediately dump the input to the browser, which now looks like:
“à á Ã Ä à Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ô.

The meta-tag has: “charset=utf-8” and I have also tried “charset=iso-8859-1”. It makes no difference.

GetLocale() = en_IE
GetEncoding(“url”) = UTF-8
GetEncoding(“form”) = UTF-8

Now, what’s interesting is that I built a simple CF page on the same server but outside of the Coldbox framework and the characters display correctly after the form/url post.

Of course, in Coldbox, the form and url values are transferred to the RequestCollection (RC). If I dump the RC immediately after the form/url post I see the wrong characters.

Therefore it is starting to look like Coldbox is taking the ‘good’ characters out of the native url/form scope and putting the ‘bad’ characters in their place in the RC.

Can anyone suggest where I can look next? Is there a ColdBox setting I should look for? Might it be something else entirely?

Just to clarify, because the characters I entered in my question are not showing up correctly (:-)) every special character shows up as a à (A-tilde).

So, I just had success using ColdBox 4.1.0 and CF902 and Chrome.

What I did was create a method in my Main.cfc handler called weirdChars:

function weirdChars(event, rc, prc) { WriteDump(rc.chars);abort; }

Then just accessed my program at: index.cfm/Main/weirdChars?chars=à á Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü ß

And the output to the browser was as you’d expect, with all the characters looking correct.

Of course the browser then showed the URL encoded as: index.cfm/Main/weirdChars?chars=à%20á%20Ã%20Ä%20Å%20Æ%20Ç%20È%20É%20Ê%20Ë%20Ì%20Í%20Î%20Ï%20Ð%20Ñ%20Ò%20Ó%20Ô%20Õ%20Ö%20Ø%20Ù%20Ú%20Û%20Ü%20ß

Wish I could be of more assistance but I wanted to at least report success with a similar setup as yours to give you hope.

Wes

Also, just wanted to say your first post displayed just fine on my browser (Chrome). Your first line of special characters were all unique. The second line you typed had a lot of A with ~ over them. Only saying this because you said this: Just to clarify, because the characters I entered in my question are not showing up correctly (:-)) every special character shows up as a à (A-tilde). and it got me wondering if maybe your first post didn’t even look correct to you in your browser…maybe a browser oddity.

Hi Wesley

What do you get if you reconfigure the url like this?
index.cfm/Main/weirdChars/chars/à á Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü ß

I get the same correct looking characters. I just tried on a new CB4 app created with CommandBox, but served via the CF9 built-in server.

I also tried with the Lucee server that is included with CommandBox and it was successful as well.

That’s annoying.

I just set up exactly the same function in my Main.cfc

function weirdChars(event, rc, prc) { WriteDump(rc.chars);abort; }

With this url: /index.cfm/main/weirdChars?chars=à á Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü ß
All displays correctly

With this url: /index.cfm/main/weirdChars/chars/à á Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü ß
All I get is A-tildes (!)

If that is not the case for you, then there must be something different in the way our SES routes are configured?? It doesn’t seem logical though…

That is annoying, but it does seem to work on my end. I am using a completely plain CB4 install from CommandBox. Have you already tried that? Just curious how it responds in your environment right out of the box. Is the server running on your locahost or through a corporate proxy/filter of some sort?

I am on a common development server that serves many applications, all sharing a common installation of ColdBox. The version is “ColdBox SEEK 3.6.0 1 John 5:12-13”.
Could it be version 3 vs 4 do you think?

I didn’t arrive on the ColdBox scene until 3.8.1 so my knowledge of CB history is rather limited. I just tested it on 3.8.1 and it looks correct as well.

If you download and use CommandBox it includes a built in version of the open source CF server called Lucee and a web server can be started from within a single directory and run locally on your machine. That would take any corporate network equipment out of the picture.

I’ll try to test CB3.6.0 as well.

Maybe try this:

  • Download and install CommandBox

  • Run CommandBox and cd C:\to\your\project\root

  • Run the command: start

  • When you’re done you can shut it down with command: stop

Tested it with ColdBox 3.6.0 with both URL formats and all characters still looked fine on CF902.

After reading all the messages so far I’m a little confused on specifically which setups are working and which ones aren’t. Chances are much more likely that the characters are getting messed with upon output to the browser and not in the internal handling of the request collection. Check out the “requestCapture()” method in the coldbox/system/web/services/RequestService.cfc component. You can see this is all ColdBox is doing:

if( isDefined( “FORM” ) ){ structAppend( rc, FORM ); }

if( isDefined( “URL” ) ){ structAppend( rc, URL ); }

I would recommend adding a dump/abort before those lines to inspect the URL and FORM scope and see if the characters look correct then. Also, dump out the rc after those lines to see what the characters look like. Keep ?fwreinit=1 in the URL while you’re testing so your changes to this file will get picked up.

Thanks!

~Brad

Hi Brad

The current situation is that Wesley’s setups are all working fine. But mine isn’t, using CB 3.6 and CF 9.

I ran the test you suggested.

With this url (traditional): /index.cfm/main/weirdChars?chars=à á Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü ß
URL scope before the conversion = fine.
RC scope after the conversion = fine.

With this url (SES): /index.cfm/main/weirdChars/chars/à á Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü ß

URL scope before the conversion = empty(!).
RC scope after the conversion = empty.

So the ColdBox process that deals with converting the URL scope when SES is being used is being affected before we even reach here.

I’ll keep on looking, but all further advice greatfully received…

I started messing around in \coldbox\system\interceptors\SES.cfc

I tracked the event back to what I think is the starting point where Coldbox takes what’s in the url and starts to work with it, which I think is the getCGIElement() function.

In the function getCGIElement(), I dumped out the CGI scope and CGI.PATH_INFO = /main/weirdChars/chars/à á Ã Ä à Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü ß
(Just to clarify, if those characters show up wrong again in this post, they are ALL all A-tildes, except the second one which is A-tilde followed by a “¡”)

So now I am wondering, how come the CGI scope is misinterpreting the special characters??

So can you reproduce the issue outside of ColdBox then using CGI[ ‘path_info’ ] ?

I believe the entry point to the SES interceptor is the onRequestCapture() method. Start there and follow its execution through until you find the point that the characters are incorrect.

Thanks!

~Brad

ColdBox Platform Evangelist
Ortus Solutions, Corp

E-mail: brad@coldbox.org
ColdBox Platform: http://www.coldbox.org
Blog: http://www.codersrevolution.com

Just out of curiosity, do you get the same results if you pass that via the request headers rather than via the url.

@Brad. Yeah, I can reproduce the error outside of ColdBox. What we are down to now is simply the question of why Apache is loading the wrong symbols into cgi.path_info. I’m not a server admin and am waiting to find one so we can investigate further. I’ll post back here what we find.

@Andrew. Errr… how would I do that? Via the console??

Looks like send via the header in the request may work, I googled your issue and came across this article. May it provide more of a solution to your problem.

Take special note on the rewrite rule.

http://stackoverflow.com/questions/2764446/problem-using-unicode-in-urls-with-cgi-path-info-in-coldfusion

So, just to round this off for everyone who might read this thread in the future…

We checked everything we could with Apache and came up with nothing. My solution therefore was to find a workaround. This I did by modifying the application so that, at the crucial time, it avoided SES routing and just sent my search string using old school ?foo=bar syntax. It’s a bit of a cop-out I grant you, but it seemed the only way.

The lesson I learned is; SES routing doesn’t always handle special characters correctly because the webserver (Apache/IIS) may not populate cgi.path_info correctly. If this happens to you, try to use ?foo=bar syntax, which puts foo into cgi.query_string and the special characters will be interpreted correctly.