[coldbox:26532] SES URLs with extra stuff at the end still parsing after valid Route has been found

I believe you may be over thinking a bit. Here’s why:

  1. You’re never going to be viewing orders without some kind of authentication, correct?

  2. /order/order/view would throw a 404 with the route /:handler/:action? unless you have an action named order in the Order handler.

  3. As for appending routes, let’s use pagination as an example: /products/category/1/page/2/sort/name is a valid use of the name/value matching. For SEO, each variable changes the content of the page, which will help correct downlinking from search engines to your category and list pages.

  4. Search engines will only follow the links you provide them – via site map or in your browser. Only penetration tests and bad bots try to “guess” routes, endpoints, and query strings. You should protect against the former, and securely ignore the latter.

My personal preference has always been to keep the name/value matching. There are too many use cases, where it’s an accurate reflection of a change in the page content.

Jon

Jon,

Thanks for the insight. I’ll definitely check to see how the odd URLs were generated (either through bad links or self generated).

However, the question is this: How do I prevent duplicate content from being generated. SEO seems to be going through things such as lower case and trailing slashes to remove duplicate content, this seems to be a natural progression.

As an example, I’ll be using both the ColdBox blog and the WordPress blog.

ColdBox has the following URL: https://www.ortussolutions.com/blog/category/12-tips-of-commandbox-christmas/
If I was to go to https://www.ortussolutions.com/blog/category/12-tips-of-commandbox-christmas/i/am/not/a/valid/route, I would still resolve to the same URL, and therefore the same content.

In WordPress, I can go to the following URL: News – The Month in WordPress: July 2017 – WordPress.org.
If I was to go to https://wordpress.org/news/2017/08/the-month-in-wordpress-july-2017/i/am/not/a/valid/route, I would get a “Sorry, no posts matched your criteria”.

Thanks for your help in this matter.

-Chester

Like Jon, I think you’re inventing a problem where there isn’t one. If a silly user or a malicious bot appends random stuff to your URL, who cares? Google isn’t going to see nor index those URLs unless you present them in your site map or links on your pages. And that shouldn’t be the case, so just don’t worry about random junk someone might add on. In my opinion, it’s no different than adding ?foo=bar onto someone’s URL. They aren’t going to give me a 404, they’ll just ignore my extra parameters that they didn’t ask for.

Thanks!

~Brad

ColdBox/CommandBox Developer Advocate
Ortus Solutions, Corp

E-mail: brad@coldbox.org
ColdBox Platform: http://www.coldbox.org
Blog: http://www.codersrevolution.com

You do that by providing a canonical tag in the of your document. If the canonical link is the same, then search engines will index the canonical link- not the appended url. https://moz.com/learn/seo/canonicalization You remove trailing slashes and deal with casing there as well.

So, in my example below, the canonical link for pagination would include the paging parameters, whereas an appended URL which didn’t change the content of the page would display the canonical url without all the appended name/value pairs. This requires a little foresight, but can easily be handled dynamically or through your handlers.