Tag Archives: http

CMS Redirects to 404 Pages for Dead Links

I have an individual CMS running for a customer, who can edit his events, news etc. in a typical admin area. Each of the items had an expiry date. On any call to a deep URL to one of the expired items (e.g. from Google search results) it would make a redirect to the custom 404-page like this:

header("Location: http://www.mycusweb.de/errors/404.php");

The 404-Page itself would then respond with its own 404 header and display a beautiful error page.

As I found out, this would NOT have the effect that I expected it to have, namely that the next Google bot crawling these URLs would kill the URL from the index since it ended up on a proper 404 page via the redirect. The headers in sequence with the above redirect look like this:

GET /pages/events.php?id=123 HTTP/1.1
HTTP/1.x 302 Found
GET /errors/404.php
HTTP/1.1 HTTP/1.x 404 Not Found

But wait, there is 302. To verify this for your own redirects, you can use the Firefox add-on Live HTTP Headers.

The problem here is the use of a default redirect, which uses a status code ’302 Found’. This equals the old ’307 Temporary Redirect’ and has the meaning for the bot, something hast just temporarily moved. In order to ‘tell’ the bot that this page should no longer be indexed and deleted, you must use a ’301 Moved Permanently’ and you do it like this:

header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.mycusweb.de/errors/404.php");

If you’re interested in the HTTP status codes, you can find the RFCs here.

After all of your redirects that should ‘tell’ bots to remove URLs from the search index use the 301 variant, you can then use the Google Remove Tool to speed things up and call the bot.