The standard ‘SEO Friendly’ way to change a URL is with a 301 ‘moved permanently’ redirect. Search engines then attribute value to the destination page. This value is nearly as much as the original (assume 85-95%), if we believe redirects are lossy.
If we want optimal squeezing-every-last-drop-out SEO, we’re better off updating a resource on the same URL instead of redirecting that URL to a new location.
Stay with me.
But what if the resources are fundamentally different? Say I’ve enthusiastically converted a PDF to html. The filetypes are different. I’ve got to move from /resources/my-guide.pdf to /resources/my-guide, right?
Not so.
- Someone requests a .pdf file we have painstakingly converted into html.
- We serve them the .html version on the original (.pdf) URL.
- We retain all historical ranking benefit that URL possesses.
- URL may rank better due to the format change (additional semantic markup possible).
- URL will likely rank worse for ‘ {query} .filetype’ queries.
Sounds great, but how do we “serve them the html version on the original filetype URL”?
Server Response Headers
We can rename file.html to file.pdf, but we’re just going to throw errors if we do that. First we examine the actual server response headers for legitimate uses using curl (this can be done in browser). This is an actual pdf:
curl -I http://URL1.pdf HTTP/1.1 200 OK Date: Wed, 28 Oct 2015 16:35:49 GMT Content-Type: application/pdf Content-Length: 51500
And this is an HTML page being an HTML page:
curl -I http://URL2 HTTP/1.1 200 OK Date: Wed, 28 Oct 2015 16:35:43 GMT Content-Type: text/html
This is an HTML page masquerading as a pdf on a pdf URL. Note the filesize:
curl -I http://URL3.pdf HTTP/1.1 200 OK Date: Wed, 28 Oct 2015 16:35:13 GMT Content-Type: application/pdf Content-Length: 4680325
Note the difference? The impostor is being viewed as a pdf, and the browser is attempting to interpret it accordingly. Given an html file can’t be opened in a pdf viewer, we get the following:
So we need to inform the browser that the pdf is not a pdf.
Overwriting the Server Header
Using the folder’s local .htaccess we overwrite any pdf to requests type in the header. Create a new .htaccess file and type the following:
AddType text/html .pdf
“If it ends in .pdf, that means it’s a text-based html file. Not a pdf.”
curl -I https://ohgm.co.uk/test/chicken2/potato.pdf HTTP/1.1 200 OK Date: Wed, 28 Oct 2015 17:40:32 GMT Content-Type: text/html
Remember, it’s not actually a pdf file, it’s ‘potato.html’ renamed ‘potato.pdf’. This renders just fine in browser:
Also works perfectly in a text-based browser:
Here’s an animated gif on a .PNG URL (Update: CloudFlare seems to ‘protect’ against this).
We can do pretty much whatever we want here. This is perfect if we’re on Apache and great if all the resources we might want to do this for reside in the same location. Typically they won’t, and the htaccess trick won’t be possible.
File Aliasing
We’ll have to start aliasing. You probably have some version of this running already. Think of it as an alternative to URL rewrites – “When someone asks for this file, give them this file instead. Don’t redirect them. Give them this file.”
Read Mapping URLs to Filesystem Locations.
The basic syntax is as follows:
Alias /location/of/filea/ /location/of/fileb/
If you request ‘/filea/potato.pdf‘ you’ll receive ‘/fileb/potato.pdf‘, without changing URL. If we’ve added the “AddType text/html .pdf” htaccess rule successfully in /fileb/ then we’ve hopefully served a file of our choosing in the format of our choosing.
Other Uses
There are so many evil uses for this.