Preserve Link Equity With File Aliasing

The standard ‘SEO Friendly’ way to change a URL is with a 301 ‘moved permanently’ redirect. Search engines then attribute value to the destination page. This value is nearly as much as the original (assume 85-95%), if we believe redirects are lossy.

If we want optimal squeezing-every-last-drop-out SEO, we’re better off updating a resource on the same URL instead of redirecting that URL to a new location.

Stay with me.

But what if the resources are fundamentally different? Say I’ve enthusiastically converted a PDF to html. The filetypes are different. I’ve got to move from /resources/my-guide.pdf to /resources/my-guide, right?

Not so.

  • Someone requests a .pdf file we have painstakingly converted into html.
  • We serve them the .html version on the original (.pdf) URL.
  • We retain all historical ranking benefit that URL possesses.
  • URL may rank better due to the format change (additional semantic markup possible).
  • URL will likely rank worse for ‘ {query} .filetype’ queries.

Sounds great, but how do we “serve them the html version on the original filetype URL”?

Server Response Headers

We can rename file.html to file.pdf, but we’re just going to throw errors if we do that. First we examine the actual server response headers for legitimate uses using curl (this can be done in browser). This is an actual pdf:

curl -I http://URL1.pdf
HTTP/1.1 200 OK
Date: Wed, 28 Oct 2015 16:35:49 GMT
Content-Type: application/pdf
Content-Length: 51500

And this is an HTML page being an HTML page:

curl -I http://URL2
HTTP/1.1 200 OK
Date: Wed, 28 Oct 2015 16:35:43 GMT
Content-Type: text/html

This is an HTML page masquerading as a pdf on a pdf URL. Note the filesize:

curl -I http://URL3.pdf
HTTP/1.1 200 OK
Date: Wed, 28 Oct 2015 16:35:13 GMT
Content-Type: application/pdf
Content-Length: 4680325

Note the difference? The impostor is being viewed as a pdf, and the browser is attempting to interpret it accordingly. Given an html file can’t be opened in a pdf viewer, we get the following:

pdfnotworking

So we need to inform the browser that the pdf is not a pdf.

Overwriting the Server Header

Using the folder’s local .htaccess we overwrite any pdf to requests type in the header. Create a new .htaccess file and type the following:

AddType text/html .pdf

If it ends in .pdf, that means it’s a text-based html file. Not a pdf.

curl -I http://ohgm.co.uk/test/chicken2/potato.pdf
HTTP/1.1 200 OK
Date: Wed, 28 Oct 2015 17:40:32 GMT
Content-Type: text/html

Remember, it’s not actually a pdf file, it’s ‘potato.html’ renamed ‘potato.pdf’. This renders just fine in browser:

chickenchicken

Also works perfectly in a text-based browser:

lynxchickenchicken

Here’s an animated gif on a .PNG URL (Update: CloudFlare seems to ‘protect’ against this).

We can do pretty much whatever we want here. This is perfect if we’re on Apache and great if all the resources we might want to do this for reside in the same location. Typically they won’t, and the htaccess trick won’t be possible.

File Aliasing

We’ll have to start aliasing. You probably have some version of this running already. Think of it as an alternative to URL rewrites – “When someone asks for this file, give them this file instead. Don’t redirect them. Give them this file.

Read Mapping URLs to Filesystem Locations.

The basic syntax is as follows:

Alias /location/of/filea/ /location/of/fileb/

If you request ‘/filea/potato.pdf‘ you’ll receive ‘/fileb/potato.pdf‘, without changing URL. If we’ve added the “AddType text/html .pdf” htaccess rule successfully in /fileb/ then we’ve hopefully served a file of our choosing in the format of our choosing.

Other Uses

There are so many evil uses for this.

Leave a Reply

Your email address will not be published. Required fields are marked *