Disallow Shopify URLs via Robots.txt 1

Disallow Shopify URLs via Robots.txt

As an SEO, you might have to assist a site in moving from one CMS to another. In this instance it was a Magento –> Shopify replatform. Going from one faceted navigation mess to another led to this post.

You Cannot Edit the Shopify Robots.txt


Update: now you can.

This is great news, but understandably Shopify won’t be supporting your implementation of this particular footgun.

Read the documentation on this new feature here.


Archive post content on how things used to be in the olden days, you kids have it so easy now

I’ve seen the inability to edit the robots.txt file on Shopify upset SEO practitioners and developers. Understandably this annoys people:

Disallow Shopify URLs via Robots.txt 2

Update: I’ve found a much better (worse) way to use a custom robots.txt on Shopify.

As far as I’m aware Shopify have done this to cut down on support tickets. It’s easy to mess up a site’s organic performance using robots.txt, and understandably they want to be a ‘good’ platform for SEO, so they’re enforcing a one-size-fits-all. Like it or not, you just have to work with this:

# we use Shopify as our ecommerce platform

User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /1234567/checkouts
Disallow: /carts
Disallow: /account
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
Disallow: /blogs/*+*
Disallow: /blogs/*%2B*
Disallow: /blogs/*%2b*
Disallow: /*design_theme_id*
Disallow: /*preview_theme_id*
Disallow: /*preview_script_id*
Disallow: /apple-app-site-association
Sitemap: http://shopify-domain.com/sitemap.xml

# Google adsbot ignores robots.txt unless specifically named!
User-agent: adsbot-google
Disallow: /checkout
Disallow: /carts
Disallow: /orders
Disallow: /1234567/checkouts
Disallow: /*design_theme_id*
Disallow: /*preview_theme_id*
Disallow: /*preview_script_id*

User-agent: Nutch
Disallow: /

User-agent: MJ12bot
Crawl-Delay: 10

Pretty neat, but it doesn’t help us.

The ‘Solution’

Or does it? To preface this section, this won’t let you edit the file. I have literally lied to you in the URL. It’s a kludge like the rest of the tips on this blog. But we can work with it. Concentrate on the following:

Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
  • “%2B” is URL encoding for the plus sign ‘+’
  • “%2b” is also URL encoding for the plus sign ‘+’

The plus sign is currently used as a way of handling crawl of the faceted navigation. When facets are stacked (more than one is selected), the URLs generated use the plus symbol as a separator.

https://shopify.ohgm.co.uk/collections/white-lamps/white+off-white” BLOCKED

So as you might have guessed already, the approach here is to rewrite the non-stacked Shopify Facet URLs we don’t want crawled so that they conform to the above robots.txt snippets by containing either a plus ‘+’ or a space ‘ ‘. The easiest method will be to append either of these features to any URLs you want blocked:

"https://shopify.ohgm.co.uk/collections/white-lamps/white+" BLOCKED
"https://shopify.ohgm.co.uk/collections/terrible/idea+" BLOCKED
"https://shopify.ohgm.co.uk/collections/seriously/do-not-do-this " BLOCKED

Mucking around like this is probably something that’ll raise the hackles of a few you (‘why can’t you *just do it properly*‘), so as always:

Don’t just do things because you can.

Blocking Faceted URLs in Shopify

Shopify uses a Ruby based templating language called liquid (read more here). We can edit the URLs we’re generating using something like the below snippet:

{% if current_tags %}

      <li>{{ link.title | link_to_add_tag: link.title }}</li>

      {% else %}

      <li><a href="{{ collection.url}}/{{ link.title | handleize}}+">{{ link.title }}</a>

        {% endif %}

That’s all there is to this. You append a plus or a space to the category page URLs you don’t want crawled. I wouldn’t recommend doing this in >95% of cases. This was done as part of a migration – we moved to this monstrous URL format – not redirecting existing Shopify URLs to this format. In this case it seems to have worked out OK:

Disallow Shopify URLs via Robots.txt 3
This might be something I worked on as a one-off project, or I might just be putting a red line on a graph that looks nice.

P.S. The above code snippet is no doubt hugely flawed and will not apply in your case. It’s the notes I’ve dug out from nearly a year ago. Just remember the idea.

You should not implement the things you read on this blog.

I cannot emphasise this enough.

I’ve found a better (possible even worse) solution.

3 thoughts on “Disallow Shopify URLs via Robots.txt”

  1. This popped up in Twitter today, so here is a little correction for your fellow readers:

    > “%2B” is url encoding for the plus sign ‘+’
    > “%2b” is URL encoding for the space sign ‘ ‘

    Both are equal and translate to the plus sign. Escaped special chars in URLs are hex encoded and therefore (usually) case-insensitive.

    The URL encoding for the space sign is “%20”.

Leave a Reply

Your email address will not be published. Required fields are marked *