X-Google-Crawl-Date

I spoke to a few SEOs I like and respect, and most of them had not seen this before, so I’m sharing it here in the hopes that:

  1. Someone else can come up with something useful.
  2. Someone else will do the work here.
  3. Google will remove this from the interface and end my suffering.

What I’m talking about is the X-Google-Crawl-Date header (and siblings), which you can find in Search Console by inspecting a URL, clicking View Crawled Page, and clicking on the HTTP Response below MORE INFO:

This is an internal HTTP header (meta header?) that is appended to Google’s records about a particular request. It should probably not be surfaced in Search Console, but is.

Since it’s internal, if you test a live URL, you won’t be shown this header, because that particular fetch didn’t have anything to do with indexing:

X-Google-Crawl-Date header is absent.

I suspect most of us already believe that the results of “Live Tests” do not make it into Google’s index. I believe this header lets us confirm this (more or less).

So what is this header doing?

The date and time that appears here matches the time displayed to you in the interface in Search Console:

Googlebot doesn’t have time for Daylight Savings.

It frequently aligns with the last cache date, but only accidentally. More interestingly, it doesn’t always neatly align to last crawled date in the server access logs.

I think the explanation is relatively straightforward – just because something is “Googlebot” and comes from a genuine Google IP, it doesn’t mean it’s the same Googlebot that gets used for indexing. For example: Google Shopping can cause you to think products are being frequently crawled by Googlebot, though the index is sluggish to update. You wouldn’t want this junk data populating Search Console.

I think we can possibly use this header from within Search Console to determine genuine crawls from Googlebot, and not just requests from Mountain View using the Googlebot UA.

Other Headers

This is not the only ‘internal’ header being exposed in Search Console – we also have:

  • X-Google-Not-Modified: which returns the last crawl date
  • X-Google-Reused: Which shows the “new” crawl date

I interpret this combination as:

  • A URL comes up in scheduling and is requested. This event is then tagged with:
  • A declaration that the previous crawl has been reused (X-Google-Reused).
  • The previous crawl chosen for reuse (X-Google-Not-Modified: {date}).
  • The reason the crawl was reused (“It wasn’t modified“).
The Not-Modified time reported matches the interface time given.

Returning 304s to Googlebot will cause this behaviour, but I do not believe that is the only way to get these responses. If I’m right, then maybe some interesting conclusions around ‘render budget’ might be reached (this won’t be that hard to do!)

On this interpretation there may be other headers available (since if not, the not-modified header appears to be redundant). It’s just that I haven’t encountered any others yet.

Gabor Papp suggested using the ‘inspect a redirect to another domain via Search Console‘ trick to try and find other headers. The redirect ‘exploit’ doesn’t surface this, since it only works if you do a live inspect of the URL. It does not show the version currently in the index:

No headers

Conclusion:

Please don’t take my word for any of this, there is plenty more testing available here. I just want it out of my head.

Free Idea:

You can return Google’s internal headers to Googlebot. A few of us have tried it with no success but perhaps you are special. I remember some people talking about doing this with 304s as a really poor method of cloaking. This is about as bad an idea.

This is nothing new (I am very smart)

Before I dug into this I wanted to see if anyone had picked this up before I unloaded this onto the basement audience:

People had picked this up in 2005, before I was born.

More recently – Valentin Pletzer spotted this back in January.

There we are. Tweets as journalism. What a time to be alive. Proof these two have a free pass. Update:

https://twitter.com/JohnMu/status/1178977727979896832

If you get anywhere with this, please do let me know.

Sorry.

One thought on “X-Google-Crawl-Date”

Leave a Reply

Your email address will not be published. Required fields are marked *