X-Google-Crawl-Date

I spoke to a few SEOs I like and respect, and most of them had not seen this before, so I’m sharing it here in the hopes that:

Someone else can come up with something useful.
Someone else will do the work here.
Google will remove this from the interface and end my suffering.

What I’m talking about is the X-Google-Crawl-Date header (and siblings), which you can find in Search Console by inspecting a URL, clicking View Crawled Page, and clicking on the HTTP Response below MORE INFO:

This is an internal HTTP header (meta header?) that is appended to Google’s records about a particular request. It should probably not be surfaced in Search Console, but is.

Since it’s internal, if you test a live URL, you won’t be shown this header, because that particular fetch didn’t have anything to do with indexing:

X-Google-Crawl-Date 3 — X-Google-Crawl-Date header is absent.

I suspect most of us already believe that the results of “Live Tests” do not make it into Google’s index. I believe this header lets us confirm this (more or less).

So what is this header doing?

The date and time that appears here matches the time displayed to you in the interface in Search Console:

X-Google-Crawl-Date 4 — Googlebot doesn’t have time for Daylight Savings.

It frequently aligns with the last cache date, but only accidentally. More interestingly, it doesn’t always neatly align to last crawled date in the server access logs.

I think the explanation is relatively straightforward – just because something is “Googlebot” and comes from a genuine Google IP, it doesn’t mean it’s the same Googlebot that gets used for indexing. For example: Google Shopping can cause you to think products are being frequently crawled by Googlebot, though the index is sluggish to update. You wouldn’t want this junk data populating Search Console.

Not every crawl results in an update in the index, which we can determine from the cache date.

I think we can possibly use this header from within Search Console to determine genuine crawls from Googlebot, and not just requests from Mountain View using the Googlebot UA.

Other Headers

This is not the only ‘internal’ header being exposed in Search Console – we also have:

X-Google-Not-Modified: which returns the last crawl date
X-Google-Reused: Which shows the “new” crawl date

I interpret this combination as:

A URL comes up in scheduling and is requested. This event is then tagged with:
A declaration that the previous crawl has been reused (X-Google-Reused).
The previous crawl chosen for reuse (X-Google-Not-Modified: {date}).
The reason the crawl was reused (“It wasn’t modified“).

X-Google-Crawl-Date 6 — The Not-Modified time reported matches the interface time given.

Returning 304s to Googlebot will cause this behaviour, but I do not believe that is the only way to get these responses. If I’m right, then maybe some interesting conclusions around ‘render budget’ might be reached (this won’t be that hard to do!)

On this interpretation there may be other headers available (since if not, the not-modified header appears to be redundant). It’s just that I haven’t encountered any others yet.

Gabor Papp suggested using the ‘inspect a redirect to another domain via Search Console‘ trick to try and find other headers. The redirect ‘exploit’ doesn’t surface this, since it only works if you do a live inspect of the URL. It does not show the version currently in the index:

Conclusion:

Please don’t take my word for any of this, there is plenty more testing available here. I just want it out of my head.

Free Idea:

You can return Google’s internal headers to Googlebot. A few of us have tried it with no success but perhaps you are special. I remember some people talking about doing this with 304s as a really poor method of cloaking. This is about as bad an idea.

This is nothing new (I am very smart)

Before I dug into this I wanted to see if anyone had picked this up before I unloaded this onto the basement audience:

X-Google-Crawl-Date 8 — People had picked this up in 2005, before I was born.

More recently – Valentin Pletzer spotted this back in January.

this is interesting as well: There is an additional HTTP-header (X-Google-Crawl-Date) shown in the "Google Index"-HTTP response of the crawled page. This additional header is neither part of "Live Test" nor is it the time of the cached version shown in the serp. @JohnMu ? pic.twitter.com/9zL4GDp8c4
— Valentin Pletzer (@VorticonCmdr) January 16, 2019

Hey @VincentCourson ! Dans la nouvelle GSC, inspection d'une URL / Plus d'infos / Réponse Http, l'outil semble rajouter des lignes comme X-Google-Crawl-Date, X-Google-Not-Modified, …. As tu des infos là dessus ? pic.twitter.com/7Jrr2CSnx6
— Julien Crenn (@julien_crenn) February 28, 2019

There we are. Tweets as journalism. What a time to be alive. Proof these two have a free pass. Update:

https://twitter.com/JohnMu/status/1178977727979896832

If you get anywhere with this, please do let me know.

Sorry.

3 thoughts on “X-Google-Crawl-Date”

Earl Grey says:
01/10/2019 at 1:30 pm
good post. not really. dropping a link
Stephen Gagnon says:
20/01/2021 at 1:30 pm
It’s great to dig into this stuff! Lots of information here..
Chase Keating says:
01/02/2021 at 2:05 am
Interesting. Thanks for the extremely detailed article. We’ve been trying to find out what all of the different headers meant. Definitely confirms the use of various Googlebots with only some indexing obviously.