Server logs have a few major drawbacks, one of which I’d hope to address today. It’s not an elegant solution but it (more or less) works. Firstly, please read this post for an overview on server logfile analysis for SEO and you’ll hopefully see where I’m coming from. I think access logs are probably the best source of information available for diagnosing onsite SEO issues.
A Problem
If you have a little experience with server logs, you’ve probably encountered the following:
188.65.114.122 - - [30/Sep/2013:08:07:05 -0400] "GET /resources/whitepapers/retail-whitepaper/ HTTP/1.1" 200 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
188.65.114.122 - - [30/Sep/2013:08:07:06 -0400] "GET /resources/whitepapers/retail-whitepaper/ HTTP/1.1" 301 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Huh.
Server logfiles provide us with the URI Stem (the portion of the URL after the host and any port number), rather than the full URL. For this blog post, that would be:
/server-logs-subdomains-cctlds/
rather than:
https://ohgm.co.uk/server-logs-subdomains-cctlds/
As logs give us URI references rather than URLs, essentially you are getting everything from the third trailing slash and beyond.
One of my clients has all the of their ccTLDs configured so that they share server logs. Server logs do not allow us to see which domain serviced the request. If you’re dealing with a site with a blog.domain.com setup, you won’t be able to tell if the main site or the blog serviced the request from the URI reference alone. The same goes for the http:// and http://www. versions.
Solution
I use the following method to gain some insight.
Firstly, cut down your server logs to size using whatever tools you’re comfortable with. I like grep or the filters in Gamut Log Parser.
grep "Googlebot" filename.log > filteredoutput.log read more