Crawl > Indexation

One of the most pervasive SEO beliefs I encounter is roughly "if your links aren't getting indexed, they might as well not exist". This causes people to try and get their links indexed, either by using a link indexing service or by blasting links to their links. Although this behaviour can be useful, I think the idea that motives this is wrong for a few reasons. If you believe your links count more for being indexed, please consider the following:Look in your webmaster tools (sorry, "Search Console"). Download all the 'links to your site'. Check the indexation status of all of these links. You now have a list of links that Google knows about, but aren't indexed. Have an unsuccessful reconsideration request. Look at the example links. Check to see if they're indexed. Think about deindexed [...]

Blocking and Verifying Applebot

Earlier today Apple confirmed the existence of their web crawler Applebot. This means that we'll be seeing it crop up a little more in server log analysis. Filtering Server Logs to Applebot As anyone can spoof their useragent to Applebot while crawling the web, we can use the IP range Apple have given us to validate these rogue visits. Currently legitimate Applebot visits will come from any IP between 17.0.0.0 and 17.255.255.255. The actual range is probably substantially smaller than this. We can pull the files we need from out server logs using the following {linux|mac|cygwin} commands in our bash terminal:First, filter to everyone claiming to be Applebot: grep 'Applebot' access.log > apple.log Then, filter to the 17.[0-255].[0-255].[0-255] IP range: grep -E '17\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' [...]

Server Logs, Subdomains, ccTLDs

Server logs have a few major drawbacks, one of which I'd hope to address today. It's not an elegant solution but it (more or less) works. Firstly, please read this post for an overview on server logfile analysis for SEO and you'll hopefully see where I'm coming from. I think access logs are probably the best source of information available for diagnosing onsite SEO issues. A Problem If you have a little experience with server logs, you've probably encountered the following: 188.65.114.122 - - [30/Sep/2013:08:07:05 -0400] "GET /resources/whitepapers/retail-whitepaper/ HTTP/1.1" 200 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"188.65.114.122 - - [30/Sep/2013:08:07:06 -0400] "GET /resources/whitepapers/retail-whitepaper/ HTTP/1.1" 301 "-" "Mozilla/5.0 (compatible; [...]

Does WMT ‘Crawl Representative URL’ Transfer Link Equity?

This is a case I encountered recently, and I struggled with for a while. To keep things quick - I was working with a fairly baroque faceted navigation which had attracted a substantial amount of external links to category URLs containing tracking parameters.The canonical tag was holding everything together link equity wise, but the crawl inefficiency was staggeringly bad. While we could use robots.txt directives, this would likely kill the site's organic performance - no crawl, no canonical, no value passing external links.One final resort option (and I mean final) is Google's Webmaster Tools parameter configuration. You make Googlbot explicitly aware of what each parameter does (or does not), so that they may crawl your domain more efficiently. As these were tracking parameters, the [...]