Automate Linkedin Stalking Your Own Employees So You Can Have Awkward Conversations About Their Profile Updates

Amazing insight: If your employees are updating their Linkedin profiles, it's one of the better indicators that they've started looking for other work. Now, your employees might be savvy enough to change this setting in their profile first:If this is the case, you won't just see their passive-aggressive profile updating in your news feed whilst you're checking out other roles. You could take some time to stalk their profiles daily in incognito mode, but this could be time consuming. You might be in charge of a lot of employees! You could automate this.Linkedin obviously aren't too fond of being crawled or scraped. Check out their robots.txt, or their scraping policy. If you were going to go ahead with this, here's how it might work:Cron job to download each profile page at regular intervals [...]

Faster Google Penalty Removal

Your website is only as good as Google's picture of it. So, if you're working on a website under penalty and are actively trying to get it out of penalty (or just trying to preempt future updates), you should do everything you can to make sure Google is up to date with the link profile so that their image reflects reality.I've used the following method for just over two years, though I haven't seen it getting any serious coverage (though I'm sure it's quite widely used). In short - get Googlebot to crawl the links that you've removed or disavowed. Penalties and Disavow The problem is that terrible links aren't crawled as often as you'd think. Google have been hinting at this in most Webmaster communications since the disavow tool was introduced: "For a disavowed link to be 'counted' [...]

Block Googlebot Crawl by Folder Depth

Some sites have deep, deep, duplicative architecture. Usually this is the result of a faceted navigation. This is especially true for enterprise platforms. And like any healthy relationship, you can't go in expecting them to change. Sometimes you'll need to admit defeat and use an appallingly ugly but kind of elegant band-aid.In short - picking the appropriate robots.txt disallow rule from the following can work: / /*/ /*/*/ /*/*/*/ /*/*/*/*/ /*/*/*/*/*/ /*/*/*/*/*/*/ /*/*/*/*/*/*/*/ /*/*/*/*/*/*/*/*/ /*/*/*/*/*/*/*/*/*/ /*/*/*/*/*/*/*/*/*/*/ /*/*/*/*/*/*/*/*/*/*/*/ /*/*/*/*/*/*/*/*/*/*/*/*/ /*/*/*/*/*/*/*/*/*/*/*/*/*/ /*/*/*/*/*/*/*/*/*/*/*/*/*/*/ /*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/ /*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/ /*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/*/ etc... This blocks [...]

Crawl > Indexation

One of the most pervasive SEO beliefs I encounter is roughly "if your links aren't getting indexed, they might as well not exist". This causes people to try and get their links indexed, either by using a link indexing service or by blasting links to their links. Although this behaviour can be useful, I think the idea that motives this is wrong for a few reasons. If you believe your links count more for being indexed, please consider the following:Look in your webmaster tools (sorry, "Search Console"). Download all the 'links to your site'. Check the indexation status of all of these links. You now have a list of links that Google knows about, but aren't indexed. Have an unsuccessful reconsideration request. Look at the example links. Check to see if they're indexed. Think about deindexed [...]

Blocking and Verifying Applebot

Earlier today Apple confirmed the existence of their web crawler Applebot. This means that we'll be seeing it crop up a little more in server log analysis. Filtering Server Logs to Applebot As anyone can spoof their useragent to Applebot while crawling the web, we can use the IP range Apple have given us to validate these rogue visits. Currently legitimate Applebot visits will come from any IP between and The actual range is probably substantially smaller than this. We can pull the files we need from out server logs using the following {linux|mac|cygwin} commands in our bash terminal:First, filter to everyone claiming to be Applebot: grep 'Applebot' access.log > apple.log Then, filter to the 17.[0-255].[0-255].[0-255] IP range: grep -E '17\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' [...]