Geert's Place

Archive for January, 2015

Fake Google Bots

by on Jan.15, 2015, under Linux

In a previous post, I mentioned an enormous load on my Apache web server. So now I found the source and cause of this very annoying happening. It was actually a not-so-subtle mix of DDoS attacks, mixed with an arsenal of fake Google bots. In the logs (typically /var/log/httpd/access_log) this would look like this :


94.23.6.88 – – [15/Jan/2015:16:35:40 +0100] “POST /wp-login.php HTTP/1.1” 403 214 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
209.20.80.243 – – [15/Jan/2015:16:36:26 +0100] “POST /wp-login.php HTTP/1.1” 403 214 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
199.59.56.6 – – [15/Jan/2015:16:36:43 +0100] “POST /wp-login.php HTTP/1.1” 403 214 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
195.154.75.101 – – [15/Jan/2015:16:36:49 +0100] “POST /wp-login.php HTTP/1.1” 403 214 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

Now, they may LOOK like Google bots (which are normally quite harmless), but taking a closer look at this .. Why would they use so many different IP addresses/networks ..? And why would they be trying to log in on my blog? 🙂
When I look up those IP addresses, the hostnames are : ns204288.ovh.net, mail.jenruno.com, synconlinemedia.com and 195-154-75-101.rev.poneytelecom.eu …

They look a lot more like hacked machines, if you ask me. The Google bot IP range is actually known: it’s 66.249.*

So, there are a few ways on how to tackle these fakers. First of all there is a nice plugin for WordPress, called Wordfence. This adds some security but you basically need to know what you’re doing. A second way would be to block the IP’s on firewall level .. But that turns into a nightmare/fulltime job quite fast.
The most effective way to block them, and still allow the REAL Google bots to carry on, is by simply routing the fakers to an error page, which is a default Apache error page. It decreases the load on the web server dramatically.

In your server (or virtual server) root directory – mostly /var/www/html/ – you just have to put an .htaccess file which contains :

RewriteEngine on
RewriteOptions inherit
Options +FollowSymlinks
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} Googlebot
RewriteCond %{REMOTE_ADDR} !^66\.249\.
RewriteRule .* – [F]

For this to work, you need to have the Rewrite Module compiled in your Apache. Just have a look in your config file (/etc/httpd/conf/httpd.conf) and look for a line that reads :
LoadModule rewrite_module modules/mod_rewrite.so

If it’s commented out, simply remove the comment and reload your Apache by running the command “apachectl restart” as the root or apache user.

Have fun killing off fake bots 😉

Leave a Comment : more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...