The Problem
Recently I’ve been receiving lots of 404 errors on a site that I work on. All of these errors were caused by one bot; WeSee.
As an example, a valid URL on this site would look like this:
www.example.com/item/123/item-name/
WeSee would try and also access both of the following:
www.example.com/item/123/
www.example.com/item/
What Is WeSee
From the WeSee website, it looks like they are merely crawling for images so they can sell data to their customers.
“Our software is used so that visual content can be turned into machine-readable data so that the content can for the first time play a significant role in Digital Advertising, Content Verification, Ecommerce and Visual Search. Our software holds the key in turning visual content in to lucrative advertising friendly targetable real estate…”
Blocking WeSee Bot
This isn’t as easy as it should be. Most well behaved bots and crawlers allow you to block them with a robots.txt file. WeSee ignores robots.txt (trust me, I tried).
What I had to do was block the IP addresses on the server manually. While this isn’t the most ideal solution as WeSee could use new IP’s at any time. Below is a list of all IP’s I’ve seen WeSee from:
199.115.116.97
199.115.116.97
199.115.116.88
199.115.115.144
178.162.199.101
178.162.199.98
178.162.199.86
178.162.199.77
178.162.199.69
178.162.199.35
95.211.156.228
95.211.159.93
95.211.159.68
95.211.159.66
If I see anymore, I’ll be sure to add them to the list.