Blocking WeSee Bot

The Problem

Recently I’ve been receiving lots of 404 errors on a site that I work on. All of these errors were caused by one bot; WeSee.

As an example, a valid URL on this site would look like this:

www.example.com/item/123/item-name/

WeSee would try and also access both of the following:

www.example.com/item/123/
www.example.com/item/

What Is WeSee

From the WeSee website, it looks like they are merely crawling for images so they can sell data to their customers.

“Our software is used so that visual content can be turned into machine-readable data so that the content can for the first time play a significant role in Digital Advertising, Content Verification, Ecommerce and Visual Search. Our software holds the key in turning visual content in to lucrative advertising friendly targetable real estateā€¦”

Blocking WeSee Bot

This isn’t as easy as it should be. Most well behaved bots and crawlers allow you to block them with a robots.txt file. WeSee ignores robots.txt (trust me, I tried).

What I had to do was block the IP addresses on the server manually. While this isn’t the most ideal solution as WeSee could use new IP’s at any time. Below is a list of all IP’s I’ve seen WeSee from:

199.115.116.97
199.115.116.97
199.115.116.88
199.115.115.144
178.162.199.101
178.162.199.98
178.162.199.86
178.162.199.77
178.162.199.69
178.162.199.35
95.211.156.228
95.211.159.93
95.211.159.68
95.211.159.66

If I see anymore, I’ll be sure to add them to the list.

The beauty of gzip compression

I have just discovered the magic of gzip compression on web pages. I knew it would be good, but I was blown away by the saving.

Without gzip compression, the homepage of the Oz Broadband Speed Test was 32.37 KB (33142 bytes), with gzip turned on the same page was 7.61 KB (7789 bytes).

That’s a massive 75% saving in data. While it might not seem like much, on a high trafficked site it really starts to add up.

Turning gzip on is also easy, with the use of an .htaccess file on an Apache web server, and is done with a single line of code.

php_value output_handler ob_gzhandler

I’m not sure how much extra load this will add to the server, but I am hoping it is minimal. I’ll keep an eye on this over the next few days.

The best thing with this is that if your browser doesn’t support any compression methods (highly unlikely in today’s browsers) then it will simply send the page back without compression.

It’s really win win.

I should also point out that this only compresses PHP files, and not CSS, JavaScript or images.

Google Charts API Extended Encode Scaling

I’ve recently been playing with the Google Charts API for a project, and ran into a problem with graph scaling when using extended encoding.

The problem was that there was no scaling on the graph when using extended encoding, so values of the graph were plotted with the y-axis having a maximum value of 4095.

This meant that smaller values look insignificant on the graph, not to mention all the wasted space. The solution I came up with is to scale the data so it fills the chart appropriately.

The PHP code I came up with is below, and also extends on a function written by Ben Dodson.

< ?php
/*
 * Returns a scaled value.
 *
 * @param    value     Int to scale
 * @param    max       Maximum int in array to calculate scale value
 * @param    scale     The int to scale value to
 * @return             Scaled value
 * @author             Alex McKenzie 
 */
function scale_value($value, $max, $scale = 4095) {
	return ($value/$max) * $scale;
}
 
/**
 * Retunrs an extended encoded string for use with Google Charts API.
 *
 * Modified function - original by Ben Dodson (http://bendodson.com/blog/2008/02/28/google-extended-encoding-made-easy/).
 *
 * @param    array     Array of values to encode
 * @param    scale     Whether to scale the values
 * @return             Extended encoded string
 * @author             Alex McKenzie [alex [at] alexmckenzie [dot] info]
 */
function array_to_extended_encoding($array, $scale = 'yes') {
    $characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-.';
 
    // Scale values before encoding if required.
    if ($scale == 'yes') {
        $max = max($array);
        $scaled_array = array();
        foreach($array as $value) {
            array_push($scaled_array, scale_value($value, $max));
        }
        $array = $scaled_array;
    }
 
    // Encode values in array.
    $encoding = '';
    foreach ($array as $value) {
        $first = floor($value / 64);
        $second = $value % 64;
        $encoding .= $characters[$first] . $characters[$second];
    }
    return $encoding;
}?>

Now using some sample data, we can see the difference scaling makes in the following two examples:

No scaling Scaling

The graphs above were generated with the following code:

< ?php
$graph = array(200,300,200,250,350,150,100);
?>
 
<img src="http://chart.apis.google.com/chart?cht=bvs&chs=200x150&chd=e:<?=array_to_extended_encoding($graph, $scale = 'no')?/>" alt="No scalling" />
<img src="http://chart.apis.google.com/chart?cht=bvs&chs=200x150&chd=e:<?=array_to_extended_encoding($graph)?/>" alt="Scalling" />