Tag Archives: Software development

Invasive browser sniffing

Take the a:visited pseudo-class and use it to determine if the client’s browser has visited a site. Do this en masse with a subset of Alexa’s top 1m websites and you can build up a fairly detailed profile of a visiting brower’s history.

var sites = [
  'http://google.com',
  'http://facebook.com'
];

$(function(){
  for (var i in sites) {
    $("#test").html("
      <li><a href='"+sites[i]+"'>"+sites[i]+"</a> </li>
    ");
    if ($("#test a").css('color') == "rgb(0, 204, 0)")
      $("#test li").appendTo("#visited");
    else
      $("#test li").appendTo("#not-visited");
    }
});

The ethics of this technique are questionable, especially if you retain this data and associate it with an email address or a registered user account. The only way I can see to protect against it is to disable JavaScript on all but the few sites you trust.

You could augment it to only send a sampling of 50 sites, respond with an ajax post, then determine which 50 to check next (e.g. if a user has visited a site specific to tech news, poll your next 50 on the most popular tech news sites). This concept is used at What The Internet Knows About You.

Implementing Schulze STV

Thus far, my implementation of Schulze STV has worked fairly well. The downside I’ve recently run into is in simulating real world multiple winner elections. Take Vancouver’s 2008 municipal election as an example, in which 32 candidates ran for 10 seats. The problem-space we’re looking at consists of 32 choose 10 nodes and (32 choose 10) x 11 edges. For those unfamiliar with binomial coefficients, that’s roughly 64 million nodes and 709 million edges. Exponential runtime is a bitch.

For the purposes of heuristic development, let’s look at Tideman’s A03. It’s a set of 989 ballots to elect 7 winners from 15 candidates. Schulze STV computes the winning set as B/D/E/F/H/K/N by looking at 6435 nodes and 51480 edges. How do we best prune the number of candidates down so as to look at a smaller subset, yet still produce the same winning set in most cases? Well, what’s quickly available? We can easily calculate the sum of all votes, standard deviation, and whatnot.

If we wanted to prune off the least preferred by sum (indicated by larger sums), we’d be cutting off one of a winner Schulze STV would want to elect. Alternatively, let’s look at the number of voters that mark each candidate in first place. If a voter marks two candidates in first place, split their vote between the two.

That heuristic seems great at first as it didn’t accidentally eliminate winning candidates and it reduced the number of nodes to 120, a 98% reduction in problem space size. However, the winning set would now be computed by SchulzeSTV as A/B/D/E/G/L/N, which is significantly different. Furthermore, the heuristic is going to incentivize tactical voting. Sometimes there aren’t any easy shortcuts and the best thing to do is just optimize the implementation.

Vancouver Open Data

Some months ago, there was talk in Vancouver about opening up the city’s data. As of yesterday, that data is now widely accessible in open formats at data.vancouver.ca. One such data layer is the bike routes in the city. There’s a lot more and it’ll be fun to see where it leads.

Short codes

With the advent of Twitter, shortened URLs have become all the rage: http://bit.ly/5dk33b, http://imgur.com/9G2rL, http://www.reddit.com/9dfo1. This kind of thing really isn’t all that hard to do.

$short_code = substr(preg_replace(
  array('/\//', '/\+/'),
  array('-', '_'),
  base64_encode(sha1(serialize($data, $salt), true))
), 0, 7);

Your $salt might be something as simple as a manually generated random string. If your data is time sensitive, it could be microtime(). Fancier two-way conversions are doable, in which you convert your primary key into a base-62 number, but I’ve yet to find a need for that.

Motivated hobbies

Start a project around a subject you feel passionate about. It’s surprising how quickly you can get things done when there’s internal motivation. My most recent project went from prototype to useful application in about 10 days. I have to say, there’s a bit of pride that goes along with reaching a presentable milestone.

Modern Ballots

One of the difficulties I’ve run into with my recent project is explaining what it is a back-end web service does. In response, I’ve put a small web app up at modernballots.com. This front-end lets you create elections, but has no direct knowledge of voting systems. The winner is calculated remotely through the election web service, which is publicly available for other programmers to use.

At the moment, the front-end only requests the winner as calculated by Schulze STV, but that’s just to make the interface simpler.

Election Web Service

First draft of the Election Web Service is up and running. I’ve implemented Plurality, IRV, Ranked Pairs, the Schulze Method, Plurality at Large, and STV.

The source code is up on GitHub for anyone interested in implementing additional voting systems or using them as Python libraries. For those that just want to send requests to a server, I have one responding to posts at vote.cognitivesandbox.com (note that the server only response to HTTP POSTs and works solely on JSON encoded requests).

Hopefully this takes some of the burden out of implementing polls and whatnot. You take care of storing the ballots, I’ll show you who won (and why).

Example Request

{
  "votingSystem": "stv",
  "ballots": [
    {"count": 4, "ballot": ["orange"]},
    {"count": 2, "ballot": ["pear", "orange"]},
    {"count": 8, "ballot": ["chocolate", "strawberry"]},
    {"count": 4, "ballot": ["chocolate", "sweets"]},
    {"count": 1, "ballot": ["strawberry"]},
    {"count": 1, "ballot": ["sweets"]}
  ],
  "winners": 3
}

Example Response

{
  "rounds": [
    {
      "tallies": {
        "orange": 4.0,
        "strawberry": 1.0,
        "chocolate": 12.0,
        "pear": 2.0,
        "sweets": 1.0
      },
      "quota": 6,
      "winners": ["chocolate"],
    },
    {
      "tallies": {
        "orange": 4.0,
        "strawberry": 5.0,
        "pear": 2.0,
        "sweets": 3.0
      },
      "quota": 5,
      "winners": ["strawberry"],
    },
    {
      "tallies": {
        "orange": 4.0,
        "sweets": 3.0,
        "pear": 2.0
      },
      "quota": 5,
      "loser": "pear"
    },
    {
      "tallies": {
        "orange": 6.0,
        "sweets": 3.0
      },
      "quota": 5,
      "winners": ["orange"],
    }
  ],
  "winners": ["orange", "strawberry", "chocolate"]
}

Election API prototype

The project sputtered a few times before it really got going, but it seems well on its way. My first elections calculator was built to demonstrate the flaws of voting systems. This new one, however, is intended for the development community.

Most voting systems are plain Plurality (aka first past the post) and is prone to a great many flaws. Instead of implementing better voting systems libraries in each language, why not provide a general web service API?

The tool speaks JSON and provides and is completely stateless. You send your full request (ballots, desired voting system, etc) and it returns the winner (or winners) and an explanation as to how it reached that conclusion. All data is stored locally in cookies; The only server communication is for the result calculation.

I’m only about half-way done: four more algorithms to go (Schulze, STV, CPO-STV and Schulze STV) and a fair bit of polishing on the UI. Late December? Early January? Something like that.

Watch Steam for price drops

This could probably be done a little better, but the core idea is here: Watch a set of specified games on Steam and, if the price drops, send out a notification email. Set this up on a daily cron job and you’ll be notified when the game you’re looking for drops in price. Might be months, but it’ll likely happen.


$games = array(
    "Batman: Arkham Asylum" => array(
        "id" => 35010,
        "known" => "$49.99 USD"
    )
);

foreach ($games as $name => $values) {
    $doc = new DOMDocument();
    @$doc->loadHTMLFile(
        "http://store.steampowered.com/app/{$values['id']}/"
    );
    $current = $doc->getElementById("game_area_purchase") \
        ->getElementsByTagName("div")->item(1)->nodeValue;
    if ($current != $attributes['known']) {
        mail(
            'youraddress@example.com',
            'Price drop on steam',
            "$name dropped to $current"
        );
    }
}

Maybe better would be to collect the price from each day over a set of games and graph this data over time. It’s not necessary, but I’d be curious to see that extension of this idea.

Open Data in Vancouver

Some months ago, a co-worker pointed me towards VanMap, an online tool that provides fine detail on the City of Vancouver: property lines, zoning information, sewer mains, etc. The down-side to this site is that it only functions in Windows and that the underlying data isn’t directly available.

Fortunately, there’s talk of opening the city’s data in a permissive licensing model like the Creative Commons. Andrea Reimer’s talk this evening about the progress towards that goal was informative, considering I wasn’t aware the problem was being worked on.

Brought up during this talk was the September 16th Hackathon. I’m not 100% clear yet on what the focus will be, but it’s there for the curious. One of their previous projects is VanTrash, which is a neat tool that can remind you when your trash pickup is. Pretty sweet, eh?

vantrash

Update

Looks like Vancouver’s Open Data Catalogue is up in beta form. Go take a look for it is super.