Tag Archives: Software development

PHP’s array_walk_recursive

The description says “If funcname needs to be working with the actual values of the array, specify the first parameter of funcname as a reference.” This isn’t necessarily helpful as the function you’re calling might be built in (e.g. trim or strip_tags). One option would be to create a version of these like so.


function trim_by_reference(&$string) {
    $string = trim($string);
}

The downside to this approach is that you need to create a wrapper function for each function you might want to call. Instead, we can use PHP 5.3′s inline function syntax to create a new version of array_walk_recursive.


/**
 * This function acts exactly like array_walk_recursive, except
 * that it pretends that the function its calling replaces the
 * value with its result.
 *
 * @param $array The first value of the array will be passed
 *               into $function as the primary argument
 * @param $function The function to be called on each element
 *               in the array, recursively
 * @param $parameters An optional array of the additional
 *                parameters to be appeneded to the function
 *
 * Example to alter $array to get a short slice of each value
 *    array_walk_recursive_by_reference(
 *        $array, "substr", array("1","3")
 *    );
 */
function array_walk_recursive_by_reference(
        &$array, $function, $parameters = array()) {
    $reference_function = function(&$value, $key, $data) {
        $parameters = array_merge(array($value), $data[1]);
        $value = call_user_func_array($data[0], $parameters);
    };
    array_walk_recursive(
        $array,
        $reference_function,
        array($function, $parameters)
    );
}

The advantage here is that we only explicitly define one wrapper function instead of potentially dozens.

Finding columns in a database

If you need to determine which columns in a database contain a certain type of string, the pseudocode is fairly simple: list all tables in a database, list their columns that are text-oriented, filter those that match some given format. For large tables, this script won’t bother looking at too much.


############################################
# Imports
import MySQLdb
import sys
############################################
# Initialize the database connection
database = MySQLdb.connect(
  host = "localhost",
  user = "username",
  passwd = "password",
  db = "database")
cursor = database.cursor();
############################################
# Find all matching columns
cursor.execute("SHOW TABLES")
tables = [table[0] for table in cursor.fetchall()]
for table in tables:
  cursor.execute("DESCRIBE %s" % table)
  columns = [column for column in cursor.fetchall()]
  for column in columns:
    if column[1].count("text") or column[1].count("char"):
      cursor.execute("SELECT `%s` FROM \
        (SELECT `%s` FROM `%s` LIMIT 20000) AS subtable \
        WHERE `%s` LIKE %s LIMIT 1" \
        % (column[0], column[0], table, column[0], '"%foo%"'))
      results = cursor.fetchall()
      if results != ():
        print "%s.%s -- %s" % (table, column[0], results)
############################################
# Close the database connection
database.close()

Invasive browser sniffing

Take the a:visited pseudo-class and use it to determine if the client’s browser has visited a site. Do this en masse with a subset of Alexa’s top 1m websites and you can build up a fairly detailed profile of a visiting brower’s history.

var sites = [
  'http://google.com',
  'http://facebook.com'
];
$(function(){
  for (var i in sites) {
    $("#test").html("
      <li><a href='"+sites[i]+"'>"+sites[i]+"</a> </li>
    ");
    if ($("#test a").css('color') == "rgb(0, 204, 0)")
      $("#test li").appendTo("#visited");
    else
      $("#test li").appendTo("#not-visited");
    }
});

The ethics of this technique are questionable, especially if you retain this data and associate it with an email address or a registered user account. The only way I can see to protect against it is to disable JavaScript on all but the few sites you trust.

You could augment it to only send a sampling of 50 sites, respond with an ajax post, then determine which 50 to check next (e.g. if a user has visited a site specific to tech news, poll your next 50 on the most popular tech news sites). This concept is used at What The Internet Knows About You.

Implementing Schulze STV

Thus far, my implementation of Schulze STV has worked fairly well. The downside I’ve recently run into is in simulating real world multiple winner elections. Take Vancouver’s 2008 municipal election as an example, in which 32 candidates ran for 10 seats. The problem-space we’re looking at consists of 32 choose 10 nodes and (32 choose 10) x 11 edges. For those unfamiliar with binomial coefficients, that’s roughly 64 million nodes and 709 million edges. Exponential runtime is a bitch.

For the purposes of heuristic development, let’s look at Tideman’s A03. It’s a set of 989 ballots to elect 7 winners from 15 candidates. Schulze STV computes the winning set as B/D/E/F/H/K/N by looking at 6435 nodes and 51480 edges. How do we best prune the number of candidates down so as to look at a smaller subset, yet still produce the same winning set in most cases? Well, what’s quickly available? We can easily calculate the sum of all votes, standard deviation, and whatnot.

If we wanted to prune off the least preferred by sum (indicated by larger sums), we’d be cutting off one of a winner Schulze STV would want to elect. Alternatively, let’s look at the number of voters that mark each candidate in first place. If a voter marks two candidates in first place, split their vote between the two.

That heuristic seems great at first as it didn’t accidentally eliminate winning candidates and it reduced the number of nodes to 120, a 98% reduction in problem space size. However, the winning set would now be computed by SchulzeSTV as A/B/D/E/G/L/N, which is significantly different. Furthermore, the heuristic is going to incentivize tactical voting. Sometimes there aren’t any easy shortcuts and the best thing to do is just optimize the implementation.

Vancouver Open Data

Some months ago, there was talk in Vancouver about opening up the city’s data. As of yesterday, that data is now widely accessible in open formats at data.vancouver.ca. One such data layer is the bike routes in the city. There’s a lot more and it’ll be fun to see where it leads.

Short codes

With the advent of Twitter, shortened URLs have become all the rage: http://bit.ly/5dk33b, http://imgur.com/9G2rL, http://www.reddit.com/9dfo1. This kind of thing really isn’t all that hard to do.

$short_code = substr(preg_replace(
  array('/\//', '/\+/'),
  array('-', '_'),
  base64_encode(sha1(serialize($data, $salt), true))
), 0, 7);

Your $salt might be something as simple as a manually generated random string. If your data is time sensitive, it could be microtime(). Fancier two-way conversions are doable, in which you convert your primary key into a base-62 number, but I’ve yet to find a need for that.

Motivated hobbies

Start a project around a subject you feel passionate about. It’s surprising how quickly you can get things done when there’s internal motivation. My most recent project went from prototype to useful application in about 10 days. I have to say, there’s a bit of pride that goes along with reaching a presentable milestone.

Modern Ballots

One of the difficulties I’ve run into with my recent project is explaining what it is a back-end web service does. In response, I’ve put a small web app up at modernballots.com. This front-end lets you create elections, but has no direct knowledge of voting systems. The winner is calculated remotely through the election web service, which is publicly available for other programmers to use.

At the moment, the front-end only requests the winner as calculated by Schulze STV, but that’s just to make the interface simpler.

Election Web Service

First draft of the Election Web Service is up and running. I’ve implemented Plurality, IRV, Ranked Pairs, the Schulze Method, Plurality at Large, and STV.

The source code is up on GitHub for anyone interested in implementing additional voting systems or using them as Python libraries. For those that just want to send requests to a server, I have one responding to posts at vote.cognitivesandbox.com (note that the server only response to HTTP POSTs and works solely on JSON encoded requests).

Hopefully this takes some of the burden out of implementing polls and whatnot. You take care of storing the ballots, I’ll show you who won (and why).

Example Request

{
  "votingSystem": "stv",
  "ballots": [
    {"count": 4, "ballot": ["orange"]},
    {"count": 2, "ballot": ["pear", "orange"]},
    {"count": 8, "ballot": ["chocolate", "strawberry"]},
    {"count": 4, "ballot": ["chocolate", "sweets"]},
    {"count": 1, "ballot": ["strawberry"]},
    {"count": 1, "ballot": ["sweets"]}
  ],
  "winners": 3
}

Example Response

{
  "rounds": [
    {
      "tallies": {
        "orange": 4.0,
        "strawberry": 1.0,
        "chocolate": 12.0,
        "pear": 2.0,
        "sweets": 1.0
      },
      "quota": 6,
      "winners": ["chocolate"],
    },
    {
      "tallies": {
        "orange": 4.0,
        "strawberry": 5.0,
        "pear": 2.0,
        "sweets": 3.0
      },
      "quota": 5,
      "winners": ["strawberry"],
    },
    {
      "tallies": {
        "orange": 4.0,
        "sweets": 3.0,
        "pear": 2.0
      },
      "quota": 5,
      "loser": "pear"
    },
    {
      "tallies": {
        "orange": 6.0,
        "sweets": 3.0
      },
      "quota": 5,
      "winners": ["orange"],
    }
  ],
  "winners": ["orange", "strawberry", "chocolate"]
}

Election API prototype

The project sputtered a few times before it really got going, but it seems well on its way. My first elections calculator was built to demonstrate the flaws of voting systems. This new one, however, is intended for the development community.

Most voting system implementations are just plain Plurality (aka first past the post), which is prone to a great many flaws. Instead of implementing better voting systems libraries in each language, why not provide a general web service API?

The tool speaks JSON and provides and is completely stateless. You send your full request (ballots, desired voting system, etc) and it returns the winner (or winners) and an explanation as to how it reached that conclusion. All data is stored locally in cookies; The only server communication is for the result calculation.

I’m only about half-way done: four more algorithms to go (Schulze, STV, CPO-STV and Schulze STV) and a fair bit of polishing on the UI. Late December? Early January? Something like that.

Update: It’s up and available.