Elections where the outcome results in no-shows

  • Consider a scheduled movie night where 30 people vote on 10 different films to decide what they’ll watch.
  • Presume for a moment that all movies can be placed on an axis with comedy at one end and tragedy at the other.
  • Assume that each voter’s preference falls somewhere on that axis.
  • Lastly, assume that the collective preference is bimodal, with some clustering around an 80/20 comedy/tragedy mix, the others a 20/80 mix.

The election occurs, the results come in and a compromise 50/50 film is declared the winner. Well, by some peculiarity of preferences, this just doesn’t sit well with the comedy-lovers, so come movie night, most of them don’t show up.

Should the votes of those that didn’t even show up count? If we reuse the ballots of those that show up, we’re likely to reach a far more tragedy-heavy winner. Further, if this movie night is held monthly, do we favourably weigh the votes of those that didn’t show up?

Party and geographic proportional representation

I’ve been trying to find a way to combine both geographic and proportional representation. First, consider allowing MPs to vote with varying strengths (e.g. if an MP represents 45k people, they vote in parliament with a strength of 45k). The only trouble with this approach is that it isn’t proportional to the popular vote.

Instead, consider something akin to a federal MMP. If your chosen candidate doesn’t win, your vote is transferred to the party leader of your choice: a two-part ballot. In our current system, you aren’t represented unless you voted for the winning candidate of your riding.

To estimate the outcome of this new system, let’s assume that vast majority voting for a candidate support their party too. Each geographic seat would be the same as it is now. For example, Vancouver Centre’s Hedy Fry still represents the riding, but only at a power of 18k. The remaining 41k ballots are transferred to the appropriate parties.

Nationally, 2.17m ballots would be transferred to the Liberals from voters in ridings where they didn’t win, 2.13m to the NDP, 1.44m to the Conservatives, 818k to the Bloc, and 539k to the Green Party. The end result has the geographic representatives holding 7.42m geographically proportional votes and the party representatives holding 7.10m, resulting in a party proportional parliament.

Gerrymandering in this system would be moot, although I gather voting fraud might rise. There’s also the question of who would hold the party seats. The party leaders seems like the most direct answer, but I’m not sure how I feel about any one representative wielding upwards to 15% of the total voting power in parliament.

Estimated Condorcet winners

With the Canadian 2011 election at a close, I thought I’d take a minute to estimate the Condorcet winners per riding. The data’s available in preliminary form through Elections Canada. Let’s look at Scarborough Centre as an example for how FPTP voting fails to elect a representative candidate

Riding results for Scarborough Centre
Liberal12075 (32.0%)
NDP11273 (29.9%)
Conservative13401 (35.5%)
Green994 (2.6%)

Using the CBC vote compass, we can approximate how these voters might have ranked each candidate if given the option for a more expressive ballot.

Estimated preference ballots for Scarborough Centre
3945NDP, Green, Liberal
3945NDP, Liberal / Green
3381NDP, Liberal, Green
994Green, NDP, Liberal
10867Liberal, NDP / Green
1207Liberal, Conservative
13401Conservative, Liberal

Given these preferences (which I admit are a wild estimate; people vote tactically concealing their true preferences), the Condorcet winner would be the Liberal Party’s candidate. Indeed, the national breakdown would have produced results like so (full estimate attached):

Seats won per party under the Schulze Method
Liberal111 (up 8 seats from 2008)
Conservative110 (down 33 seats)
NDP51 (up 22 seats)
Bloc35 (down 14 seats)
Green1 (up 1 seat)

For the curious, the code used to generate the above is attached.

The problem of group allocation

An example scenario

Let’s say you run a summer camp. At the start of the summer, 100 campers arrive and learn that you offer 5 unique courses. These courses can’t have more than 25 campers each. The campers all submit ballots expressing their preferences for the courses. How do we maximize course allocation satisfaction?

The general case

  • V voters that each submit a ballot of preferences
  • G groups, each with a maximum capacity (e.g. G1cap = 25)
  • Maximize satisfaction (open to interpretation)

Algorithm sketches

A horrible algorithm

for group-x in groups:
    while group-x has remaining capacity:
        allocate any unallocated voter to group-x

Clearly this isn’t going to cut it. Preferences aren’t being taken into account at all. We’re just dumping voters into groups.

Linear optimization

convert each ballot into an allocation of points where
    all group preferences are in [0, 1]
    sum of group preferences = 1
system satisfaction = sum of voter satisfaction for all voters
linear optimization to maximize system satisfaction

As an illustration of the principle, consider an example with voter V1 that prefers G1 and voter V2 that prefers G2. Assigning the voters to their preferred groups optimizes the system satisfaction at 2, whereas switching them around produces a system satisfaction of 0. Okay, this is a step forward.

Possible problem case:

5 groups, each with a capacity of 1
voter 1:  0.4  0.2  0.2  0.2  0.0
voter 2:  0.7  0.2  0.0  0.0  0.1
voter 3:  0.0  0.7  0.2  0.0  0.1
voter 4:  0.0  0.0  0.7  0.2  0.1
voter 5:  0.2  0.0  0.0  0.7  0.1
system optimizes with voter 1 in group 5
voter 1 allocated to their least preferred group

This algorithm provides a measure of preference debt that we can use in the iterative versions of this problem. Also, I’m not sure how much of an issue tactical misrepresentation is here.

Additional problems

Consider the additional problem where voters vote once per day for new sets of courses. This can vary on fixed or variable set of voters, on a fixed or variable set of groups. Consider further the constraint where groups have both a minimum and a maximum capacity. Or perhaps where C has preferences for G and G has preferences for C.

I don’t quite have the solutions to any of this just yet, but it’s an interesting problem space. Someone has to have studied this already. Pointers in the right direction?

Three types of voting problems

Consider an electorate divided solely on some linear issue X. Assume the breakdown is as follows:

Wants more XWants same XWants less X
45%20%35%

Assume also that each position has only two representatives. Pretend you’re a voting system and imagine the ballots these people submit to you. How do you aggregate these votes to best represent the electorate? I’d argue the following is fair:

MoreSameLess
1 winner1 winner
2 winners1 winner1 winner
3 winners1 winner1 winner1 winner
4 winners2 winners1 winner1 winner
5 winners2 winners1 winner2 winners
6 winners2 winners2 winners2 winners

As an aside, note that the result at 6 winners isn’t really proportional to the popular vote, but that we were constrained by the number of running candidates. Had More X another candidate to run, More/More/More/Same/Less/Less would have been a fairer result.

More to the point though, notice the oscillation between the 1 and 2 winner scenarios. Although the electorate is best represented with a single Same X winner, More/Less better represents the electorate than Same/Same. While this holds true when the number of winners is known and fixed, our notion of fair changes when the number of winners is variable.

Consider the scenario where you have one representative, but occasionally need a second. Since the first representative is fixed, you choose your second proportional to the electorate. We’re looking at generating an ordered list of candidates.

Proportional Ranking
First placeSame
Second placeMore
Third placeLess
Fourth placeMore
Fifth placeLess
Sixth placeSame

Note the difference here between a fixed two-winner election (where we chose More/Less) and the proportional ranking (which starts off Same/More). In this different problem space, a fair outcome produces different results.

Lastly, consider the scenario where you only need one winner, but you need a runner up in case the first winner is unable to take the position.

Non-proportional Ranking
First placeSame
Second placeSame
Third placeMore
Fourth placeMore
Fifth placeLess
Sixth placeLess

Where in the first example we provided More/Less as a fairer alternative to Same/Same, the problem space for this third example would dictate Same/Same over More/Less. Again, different results for the same notion of fair.

These three scenarios have produced significantly different results, illustrating fixed-winner allocation, proportional ranking and non-proportional ranking. Hopefully this highlights the need to determine in advance the context in which you’re bringing things to a vote.

The one question I have in all of this is how to distinguish between these two problems:

  • One winner every couple of years (e.g. federal elections) is likely best suited with fixed-winner allocation
  • One winner every hour (e.g. a board game marathon) is likely best suited with proportional ranking

Is the salient difference here that preferences can change over time? That the voters themselves change over time? Or is it that you can opt to take a break during the next board game if you don’t like it, but you can’t quite opt out of being a citizen (at least, not with as much ease). I’m really not sure here.

Quickly voting using the Schulze Method

When you’re in a group trying to reach a collective decision by voting, chances are it sounds a little like this.

All those in favour of W?2.
All those in favour of X?7.
All those in favour of Y?5.
All those in favour of Z?6.
X has the most votes. X wins.

The problem here is that the majority might not want X, which only got 35% of the vote.

The Alternative

Next time, try the following. It takes a little longer, but the results will be much fairer.

W vs X.Who prefers W?13.Who prefers X?7.
W vs Y.Who prefers W?8.Who prefers Y?5.
W vs Z.Who prefers W?12.Who prefers Z?3.
W never loses a pairing. W wins.

Bam. Done.

As an aside, you could also infer this information from a set of collected ballots.


The Edge Cases

Disclaimer: Edge cases are exceedingly rare in large elections. What follows is simply for completeness.

In the case where two or more candidates never lose a pairing, that’s a tie. Randomly choose between these candidates.

In the case that all candidates lose a pairing, that’s a Condorcet paradox. The chances of this occurring are asymptotically rare as the number of voters increases, but here’s how to handle it if it comes up.

W vs X.Who prefers W?16.Who prefers X?14.
W vs Y.Who prefers W?17.Who prefers Y?13.
W vs Z.Who prefers W?12.Who prefers Z?18.
X vs Y.Who prefers X?19.Who prefers Y?11.
X vs Z.Who prefers X?9.Who prefers Z?21.
Y vs Z.Who prefers Y?20.Who prefers Z?10.
Each candidate loses a pairing. Let’s draw it out.

The first image shows the tallies. The second shows the cycles that prevented us from finding a simple winner. All subsequent steps are simply removing the weakest edge and any loose candidates.

For more information, see Wikipedia’s criteria analysis. To see it in action, try this mock election out.

Misinterpreting problem spaces

Polls and elections aren’t the same type of thing. The former asks for descriptive data, the latter for prescriptive preferences. To illustrate the hazards of this common misunderstanding, let’s go over some other ways data can partitioned and the problems that arise from confusing them.

Ordered data vs unordered data

Ordered data has an innate linear ordering, whereas unordered data doesn’t.

Ordered
How many apples can you eat in one sitting? (0, 1, 2…)
Unordered
What was the last type of apple you ate? (Granny smith, red delicious…)

When unordered data is confused as ordered data, a relationship is erroneously implied between the items.

unordered-as-ordered

When ordered data is confused as unordered, data trends are buried.

ordered-as-unordered

Ordered discrete vs ordered continuous

With discrete data, there exist two values for which there is no valid value in between, whereas continuous data can always be broken down into more detail.

Discrete
How many thermostats do you have in your house? (0, 1, 2…)
Continuous
What is the average temperature in your house? (22.1°C, 20.3°C…)

When discrete data is confused as continuous, inappropriate conclusions can be reached (chances are you have 2.3 thermostats in your house).

discrete-as-continuous

When continuous data is confused as discrete, granularity is lost. We can’t tell from the graph below whether there’s a large portion of 0.0km entries, if only odd decimal entries exist (0.1, 0.3, 0.5…), or any other detail hidden by these brackets.

continuous-as-discrete

Simple data vs compound data

Compound data have multiple values that bear some relation to each other.

Simple
How big is your television diagonally (24″, 32″, 34″…)
Compound
What are the dimensions of your television (20″x18″, 24″x20″…)

Compound data can be misrepresented as multiple simple data axes by ignoring the coupling between the two values, whereby the conditional probabilities are lost. Worse, it can be restricted down to a single simple data axis, whereby relevant data is entirely discarded (e.g. do sampled televisions only exist at certain aspect ratios?).

compound-as-simple

Descriptive data vs prescriptive preferences

Descriptive
What did you have for dinner last night?
Prescriptive
What should we have for dinner tonight?

The difference is subtle, but it’s there. With descriptive data, respondents have no incentive to tactically misrepresent their opinions. With prescriptive preferences, respondents (aka voters) are aware that their response will influence a future that likely affects them and will vote accordingly.

When prescriptive preferences are confused as descriptive data, a few things can happen:

  • Continuous data can be filtered down into discrete data. Instead of allowing degrees of preference, ballots are restricted to binary expressions of support.
  • Compound data can be filtered down into simple data. Although multiple options exist, ballots restrict the voter to supporting a single candidate.

Both confusions are present when Plurality voting is used in lieu of a more expressive ballot, where in voters could express more than just a single binary preference for a single option. Maybe this comes from ignorance of alternative voting methods, technical restrictions from the online voting application used, or seeing everything as a nail when you’re holding a hammer; Take your pick.

When Plurality voting is used to obtain prescriptive preferences, the same type of analytical fallacy is being made as is detailed in the above examples, all of which lead to a poor understanding of the underlying reality the data represents.

Implementing Schulze STV

Thus far, my implementation of Schulze STV has worked fairly well. The downside I’ve recently run into is in simulating real world multiple winner elections. Take Vancouver’s 2008 municipal election as an example, in which 32 candidates ran for 10 seats. The problem-space we’re looking at consists of 32 choose 10 nodes and (32 choose 10) x 11 edges. For those unfamiliar with binomial coefficients, that’s roughly 64 million nodes and 709 million edges. Exponential runtime is a bitch.

For the purposes of heuristic development, let’s look at Tideman’s A03. It’s a set of 989 ballots to elect 7 winners from 15 candidates. Schulze STV computes the winning set as B/D/E/F/H/K/N by looking at 6435 nodes and 51480 edges. How do we best prune the number of candidates down so as to look at a smaller subset, yet still produce the same winning set in most cases? Well, what’s quickly available? We can easily calculate the sum of all votes, standard deviation, and whatnot.

If we wanted to prune off the least preferred by sum (indicated by larger sums), we’d be cutting off one of a winner Schulze STV would want to elect. Alternatively, let’s look at the number of voters that mark each candidate in first place. If a voter marks two candidates in first place, split their vote between the two.

That heuristic seems great at first as it didn’t accidentally eliminate winning candidates and it reduced the number of nodes to 120, a 98% reduction in problem space size. However, the winning set would now be computed by SchulzeSTV as A/B/D/E/G/L/N, which is significantly different. Furthermore, the heuristic is going to incentivize tactical voting. Sometimes there aren’t any easy shortcuts and the best thing to do is just optimize the implementation.