All posts by druescas

Reddit-style filtering for e-democracy

When voting, people select among several options that have been predefined prior to the election. In more open scenarios of citizen participation, people are asked not only to vote for predefined options, but to contribute their own ideas before the final vote. This introduces another activity to the democratic process, we can call it filtering.

The aim of filtering is to select, out of a very large number of ideas, the very few best ones either to directly implement them or to put them to a formal vote. Filtering has to address the problem that voting cannot: How do we select out of potentially hundreds of ideas without having each voter evaluate them all? The solution provided by filtering is a common theme in many proposals to augment democracy: division of cognitive labour.

A single voter cannot rank hundreds of ideas, but the cognitive load of selecting the best ones can be spread across the many citizens that are contributing them. Filtering can scale because as the number of ideas grows with the number of people suggesting them, so too does the available cognitive effort available to evaluate them.

However, a key distinction must be noted between voting and filtering. The process of distributing cognitive labour allows filtering many ideas, but once we accept that not all people are evaluating everything, the information available is necessarily incomplete. In other words, the process of ranking ideas has an element of uncertainty, it is statistical.

Reddit-style filtering

When looking for tools to implement internet mediated filtering, sooner or later one runs into systems like reddit, slashdot, digg and the like. At an abstract level, we can pick out the following features which are relevant to the problem of filtering:

  1. Many users can submit a large amount of items as well as evaluate them
  2. Users evaluate items with a binary signal, eg voting up/down
  3. A ranking method exists that sorts items for some definition of better => worse.

At this level of description, these three properties fit filtering well. We have ideas that we wish to rank, and we cope with the large volume by having the users evaluate them, possibly using binary signal such as approval (or not approval). The meat of the matter lines in the third property, specifically in the phrase “for some definition of better => worse”.

Quality of ideas

Above we have used the abstract term “items”. In reddit, items can be stories or comments. The distinction is important because reddit uses different ranking algorithms for each. The key feature of filtering that departs from the reddit model is that reddit and similar sites are news aggregation tools. This is why stories in reddit include a novelty component in their ranking. Ceteris paribus, new items are better than old items.

Filtering on the other hand is not about news, but about ideas/proposals. Although novelty may still be a factor, it is not a central one. Idea quality must be mostly about voters evaluation through a binary signal, and not very much about its submission date. Filtering ranking is more like reddit’s ranking of comments than it is of stories.

The problem is simple then: we must rank ideas according to users’ binary evaluation in a way that deals with incomplete information. That is, we will not have a +/- judgement for every idea from every user, only subsets of this. Fortunately the problem as posed is directly an instance of binomial proportion estimation, an urn model.

We have, for every idea, an unknown quantity which represents the fraction of all users who would evaluate it positively. This quantity can be statistically inferred using the number voters that have in fact evaluated the idea and approved it. To do this we use the betabinomial model, see here for an introduction. Briefly, we choose a uniform uninformative prior distribution with parameters alpha = beta = 1. Because the beta is conjugate to the binomial, posterior distribution is also a beta given by

The mean of a beta distrubution is given by

If we plug this into the posterior we see that the mean is simply approvals / total, which makes sense. So a first approximation would be to rank ideas by the mean of the (posterior) probability distribution, which is just the fraction of approvals out of total evaluations.

What about uncertainty?

The problem with the above is that it ignores how confident we are about the estimation of an idea’s quality. For example, an idea evaluated 1000 times with 500 approvals would rank equally to an idea evaluated twice and approved once. In technical terms, using the mean throws away information about the probability distribution. This is what the author of this post realized:

suppose item 1 has 2 positive ratings and 0 negative ratings. Suppose item 2 has 100 positive ratings and 1 negative rating. This algorithm puts item two (tons of positive ratings) below item one (very few positive ratings). WRONG.

The author then goes on to suggest a lower bound wilson confidence interval, this is how reddit currently ranks comments. Another option along these lines is to use a lower bound on our beta posterior using its cumulative distribution function (I wont go into more details here). In either case, the spirit of this measure is to rank ideas that we are confident are good higher than ideas which have not been evaluated enough.

What about equality/winner takes all/sunk ideas/information gain?

This is where filtering departs strongly from systems like reddit. Consider:

  • Ideas should have an equal opportunity of being selected.
  • Once ideas are ranked highly there is a snowball effect as they are more visible and can accumulate approvals.
  • New good ideas will be lost and sunk as they cannot compete with the volume of older ones that have accumulated approvals.
  • By penalizing uncertainty you concentrate evaluation away from where its most needed, information gain is minimized.

All these objections follow a common theme, the theme of balancing two competing objectives:

  • We want to rank good ideas highly
  • We want to allow for the discovery of good yet unproven ideas

Thompson Sampling and multi-armed bandits

The above dilemma is another form of the exploration-exploitation tradeoff that appears in reinforcement learning, an area of machine learning. There is one specific problem in machine learning whose structure is very similar to the problem of rating items and filtering: multi-armed bandits. In this post James Neufeld first makes the connection between rating comments on reddit and muti-armed bandits.

The comment scoring problem on reddit is slightly different from the basic setting described above. This is because we are not actually choosing a single comment to present to the user (pulling one arm) but, instead, producing a ranking of comments. There are some interesting research papers on modelling this problem precisely, for example [Radlinski, et al.,2008], but, it turns out to be a combinatorial optimization (hard). However, rather than going down this complex modelling path, one could simply rank all of the μ¯iμ¯i samples instead of taking just the max, this gives us a full ranking and, since the max is still at the top, is unlikely to adversely affect the convergence of the algorithm.

He then goes on to propose adapting Thompson sampling, a solution applied to multiarmed bandits in reinforcement learning, to the case of rating comments in reddit. The method of Thompson Sampling (or probability matching) is simple given what we’ve seen above regarding the beta posterior. Instead of using the beta posterior mean, or a lower bound on its cumulative distribution, we simply sample from it. The procedure is:

  1. For each idea, sample from its posterior beta probability distribution
  2.  Construct a ranking according to the sampled value

Here’s how Thompson sampling relates to the two objetives stated above

  • We want to rank good ideas highly

Sampling from a high quality posterior will tend to produce larger values.

  • We want to allow for the discovery of good yet unproven ideas

Because we are sampling there is a chance that unproven ideas will be ranked highly. Furthermore this possibility is greater for better ideas and for ideas with high uncertainty. There is more to be investigated here about the relationship between Thompson sampling and maximizing information gain (eg Kullback Leibler divergence).

Extensions

The base Thompson sampling model can be extended in several ways. For example:

  • Time: we can incorporate time factors by having decays on the beta distribution parameters.
  • Collaborative filtering: we can apply weight factors on beta parameters depending on affinity with the user.
  • Boosting newly submitted ideas: we can choose non uniform priors to ensure high exposure to new ideas.

The pairwise-bradleyterry model for pairwise voting

In a previous post we discussed pairwise voting and the pairwise-beta model as a way to obtain a global ranking over candidates using bayesian inference with the beta distribution. In that post we remarked in the final paragraph that the pairwise-beta model is not perfect:

In particular, it does not exploit information about the strength of the opposing item in a pairwise comparison.

In this post we will look at a better model which addresses this particular problem, albeit at a computational cost. To begin we present a pathological case which exhibits the problem when using the pairwise-beta.

Consider the following set of pairwise ballots, where A, B, C, D, E and F are options, and A > B indicates that A is preferred to B. There are 5 ballots:

A > B

B > C

C > D

D > E

F > E

Applying the pairwise-beta algorithm method to this set of ballots yields the following output (options A-F are referred to as numbers 0-5):

[(0, {'wins': 1, 'losses': 0, u'score': 0.6666666666666666}), 
(5, {'wins': 1, 'losses': 0, u'score': 0.6666666666666666}), 
(1, {'wins': 1, 'losses': 1, u'score': 0.5}), 
(2, {'wins': 1, 'losses': 1, u'score': 0.5}), 
(3, {'wins': 1, 'losses': 1, u'score': 0.5}), 
(4, {'wins': 0, 'losses': 2, u'score': 0.25})]

which is equivalent to the following ranking:

  1. A, F
  2. B, C, D
  3. E

A and F share the first position. B, C and D share the second position. E is last.

Hopefully the problem in this ranking is apparent: the strength of the opposing option in a pairwise comparison is not affecting the global ranking. This is why option F, which only beats the last option, is ranked at the same position as A, which “transitively” beats every other option. Similarly, options B, C and D are ranked at the same level, even though presumably option B should be stronger as it beats option C which beats option D.

In other words, beating a strong option should indicate more strength than beating a weak option. Similarly, being beaten by a strong option should indicate less weakness than being beaten by a weak option.

We can resort to the Bradley-Terry [1] model to address these shortcomings. The Bradley-Terry is a probabilistic model that can be used to predict the outcome of pairwise comparisons, as well as to obtain a global ranking from them. It has the following form:

and in logit form[2]:

logit2

The parameters (p’s and lambdas) can be fit using maximum likelihood estimation. One can consider these to represent the relative strength of options and therefore give a global ranking, although strictly speaking their interpretation is rooted in probabilities of outcomes of comparisons.

In order to apply this model we can use the BradleyTerry2 R package by Turner and Firth[2], which fits the model using tabular input data. Armed with this package all we need is some extra plumbing in our Agora Voting tallying component and we’re ready to go. Let’s run it against the same ballots we did above, we get:

[(0, {'score': 73.69821}),
 (1, {'score': 49.13214}),
 (2, {'score': 24.56607}),
 (3, {'score': 1.186101e-06}),
 (5, {'score': 0.0}),
 (4, {'score': -24.56607})]

which is equivalent to the following ranking:

  1. A
  2. B
  3. C
  4. D
  5. F
  6. E

Notice how this ranking does away with all the problems we mentioned with the pairwise-beta result. In particular, note how option F, which above was ranked joint first, is in this case ranked fifth. This is because it beat option E, which is last, and therefore not much strength can be inferred from that comparison.

Before concluding that the pairwise-beta model is terrible, remember that the results we got here correspond to a handpicked pathological set of ballots. In general it seems reasonable to expect results from both models to converge as more data accumulates and the strength of opponents is evened out. This hypothesis seems to match that stated in work by Salganik[3], where the pairwise-beta and a more robust model are compared saying:

In the cases considered in the paper, the two estimates of the score were very similar; there was a correlation of about 0.95 in both cases.

In summary, in this and the previous post we have described two models that can be used for pairwise elections, where candidates are asked to compare options in pairs. We have seen how one of the models works well and is easy to calculate, but can potentially give unrealistic rankings when data is sparse. We then considered a second more robust model which addresses this problem, but is computationally more expensive. Further work is required to determine exactly how computationally demanding our pairwise-bradleyterry implementation is.

 


[1] BRADLEY, R. A. and TERRY, M. E. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons.  – http://www.jstor.org/stable/2334029

[2] Bradley-Terry Models in R: The BradleyTerry2 Package – http://www.jstatsoft.org/v48/i09/paper

[3] Wiki surveys: Open and quantifiable social data collection -http://arxiv.org/pdf/1202.0500v2.pdf

The pairwise-beta model for pairwise voting

In a pairwise vote, voters are asked to repeatedly pick between pairs of options, selecting the one they favor of the two. The procedure then combines all these pairwise choices in order to establish a global ranking for all options. A pairwise vote is statistical in nature, it must infer preference data that voters have not explicitly stated in order to obtain a result.

This statistical property allows obtaining an approximate preference ordering over a large list of options without overwhelming the voter with too much work. For example, if each voter was asked to establish a preference over 50 items, they would be exhausted and participation would suffer.

The pairwise-beta is a simple bayesian method used to rank items in pairwise votes. It is based on the beta-binomial model [1], which is composed of a beta prior and a binomial likelihood. This model is very tractable: because the beta distribution is conjugate to the binomial, the posterior also has beta form and is easily obtained:

bb1

I will not present a formal justification of the pairwise-beta model for pairwise comparisons, rather I will present some intuitions that should convey how and why the model works.

The key bridge to interpret pairwise comparisons in terms of the beta-binomial model is to realise that the better-worse binary relation between options maps directly onto the success/failure outcomes of a bernoulli trial. We can thus establish a correspondence[4] between pairwise comparisons and bernoulli trials:

  • each item i corresponds to a sequence of bernoulli trials, Bi
  • a comparison in which i wins corresponds to a success in Bi
  • a comparison in which i loses corresponds to a failure in Bi

The question we are trying to answer is

Given the proportion of comparisons in which i wins, what is the proportion of items that are globally better than i?

that reformulated in terms of our correspondences becomes

Given a sequence of bernoulli trials Bi, what is the proportion of successes/losses for i?

Which is a case of standard binomial proportion estimation[2]. As we noted before, the posterior of the beta binomial is also a beta distribution, given by

bb3

If we want a point estimate we just use the mean for this distribution which is

bb2

This gives us, for each item i, an estimation for the number of items that are better/worse than itself. This leads directly to a global ranking: the best ranked items will be those which are estimated to be better than most other items.

 In summary, the procedure is

  1. For each item i, obtain the corresponding sequence of bernoulli trials Bi
  2. For each item i, calculate the posterior beta distribution mean given the data from 1)
  3. Create a global ranking based on the proportions for each item, as calculated in 2)

The pairwise-beta model is simple but not perfect. In particular, it does not exploit information about the strength of the opposing item in a pairwise comparison. However, despite this drawback it performs well in practice. Please refer to [3] for details.

 


[1] http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf

[2] http://homepage.ntu.edu.tw/~ntucbsc/%A5%CD%AA%AB%C2%E5%BE%C7%B2%CE%ADp[email protected]/[3]%20Chapter%208.pdf

[3] http://arxiv.org/pdf/1202.0500v2.pdf

[4] The correspondence is two to one, as each comparison yields two bernoulli trials

Integer encoding of multiple-choice ballots (2)

In the last post we saw how simple arithmetic with the right choice of base can encode integer lists for multiple ballot choices in an elegant and compact way. A couple of points were mentioned in the notes section, one of them was

our scheme allows encoding repeated choices, which are usually not legal ballots. For example, one can encode the choice 1,1,1 but that would be an illegal vote under all the voting systems we mentioned as examples. The fact that legal ballots are a subset of all possible encodings means that the encoding is necessarily suboptimal with respect to those requirements. We will look at an alternative scheme in the next post.

The alternative scheme we show below attempts to optimize the encoding by using a variable base as digits are encoded. The main idea is that as chosen integers in the list are encoded, the remaining ones are constrained since they cannot be repeated. Thus the base can be reduced as less choices are possible. In practice the process is complicated by the need to keep track of what digits correspond to what, as gaps form in the remaining integers and the corresponding meaning of digits changes. Here is a python implementation

def encode2(digits, choices = 9):
  radix = choices + 1
  present = range(0, radix)
  digits2 = deque()
  radix = radix - len(digits) + 2
  for d in digits:
    index = present.index(int(d))
    present.remove(int(d))
    digits2.appendleft(index)

  number = 0
  print(digits2)
  for d in digits2:
    number += d
    number = number * radix
    # print(d, radix, number)
    radix = radix + 1

  return number / (radix - 1)

def decode2(number, choices = 9):
  radix = choices + 1
  print("decode %d" % number)
  digits = deque()
  present = range(0, radix)
  tmp = number
  while(number > 0):
    # print(number, radix, number % radix)
    digits.append(number % radix)
    number = number / radix
    radix = radix - 1

  print(digits)
  ret = deque()
  for d in digits:
    # print(d, present, ret)
    val = present[d - 0]
    ret.append(val)
    present.remove(val)

  return ret

and a sample session using both encoders

>>> execfile("encoder.py")
>>> decode(encode([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], 21), 21)
decode 14606467545964956303452810
deque([1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L])
>>> decode2(encode2([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], 21), 21)
decode 36697695360790800022
deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

Note the shorter value produced by the second encoder

encode: 14606467545964956303452810

encode2: 36697695360790800022

Despite the reduction, the second encoder is not optimal (the first encoder is optimal given repeatable choices); the range of output numbers is larger than that of legal ballots. It would be interesting to see how to obtain the most compact solution, a detailed analysis could compare these schemes systematically to get quantitive measures of space efficiency.


 

Integer encoding of multiple-choice ballots

Secure voting systems supporting privacy through encryption must encode ballot contents into integers before they can be encrypted[1]. This encoding step is mostly trivial. For example, imagine a yes-no-abstain ballot. One can simply apply the following mapping to yield integer plaintexts for corresponding ballots:

Yes => 1
No => 2
Abstain => 3

But things can get a bit more involved when dealing with multiple-selection ballots. These are ballots where the voter makes more than one choice. They can be either ranked ballots, where the voter specifies a preference relation over the selections, or unranked ballots where no such a preference exists. Examples of voting systems using the former are single transferable vote or instant runoff voting.  Systems like approval voting or plurality at large are examples of the second type, using unranked ballots.

Imagine we are using one of these systems to elect a candidate out of a field four: Alice, Bob, Charlie, and Donna. We first apply the trivial mapping:

Alice => 1
Bob => 2
Charlie => 3
Donna => 4

But how do we encode a complete ballot, for example, a ballot with (X corresponds to marked choices)

Alice X
Bob X
Charlie O
Donna O

Unlike the yes-no-abstain ballot above, the content of the ballot corresponds to a list of integers: 1 and 2. We could use the following mapping

encode: Alice, Bob => 1, 2 => 12

The ballot is encoded as the number 12, resulting from the concatenation of the string representations of each of the integers. But what if there are more candidates, and we need to encode something like:

encode: 11, 3 => 113

That won’t work, because 113 could represent either 11 and 3, or 1 and 13.

decode: 113 => ?

We can remedy this by adding some padding such that each choice is well separated:

encode: 11, 3 => “1103” => 1103

Then when decoding, we convert the integer to a string, split it every 2 characters, and finally obtain integers for each of the candidates:

decode: 1103 => “1103” => “11”, “03” => 11, 03

But there’s still a problem, what about this choice:

encode: 1, 13 => “0113” => 113

We run into trouble here, because the string “0113” corresponds to the integer 113; there is no mathematical difference between “0113” and “113”. To fix this, when decoding we can first check that the string length is a multiple of 2 (since we are using 2 chars per candidate integer), if it is not we prepend the required zeros. The encode-decode process would be

encode: 1, 13 => “0113” => 113
decode: 113 => “113” (prepend zero) => “0113”  => “01”, “13” => 1, 13

I hear you complain that all this concatenation, padding, and prepending looks a bit hackish, can we do better?

Let’s go back to our first example, when we simply wanted to do

encode: Alice, Bob => 1, 2 => 12

This looked very nice and simple. Can’t we do something like this in general, without any string hackery? The first step is to go back to the definition of decimal numbers.

Decimal numbers (cplusplus.com)

In these terms, the encoding 1, 2 => 12 corresponds to

(10^1) * 1 + (10^0) * 2 = 12

Here we have expressed the encoding of 1, 2 using arithmetic, no string operations involved. The ballot choices are interpreted as digits according to the mathematical definition of decimal numbers. (In fact, this is what goes on under the covers when you convert a string like “12” into the number 12.) This gives us a a purely arithmetical description of the simple mapping we started with. Things then got complicated when we considered the possiblity of more choices (candidates) in the ballot. Let’s apply our mapping to that problematic ballot:

encode: 11, 3 => (10^1) * 11 + (10^0) * 3 = 113

Our new procedure fails the same way: the simple scheme where each digit represents one choice cannot be accommodated by the decimal digit representation of choices, and the result 113 is ambiguous. But wait, who says we have to encode according to a decimal representation? What if we were to map choices to hexadecimal digits,:

encode: 11, 3 => (10^1) * B + (10^0) * 3 = B3

And we’ve restored simplicity and correctness to our scheme. B3 encodes the choice 11, 3 with one choice per digit and no ambiguity! If the B3 looks like cheating, just remember, B3 is a representation of a number that in decimal format turns out to be 179. The encode-decode process could just as well be written as

encode: 11, 3 => 179
decode: 179 => 11, 3

The bottom line is we can encode lists of integers into an integer provided we use the optimal base, which is equal to the number of possible choices in the ballot plus one.

Let’s revisit our original example, with Alice, Bob, Charlie and Donna. Since we have four candidates, our base is 4 + 1 = 5. The encoding is thus:

encode: Alice, Bob => 1, 2 => (10^1) * 1 + (10^0) * 2 = 12 (base 5) = 7 (decimal)

in short:

encode: 1, 2 => 7
decode: 7 => 1, 2

Note that not only is this method simpler with no string operations or padding, but the encoded values are smaller. Compare:

encode: 1, 2 => 12
encode: 11, 3 => 1103

with

encode: 1, 2 => 7
encode: 11, 3 => 179

Which should not come as a surprise, encoding with the specified base is the most compact[2] encoding possible (proof left as excercise for the reader). A larger base wastes space encoding ballot contents that are not possible, whereas a smaller base is insufficient to encode all possible ballots.

Finally, here is a python implementation of the encoder we have proposed

def encode(choices, max_choices):
    radix = max_choices + 1
    number = 0
    for d in choices:
        number += int(d)
        number = number * radix

    return number / radix

def decode(number, max_choices):
    radix = max_choices + 1
    choices = deque()
    while(number > 0):
       choices.appendleft(number % radix)
       number = number / radix

    return choices

In the next post we will further discuss details as to the compactness of the encoding mentioned in [2].


[1] In the case of ElGamal encryption used in Agora Voting, the plaintext must be encoded into an element of the multiplicative subgroup G of order q of the ring Zp, where p and q are suitably chosen prime numbers. In order to do this, the plaintext must be first encoded into an integer, after which it is mapped to a member of G, and subsequently encrypted.

[2] A few caveats must be mentioned. First, we are using 1-based indices to represent choices, which means some values are unused. Second, our scheme allows encoding repeated choices, which are usually not legal ballots. For example, one can encode the choice 1,1,1 but that would be an illegal vote under all the voting systems we mentioned as examples. The fact that legal ballots are a subset of all possible encodings means that the encoding is necessarily suboptimal with respect to those requirements. We will look at an alternative scheme in the next post.

Voter fraud and bayesian inference – part 3

Here’s part1 and part2.

Welcome back. In the previous posts we saw how to do inference using the beta-binomial to get probabilities for the proportion of fake ballots in an election, as well as an upper bound on the probability that the election result is incorrect. We briefly mentioned the hypergeometric distribution but did discuss it further nor use it.

Like the binomial (and beta-binomial), the hypergeometric distrbution can be used to model the number of successes in a series of sampling events with a binary outcome. The distinction is that the binomial models sampling with replacement, whereas the hypergeometric models sampling without replacement. In other words, if we are sampling from a box, the binomial applies when the sample is returned to the box before drawing more samples. The hypergeometric applies when the sample is not returned. But wait, doesn’t that mean that we’ve been doing it wrong?

When auditing ballots we keep track of those already checked, a ballot is never audited twice. Shouldn’t we then be using the hypergeometric distribution? It turns out that the binomial distribution approaches the hypergeometric distribution in the limit of a large total number of items compared to the number sampled. This fits our case, as we can only audit a limited number of ballots compared to all those cast.

beta5
Hypergeometric for increasing values of N. The bottom right is the corresponding beta-binomial.

As we saw in the previous post, the beta distribution is a conjugate prior for the binomial, which makes inference very easy. Unfortunately this is no the case for the hypergeometric. But because of the converging behaviour seen above, we can stick to the beta-binomial’s easy calculations without sacrificing accuracy. For the sake of completeness we will quickly show the posterior for the hypergeometric, following [1]. Incidentally this “manual calculation” is what allowed us to obtain the images above, through the javascript implementation in the jsfiddle.

hyp

Again, this is just Bayes theorem with the hypergeometric likelihood function and a uniform prior. In [1] it is also pointed out that the normalization factor can be computed directly with this expression

hyp2

We use this in the jsfiddle implementation to normalize. Another thing to note is that the hypergeometric posterior is 0 at positions that are inconsistent with evidence. One cannot have less successes than have been observed, nor more than are possible given the evidence. These conditions are checked explicitly in the implementation. Finally, the jsfiddle does not contain an implementation for obtaining the upper bound on the probablity of election error, only code for the posterior is present. How about forking it and adding that yourself?

In these three posts we have used bayesian inference to calculate probabilities over proportion of fake ballots, and from there to calculate probabilities that an election result was incorrect. These probabilities could be used to achieve trust from stakeholders in that everything went well, or conversely to detect a possible fraud and invalidate an election.

I’ll finish the post by mentioning that the techniques we have seen here can be generalized beyond the special case of detecting fake ballots for plurality votes. For example, one could use bayesian inference to conduct ballot audits for the sake of checking tally correctness, not just from failures in authentication, but from errors in counting. See [2] for this kind of more general treatment

 


[1] http://www.wmbriggs.com/public/HGDAmstat4.pdf

[2] https://www.usenix.org/system/files/conference/evtwote12/rivest_bayes_rev_073112.pdf

In this work, because results of audits are not just binary, and because tallies are not only plurality, the authors use dirichlet distrbutions and sampling using posteriors to project possible alternative tallies.

Voter fraud and bayesian inference – part 2

We left off the discussion with

We want to calculate the proportion of fake ballots in an election based on the results of limited audits. We have seen how the binomial and hypergeometric distributions give probabilities for the results of an audit given an assumption about the proportion of fake ballots. Bayes theorem can be used to calculate the inverse probability that we are after, once we have specified a prior.

Bayesian inference is a process that takes prior information, and adds evidence to obtain a posterior distribution. In our case this posterior distribution will be over the possible proportion of fake ballots in the set of all ballots. Let’s begin with the binomial case. What prior should we use? One answer is that, since we know nothing about the proportion of fake ballots we should be indifferent about each possibility. This translates into a uniform prior, where all proportions are equally likely. For example

P(proportion = fake ballots/ total ballots) = 1 / (total ballots + 1)

Since there are n + 1 possibilities for the number of fake ballots, we give each of them the same weight, which is 1 / (n + 1).

Beta + Binomial = Beta-Binomial

Before plugging this into bayes, a small technical detour.  Notice how the prior is itself a probability distribution, defined over the 0.0 – 1.0 interval. That is, the minimum proportion (0.0) is no fake ballots and maximum (1.0) is all fake ballots. It turns out there is a paramateric probability distribution one can use for this interval, it’s called the Beta distribution. The Beta distribution has two parameters, alpha and beta. The case of our neutral prior we defined above is equivalent to the Beta distribution with parameters (1, 1)

P(proportion ) = 1 / (n + 1) = Beta(1, 1)

We could express other knowledge with different choices of alpha and beta. But what’s the point of using the Beta, besides having a convenient way to specify priors? The point is that the Beta distribution is a conjugate prior of the binomial distribution. This means that the posterior distribution, once having taken into account available evidence, is also a Beta distribution. Meaning that the calculation of the posterior is much easier, as inference is just a matter of mapping the values of parameters of the Beta to some other values. Here is the posterior of the Beta distribution when it is used as the prior of the binomial (this is called the beta-binomial model).

beta

Equations taken from [1]. The first line is just Bayes theorem, but the payoff is that the last line corresponds to a beta distribution, but with different parameters. In summary

beta2

with a beta prior, bayesian inference reduces to remapping the initial parameters alpha and beta, to alpha + k and beta + n – k, where k is the number of successes and n is the number of trials. Conjugate priors are an algebraic convenience that allow obtaining analytic expressions for posteriors easily. End of detour, please refer to [1] for further details.

Armed with our use of the beta-binomial obtaining the posterior given some audit results is simple. If we audited 10 ballots and 3 of them were fake our posterior would simply be

P(proportion = p | fake audit count = 3 out of 10)

= Beta(1 + 3, 1 + 10 – 3)

= Beta(4, 8)

here’s what Beta(4, 8) looks like

beta3

note how the peak of the distribution is at 0.3, it makes sense since in the sample 3 out 10 ballots where invalid. Evidence has transformed our initial uniform prior into the distribution seen above. This meets our original objective, a way to judge how many ballots are fake in the entire set of ballots based on limited audits. But it’s not the end of the story. What we would like also is to have an estimate as to whether or not the election result is correct. As we said previously, this estimation can be used either as a guarantee that all went well or in the opposite case to detect a problem and even invalidate the results.

fraud

The probablity that an election result was correct, given uncertainty about fake ballots, depends on two things. One is the proportion of ballots that are fake, this is what we already have a calculation for. The other thing is the actual election outcome; specifically a measure of how close the result was. The reason is simple, if the election was close, a small number of invalid ballots could cast doubts on its correctness. Conversely, if the result was a landslide, the presence of fake votes has no bearing on its correctness. For our purposes we will stick with a simple example in which the election decides between two options via simple plurality.

Call the difference between the winning and losing option d

d = winner votes – loser votes

In order for the election to be wrong, there must be a minimum of d fake votes. The existence of d fake votes does not imply that the result was wrong, but d fake votes are a necessary condition. Thus a probability that the number of fake votes is greater than or equal to d represents an upper bound on probability that the election was incorrect. Call this E (for error)

P(proportion of fake votes >= d / total votes) = E

(upper limit on the probability that the election was wrong)

We have P(proportion), it is the posterior we got above. How do we get P(proportion >= some constant)? Through the beta distribution’s cumulative distribution function, which is defined in general as

In order to reverse the inequality, we just need to subtract it from 1 (gives us the tail distribution). We finally have

Probability of incorrect result

= P(proportion >= d / total ballots)

= 1 – Cumulative Distribution Function of P(d / total ballots)

 One final correction. Because we have sampled a number of votes with known results, we must apply our calculations to the remaining ballots.

P(E) = 1 – CDF(d – sampled ballots / total ballots – sampled ballots)

Let’s try an example, an election between option A and option B with the following numbers.

Votes for A = 550

Votes for B = 450

Total ballots = 1000

Audited ballots = 100

Audited fake ballots = 4

which gives

Posterior = Beta(5, 97)

d = 100

Minimum fraction of fake votes required to change result = (100 – 4) / (1000 – 10) = 0.1066

Upper bound on probability of error

= 1 – CDF(Beta(5, 97), 0.1066)

= 0.01352

In conclusion, the probability of error due to fake ballots in this election is less than or equal to 1.35%.

beta4

You can find a javascript implementation for everything we’ve seen until now in this jsfiddle. Calculations for the binomial, beta, hypergeometric and cumulative distribution function are done with the jStat javascript library. Wait, you say, what about the hypergeometric? We’ll leave that for the next post, which should be pretty short.

 


[1] http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf

Voter fraud and bayesian inference

Wikipedia says

Personation  is a term used in law for the specific kind of voter fraud where an individual votes in an election, whilst pretending to be a different elector.

when someone practices personation multiple times to cast multiple votes we are talking about ballot stuffing. In this post we will consider an election in which authentication is not 100% secure, where personation is difficult but not impossible.  Furthermore we will assume there is some available, but costly method by which  ballots can be audited to determine whether or not they were cast via personation or were in fact valid.

What makes the problem non trivial is that ballot auditing is costly and cannot in principle be performed for the entirety of the ballots cast. Hence we would like to estimate, from a limited number of audited ballots, how severe ballot stuffing was for an election. This estimation can be used either as a guarantee that all went well or in the opposite case to detect a problem and even invalidate the results.

What we need is a mathematical model that given some information about the results of an auditing processes allows us to estimate the proportion of “fake” ballots in the set of all those cast. In other words, we are talking about statistical inference; in this post will use a bayesian approach. Let’s get to work.

Imagine we have a box with all the ballots for an election, and the auditing process consists in randomly taking one out and determining whether it is valid or not, recording the result, and then repeating a limited number of times. After we have recorded the results of all the audits, we would like to know how many of ballots in the entire box are fake. Two things come to mind. First, that the result of each audit is binary, we either get FAKE or VALID. Second, that if the proportion of fake ballots in the box is p, then probability that a randomly chosen ballot is fake is p; the probability that it is valid is 1 – p.

The auditing process as a whole yields a count of fake ballots and a count of valid ballots. If we have probablity p for the result of a single audit , can we assign a probablity to the count resulting from the complete audit process? Yes, the binomial distribution and its brother the hypergeometric distribution do just that[1]. Here they are

In our example, k above corresponds to the count of fake ballots. So these distributions give us a way to calculate the probability that a specific number of fake ballots is obtained in the audit assuming a certain proportion of fake ballots in the entire election. For example, let’s say we have 100 ballots total and we know that 10 of them are fake. What is the probability that if we audited 10 ballots we would obtain a fake count of 3?

P(X = 3) = 0.057395628 (binomial)

P(X = 3) = 0.0517937053324283 (hypergeometric)

Only 5%, it is unlikely we’d find 3 fake ballots with only 10 audits, given that there are only 10 out of 100 in total.

We have a way to calculate the probability of some outcome given some assumption about the proportion of fake ballots. But remember, what we want is exactly the opposite: given a certain result for an audit, we’d like to estimate the proportion of fake ballots in the entire set. It is this inversion that makes the problem a case of bayesian inference, and our friend Bayes theorem gives us the relationship between what we have and what we want.

in our case, it translates down to

P(Proportion of fake ballots | Audit Result) = P(Audit Result | Proportion of fake ballots) * P(Proportion of fake ballots) / P(Audit Result)

What we were calculating before is P(Audit Result | Proportion of fake ballots), which we can plug into the formula, together with the other terms, to get what we want: P(Proportion of fake ballots | Audit Result). The other terms are

P(Audit Result) =

The unconditional probability that a certain audit result occurs. It can be calculated by summing over all possible proportions, like this version of Bayes theorem shows:

As seen in the bottom term. Because the bottom term is common to all values of Ai, it can be interpreted as a normalizing constant that ensures that probabilities sum to 1.

P(Proportion of fake ballots) =

The prior probability that some proportion of fake ballots occurs. This ingredient is crucial for Bayesian inference and the Bayesian approach in general. It is an estimate of  the quantity we want to calculate that is prior to any of the evidence we obtain. It can be used to encode prior knowledge about what we are calculating. If we have no knowledge, we can try to encode that in a “neutral prior”. This last point is a very deep problem in Bayesian inference, as is the general problem of choosing priors. We won’t go into in detail here.

Recap. We want to calculate the proportion of fake ballots in an election based on the results of limited audits. We have seen how the binomial and hypergeometric distributions give probabilities for the results of an audit given an assumption about the proportion of fake ballots. Bayes theorem can be used to calculate the inverse probability that we are after, once we have specified a prior. See it in action in the next post.


[1] There is an important difference, the binomial distribution models sampling with replacement, whereas the hypergeometric models sampling without replacement. In the next post we will consider this difference and its significance for our problem.

Liquid democracy and spectral theory

In this post we will show some practical examples of how liquid democracy can be understood in terms of, and make use of results from spectral graph theory. For more background please see [Vigna2009]. Wikipedia says:

In mathematics, spectral graph theory is the study of properties of a graph in relationship to the characteristic polynomial, eigenvalues, and eigenvectors of matrices associated to the graph, such as its adjacency matrix or Laplacian matrix.

What does this have to do with liquid democracy? To answer this, let’s remember what defines liquid democracy: a system of transitive proxy voting. Proxy in that I can delegate my vote to some entity. Transitive because that entity can itself delegate the vote, and so on. Imagine a simple case with three voters, Alice, Bob, and Charlie. Alice delegates to Bob, and Bob delegates to Charlie. It is very natural to represent this as a graph, this is what we call the delegation graph

graph_decycled
Alice chooses Bob, Bob chooses Charlie

Assuming each voter starts with one vote, this would give us the following weights for each voter:

Alice = 1, Bob = 2, Charlie = 3

Because Alice and Bob have delegated their votes, Charlie will end up casting three votes, one for himself, one for Bob and one for Alice. What determines these weights is the structure of the graph; who is voting for who is encoded in the connections between vertices. In graph theory, the object that encodes this information is the adjacency matrix. Here’s the adjacency matrix for our example:

M

Where each row shows who each voter delegated to. Alice (A) delegated to Bob (B), hence the 1 in the second position of the first row. Similarly, Bob (B) delegated to Charlie, (C) as can be seen in the second row. Because Charlie did not delegate, the third row is all zeroes.

We can express the liquid tally above with these equations (1’s represent voter’s initial weights)

0*A + 0*B + 0*C + 1 = A

1*A + 0*B + 0*C + 1 = B

0*A + 1*B + 0*C  + 1 = C

Note how the 0’s and 1’s above correspond to the columns of the adjacency matrix. The above can be represented[1] in matrix form:

(A B C) * AdjacencyMatrix = (A B C)

This is an eigenvalue equation, whose eigenvector (the (A B C) row vector) corresponds to the result of the liquid tally. Note how the equation is recursive, which fits the recursive nature of transitive delegation. Vote weights are calculated in terms of vote weights themselves, the same way each delegate transmits votes that result from previous delegations.

When the adjacency matrix is used to evaluate the importance or influence of nodes in a graph in this way we are speaking of eigenvector centrality. Our example shows that calculating centrality is basically equivalent to tallying in liquid democracy. This is what makes the connection between spectral theory and liquid democracy.

Eigenvector centrality, Katz and Pagerank

Yes, that’s PageRank as in google, in case you thought this whole talk of centrality was a bit abstract, it’s what made google what it is today. Eigenvector centrality, Katz centrality, and PageRank are related methods to measure the importance of a node in a network. We won’t go into all the differences between each measure, besides noting that both Katz and PageRank include an attenuation factor that decreases contributions from distant nodes, whereas eigenvector centrality does not.

In order to run some examples we will use the networkX library, which includes several functions for calculating centrality as well as many other useful features. If you want to play along with this code you will need to install the library as well as its dependencies, including matplotlib and numpy. If you’re on windows I suggest downloading the comprehensive WinPython package which includes everything you need here.

Let’s first create the graph corresponding to our example with Alice, Bob and Charlie. Here’s the code that does that

# construct graph
G = nx.DiGraph()
G.add_edges_from([("Alice", "Bob"), ("Bob", "Charlie")])
# draw it
nx.draw_spring(G)
plt.savefig("graph.png")

This generated the image shown earlier. Now to calculate the tally, we will run both Katz and PageRank.

# katz
centrality = nx.katz_centrality(G.reverse(), alpha=1.0, beta=1.0, normalized=False)
print("katz %s" % centrality)

# pagerank
centrality = nx.pagerank(G, alpha=1.0)
print("pagerank %s" % dict(zip(centrality.keys(), denormalize(centrality.values()))))

which gives us

katz {'Charlie': 3.0, 'Bob': 2.0, 'Alice': 1.0}
pagerank {'Charlie': 3.0, 'Bob': 2.0, 'Alice': 1.0}

Both results match the tally we showed before. A couple of minor points above. First, the PageRank result was rescaled to make it match Katz. Second, the adjacency matrix for Katz was reversed as the networkx 1.8.1 Katz implementation is using a right eigenvector (this has been changed to left eigenvector in master).

More importantly, the alpha parameter is a damping factor. In the language of liquid democracy it modulates just how transitive delegation is by reducing contributions the further away the originate. For example, let’s change the above to alpha = 0.5:

katz {'Charlie': 1.75, 'Bob': 1.5, 'Alice': 1.0}
pagerank {'Charlie': 1.75, 'Bob': 1.5, 'Alice': 1.0}

Now Charlie receives 25% of Alice’s vote and 50% of Bob’s vote. So alpha quantifies the fraction of the vote that is effectively delegated. We can interpret then that a liquid democracy tally is a special case of Katz centrality and PageRank. In fact, liquid democracy is the limiting case of Katz and PageRank when alpha = 1.0, ie no damping (which is why you get viscous democracy in [Boldi2011]).

What about cycles?

One of the first things you have to deal with if you’ve implemented a liquid tallying algorithm is the possibility of cycles in the delegation graph, otherwise the procedure will blow up. Having detected a cycle at tally time the standard treatment is to consider votes that enter it as lost. In order to prevent that undesirable situation you can do cycle detection at vote time to warn the user that his/her vote may produce such a cycle.

What happens if we add a cycle to our example? Let’s try it

G = nx.DiGraph()
G.add_edges_from([("Alice", "Bob"), ("Bob", "Charlie"), ("Charlie", "Donna"), ("Donna", "Bob")])

graph

and we get

File "katz.py", line 70, in 
  centrality = nx.pagerank(G, alpha=1.0, max_iter=1000)
File "C:	oolsWinPython-64bit-2.7.6.4python-2.7.6.amd64libsite-packages
etworkxalgorithmslink_analysispagerank_alg.py", line 143, in pagerank
  'in %d iterations.'%(i-1))
networkx.exception.NetworkXError: pagerank: power iteration failed to converge in 1000 iterations.

The reason this happens has to do with the details of the algorithm that calculates eigenvectors; in particular the relationship between its convergence and the attenuation factor alpha[2]. The short story is this: using an attenuation factor of 1.0 on a graph with cycles may cause problems.

Just as liquid tally algorithms have to deal with cycles, so do we in order to make centrality work correctly. Fortunately there are fast algorithms to detect cycles in graphs. NetworkX offers an implementaton of an improved version of Tarjan’s strongly connected components algorithm, we will use it to define a function that removes cycles in a graph

def de_cycle(G):
    def r(source, target):
        print("remove %s -> %s" % (source, target))        
        G.remove_edge(source, target)
        return target

    cycles = list(nx.simple_cycles(G))
    if(len(cycles) > 0):
        for cycle in cycles:
            print ("cycle %s" % cycle)
            # quick and dirty side-effecting use of reduce, see proper 2-batch iteration: http://stackoverflow.com/questions/5764782/iterate-through-pairs-of-items-in-python-list            
            reduce(r, cycle)

    return G

Using this function we can obtain liquid tallies for any delegation graph correctly, using either Katz or PageRank. See the bottom of this post for the full python script demonstrating this.

Liquid democracy and degree (or branching factor)

Before we said that liquid democracy is the limiting case of Katz centrality and PageRank when alpha = 1.0. In the last section we saw another requirement besides that of alpha = 1.0: that the delegation graph must be acyclic, in other words a DAG. There is one more property that we can consider, degree.

A node’s degree is the number of (in our case, outward) connections with other nodes. In terms of delegation, it is the number of delegates that a voter chooses. Standard liquid democracy uses degree = 1, but such a requirement could in theory be relaxed. How does this fit in with Katz and PageRank? Lets construct a graph where voters may choose one or two delegates.

G = nx.DiGraph()
G.add_edges_from([("Alice", "Bob"), ("Bob", "Charlie")])
G.add_edges_from([("Bob", "Florence"), ("Charlie", "Donna"), ("Florence", "Edward")])

which gives

graph

resulting values

pagerank {'Charlie': 2.0, 'Alice': 1.0, 'Edward': 3.0, 'Donna': 3.0, 'Bob': 2.0, 'Florence': 2.0}
katz {'Charlie': 3.0, 'Bob': 2.0, 'Edward': 4.0, 'Florence': 3.0, 'Alice': 1.0, 'Donna': 4.0}

We see how Katz centrality does not yield a correct tally as it is not dividing outgoing weights for voters who split their delegation among two delegates, instead we get inflated weights. But the PageRank result does work, Bob’s two votes are split correctly, and the delegation proceeds normally from then on.

In summary

  • Liquid democracy is a special case of Katz centrality given
    • a damping factor alpha = 1.0
    • a directed acyclic graph of degree d = 1
  • Liquid democracy is a special case of PageRank given
    • a damping factor alpha = 1.0
    • a directed acyclic graph of degree d >= 1

That’s it for our quick tour of the relationship between liquid democracy and spectral theory. We have also seen how liquid democracy could be extended to include damping (as in [Boldi2011]), or to allow “multi delegation”.


Notes/References

[Vigna2009] Sebastiano Vigna – Spectral Ranking  http://arxiv.org/abs/0912.0238

[Page1998] The PageRank Citation Ranking: Bringing Order to the Web http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

[Boldi2011] Viscous Democracy for Social Networks http://chato.cl/papers/boldi_bonchi_castillo_vigna_2011_viscous_democracy_social_networks.pdf

[1] For simplicity, I have ignored the initial weights associated with each voter in the matrix equation. These initial weights are what makes liquid tallying equivalent to undamped Katz centrality rather than eigenvector centrality.

[2] For details as to alpha and PageRank See http://vigna.di.unimi.it/ftp/papers/PageRankFunctional.pdf section 5 Limit behaviour.

In the case of the networkX implementation of Katz centrality an alpha of 1.0 is guaranteed to converge as all eigenvalues of an acyclic graph are 0 (see http://www.emis.de/journals/JIS/VOL7/Sloane/sloane15.pdf and http://networkx.github.io/documentation/latest/reference/generated/networkx.algorithms.centrality.katz_centrality.html#networkx.algorithms.centrality.katz_centrality)

Python Script: Random liquid Tallying with Katz/PageRank

import networkx as nx 
import matplotlib.pyplot as plt
from random import randrange

def simple_graph():
	G = nx.DiGraph()
	G.add_edges_from([("Alice", "Bob"), ("Bob", "Charlie")])
	# with cycle
	G.add_edges_from([("Alice", "Bob"), ("Bob", "Charlie"), ("Charlie", "Donna"), ("Donna", "Bob")])

	return G

def random_graph(n):
    G = nx.DiGraph()
    G.add_nodes_from(range(0,n - 1))
    for node in G.nodes():
        target = randrange(n - 2)
        if(target == node):
            target = n - 1
        G.add_edge(node, target)

    return G  

# denormalizes page rank scores 
def denormalize(array):
    m = min(array)
    factor = 1 / m
    return map(lambda x: round(x*factor, 5), array)

def de_cycle(G):
    def r(source, target):
        print("remove %s -> %s" % (source, target))        
        G.remove_edge(source, target)
        return target

    cycles = list(nx.simple_cycles(G))
    if(len(cycles) > 0):
        for cycle in cycles:
            print ("cycle %s" % cycle)
            # proper 2-batch iteration: http://stackoverflow.com/questions/5764782/iterate-through-pairs-of-items-in-python-list            
            reduce(r, cycle)

    return G

import sys
if(len(sys.argv) != 2):
	nodes = 10
else:
	nodes = int(sys.argv[1])

# G = simple_graph()
G = random_graph(nodes)
nx.draw_spring(G)
plt.savefig("graph.png")
de_cycle(G)

centrality = nx.pagerank(G, alpha=1.0, max_iter=1000)
print("pagerank %s" % dict(zip(centrality.keys(), denormalize(centrality.values()))))

cycles = list(nx.simple_cycles(G))
print("cycles %s" % cycles)
centrality = nx.katz_centrality(G.reverse(), alpha=1.0, beta=1.0, normalized=False)
print("katz %s" % centrality)

# draw graph without cycles
plt.clf()
nx.draw_spring(G)
plt.savefig("graph_decycled.png")

Experimenting with liquid filtering

Previously we introduced the idea of liquid filtering and its motivation:

We have a situation where we have to collectively choose among many options. In the presentation this was referred to as scaling the solution space. It is infeasible to apply voting as is, because voters cannot consider all these options to make a judgement. So what we do is distribute the cognitive load in a way that reflects user delegation. The problem of liquid filtering is the assignment of voters to questions according to delegation choices, in an optimal way.

and we spoke of optimality which we defined as

minimize the number of questions that each voter has to answer while ensuring that every voter votes on each question

and its math equivalent

minimize |A| subject to ∃ (v, q) ∈ T ∀ q  Q, v ∈ V

The first step in implementing a liquid filtering algorithm is to convert this optimality criterion into an objective function. For simplicity we will modify our criterion slightly to

given a fixed number assignments, maximise the effective number of questions answered

which is saying basically the same thing but from the reverse point of view. With ‘effective’ we refer to the fact that  although voters may not answer a question directly, they can answer it by virtue of having one of their delegates answer it for them. The effective number of questions answered is thus the sum of questions answered directly or via delegation. Optimality is thus defned as

given some |A|  maximize |(v,q)| such that (v, q) ∈ T where q  Q, v ∈ V

We will also assume for simplicity that each voter answers the same number of questions. We referred to this possiblity before

even if the sum total of questions is minimzed, one does not want to overload specific voters with too many questions

One way to define an objective function could simply be the sum of questions effectively answered by each voter, over all voters:

f(A) = Σ t(v, q), where

t(v, q) =

1 if  (v, q) ∈ T and

0 otherwise

the maximum value that this function can attain is that in which every voter has effectively answered every question.

max f(A) = |V| x |Q|

Using this quantity we can define a more intuitive function, coverage, which corresponds to the fraction of all the question-voter pairs that have been effectively answered:

coverage(A) = Σ t(v, q) / |V| x |Q|

this ranges from 0 (no coverage) to 1 (full coverage).

What about delegation chain length?

The coverage function we just defined is the simplest case. One variation could address something we said in the previous post

one goal could be to reduce transitive chains lengths such that user delegates are in fact aligned with the user’s preferences

The intent here is to accomodate the fact that long delegation chains may potentially lead to results that are not representative of voter’s choices. This possibility is also considered in viscous democracy[1], and more generally in systems that penalize edge chain lengths with damping, such as PageRank[2]. So we can define a non-transitive coverage function

non-transitive coverage(A) = Σ t1(v, q) / |V| x |Q|, where

t1(v, q) =

1 if  (v, q) ∈ T1 and

0 otherwise

where we define T1 as equal to T but with a delegate chain cutoff of 1. A cutoff of 1 means we consider only direct delegates. Delegates of delegates, and so on, do not count in calculating a voter’s effective response. Thus T1 is the same as T but replacing D+ with its 1-cutoff equivalent:

T1 = { (v, q) |  ∃ (v’, q) ∈ A ∧ ∃ (v, v’) ∈ D1+ ∧ v, v’ ∈ V, q  Q }

where D1+ is the 1-cutoff equivalent of D+ (recall that Dis the transitive closure of D, the binary relation encoding delegation)

Generalizing, we could define a weighted coverage function which penalizes delegation lengths with values other than 0 or 1, and uses arbitrary cutoffs, but we won’t go into it in detail now.

Some experiments

Below are some results we’ve had running simulations on random graphs to get a feel of how the solution space looks like. The key parameters that define an instance of a random filtering problem are

voters: the number of voters that participate in the decisions

questions: the number of questions that must be decided on

average delegates: the average number of delegations each voter makes

questions asked: the number of questions each voter is presented

coverage function: a choice of objective function, such as those just described above

non-transitive coverage

The following plots were made with questions asked fixed to 1

transitive coverage function

The following plots were made adding an additional axis for the questions asked parameter, varying from 1 to 5. The average delegates was the other horizontal axis, whilst the vertical axis was the coverage function, using a delegate chain cutoff of 3. Additionally we show results of simple hill climbing optimizations (with the ‘X’ symbol) on the assignments.

here we show the same runs but using a weighted coverage function, again with a cutoff 3. The coverage function used here penalized delegation according to their length.

tentative conclusions

 Although these are early results in liquid filtering some conclusions are available.

  • participation is most sensitive to average degree and questions asked
  • population size has no effect provided it is much larger than questions
  • participation becomes saturated at high average degree due to graph topology
  • graph topology dominates optimization (at least using hill climbing)

Other observations to make

  • Secure voting precludes the possiblity of optimization, in these cases it is not possible to improve assignments beyond uniform distribution
  • Without transitivity, average degree must have a magnitude roughly comparable to questions in order to achieve high participation

The biggest flaw in these experiments is probably the purely random nature of the delegation graphs. In order to obtain more realistic results, the generator should produce graphs that, although still random, exhibit properties of those emerging in a practical liquid filtering system. For example, degree variance could be constrained in such systems. Another possibility is that these graphs could feature small world effects or power law degree distributions, as has been observed for social networks.


[1] http://dl.acm.org/citation.cfm?doid=1953122.1953154

[2] http://en.wikipedia.org/wiki/PageRank#Damping_factor