Integer encoding of multiple-choice ballots (2)

In the last post we saw how simple arithmetic with the right choice of base can encode integer lists for multiple ballot choices in an elegant and compact way. A couple of points were mentioned in the notes section, one of them was

our scheme allows encoding repeated choices, which are usually not legal ballots. For example, one can encode the choice 1,1,1 but that would be an illegal vote under all the voting systems we mentioned as examples. The fact that legal ballots are a subset of all possible encodings means that the encoding is necessarily suboptimal with respect to those requirements. We will look at an alternative scheme in the next post.

The alternative scheme we show below attempts to optimize the encoding by using a variable base as digits are encoded. The main idea is that as chosen integers in the list are encoded, the remaining ones are constrained since they cannot be repeated. Thus the base can be reduced as less choices are possible. In practice the process is complicated by the need to keep track of what digits correspond to what, as gaps form in the remaining integers and the corresponding meaning of digits changes. Here is a python implementation

def encode2(digits, choices = 9):
  radix = choices + 1
  present = range(0, radix)
  digits2 = deque()
  radix = radix - len(digits) + 2
  for d in digits:
    index = present.index(int(d))
    present.remove(int(d))
    digits2.appendleft(index)

  number = 0
  print(digits2)
  for d in digits2:
    number += d
    number = number * radix
    # print(d, radix, number)
    radix = radix + 1

  return number / (radix - 1)

def decode2(number, choices = 9):
  radix = choices + 1
  print("decode %d" % number)
  digits = deque()
  present = range(0, radix)
  tmp = number
  while(number > 0):
    # print(number, radix, number % radix)
    digits.append(number % radix)
    number = number / radix
    radix = radix - 1

  print(digits)
  ret = deque()
  for d in digits:
    # print(d, present, ret)
    val = present[d - 0]
    ret.append(val)
    present.remove(val)

  return ret

and a sample session using both encoders

>>> execfile("encoder.py")
>>> decode(encode([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], 21), 21)
decode 14606467545964956303452810
deque([1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L])
>>> decode2(encode2([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], 21), 21)
decode 36697695360790800022
deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

Note the shorter value produced by the second encoder

encode: 14606467545964956303452810

encode2: 36697695360790800022

Despite the reduction, the second encoder is not optimal (the first encoder is optimal given repeatable choices); the range of output numbers is larger than that of legal ballots. It would be interesting to see how to obtain the most compact solution, a detailed analysis could compare these schemes systematically to get quantitive measures of space efficiency.


 

Integer encoding of multiple-choice ballots

Secure voting systems supporting privacy through encryption must encode ballot contents into integers before they can be encrypted[1]. This encoding step is mostly trivial. For example, imagine a yes-no-abstain ballot. One can simply apply the following mapping to yield integer plaintexts for corresponding ballots:

Yes => 1
No => 2
Abstain => 3

But things can get a bit more involved when dealing with multiple-selection ballots. These are ballots where the voter makes more than one choice. They can be either ranked ballots, where the voter specifies a preference relation over the selections, or unranked ballots where no such a preference exists. Examples of voting systems using the former are single transferable vote or instant runoff voting.  Systems like approval voting or plurality at large are examples of the second type, using unranked ballots.

Imagine we are using one of these systems to elect a candidate out of a field four: Alice, Bob, Charlie, and Donna. We first apply the trivial mapping:

Alice => 1
Bob => 2
Charlie => 3
Donna => 4

But how do we encode a complete ballot, for example, a ballot with (X corresponds to marked choices)

Alice X
Bob X
Charlie O
Donna O

Unlike the yes-no-abstain ballot above, the content of the ballot corresponds to a list of integers: 1 and 2. We could use the following mapping

encode: Alice, Bob => 1, 2 => 12

The ballot is encoded as the number 12, resulting from the concatenation of the string representations of each of the integers. But what if there are more candidates, and we need to encode something like:

encode: 11, 3 => 113

That won’t work, because 113 could represent either 11 and 3, or 1 and 13.

decode: 113 => ?

We can remedy this by adding some padding such that each choice is well separated:

encode: 11, 3 => “1103” => 1103

Then when decoding, we convert the integer to a string, split it every 2 characters, and finally obtain integers for each of the candidates:

decode: 1103 => “1103” => “11”, “03” => 11, 03

But there’s still a problem, what about this choice:

encode: 1, 13 => “0113” => 113

We run into trouble here, because the string “0113” corresponds to the integer 113; there is no mathematical difference between “0113” and “113”. To fix this, when decoding we can first check that the string length is a multiple of 2 (since we are using 2 chars per candidate integer), if it is not we prepend the required zeros. The encode-decode process would be

encode: 1, 13 => “0113” => 113
decode: 113 => “113” (prepend zero) => “0113”  => “01”, “13” => 1, 13

I hear you complain that all this concatenation, padding, and prepending looks a bit hackish, can we do better?

Let’s go back to our first example, when we simply wanted to do

encode: Alice, Bob => 1, 2 => 12

This looked very nice and simple. Can’t we do something like this in general, without any string hackery? The first step is to go back to the definition of decimal numbers.

Decimal numbers (cplusplus.com)

In these terms, the encoding 1, 2 => 12 corresponds to

(10^1) * 1 + (10^0) * 2 = 12

Here we have expressed the encoding of 1, 2 using arithmetic, no string operations involved. The ballot choices are interpreted as digits according to the mathematical definition of decimal numbers. (In fact, this is what goes on under the covers when you convert a string like “12” into the number 12.) This gives us a a purely arithmetical description of the simple mapping we started with. Things then got complicated when we considered the possiblity of more choices (candidates) in the ballot. Let’s apply our mapping to that problematic ballot:

encode: 11, 3 => (10^1) * 11 + (10^0) * 3 = 113

Our new procedure fails the same way: the simple scheme where each digit represents one choice cannot be accommodated by the decimal digit representation of choices, and the result 113 is ambiguous. But wait, who says we have to encode according to a decimal representation? What if we were to map choices to hexadecimal digits,:

encode: 11, 3 => (10^1) * B + (10^0) * 3 = B3

And we’ve restored simplicity and correctness to our scheme. B3 encodes the choice 11, 3 with one choice per digit and no ambiguity! If the B3 looks like cheating, just remember, B3 is a representation of a number that in decimal format turns out to be 179. The encode-decode process could just as well be written as

encode: 11, 3 => 179
decode: 179 => 11, 3

The bottom line is we can encode lists of integers into an integer provided we use the optimal base, which is equal to the number of possible choices in the ballot plus one.

Let’s revisit our original example, with Alice, Bob, Charlie and Donna. Since we have four candidates, our base is 4 + 1 = 5. The encoding is thus:

encode: Alice, Bob => 1, 2 => (10^1) * 1 + (10^0) * 2 = 12 (base 5) = 7 (decimal)

in short:

encode: 1, 2 => 7
decode: 7 => 1, 2

Note that not only is this method simpler with no string operations or padding, but the encoded values are smaller. Compare:

encode: 1, 2 => 12
encode: 11, 3 => 1103

with

encode: 1, 2 => 7
encode: 11, 3 => 179

Which should not come as a surprise, encoding with the specified base is the most compact[2] encoding possible (proof left as excercise for the reader). A larger base wastes space encoding ballot contents that are not possible, whereas a smaller base is insufficient to encode all possible ballots.

Finally, here is a python implementation of the encoder we have proposed

def encode(choices, max_choices):
    radix = max_choices + 1
    number = 0
    for d in choices:
        number += int(d)
        number = number * radix

    return number / radix

def decode(number, max_choices):
    radix = max_choices + 1
    choices = deque()
    while(number > 0):
       choices.appendleft(number % radix)
       number = number / radix

    return choices

In the next post we will further discuss details as to the compactness of the encoding mentioned in [2].


[1] In the case of ElGamal encryption used in Agora Voting, the plaintext must be encoded into an element of the multiplicative subgroup G of order q of the ring Zp, where p and q are suitably chosen prime numbers. In order to do this, the plaintext must be first encoded into an integer, after which it is mapped to a member of G, and subsequently encrypted.

[2] A few caveats must be mentioned. First, we are using 1-based indices to represent choices, which means some values are unused. Second, our scheme allows encoding repeated choices, which are usually not legal ballots. For example, one can encode the choice 1,1,1 but that would be an illegal vote under all the voting systems we mentioned as examples. The fact that legal ballots are a subset of all possible encodings means that the encoding is necessarily suboptimal with respect to those requirements. We will look at an alternative scheme in the next post.

Hackatón con Podemos TIC

El próximo sábado 12 de julio celebraremos un hackatón en Madrid sobre el proyecto Ágora Voting. ¿Tienes ganas de colaborar? este es el momento. Nuestra intención con estos hackatones es fomentar la comunidad de este proyecto de software libre.

BARBacoa

El hackatón se realizará en colaboración con el círculo TIC de Podemos que nos cede el lugar donde celebrarlo y también dará publicidad al evento. Invitamos a todas las personas con los siguientes perfiles a participar en el desarrollo de la herramienta:

Se busca

  • Programadores con experiencia en java/networking/multithreading
  • Programadores con experiencia en python/django/flask/threads/unit-testing
  • Administradores de sistemas con experiencia en debian/puppet/nginx/proxmox

Objetivos

  • recoger feedback sobre agora-ciudadana (el software que se ve en agoravoting.com) y que los desarrolladores de python que vengan se lo instalen localmente como entorno de desarrollo y puedan trabajar en junior-jobs en ese código en base a las incidencias reportadas.
  • trabajar en election-orchestra, que es la parte de backend que ejecutan las autoridades para hacer recuentos seguros.
  • trabajar en junior jobs de election-orchestra, para mejorar su documentación, el script de despliegue, incidencias.
  • hacer algunas pruebas y mejorar verificatum, que es el software escrito en java que lanza election-orchestra para hacer el recuento.
  • establecer una política de backups y de administración de la máquina que Podemos gestiona para votaciones.
  • montar una autoridad en los servidores de Podemos administrada por gente del círculo de Podemos TIC y hacer pruebas
  • diseñador que se encargue de hacer una github-pages sencilla para la comunidad de Agora Voting

 

Recompensas

  • Comida grasienta y bebidas estimulantes
  • Socializar con (otros) frikis
  • Formar parte de la comunidad de un proyecto de software libre que está revolucionando la democracia

Dónde, cuándo

El hackathon comenzará a las 11:00 de la mañana el sábado 12 de julio y se planea que dure hasta la noche. Se realizará en la siguiente dirección: Calle Nanclares de Oca, 3 (el metro más cercano es Canillejas).

El lugar está un poco apartado, te recomendamos venir en coche. Para cualquier duda, contáctanos en nuestra lista de correo: https://groups.google.com/group/agora-ciudadana-devel

¿Qué me llevo?

Tu portátil, el cable de alimentación, y muchas ganas de hackear.

Antecedentes

La colaboración entre Agoravoting y Podemos está siendo muy fructífera, además de los crowdfunding de Podemos para la aplicación de mejoras, el apoyo mostrado por el movimiento a favor de esta herramienta https://podemos.info/participa/herramientas/ ha propiciado:

  • innumerables pruebas de círculos de podemos en agora-ciudadana, (la versión web de la aplicación), que han sacado a relucir las vergüenzas del programa y que pretendemos abordar en el hackatón
  • recursos humanos y materiales para continuar con el desarrollo, principalmente la colaboración con el círculo TIC (técnología de la información y la comunicación) que nos produce una sana envidia por su gran número de participantes y los conocimientos de los mismos
  • la cesión desinteresada de un excelente espacio de coworking donde poder trabajar en el desarrollo de esta herramienta y otras.

El 15 de junio, después de la primera gran asamblea de podemos, donde se realizó un taller TIC, se celebró una barbacoa en este espacio con el fin de conocer el lugar, romper el hielo, discutir ideas y comer salchicas (recomendación: no llevar camisa blanca a una barbacoa).


También se aprovechó la ocasión para celebrar la primera asamblea del círculo TIC en la misma mesa de pingpong en la que intentamos jugar infructuosamente, por el viento. En este hackatón nos centraremos más en el código y menos en la comida grasienta y otros divertimentos, que por otra parte no pueden faltar en un evento como este ¿quieres venir?