Integer encoding of multiple-choice ballots (2)

In the last post we saw how simple arithmetic with the right choice of base can encode integer lists for multiple ballot choices in an elegant and compact way. A couple of points were mentioned in the notes section, one of them was

our scheme allows encoding repeated choices, which are usually not legal ballots. For example, one can encode the choice 1,1,1 but that would be an illegal vote under all the voting systems we mentioned as examples. The fact that legal ballots are a subset of all possible encodings means that the encoding is necessarily suboptimal with respect to those requirements. We will look at an alternative scheme in the next post.

The alternative scheme we show below attempts to optimize the encoding by using a variable base as digits are encoded. The main idea is that as chosen integers in the list are encoded, the remaining ones are constrained since they cannot be repeated. Thus the base can be reduced as less choices are possible. In practice the process is complicated by the need to keep track of what digits correspond to what, as gaps form in the remaining integers and the corresponding meaning of digits changes. Here is a python implementation

def encode2(digits, choices = 9):
  radix = choices + 1
  present = range(0, radix)
  digits2 = deque()
  radix = radix - len(digits) + 2
  for d in digits:
    index = present.index(int(d))

  number = 0
  for d in digits2:
    number += d
    number = number * radix
    # print(d, radix, number)
    radix = radix + 1

  return number / (radix - 1)

def decode2(number, choices = 9):
  radix = choices + 1
  print("decode %d" % number)
  digits = deque()
  present = range(0, radix)
  tmp = number
  while(number > 0):
    # print(number, radix, number % radix)
    digits.append(number % radix)
    number = number / radix
    radix = radix - 1

  ret = deque()
  for d in digits:
    # print(d, present, ret)
    val = present[d - 0]

  return ret

and a sample session using both encoders

>>> execfile("")
>>> decode(encode([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], 21), 21)
decode 14606467545964956303452810
deque([1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L])
>>> decode2(encode2([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], 21), 21)
decode 36697695360790800022
deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

Note the shorter value produced by the second encoder

encode: 14606467545964956303452810

encode2: 36697695360790800022

Despite the reduction, the second encoder is not optimal (the first encoder is optimal given repeatable choices); the range of output numbers is larger than that of legal ballots. It would be interesting to see how to obtain the most compact solution, a detailed analysis could compare these schemes systematically to get quantitive measures of space efficiency.