Random Permutations and Sorting

Introduction

The standard algorithm for shuffling a deck of cards, or generating a random permutation of a finite set with n elements in general, is the Fisher-Yates shuffle. Conventionally, the elements are stored in a mutable array and then randomly swapped a few times. More specifically, they are swapped n times, so the algorithm runs in O(n) time. Of course, the point is that these swaps are chosen carefully such that each permutation is equally likely.

In a message to the Haskell beginners mailing list, Jan Snajder presented a short program that implements exactly this, with mutable arrays and all. His post made me wonder whether there is a purely functional way to do this, one without mutable arrays and indexes but with similar time complexity?

Yes, there is! The main idea is that shuffling a list is essentially the same as sorting a list; the minor difference being that the former chooses a permutation at random while the latter chooses a very particular permutation, namely the one that sorts the input. In other words, the idea is to take a generic sorting algorithm like merge sort and turn it into a shuffle algorithm by simply replacing each comparison with a random choice.

Thus, in the following article, we are going to transform merge sort into a random permutation generator.

Random Numbers

Before we begin, a few words about random number generation. Jan used the IO monad to obtain random numbers, giving the shuffle algorithm the type

permute :: [a] -> IO [a]

This is both pretty and ugly at the same time; ugly in that IO is used as ragbag for side effects that don’t seem to fit elsewhere, but pretty in that the usual procedure of carrying around and updating a random seed is hidden behind an abstraction. We will drop the ugly part by using a new monad R

permute :: [a] -> R [a]

which works like IO but allows only random number creation. In other words, the only side effects available in R are functions like

getRandomR :: (Int,Int) -> R Int

which generates a random integer in the given range.

Note that the restriction to random numbers gives R a a meaning on its own; we can interpret R a as “a collection of values a of which one will be picked at random when the program is run” or for the mathematically inclined as “a random variable of type a”.

You can implement it as type R a = IO a if you like, but we will use the MonadRandom package and set

import Control.Monad.Random
type R a = Rand StdGen a

Samples can be generated with the functions evalRand and evalRandIO.

Merge Sort

Recall how merge sort works: the list is divided in half, both halves are sorted individually and then merged to give the result. Let us capture the second phase of the algorithm with a designated data type SortedList a that represents sorted lists. Of course, we can simply implement it as ordinary list, with the convention that the elements appear in order:

type SortedList a = [a]

empty :: SortedList a
empty = []

singleton :: a -> SortedList a
singleton x = [x]

merge :: Ord a => SortedList a -> SortedList a -> SortedList a
merge []     ys     = ys
merge xs     []     = xs
merge (x:xs) (y:ys) =
    if x <= y
        then  x : ( xs  `merge` y:ys)
        else  y : (x:xs `merge`  ys )

With these preliminaries, the whole algorithm reads:

mergesort :: Ord a => [a] -> SortedList a
mergesort = fromList
    where
    fromList []  = empty
    fromList [x] = singleton x
    fromList xs  = (fromList l) `merge` (fromList r)
        where (l,r) = splitAt (length xs `div` 2) xs

Random Lists

To generate random permutations, we now replace sorted lists with “random lists”, i.e. some kind of list structure that presents its elements in random order. The straightforward choice

type RandomList a = R [a]

will do the trick. Trivially, we have

empty :: RandomList a
empty = return []

singleton :: a -> RandomList a
singleton x = return [x]

Merge

Now, we want to merge random lists. This process is entirely analogous to merging sorted lists: we select the first element and then combine the tails with merge. But instead of always putting the smallest element in front, we toss a coin and select an element at random. In Haskell pseudo code, this would look something like this:

merge (x:xs) (y:ys) = do
    k <- getRandomR (1,2)   -- toss a coin
    if k == 1
        then  x : ( xs  `merge` y:ys)
        else  y : (x:xs `merge`  ys )

That being said, giving each list an equal probability of 1/2 is unfair when the lists have different sizes. For instance, if the first list has only 1 element and the second list is a random permutations of 5 elements, then the single element would have a probability of 1/2 to appear as first element in the result. In a fair permutation, this probability should be only 1/6.

The selection can be made fair if we weight the lists by their sizes. In our example, this means giving the first list a probability of 1/(5+1) and the second one a probability of 5/(5+1). By keeping track of the list sizes and adjusting the random selection accordingly, we obtain the following implementation:

merge :: RandomList a -> RandomList a -> RandomList a
merge rxs rys = do
        xs <- rxs
        ys <- rys
        merge' (length xs, xs) (length ys, ys)
    where
    merge' (0 , [])   (_ , ys)   = return ys
    merge' (_ , xs)   (0 , [])   = return xs
    merge' (nx, x:xs) (ny, y:ys) = do
        k <- getRandomR (1,nx+ny)
        if k <= nx
            then (x:) `liftM` ((nx-1, xs) `merge'` (ny, y:ys))
            else (y:) `liftM` ((nx, x:xs) `merge'` (ny-1, ys))

Clearly, this function needs O(n) time to merge two random lists.

Permute

The final algorithm to generate a random permutation has exactly the same structure as merge sort:

permute :: [a] -> RandomList a
permute = fromList
    where
    fromList []  = empty
    fromList [x] = singleton x
    fromList xs  = (fromList l) `merge` (fromList r)
        where (l,r) = splitAt (length xs `div` 2) xs

And just like merge sort, this implementation will take O(n log n) time to generate a random permutation. After all, the new merge has the same time complexity and the old analysis applies.

The time complexity may seem like a regress compared to O(n) for the Fisher-Yates shuffle, but the connection to sorting shows that without arrays, O(n log n) is optimal. After all, there are n! permutations in total, so we need at least O(n log n) bits to encode them and we will need equally many steps to pick one if we may only decide one bit at a time.

Where to go from here

I hope you have enjoyed this excursion into sorting and functional programming. For your convenience, I have packaged the implementation of random lists into a tiny module.

I would like remark that it is Haskell’s first class treatment of computations like R a that makes the RandomList abstraction possible at all. Also, purity is a boon; I don’t think that an impure language treating random numbers as a side effect can convey the abstraction of random variables R a this convincingly.

Quicksort

Merge sort is not the only sorting algorithm that can be turned into a random permutation generator. How about trying the same with quicksort?

Fairness

The merge function looks fair, but can you actually prove that? Probabilistic Functional Programming by Erwig and Kollmansberger to the rescue! Available on Hackage.

Incremental Generation

I’ll end with another challenging problem: how about an incremental algorithm that generates random elements on demand? For instance, we want

head `liftM` permute

to return a random element in O(n) time and

take k `liftM` permute

to return elements in say O(n + k log n) time.

For sorting, this question is explored in Quicksort and k-th Smallest Elements. Note that the merge sort implementation presented here is too inefficient at constructing the call tree for merge.

Also, the RandomList implementation has to be changed unless the random monad R is sufficiently lazy. In the spirit of Inductive Reasoning about Effectful Data Types by Filinski and Støvring, try a variant of

data RandomList a = Nil | Cons (R (a, RandomList a))

to achieve the desired behavior.