In this article, we implement a purely functional algorithm for selecting a permutation of n elements at random. The main idea is that shuffling a list is *essentially the same* as sorting a list!
Table of contents
The standard algorithm for shuffling a deck of cards, or generating a random permutation of a finite set with n elements in general, is the Fisher-Yates shuffle. Conventionally, the elements are stored in a mutable array and then randomly swapped a few times. More specifically, they are swapped n times, so the algorithm runs in O(n) time. Of course, the point is that these swaps are chosen carefully such that each permutation is equally likely.
In a message to the Haskell beginners mailing list, Jan Snajder presented a short program that implements exactly this, with mutable arrays and all. His post made me wonder whether there is a purely functional way to do this, one without mutable arrays and indexes but with similar time complexity?
Yes, there is! The main idea is that shuffling a list is essentially the same as sorting a list; the minor difference being that the former chooses a permutation at random while the latter chooses a very particular permutation, namely the one that sorts the input. In other words, the idea is to take a generic sorting algorithm like merge sort and turn it into a shuffle algorithm by simply replacing each comparison with a random choice.
Thus, in the following article, we are going to transform merge sort into a random permutation generator.
Before we begin, a few words about random number generation. Jan used the IO monad to obtain random numbers, giving the shuffle algorithm the type
permute :: [a] -> IO [a]
This is both pretty and ugly at the same time; ugly in that IO
is used as ragbag for side effects that don’t seem to fit elsewhere, but pretty in that the usual procedure of carrying around and updating a random seed is hidden behind an abstraction. We will drop the ugly part by using a new monad R
permute :: [a] -> R [a]
which works like IO
but allows only random number creation. In other words, the only side effects available in R
are functions like
getRandomR :: (Int,Int) -> R Int
which generates a random integer in the given range.
Note that the restriction to random numbers gives R a
a meaning on its own; we can interpret R a
as “a collection of values a
of which one will be picked at random when the program is run” or for the mathematically inclined as “a random variable of type a
”.
You can implement it as type R a = IO a
if you like, but we will use the MonadRandom package and set
import Control.Monad.Random
type R a = Rand StdGen a
Samples can be generated with the functions evalRand
and evalRandIO
.
Recall how merge sort works: the list is divided in half, both halves are sorted individually and then merged to give the result. Let us capture the second phase of the algorithm with a designated data type SortedList a
that represents sorted lists. Of course, we can simply implement it as ordinary list, with the convention that the elements appear in order:
type SortedList a = [a]
empty :: SortedList a
empty = []
singleton :: a -> SortedList a
singleton x = [x]
merge :: Ord a => SortedList a -> SortedList a -> SortedList a
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys) =
if x <= y
then x : ( xs `merge` y:ys)
else y : (x:xs `merge` ys )
With these preliminaries, the whole algorithm reads:
mergesort :: Ord a => [a] -> SortedList a
mergesort = fromList
where
fromList [] = empty
fromList [x] = singleton x
fromList xs = (fromList l) `merge` (fromList r)
where (l,r) = splitAt (length xs `div` 2) xs
To generate random permutations, we now replace sorted lists with “random lists”, i.e. some kind of list structure that presents its elements in random order. The straightforward choice
type RandomList a = R [a]
will do the trick. Trivially, we have
empty :: RandomList a
empty = return []
singleton :: a -> RandomList a
singleton x = return [x]
Now, we want to merge random lists. This process is entirely analogous to merging sorted lists: we select the first element and then combine the tails with merge
. But instead of always putting the smallest element in front, we toss a coin and select an element at random. In Haskell pseudo code, this would look something like this:
merge (x:xs) (y:ys) = do
k <- getRandomR (1,2) -- toss a coin
if k == 1
then x : ( xs `merge` y:ys)
else y : (x:xs `merge` ys )
That being said, giving each list an equal probability of 1/2 is unfair when the lists have different sizes. For instance, if the first list has only 1 element and the second list is a random permutations of 5 elements, then the single element would have a probability of 1/2 to appear as first element in the result. In a fair permutation, this probability should be only 1/6.
The selection can be made fair if we weight the lists by their sizes. In our example, this means giving the first list a probability of 1/(5+1) and the second one a probability of 5/(5+1). By keeping track of the list sizes and adjusting the random selection accordingly, we obtain the following implementation:
merge :: RandomList a -> RandomList a -> RandomList a
merge rxs rys = do
xs <- rxs
ys <- rys
merge' (length xs, xs) (length ys, ys)
where
merge' (0 , []) (_ , ys) = return ys
merge' (_ , xs) (0 , []) = return xs
merge' (nx, x:xs) (ny, y:ys) = do
k <- getRandomR (1,nx+ny)
if k <= nx
then (x:) `liftM` ((nx-1, xs) `merge'` (ny, y:ys))
else (y:) `liftM` ((nx, x:xs) `merge'` (ny-1, ys))
Clearly, this function needs O(n) time to merge two random lists.
The final algorithm to generate a random permutation has exactly the same structure as merge sort:
permute :: [a] -> RandomList a
permute = fromList
where
fromList [] = empty
fromList [x] = singleton x
fromList xs = (fromList l) `merge` (fromList r)
where (l,r) = splitAt (length xs `div` 2) xs
And just like merge sort, this implementation will take O(n log n) time to generate a random permutation. After all, the new merge
has the same time complexity and the old analysis applies.
The time complexity may seem like a regress compared to O(n) for the Fisher-Yates shuffle, but the connection to sorting shows that without arrays, O(n log n) is optimal. After all, there are n! permutations in total, so we need at least O(n log n) bits to encode them and we will need equally many steps to pick one if we may only decide one bit at a time.
I hope you have enjoyed this excursion into sorting and functional programming. For your convenience, I have packaged the implementation of random lists into a tiny module.
I would like remark that it is Haskell’s first class treatment of computations like R a
that makes the RandomList
abstraction possible at all. Also, purity is a boon; I don’t think that an impure language treating random numbers as a side effect can convey the abstraction of random variables R a
this convincingly.
Merge sort is not the only sorting algorithm that can be turned into a random permutation generator. How about trying the same with quicksort?
The merge
function looks fair, but can you actually prove that? Probabilistic Functional Programming by Erwig and Kollmansberger to the rescue! Available on Hackage.
I’ll end with another challenging problem: how about an incremental algorithm that generates random elements on demand? For instance, we want
head `liftM` permute
to return a random element in O(n) time and
take k `liftM` permute
to return elements in say O(n + k log n) time.
For sorting, this question is explored in Quicksort and k-th Smallest Elements. Note that the merge sort implementation presented here is too inefficient at constructing the call tree for merge
.
Also, the RandomList
implementation has to be changed unless the random monad R
is sufficiently lazy. In the spirit of Inductive Reasoning about Effectful Data Types by Filinski and Støvring, try a variant of
data RandomList a = Nil | Cons (R (a, RandomList a))
to achieve the desired behavior.