An explanation of how the merge sort is actually some kind of heap sort. With additional laziness, this can be used to efficiently calculate prime numbers by merging an infinite list.

Table of contents

*This article is a bunch of mailing list posts for now.*

Thanks to lazy evaluation, the `mergesort`

function is actually some kind of heap sort, as shown in Quicksort and k-th minimum. With additional laziness, this can be used to efficiently create prime numbers.

The discussion was initiated by Dave Bayer and he also came up with the clever data structure that made `merge`

work properly for infinite lists. It has an amusing explanation in terms of “Very Important Persons”.

Here is a laziness-friendly mergesort:

```
mergesort [] = []
mergesort xs = foldtree1 merge $ map return xs
foldtree1 f [x] = x
foldtree1 f xs = foldtree1 f $ pairs xs
where
pairs [] = []
pairs [x] = [x]
pairs (x:x':xs) = f x x' : pairs xs
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys) =
if x <= y then x:merge xs (y:ys) else y:merge (x:xs) ys
```

Why does this work? The function `foldtree1`

folds the elements of a list as if they were in a binary tree:

```
foldrtree1 f [1,2,3,4,5,6,7,8]
==>
((1 `f` 2) `f` (3 `f` 4)) `f` ((5 `f` 6) `f` (7 `f` 8))
```

The important point is that this expression is constructed in O(n + n/2 + n/4 + n/8 + ..) = O(n) time. Now, given such a binary tree of `merge`

operations, getting the next list element of the result takes O(log n) time. Why? Because the merge in the middle gets the next element either from its left or its right subexpression which in turn gets its element from its left or its right subexpression and so on. Clearly, we only need logarithmically many steps to hit the bottom. Putting both together, we see that we have to pay O(n + k*log n) steps to build the expression and to fetch the first k elements.

Making this argument rigorous with a suitable debit invariant is left to the attentive reader :).

Dave Bayer wrote:

Learning Haskell, the Prelude.ShowS type stood out as odd, exploiting the implementation of lazy evaluation to avoid explicitly writing an efficient concatenable list data structure.

`ShowS`

has nothing to do with lazy evaluation, the same trick can be done in a call-by-value language. The idea is to represent a string `xs`

as a context `(xs ++ ¥)`

.

merge xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (merge xt ys) EQ -> x : (merge xt yt) GT -> y : (merge xs yt)

diff xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (diff xt ys) EQ -> diff xt yt GT -> diff xs yt

merge’ (x:xt) ys = x : (merge xt ys)

primes = ps ++ (diff ns $ foldr1 merge’ $ map f $ tail primes) where ps = [2,3,5] ns = [7,9..] f p = [ m*p | m <- [p,p+2..]]

The code is very fast for its size; I haven’t seen Haskell code posted on the web that comes close, and it is faster than any of my other tries (I posted this code to http://www.haskell.org/haskellwiki/Prime_numbers). Effectively, it steals a heap data structure out of thin air by exploiting the implementation of lazy evaluation. It would seem that GHC’s core data structures are coded closer to the machine that anything I can write

inHaskell. So much for studying how to explicitly write a good heap!

While your observation that merge may create an implicit heap is true, it doesn’t happen in your code :). When unfolding the `foldr1`

, we get something like

`2:.. `merge'` (3:.. `merge'` (5:.. `merge1` (...)))`

i.e. just a linear chain of merges. Retrieving the least element is linear time in the worst case. This shape will not change with subsequent reductions of merge. In other words, it’s the responsibility of `fold`

to build a heap. Mergesort shows how a fold can build a heap:

```
http://thread.gmane.org/gmane.comp.lang.haskell.general/15007
(Replicated above)
```

For primes , the heap shape has to be chosen carefully in order to ensure termination. It’s the same problem that forces you to use `foldr1 merge'`

instead of `foldr1 merge`

.

There’s also a long thread about prime sieves

`http://thread.gmane.org/gmane.comp.lang.haskell.cafe/19699`

Regards,

apfelmus

*In answer to: message 27612 *

Dave Bayer wrote:

What I’m calling a “venturi”

venturi :: Ord a => [[a]] -> [a]

merges an infinite list of infinite lists into one list, under the assumption that each list, and the heads of the lists, are in increasing order.

I wrote this as an experiment in coding data structures in Haskell’s lazy evaluation model, rather than as explicit data. The majority of the work done by this code is done by “merge”; the multiples of each prime percolate up through a tournament consisting of a balanced tree of suspended “merge” function calls. In order to create an infinite lazy balanced tree of lists, the datatype

data List a = A a (List a) | B [a]

is used as scaffolding. One thinks of the root of the infinite tree as starting at the leftmost child, and crawling up the left spine as necessary.

After some pondering, the List a data structure for merging is really ingenious! :) Here’s a try to explain how it works:

The task is to merge a number of sorted and infinite lists when it’s known that their heads are in increasing order. In particular, we want to write

`primes = (2:) $ diff [3..] $ venturi $ map multiple primes`

Thus, we have a (maybe infinite) list

`xss = [xs1, xs2, xs3, ...]`

of infinite lists with the following properties

```
all sorted xss
sorted (map head xss)
```

where sorted is a function that returns `True`

if the argument is a sorted list. A first try to implement the merging function is

```
venturi xss = foldr1 merge xss
= xs1 `merge` (xs2 `merge` (xs3 `merge` ...
```

where `merge`

is the standard function to merge two sorted lists.

However, there are two problems. The first problem is that this doesn’t work for infinite lists since merge is strict in both arguments. But the property `head xs1 < head xs2 < head xs3 < ...`

we missed to exploit yet can now be used in the following way

```
venturi xss = foldr1 merge' xss
merge' (x:xt) ys = x : merge xt ys
```

In other words, `merge'`

is biased towards the left element

`merge' (x:_|_) _|_ = x : _|_`

which is correct since we know that `head xs < head ys`

.

The second problem is that we want the calls to `merge`

to be arranged as a balanced binary tree since that gives an efficient heap. It’s not so difficult to find a good shape for the infinite tree, the real problem is to adapt `merge'`

to this situation since it’s not associative:

```
merge' xs (merge' (y:_|_) _|_) = merge' xs (y:_|_) = x1:x2:..:y:_|_
=/=
merge' (merge' xs (y:_|_)) _|_ = merge' (x1:x2:...:y:_|_) _|_
= x1:_|_
```

The problem is that the second form doesn’t realize that `y`

is also smaller than the third argument. In other words, the second form has to treat more than one element as “privileged”, namely `x1,x2,...`

and `y`

. This can be done with the aforementioned list data structure

`data People a = VIP a (People a) | Crowd [a]`

The people (VIPs and crowd) are assumed to be *sorted*. Now, we can start to implement

`merge' :: Ord a => People a -> People a -> People a`

The first case is

`merge' (VIP x xt) ys = VIP x (merge' xt ys)`

In other words, the invariant is that every VIP on the left of a `merge'`

is guaranteed to be smaller than anyone on the right and thus will be served first. The next case

`merge' (Crowd xs) (Crowd ys) = Crowd (merge xs ys)`

is clear since it doesn’t involve the invariant. What about the last case

`merge' (Crowd xs) (VIP y yt) = ??`

Here, someone from the crowd `xs`

may be smaller than `y`

. But should we return a crowd or a VIP? The crucial answer is to always return a VIP

```
merge' xs@(Crowd (x:xt)) ys@(VIP y yt)
| x <= y = VIP x (merge' (Crowd xt) ys)
| otherwise = VIP y (merge' xs yt)
```

because doing otherwise would turn a VIP into a member of some crowd. But turning `x`

into a VIP is no problem since that doesn’t violated the invariant.

Now `merge'`

is associative and everything works as we want.

Some HTML formatting is allowed.