This article was first published in issue 15 of The Monad.Reader.
Another monad tutorial? Oh my god, why!? Fear not, this article is aimed at Haskellers who are already familiar with monads, though I have of course tried to keep the material as accessible as possible; the first two sections may serve as an initial introduction to monads and the monad laws for the brave.
In this tutorial, I would like to present monads from the viewpoint of operational semantics and how it makes designing and implementing new monads a piece of cake. Put differently, s > (a,s)
is not the only way to implement the state monad and this tutorial aims to present a much more systematic way. I think it is still regrettably underused, hence this text.
The main idea is to view monads as a sequence of instructions to be executed by a machine, so that the task of implementing monads is equivalent to writing an interpreter. The introductory example will be a stack automaton, followed by a remark on a monad for random numbers. Then, to showcase the simplicity of this approach, we will implement backtracking parser combinators, culminating in a straightforward breadthfirst implementation equivalent to Claessen’s parallel parsing processes.
For those in the know, I’m basically going to present the principles of Chuankai Lin’s Unimo paper. The approach is neither new nor unfamiliar; for example, John Hughes already used it to derive the state monad. But until I read Lin’s paper, I did not understand how valuable it is when done systematically and in Haskell. Ryan Ingram’s MonadPrompt
package is another recent formulation.
To encourage reuse, I have also released a package operational
on hackage which collects the generic bits of these ideas in a small library. For convenient study, the source code from each section of this article is also available.
Our introductory example will be a stack machine, i.e. an imperative minilanguage featuring two instructions push
and pop
for pushing and popping values onto and from a stack.
In other words, I have imperative programs like the following in mind:
push 5; push 42; pop;
Instructions are separated by semicolons. As shown in the following picture, this program first puts the number 5
on the stack, then puts the number 42
on top of the stack and proceeds to remove it again.
How can we embed such programs into Haskell?
First we need some way of representing the program text, for instance as a list of instructions:
type Program instr = [instr]
type StackProgram = Program StackInstruction
data StackInstruction = Push Int  Pop
Our example is represented as
example = Push 5 : Push 42 : Pop : []
In a sense, the colon (:)
for building lists takes the role of the semicolon for sequencing instructions.
Note that this representation gives us a very convenient tool for assembling bigger programs from smaller subprograms: list concatenation (++)
. For instance,
exampleTwice = example ++ example
= Push 5 : Push 42 : Pop : Push 5 : Push 42 : Pop : []
is a program that executes example
twice. Together with the empty program
empty = []
concatenation obeys the following three wellknown laws:
empty ++ is = is  left unit
is ++ empty = is  right unit
(is ++ js) ++ ks = is ++ (js ++ ks)  associativity
which seem almost too evident to be worth mentioning. For example, it is customary to leave out the parenthesis in the last line altogether.
Once accustomed to the notion of programs and (++)
to combine them, the special case of single instructions and (:)
for sequencing them is unnecessary. The user of our language does not care that we deem push
and pop
to be primitive operations but not, for example, the program
replace a = Pop : Push a : []
which replaces the topmost stack element with a
; he is entirely content to be given two programs
push :: Int > StackProgram
pop :: StackProgram
and two general combinators for building new ones
empty :: StackProgram
(++) :: StackProgram > StackProgram > StackProgram
without any mention of the distinction between single instruction and compound program. Their difference is but an implementation detail.
Well, to be entirely content, the user also needs a way to run programs. In particular, we need to implement a function interpret
that maps the program text to its intended meaning, here a function that transforms a stack of integers.
type Stack a = [a]
interpret :: StackProgram > (Stack Int > Stack Int)
The implementation follows the style of operational semantics: inspect the first instruction, change the stack accordingly, and recursively proceed with the remaining list of instructions is
:
interpret (Push a : is) stack = interpret is (a : stack)
interpret (Pop : is) stack = interpret is (tail stack)
interpret [] stack = stack
“All well and good, but why all the fuss with ‘monads’ then, when lists of instructions will do?” you may ask. Alas, the problem is of course that lists won’t do! We forgot something very important: our programs are completely unable to inspect values from the stack.
For instance, how to write a program that pops the two topmost values and pushes their sum onto the stack? Clearly, we want something like
a < pop;
b < pop;
push (a+b);
where each pop
returns the element just removed and the arrow <
binds it to a variable. But binding variables is simply impossible to express with our current representation of programs as lists of instructions.
Well, if ordinary lists of instructions are not enough to represent programs that involve binding variables like
a < pop; b < pop; push (a+b);
then let’s invent some fancy kind of list of instructions that will! The following presentation will be in close analogy to the structure of the previous section.
First, if we want to interpret pop
as a function that returns something, we had better label it with the type of the value returned! Hence, instead of a plain type
Pop :: StackInstruction
we need an additional type argument
Pop :: StackInstruction Int
which indicates that the Pop
instruction somehow returns a value of type Int
.
For simplicity, we attribute a return type to push
as well, even though it doesn’t really return anything. This can modeled just fine with the unit type ()
.
Push 42 :: StackInstruction ()
Push :: Int > StackInstruction ()
Putting both together, our type of instructions will become
data StackInstruction a where
Pop :: StackInstruction Int
Push :: Int > StackInstruction ()
If this syntax is alien to you: this is a Generalized Algebraic Data Type (GADT) which allows us to define a data type by declaring the types of its constructors directly. As of Haskell 2010, GADTs are not yet part of the language standard, but they are supported by GHC.
Like instructions, we also have to annotate programs with their return type, so that the definition for StackProgram
becomes
data Program instr a where ...
type StackProgram a = Program StackInstruction a
As before, instr
is the type of instructions, whereas a
is the newly annotated return type.
How to represent the binding of variables? Lambda abstractions will do the trick; imagine the following:
take a binding 
a < pop; rest

turn the arrow to the right 
pop > a; rest

and use a lambda expression to move it past the semicolon 
pop; \a > rest

Voila, the last step can be represented in Haskell, with a constructor named Then
taking the role of the semicolon:
Pop `Then` \a > rest
The idea is that Then
plugs the value returned by pop
into the variable a
. By the way, this is akin to how let
expressions can be expressed as lambda abstractions in Haskell:
let a = foo in bar <=> (\a > bar) foo
Anyway, our motivating example can now be represented as
example2 = Pop `Then` (\a > Pop `Then`
(\b > Push (a+b) `Then` Return))
where Return
represents the empty program which we will discuss in a moment. Remember that parentheses around the lambda expressions are optional, so we can also write
example2 = Pop `Then` \a >
Pop `Then` \b >
Push (a+b) `Then`
Return
It is instructive to think about the type of Then
. It has to be
Then :: instr a > (a > Program instr b) > Program instr b
Except for the return type a
in instr a
and the lambda abstraction, this is entirely analogous to the “cons” operation (:)
for lists.
The empty program, corresponding to the empty list []
, is best represented by a constructor
Return :: a > Program instr a
that is not “entirely empty” but rather denotes a trivial instruction that just returns the given value a
(hence the name). This is very useful, since we can now choose return values freely. For instance,
example3 = Pop `Then` \a > Pop `Then` \b > Return (a*b)
is a program that pops two values from the stack but whose return value is their product.
Taking everything together, we obtain a fancy list of instructions, once again a GADT:
data Program instr a where
Then :: instr a > (a > Program instr b) > Program instr b
Return :: a > Program instr a
And specialized to our stack machine language, we get
type StackProgram a = Program StackInstruction a
Before thinking thinking further about our new representation, let’s first write the interpreter to see the stack machine in action. This time, however, we are not interested in the final stack, only in the value returned.
interpret :: StackProgram a > (Stack Int > a)
interpret (Push a `Then` is) stack = interpret (is ()) (a:stack)
interpret (Pop `Then` is) (b:stack) = interpret (is b ) stack
interpret (Return c) stack = c
The implementation is like the previous one, except that now, we also have to pass the return values like ()
and b
to the remaining instructions is
.
Our example program executes as expected:
GHCi> interpret example3 [7,11]
77
Just as with lists, we can build large programs by concatenating smaller subprograms. And as before, we don’t want the user to bother with the distinction between single instruction and compound program.
We begin with the latter: the function
singleton :: instr a > Program instr a
singleton i = i `Then` Return
takes the role of \x > [x]
and helps us blur the line between program and instructions:
pop :: StackProgram Int
push :: Int > StackProgram ()
pop = singleton Pop
push = singleton . Push
Now, we define the concatenation operator (often dubbed “bind”) that glues two programs together:
(>>=) :: Program i a > (a > Program i b) > Program i b
(Return a) >>= js = js a
(i `Then` is) >>= js = i `Then` (\a > is a >>= js)
Apart from the new symbol (>>=)
and the new type signature, the purpose and implementation is entirely analogous to (++)
. And as before, together with the empty program,
return = Return
it obeys three evident laws
return a >>= is = is a  left unit
is >>= return = is  right unit
(is >>= js) >>= ks = is >>= (\a > js a >>= ks)  associativity
also called the monad laws. Since we need to pass return values, the laws are slightly different from the concatenation laws for ordinary lists, but their essence is the same.
The reason that these equations are called the “monad laws” is that any data type supporting two such operations and obeying the three laws is called a monad. In Haskell, monads are assembled in the type class Monad
, so we’d have to make an instance
instance Monad (Program instr) where
(>>=) = ...
return = ...
This is similar to lists which are said to constitute a monoid.
We conclude the first part of this tutorial by remarking that the (>>=)
operator is the basis for many other functions that build big programs from small ones; these can be found in the Control.Monad
module and are described elsewhere.
Those familiar with the state monad will recognize that the whole stack machine was just
State (Stack Int)
in disguise. But surprisingly, we haven’t used the pattern s > (a,s)
for threading state anywhere! Instead, we were able to implement the equivalent of
evalState :: State s > (s > a)
directly, even though the type s > a
by itself is too “weak” to serve as an implementation of the state monad.
This is a very general phenomenon and it is of course the main benefit of the operational viewpoint and the new Program instr a
type. No matter what we choose as interpreter function or instruction set, the monad laws for (>>=)
and return
will always hold, for they are entirely independent of these choices. This makes it much easier to define and implement new monads and the remainder of this article aims to give a taste of its power.
A first advantage of the operational approach is that it allows us to equip one and the same monad with multiple interpreters. We’ll demonstrate this flexibility with an example monad Random
that expresses randomness and probability distributions.
The ability to write multiple interpreters is also very useful for implementing games, specifically to account for both human and computer opponents as well as replaying a game from a script. This is what prompted Ryan Ingram to write his MonadPrompt
package.
At the heart of random computations is a type Random a
which denotes random variables taking values in a
. Traditionally, the type a
would be a numeric type like Int
, so that Random Int
denotes “random numbers”. But for the Haskell programmer, it is only natural to generalize it to any type a
. This generalization is also very useful, because it reveals hidden structure: it turns out that Random
is actually a monad.
There are two ways to implement this monad: one way is to interpret random variables as a recipe for creating pseudorandom values from a seed, which is commonly written
type Random a = StdGen > (a,StdGen)
The other is to view them as a probability distribution, as for example expressed in probabilistic functional programming as
type Probability = Double
type Random a = [(a,Probability)]
Traditionally, we’d have to choose between one way or the other depending on the application. But with the operational approach, we can have our cake and eat it, too! The two ways of implementing random variables can be delegated to two different interpreter functions for one and the same monad Random
.
For demonstration purposes, we represent Random
as a language with just one instruction uniform
that randomly selects an element from a list with uniform probability
type Random a = Program RandomInstruction a
data RandomInstruction a where
Uniform :: [a] > RandomInstruction a
uniform :: [a] > Random a
uniform = singleton . Uniform
For example, a roll of a die is modeled as
die :: Random Int
die = uniform [1..6]
and the sum of two dice rolls is
sum2Dies = die >>= \a > die >>= \b > return (a+b)
Now, the two different interpretations are: sampling a random variable by generating pseudorandom values
sample :: Random a > StdGen > (a,StdGen)
sample (Return a) gen = (a,gen)
sample (Uniform xs `Then` is) gen = sample (is $ xs !! k) gen'
where (k,gen') = System.Random.randomR (0,length xs1) gen
and calculating its probability distribution
distribution :: Random a > [(a,Probability)]
distribution (Return a) = [(a,1)]
distribution (Uniform xs `Then` is) =
[(a,p/n)  x < xs, (a,p) < distribution (is x)]
where n = fromIntegral (length xs)
Truth to be told, the distribution
interpreter has a flaw, namely that it never tallies the probabilities of equal outcomes. That’s because this would require an additional Eq a
constraint on the types of return
and (>>=)
, which is unfortunately not possible with the current Monad
type class. A workaround for this known limitation can be found in the norm
function from the paper on probabilistic functional programming.
Now, it is time to demonstrate that the operational viewpoint also makes the implementation of otherwise advanced monads a piece of cake. Our example will be monadic parser combinators and for the remainder of this article, I will assume that you are somewhat familiar with them already. The goal will be to derive an implementation of Koen Claessen’s ideas from scratch.
At their core, monadic parser combinators are a monad Parser
with just three primitives:
symbol :: Parser Char
mzero :: Parser a
mplus :: Parser a > Parser a > Parser a
which represent
respectively. (The last two operations define the MonadPlus
type class.) Furthermore, we need an interpreter, i.e. a function
interpret :: Parser a > (String > [a])
that runs the parser on the string and returns all successful parses.
The three primitives are enough to express virtually any parsing problem; here is an example of a parser number
that recognizes integers:
satisfies p = symbol >>= \c > if p c then return c else mzero
many p = return [] `mplus` many1 p
many1 p = liftM2 (:) p (many p)
digit = satisfies isDigit >>= \c > return (ord c  ord '0')
number = many1 digit >>= return . foldl (\x d > 10*x + d) 0
The instruction set for our parser language will of course consist of these three primitive operations:
data ParserInstruction a where
Symbol :: ParserInstruction Char
MZero :: ParserInstruction a
MPlus :: Parser a > Parser a > ParserInstruction a
type Parser a = Program ParserInstruction a
A straightforward implementation of interpret
looks like this:
interpret :: Parser a > String > [a]
interpret (Return a) s = if null s then [a] else []
interpret (Symbol `Then` is) s = case s of
c:cs > interpret (is c) cs
[] > []
interpret (MZero `Then` is) s = []
interpret (MPlus p q `Then` is) s =
interpret (p >>= is) s ++ interpret (q >>= is) s
For each instruction, we specify the intended effects, often calling interpret
recursively on the remaining program is
. In prose, the four cases are
Return
at the end of a program will return a result if the input was parsed completely.Symbol
reads a single character from the input stream if available and fails otherwise.MZero
returns an empty result immediately.MPlus
runs two parsers in parallel and collects their results.The cases for MZero
and MPlus
are a bit roundabout; the equations
interpret mzero = \s > []
interpret (mplus p q) = \s > interpret p s ++ interpret q s
express our intention more plainly. Of course, these two equations do not constitute valid Haskell code for we may not pattern match on mzero
or mplus
directly. The only thing we may pattern match on is a constructor, for example like this
interpret (Mplus p q `Then` is) = ...
But even though our final Haskell code will have this form, this does not mean that jotting down the left hand side and thinking hard about the ...
is the best way to write Haskell code. No, we should rather use the full power of purely functional programming and use a more calculational approach, deriving the pattern matches from more evident equations like the ones above.
In this case, we can combine the two equations with the MonadPlus
laws
mzero >>= m = mzero
mplus p q >>= m = mplus (p >>= m) (q >>= m)
which specify how mzero
and mplus
interact with (>>=)
, to derive the desired pattern match
interpret (Mplus p q `Then` is)
= { definition of concatenation and mplus }
interpret (mplus p q >>= is)
= { MonadPlus law }
interpret (mplus (p >>= is) (q >>= is))
= { intended meaning }
\s > interpret (p >>= is) s ++ interpret (q >>= is) s
Now, in light of the first step of this derivation, I even suggest to forget about constructors entirely and instead regard
interpret (mplus p q >>= is) = ...
as “valid” Haskell code; after all, it is straightforwardly converted to a valid pattern match. In other words, it is once again beneficial to not distinguish between single instructions and compound programs, at least in notation.
Unfortunately, our first implementation has a potential space leak, namely in the case
interpret (MPlus p q `Then` is) s =
interpret (p >>= is) s ++ interpret (q >>= is) s
The string s
is shared by the recursive calls and has to be held in memory for a long time.
In particular, the implementation will try to parse s
with the parser p >>= is
first, and then backtrack to the beginning of s
to parse it again with the second alternative q >>= is
. That’s why this is called a depthfirst or backtracking implementation. The string s
has to be held in memory as long the second parser has not started yet.
To ameliorate the space leak, we would like to create a breadthfirst implementation, one which does not try alternative parsers in sequence, but rather keeps a collection of all possible alternatives and advances them at once.
How to make this precise? The key idea is the following equation:
(symbol >>= is) `mplus` (symbol >>= js)
= symbol >>= (\c > is c `mplus` js c)
When the parsers on both sides of mplus
are waiting for the next input symbol, we can group them together and make sure that the next symbol will be fetched only once from the input stream.
Clearly, this equation readily extends to more than two parsers, like for example
(symbol >>= is) `mplus` (symbol >>= js) `mplus` (symbol >>= ks)
= symbol >>= (\c > is c `mplus` js c `mplus` ks c)
and so on.
We want to use this equation as a function definition, mapping the left hand side to the right hand side. Of course, we can’t do so directly because the left hand side is not one of the four patterns we can match upon. But thanks to the MonadPlus
laws, what we can do is to rewrite any parser into this form, namely with a function
expand :: Parser a > [Parser a]
expand (MPlus p q `Then` is) = expand (p >>= is) ++
expand (q >>= is)
expand (MZero `Then` is) = []
expand x = [x]
The idea is that expand
fulfills
foldr mplus mzero . expand = id
and thus turns a parser into a list of summands which we now can pattern match upon. In other words, this function expands parsers matching mzero >>= is
and mplus p q >>= is
until only summands of the form symbol >>= is
and return a
remain.
With the parser expressed as a big “sum”, we can now apply our key idea and group all summands of the form symbol >>= is
; and we also have to take care of the other summands of the form return a
. The following definition will do the right thing:
interpret :: Parser a > String > [a]
interpret p = interpret' (expand p)
where
interpret' :: [Parser a] > String > [a]
interpret' ps [] = [a  Return a < ps]
interpret' ps (c:cs) = interpret'
[p  (Symbol `Then` is) < ps, p < expand (is c)] cs
Namely, how to handle each of the summands depends on the input stream:
symbol >>= is
will proceed, the other parsers have ended prematurely.return x
have parsed the input correctly, and their results are to be returned.That’s it, this is our breadthfirst interpreter, obtained by using laws and equations to rewrite instruction lists. It is equivalent to Koen Claessen’s implementation.
As an amusing last remark, I would like to mention that our calculations can be visualized as high school algebra if we ignore that (>>=)
has to pass around variables, as shown in the following table:
Term  Mathematical operation 

return 
1 
(>>=) 
× multiplication 
mzero 
0 
mplus 
+ addition 
symbol 
x indeterminate 
For example, our key idea corresponds to the distributive law
x × a + x × b = x × (a + b)
and the monad and MonadPlus
laws have wellknown counterparts in algebra as well.
I hope I have managed to convincingly demonstrate the virtues of the operational viewpoint with my choice of examples.
There are many other advanced monads whose implementations also become clearer when approached this way, such as the list monad transformer (where the naive m [a]
is known not to work), Oleg Kiselyov’s LogicT
, Koen Claessen’s poor man’s concurrency monad, as well coroutines like Peter Thiemann’s ingenious WASH which includes a monad for tracking session state in a web server.
The operational
package includes a few of these examples.
Traditionally, the continuation monad transformer
data Cont m a = Cont { runCont :: forall b. (a > m b) > m b }
has been used to implement these advanced monads. This is no accident; both approaches are capable of implementing any monad. In fact, they are almost the same thing: the continuation monad is the refunctionalization of instructions as functions
\k > interpret (Instruction `Then` k)
But alas, I think that this unfortunate melange of instruction, interpreter and continuation does not explain or clarify what is going on; it is the algebraic data type Program
that offers a clear notion of what a monad is and what it means to implement one. Hence, in my opinion, the algebraic data type should be the preferred way of presenting new monads and also of implementing them, at least before program optimizations.
Actually, Program
is not a plain algebraic data type, it is a generalized algebraic data type. It seems to me that this is also the reason why the continuation monad has found more use, despite being conceptually more difficult: GADTs simply weren’t available in Haskell. I believe that the Program
type is a strong argument to include GADTs into a future Haskell standard.
Compared to specialized implementations, like for example s > (a,s)
for the state monad, the operational approach is not entirely without drawbacks.
First, the given implementation of (>>=)
has the same quadratic running time problem as (++)
when used in a leftassociative fashion. Fortunately, this can be ameliorated with a different (fancy) list data type; the operational
library implements one.
Second, and this cannot be ameliorated, we lose laziness. The state monad represented as s > (a,s)
can cope with some infinite programs like
evalState (sequence . repeat . state $ \s > (s,s+1)) 0
whereas the list of instructions approach has no hope of ever handling that, since only the very last Return
instruction can return values.
I also think that this loss of laziness also makes value recursion a la MonadFix
very difficult.
After some initial programming experience in Pascal, Heinrich Apfelmus picked up Haskell and purely functional programming just at the dawn of the new millenium. He has never looked back ever since, for he not only admires Haskell’s mathematical elegance, but also its practicality in personal life. For instance, he was always too lazy to tie knots, but that has changed and he now accepts shoe laces instead of velcro.
]]>This article previously appeared in issue 14 of The Monad.Reader.
In a post to reddit, user CharlieDancey presented a challenge to write a short and clever morse code decoder. Of course, what could be more clever than writing it in Haskell? ;)
In the following, I’ll present a series of solutions that gradually include dichotomic search (the straightforward generalization of binary search), stackbased languages or reverse polish notation, and finally deforestation (also called fusion), an optimization technique specific to purely functional languages like Haskell.
For your convenience, source code for the individual sections is available.
Morse code was designed to be produced and read, or rather heard by humans, who have to memorize the following table:
To write a program that decodes sequences of dots and dashes like
  .. ... . ..  .. .
into letters, we will have to store this table in some form. For instance, we could store each letter and corresponding morse code in a big association list
type MorseCode = String
dict :: [(MorseCode, Char)]
dict =
[("." , 'A'),
("..." , 'B'),
(".." , 'C'),
...
which we name dict
because it’s a dictionary translating between the latin alphabet and morse code. To decode a letter, we simply browse through this list to determine whether it contains a given code of dots and dashes
decodeLetter :: MorseCode > Char
decodeLetter code = maybe ' ' id $ lookup code dict
Decoding a whole sequence of letters is done with
decode = map decodeLetter . words
so that we have for example
> decode "  .. ... . ..  .. ."
"MORSECODE"
Of course, association lists are a rather slow data structure. A binary search tree, trie, or hash table would be a better choice. But fortunately, it is very easy to change data structures in Haskell. All you have to do is to a qualified import of a different module, such as Data.Map, and change the definition of dict
to
dict :: Data.Map.Map MorseCode Char
dict = Data.Map.fromList $
[("." , 'A'),
("..." , 'B'),
(".." , 'C'),
...
and use Data.Map.lookup
instead of Data.List.lookup
.
Faithfully replicating the morse code table is clear and preferable, but makes for quite large source code. So, let’s make an exception today and think of something as clever as we can.
The idea is the following: whenever an encoded letter starts with a dash ''
, we know that it can’t be for example an A or E, because those start with a dot '.'
. And if the next symbol is a dot, then it can’t be a G or M because their second symbol is a dash. With each symbol we read, more and more alternatives disappear until we are left with a single alternative. We can depict this with a binary tree:
At each node, the left subtree contains all the letters that can be obtained by adding a dot to the current letter, while the right subtree contains those letters that can be obtained with a dash. This is illustrated by means of a dotted line connected to the left subtree and a dashed line connected to the right subtree.
Now, to decode a letter, we simply start at the root of the tree and follow the dotted or dashed lines depending on the symbols read. For instance, to decode the sequence "..."
, we have to go right once and then left thrice, ending up at the letter B. This procedure is also called dichotomic search because at each point in our search for the right letter, we ask a yes/no question “Dot or dash?” (a dichotomy) that partitions the remaining search space into to disjoint parts.
Let’s implement this in Haskell. First, we assume that our dictionary is given as a tree
data Tree a = Leaf
 Branch { tag :: a
, left :: Tree a
, right :: Tree a }
dict :: Tree Char
dict = Branch ' ' (Branch 'E' ... ) (Branch 'T' ...)
Of course, writing out the dict
tree in full is going to be repetitive and boring, but let’s just imagine we have already done that. Then, decoding a letter by walking down the tree is simply a left fold over the code word
decodeLetter = tag . foldl (flip step) dict
where
step '.' = left
step '' = right
In this case, the accumulated value of the fold is the current subtree, although the term “accumulate” is a bit misleading in that we don’t make the dictionary bigger; we make it smaller by passing to a subtree.
The curious reader may notice that we have actually implemented a trie.
Now, let’s reduce the source code needed for the dictionary tree. In Haskell, we’d have to write something like
dict = Branch ' ' (Branch 'E' ... ) (Branch 'T' ...)
Compared to the association list, we got rid of the dots '.'
and dashes ''
, they have been made implicit in the structure of our tree. But we still need a lot of parenthesis and applications of the Branch
function.
A clever way to get rid of parenthesis is reverse polish notation, commonly used in stackbased languages. The idea is that a program in such a language is a sequence of instructions that pop and push values from a stack. For example,
1 2 +
is a program to calculate 1 plus 2. Reading from left to right, the first instruction is 1
which pushes the integer 1 onto the stack. The second instruction is 2
, pushing 2 onto the stack. And finally, the instruction +
pops the two topmost integers from the stack and pushes their sum onto the stack. This procedure is visualized as follows:
To see how that eliminates the need for parenthesis, take a look the program
1 2 + 3 4 + +
which calculates the sum (1+2)+(3+4).
To build our dictionary tree, we are going to devise a very similar language. Instead of integers, the stack will store whole trees. In analogy to the numerals, there will be an instruction for pushing a Leaf
onto the stack; and in analogy to +
, there is going to be an instruction for applying the Branch
constructor to the two topmost subtrees.
Here’s the program for building the tree, stored as a string:
program :: String
program = "__5__4H___3VS__F___2 UI__L__+_ R__P___1JWAE"
++ "__6__=B__/_XD__C__YKN__7_Z__QG__8_ __9__0 OMT "
Each character represents one instruction. The underscore '_'
pushes a Leaf
onto the stack while all the other character push a Branch
tagged with the character in question onto the stack. The interpreter for this tiny programming language can be written using a simple left fold:
dict = interpret program
where
interpret = head . foldl exec []
exec xs '_' = Leaf : xs  push Leaf
exec (x:y:xs) c = Branch c y x : xs  combine subtrees
The stack xs
is represented as a list of trees.
We have found a concise way to represent the morse code dictionary in source code, but cleverness does not end here. For instance, how about eliminating the tree entirely?
More precisely, it seems wasteful to construct the dictionary tree by creating leaves and connecting branches only to deconstruct them again when decoding letters. Wouldn’t it be more efficient to interpret the morse code tree as a call graph rather than as a data structure Tree Char
?
In other words, imagine say a function e
corresponding to the node labeled ‘E’ in the tree, and working as follows:
e :: MorseCode > Char
e ('.':ds) = i ds
e ('':ds) = a ds
e [] = 'E'
If the next symbol is a dot or dash, we proceed with the functions i
or a
corresponding to 'I'
and 'A'
respectively. And in case we have already reached the end of the code, we know that we have decoded an 'E'
. The functions i
and a
work just like e
, except that they proceed with other letters; and the morse code tree becomes their call graph.
Now, writing all these functions by hand would be most errorprone and tedious, they all look the same. But abstracting repetitive patterns is exactly where functional programming shines; we are to devise a combinator that automates the boring parts of the code for us.
What does this combinator look like? Well, the nonboring parts are the letter it represents and the two followup functions, so these will be parameters:
combinator :: Char  letter
> (MorseCode > Char)  function for dot
> (MorseCode > Char)  function for dash
> (MorseCode > Char)  result function
That’s all we need; we can implement
combinator c x y = \code > case code of
'.':ds > x ds
'':ds > y ds
[] > c
which allows us to write
e = combinator 'E' i a
i = combinator 'I' s u
... etc ...
But wait, what have we done? The type signature of combinator
looks exactly like that of
Branch :: Char  letter
> Tree Char  tree for dot
> Tree Char  tree for dash
> Tree Char  result tree
but with the type Tree Char
replaced by (MorseCode > Char)
! In fact, let’s rename the combinator to lowercase branch
branch c x y = \code > case code of
'.':ds > x ds
'':ds > y ds
[] > c
and observe that the tree of functions now reads
e = branch 'E' i a
i = branch 'I' s u
... etc ...
or
e = branch 'E' (branch 'I' ... ) (branch 'A' ...)
if we inline the function definitions. But this is of course just the tree from the section on dichotomic search with each Branch
replaced by branch
!
In other words, branch
is like a dropin replacement for the constructor Branch
. To implement the morse code tree directly as functions instead of as an algebraic data type, we simply replace every occurrence of Branch
with branch
in our previous code. Thus, we have a new implementation
dict :: MorseCode > Char
dict = interpret program
where
interpret = head . foldl exec []
exec xs '_' = leaf : xs  push leaf
exec (x:y:xs) c = branch c y x : xs  combine subtrees
decodeLetter = dict
where
leaf = undefined
is a trivial replacement for the Leaf
constructor.
To further understand how and why the replacement of Branch
by branch
works, it is instructive to derive it systematically from just the program text.
In particular, consider the implementation of decodeLetter
from the earlier section on dichotomic search:
decodeLetter :: MorseCode > Char
decodeLetter = tag . foldl (flip step) dict
where
step '.' = left
step '' = right
In this formulation, the focus is on the recursion over the input string performed by foldl
, neglecting the recursive descent into the dictionary tree. Let’s systematically rewrite this code to highlight the latter.
First, we interpret it as converting the dictionary tree into a function that decodes letters
decodeLetter = decodeWith dict
decodeWith :: Tree Char > (MorseCode > Char)
decodeWith dict = tag . foldl (flip step) dict
where
step '.' = left
step '' = right
Then, we turn the foldl
into explicit recursion
decodeWith dict [] = tag dict
decodeWith dict (d:ds) = decodeWith (step d dict) ds
where
step '.' = left
step '' = right
and dissolve the step
function into the pattern matching
decodeWith dict [] = tag dict
decodeWith dict ('.':ds) = decodeWith (left dict) ds
decodeWith dict ('':ds) = decodeWith (right dict) ds
We group the pattern matching on the input code into a case expression
decodeWith dict = \code > case code of
[] > tag dict
('.':ds) > decodeWith (left dict) ds
('':ds) > decodeWith (right dict) ds
and finally, we replace the field selectors by pattern matching on the tree, also noting the case of Leaf
:
decodeWith Leaf = undefined
decodeWith (Branch c x y) = \code > case code of
[] > c
('.':ds) > (decodeWith x) ds
('':ds) > (decodeWith y) ds
The recursion on the tree is apparent now, decodeWith
simply traverses both subtrees and combines the results. We can make this even more evident by appealing to the combinator we defined in the previous section:
decodeWith Leaf = leaf
decodeWith (Branch c x y) = branch c (decodeWith x) (decodeWith y)
In other words, decodeWith
takes a tree and simply substitutes each Leaf
constructor with leaf
and each Branch
constructor with branch
.
Of course, first using Branch
and then replacing it with branch
is a waste; we should rather use branch
from the start and thus cut away the intermediate tree. That’s exactly what we’ve done in the previous section!
Replacing constructors is a very general pattern, not restricted to binary trees. For instance, you probably know following function:
foldr f z [] = z
foldr f z (x:xs) = x `f` foldr f z xs
It’s the good old fold! And at its heart, it just replaces the constructors of the list data type: z
is the substitute for the empty list []
, and f
is put in lieu of the (:)
constructor.
In its honor, any function that substitutes constructors in such fashion is known as a generalized fold. A more exotic but commonly used alternative name is “catamorphism”. In other words, decodeWith
is a catamorphism.
In this generality, the idea of deforestation is that instead of first constructing a data structure and then chopping it up with a catamorphism, it is more efficient to saw the pieces properly at the time of creation. For example, creating a list and then adding all elements
sum [1..100] = foldr (+) 0 (enumFromTo 1 100)
is less efficient than adding the elements as they are created
sumFromTo a b  changes to enumFromTo
 a > b = 0  0 replaces []
 otherwise = a + sumFromTo (a+1) b  + replaces (:)
Merging two functions in this fashion is also called fusion.
Performing deforestation manually, as we did, sacrifices reusability: the morse code tree only exists as a call graph and cannot be printed out anymore, and sumFromTo
is not nearly as useful as are sum
and enumFromTo
were.
Hence, the goal is to teach the compiler to automatically fuse catamorphisms with their structure creating counterparts, the so called anamorphisms. That’s what efforts like a shortcut to deforestation and the recent stream fusion are set out to do, yielding dramatic gains in efficiency while preserving the compositional style.
I hope you had fun doodling around with morse code; I certainly did. If you long for more, here a few suggestions:
While we’re at it, the intermediate lists created by words
in
decode = map decodeLetter . words
can be deforested as well and fused into the letter decoding. Try it yourself, or see the example source code.
Polish notation, the converse of reverse polish notation, puts function symbols before their arguments and is thus closer to the Haskell syntax. When the arities of the functions are known in advance, like 2 for Branch
and 0 for Leaf
, this notation doesn’t require parenthesis either. Write a program
in polish notation and a corresponding tiny interpreter to create the morse code tree.
Since deforestation is supposed to make things faster, how about comparing our deforested morse code tree to say an implementation in C? But instead of doing benchmarks, let me give two general remarks on the machine representation.
First, one might think that the call graph we’ve built is compiled to function calls, with e
calling i
or a
and each function having different machine code, just as we originally intended. But this is actually not the case. When defined with branch
, the functions are represented as closures, sharing the same machine code but carrying different records that store their free variables c
, x
and y
. This is not very different from storing c
, x
and y
in a Branch
constructor. Template Haskell or some kind of partial evaluation would be needed to hardcode the free variables into the executable. For more on the Haskell execution model, take a look at the GHC commentary.
Second, all our implementations decipher a letter by follow a chain of pointers: descending a tree means following pointers to deeper nodes, and making procedure calls means repeatedly jumping to different code parts. This of course raises questions of cache locality, branch prediction or memory requirements for storing all these pointers.
In C, however, there is another technique available: instead of following a chain of pointers to get to the destination, the address of the final stop is calculated directly with pointer arithmetic. The following example illustrates this:
#include <stdio.h>
#include <string.h>
char buf[99], tree[] = " ETIANMSURWDKGOHVF L PJBXCYZQ ";
int main() {
while (scanf("%s", buf)) {
int n, t=0;
for(n=0; buf[n]!=0; n++)
t = 2*t + 1 + (buf[n]&1); /* compute destination */
putchar(tree[t]); /* fetch letter */
}
}
Here, the tree is represented as an array and paths to nodes are encoded as (zeroless) binary numbers. The program calculates the path t
to the proper letter and then fetches it with an O(1) memory lookup. Compare also the morse code translator by reddit user kayamon.
Since the number of memory accesses is kept to a minimum, this technique is the most efficient; but it is impossible to reproduce with algebraic data types alone. Fortunately, arrays libraries are readily available in Haskell as well.
The main drawback of this approach, and of arrays in general, is the lack of clarity; indices are notorious for being nondescript and messy. The raw index arithmetic in the example C code above sure seems like magic!
Such magic is best hidden behind a descriptive abstract data type. How about rewriting the example in Haskell such that decodeLetter
looks exactly like the one in the section on dichotomic search? In other words, the abstract data type is to support the functions left
, right
and tag
. One possible solution can be found in the source code accompanying this article.
Want to write a small GUI thing but forgot to sacrifice to the giant rubber duck in the sky before trying to install wxHaskell or Gtk2Hs? Then this library is for you! Threepenny is easy to install because it uses the web browser as a display.
The library also has functional reactive programming (FRP) builtin, which makes it a lot easier to write GUI application without getting caught in spaghetti code. For an introduction to FRP, see for example my slides from a tutorial I gave in 2012. (The API is slightly different in Reactive.Threepenny
.)
In version 0.6, the communication with the web browser has been overhauled completely. On a technical level, Threepenny implements a HTTP server that sends JavaScript code to the web browser and receives JSON data back. However, this is not the right level of abstraction to look at the problem. What we really want is a foreign function interface for JavaScript, i.e. we want to be able to call arbitrary JavaScript functions from our Haskell code. As of this version, Threepenny implements just that: The module Foreign.JavaScript gives you the essential tools you need to interface with the JavaScript engine in a web browser, very similar to how the module Foreign and related modules from the base library give you the ability to call C code from Haskell. You can manipulate JavaScript objects, call JavaScript functions and export Haskell functions to be called from JavaScript.
However, the foreign calls are still made over a HTTP connection (Threepenny does not compile Haskell code to JavaScript). This presents some challenges, which I have tried to solve with the following design choices:
Garbage collection. I don’t know any FFI that has attemped to implement crossruntime garbage collection. The main problem are cyclic references, which happen very often in a GUI setting, where an event handler references a widget, which in turn references the event handler. In Threepenny, I have opted to leave garbage collection entirely to the Haskell side, because garbage collectors in current JavaScript engines are vastly inferior to what GHC provides. The module Foreign.RemotePtr gives you the necessary tools to keep track of objects on the JavaScript (“remote”) side where necessary.
Foreign exports. Since the browser and the HTTP server run concurrently, there is no shared “instruction pointer” that keeps track of whether you are currently executing code on the Haskell side or the JavaScript side. I have chosen to handle this in the following way: Threepenny supports synchronous calls to JavaScript functions, but Haskell functions can only be called as “asynchronous event handlers” from the JavaScript side, i.e. the calls are queued and they don’t return results.
Latency, fault tolerance. Being a GUI library, Threepenny assumes that both the browser and the Haskell code run on localhost, so all network problems are ignored. This is definitely not the right way to implement a genuine web application, but of course, you can abuse it for writing quick and dirty GUI apps over your local network (see the Chat.hs example).
To see Threepenny in action, have a look at the following applications:
Daniel Austin’s FNIStash Editor for Torchlight 2 inventories. 
Chaddai’s CurveProject Plotting curves for math teachers. 
Get the library here:
Note that the API is still in flux and is likely to change radically in the future. You’ll have to convert frequently or develop against a fixed version.
]]>Want to write a small GUI thing but forgot to sacrifice to the giant rubber duck in the sky before trying to install wxHaskell or Gtk2Hs? Then this library is for you! Threepenny is easy to install because it uses the web browser as a display.
The library also has functional reactive programming (FRP) builtin, which makes it a lot easier to write GUI application without getting caught in spaghetti code. For an introduction to FRP, see for example my slides from a tutorial I gave in 2012 (the API is slightly different in Reactive.Threepenny
) and the preliminary widget design guide.
Version 0.5 is essentially a maintenance release, allowing for newer versions of the libraries that it depends on. It also incorporates various contributions by other people, including a small Canvas API by Ken Friis Larsen and Carsten König, and a complete set of SVG elements and attributes by Steve Bigham. Many thanks also to Yuval Langer and JP Moresmau.
However, while it’s great that the library begins to grow in breadth, incorporating larger and larger parts of the DOM API, I also feel that the current backend code is unable to cope with this growth. In the next version, I intend to overhaul the server code and put the JavaScript FFI on a more solid footing.
To see Threepenny in action, have a look at the following applications:
Daniel Austin’s FNIStash Editor for Torchlight 2 inventories. 
Daniel Mlot’s Stunts Cartography Track Viewer Map viewer for the Stunts racing game. 
Get the library here:
Note that the API is still in flux and is likely to change radically in the future. You’ll have to convert frequently or develop against a fixed version.
]]>Reactive.Banana.Prim
which can be used to implement your own, custom FRP library.
After a long wait, I have finally found the time to implement a pushdriven algorithm that actually deserves that name – the previous implementations had taken a couple of shortcuts that rendered the performance closer to that of a pulldriven implementation. There was also an unexpected space leak, which I have fixed using a reasoning principle I’d like to call space invariants. Note that this release doesn’t include garbage collection for dynamically switched events just yet. But having rewritten the core algorithm over and over again for several years now, I finally understand its structure so well that garbage collection is easy to add – in fact, I have already implemented it in the development branch for the future 0.9 release.
Starting with this release, the development of reactivebanana will focus on performance – the banana is ready to pull out the guns and participate in the benchmarking game. (You see, the logo is no idle threat!) In fact, John Lato has put together a set of benchmarks for different FRP libraries. Unfortunately, reactivebanana took a severe beating there, coming out as the slowest contender. Oops. The main problem is that the library uses a stack of monad transfomers in an inner loop – bad idea.
Now, optimizing monad transformers seems to be an issue of general interest, but the only public information I could find was a moderately useful wiki page. If you have any tips or benchmarks for optimizing monad transformer stacks, please let me and the community know!
The other major addition to the reactivebanana library is a module Reactive.Banana.Prim
which presents the core algorithm in such a way that you can use it to implement your very own FRP API. Essentially, it implements a basic FRP system on top of which you can implement richer semantics – like observable sharing, recursion, simultaneous events, change notification, and so on. Of course, few people will ever want to do that, also given that reactivebanana is currently not the fastest fruit in the basket.
But this new module is my main motivation for releasing version 0.8. It contains the lessons that I’ve learned from implementing yet another toy FRP system for my threepennygui project, and it’s time to put the latter on a solid footing. In particular, it appears that widget abstractions greatly benefit from dynamic event switching, which means that future versions of Threepenny cannot do without a solid FRP subsystem.
]]>Want to write a small GUI thing but forgot to sacrifice to the giant rubber duck in the sky before trying to install wxHaskell or Gtk2Hs? Then this library is for you! Threepenny is easy to install because it uses the web browser as a display.
The library also has functional reactive programming (FRP) builtin, which makes it a lot easier to write GUI application without getting caught in spaghetti code. For an introduction to FRP, see for example my slides from a tutorial I gave in 2012. (The API is slightly different in Reactive.Threepenny
.)
Version 0.4 is an incremental improvement over the previous version. A new UI
monad has been introduced to simplify the JavaScript FFI and allow recursion for FRP. DOM elements are now subject to garbage collection. (Unfortunately, this also leads to a bug when using custom HTML files.)
To see Threepenny in action, have a look at the following applications:
Daniel Austin’s FNIStash Editor for Torchlight 2 inventories. 
Chaddai’s CurveProject Plotting curves for math teachers. 
Get the library here:
Note that the API is still in flux and is likely to change radically in the future. You’ll have to convert frequently or develop against a fixed version.
Many thanks to Daniel Austin and Daniel Mlot for their help with this project and to Chris Done for implementing the Ji library which is the basis for this effort.
]]>Want to write a small GUI thing but forgot to sacrifice to the giant rubber duck in the sky before trying to install wxHaskell or Gtk2Hs? Then this library is for you! Threepenny is easy to install because it uses the web browser as a display.
The special feature of version 0.3 is that it has functional reactive programming (FRP) builtin. This makes it a lot easier to program GUI application without getting caught in spaghetti code. It’s completely optional, though, you can freely switch between FRP and traditional event handlers. Note that the FRP variant used in this release is a little underpowered, it does not have dynamic event switching yet. But it works today and is already extremely useful.
For an introduction to FRP, see for example my slides from a tutorial I gave last year. (The API is slightly different in Reactive.Threepenny
.)
To see Threepenny in action, have a look at the following applications:
Daniel Austin’s FNIStash Editor for Torchlight 2 inventories. 
Daniel Mlot’s Stunts Cartography Track Viewer Map viewer for the Stunts racing game. 
Get the library here:
Note that the API is still in flux and is likely to change radically in the future. You’ll have to convert frequently or develop against a fixed version.
Many thanks to Daniel Austin and Daniel Mlot for their help with this project and to Chris Done for implementing the Ji library which is the basis for this effort.
]]>Another reason for writing this post is that some time ago, several people pointed out to me that my library reactivebanana had a couple of unpleasant space leaks. I finally found the time to investigate and fix them, but I had a hard time to prove to myself that I had actually fixed the space leak for good, until space invariants gave a me an excellent way to reason about it.
Most importantly, space invaders make a great pun on space invariants.
Anyway, space invariants are about expressing the folklore advice of adding strictness annotations to your data types as an actual property of the data structure that can be reasoned about. The essence of space invariants is
to ensure that whenever we evaluate a value to weak head normal form (WHNF), we automatically know a bound on its memory usage.
The idea of attaching invariants to WHNFs has proven very successful (see Chris Okasaki’s book [pdf]) for reasoning about the time complexity of lazy functional programs; this is known as the debit method. Hopefully, this will help us with space usage as well.
In the next section, I will give a short overview aimed at more experienced Haskellers. Then we’ll have a section aimed at Haskell beginners and a section concerning the relation to strictness. Finally, I’ll present a short case study describing how I fixed a space leak in my reactivebanana library.
Consider a container like Data.Map
. Here is an example container:
container = insert "key1" 'a'
. delete "key2" . insert "key2" 'b' $ empty
As written, this expression is a rather inefficient representation of the container which contains a single value 'a'
at the key "key1"
. After all, the expression still contains a reference to another value 'b'
, which will take up space in memory. It is only when this expression is evaluated that the container will contain a single value. Or will it?
spinestrict = if the container is in WHNF, then the space used by the container is the total of the space used by the values that are contained in the container (times a constant)
Apparently, when evaluating this expression, only a spinestrict container will give you the seemingly obvious guarantee that it stores references only to those values that it actually contains. You see, the value of the container
is denotationally equivalent to
container2 = fromList [("key1",'a')]
but these two expressions are represented very differently operationally in the computer memory. To be able to reason about the space usage, it is convenient to link the denotational side to the operational side via an invariant. The great thing about denotation is that it is independent of how the expression is represented. Using an invariant like spinestrictess, you can now say “Oh, I know which elements are in the container, so I can conclude how much space it takes up.”, regardless of how you constructed the container in the course of the program. For languages based on lazy evaluation, such a link is usually not available.
It is not necessarily a bad thing that denotation and space usage can be separate from each other, because the denotation can also be much larger than the actual expression. For instance, lists are a data structure that are not spinestrict and we can easily express lists of infinite (denotational) size in small space
squares = map (^2) [1..]
By the way, this distinction between denotation and operational behavior is also why the word “strict” in the term “spinestrict” is a bad idea. Strictness is a purely denotational concept and hence cannot give you any of the operational guarantees that we need to control space usage. (For time complexity, this is actually not a problem because up to a constant factor, lazy evaluation is always faster than eager evaluation.) The concepts are certainly related, as I will discuss below, but strictness alone doesn’t help you with space usage at all.
The other important space invariant for a container is the property of valuestrictness
valuestrict = if the container is in WHNF, then each value stored in the container is in WHNF
In other words, the container will make sure that all values are stored in an evaluated form, which in turn allows us to apply other space invariants for the values and hence control the total memory usage of the container. Again, my reservations about the use of the word “strict” apply here as well.
As you can see, space invariants are usually of the form “if this is in WHNF, then other things are in WHNF as well”. It is generally hard to say something about expressions that are not in WHNF, their size usually doesn’t bear any relation to their denotation.
How do you create space invariants? Well, that is easy, by using the seq
function. A fresh view on the definition
x = a `seq` b
is to not see it as a specification of evaluation order, but rather as a declaration of a relation between the evaluation of x
and the evaluation of a
: whenever x
is in WHNF, so is a
. Taking this further, another example of a space invariant is a property that I would like to call deepspace:
deepspace = if the value is in WHNF, then it is already in full normal form (NF)
In other words, a value is deepspace if it is either unevaluated or already in full normal form. This time, I have avoided the word “strictness” for good effect. :)
For algebraic data types, we can encode applications of the seq
combinator directly in the constructors via “strictness annotations”. The plain data type
data Tree a = Bin (Tree a) (Tree a)  Leaf a
has the spinestrict variant
data Tree a = Bin !(Tree a) !(Tree a)  Leaf a
and the both spinestrict and valuestrict variant
data Tree a = Bin !(Tree a) !(Tree a)  Leaf !a
The exclamation mark before a constructor argument indicates that the corresponding value is in WHNF whenever the constructor of the algebraic data type is in WHNF. Note that binary trees do not have a valuestrict variant that is not spinestrict as well, because you can build trees such as Bin thunk1 thunk1
out of unevaluated leaves. On the other hand, abstract container types can be made purely valuestrict by making the insert
function use seq
on the value first.
Before continuing with space invariants and their relation to strictness, I would like to make an interlude for readers that are Haskell beginners and spell out some general information about lazy evaluation and space usage that I have already assumed implicitly in the previous section. If you’re an experienced Haskeller, you can skip to the next section.
The main cause for why space usage in a lazy language like Haskell is hard to reason about is because it is heavily contextdependent. For example, the expression [1..n]
uses O(n) memory when evaluated in full, but when it appears in a different context, for instance in the expression head [1..n]
, evaluating the whole program takes only O(1) memory. In contrast, the value represented by the expression [1..n]
is independent of the context, it is compositional. It does not matter how you represent the list of integers from 1
to n
in the computer, calculating its head
will always give the same result 1
.
Since memory usage depends on context, it seems that in order to reason about space usage, you have to trace the evaluation of a Haskell program in detail. Edward Yang has a beautiful visualization for how to do that, but unfortunately, it is utterly impossible to trace the evaluation for anything but trivial programs in this way. We need tools that abstract the details away: we don’t want to know about the details of how GHC manipulates heap objects, and we don’t really want to know how lazy evaluation works on a more abstract level either, we just want some sort of compositionality, just like we have in the case of semantics.
As a first step, understanding GHC’s evaluation machinery has fortunately never been necessary. You can understand lazy evaluation entirely on the source code level, in terms of graph reduction. The basic idea is that lazy evaluation deals with expressions. For instance, every one of the following lines
5
2*3  1
1 + (1 + (1 + (1 + 1)))
is an expression that denotes the value 5, but they are still different expressions for the same semantic value. The size of an expression tells you how much memory is needed to store it in the computer. The last expression is very long and takes five memory cells to store, while the first expression is small and takes only one memory cell to store.
Executing a Haskell program is about evaluating expressions, essentially by applying the definitions of functions like *
or +
repeatedly. Usually, evaluation of an expression stops when it has reached weak head normal form (WHNF). Of the example expressions above, only 5
is in WHNF. For treelike data types, WHNF means that we can see the outermost constructor, but its arguments may still contain arbitrary expressions. For instance, for the type
data Tree a = Bin (Tree a) (Tree a)  Leaf a
the expressions
Leaf (1+1)
Bin (insert 17 (Leaf 1)) undefined
are in WHNF.
Concerning terminology, we will call an expression that is not in WHNF an unevaluated expression, or thunk for short. However, keep in mind that the latter term originally referred to a particular representation of unevaluated expressions in GHC. I would like to repeat again: no understanding of GHC internals is necessary to reason about space usage in Haskell, it can all be done on the level of expressions (graphs). The more details you have to keep track of, the less you understand your program. But it seems to me that the term “thunk” has become synonymous with “unevaluated expression” in current usage, so I will use it in this way as well.
An even higher level of abstraction is the notion of strictness. It answers the following question: a function foo
is called strict if evaluating foo expr
to WHNF will also evaluate the expression expr
to WHNF. In other words, to evaluate the result, you have to look at the argument. The great thing about strictness is that it can be formulated in terms of denotations alone, which are independent evaluation order. In particular, the function foo
is strict iff
foo ⊥ = ⊥
where ⊥
is called “bottom” and denotes a “diverging” value, for instance an expression whose evaluation does not terminate. In other words, if evaluating the argument to WHNF will throw you into an infinite loop, then evaluating the function result will also do that, because it needs to peek at the argument. (See the Wikibook for more on ⊥).
So much for the general setup.
As we have seen, strictness is a denotational property that can express some relations between WHNF, but it is not powerful enough to express space invariants in general. Neither do space invariants imply strictness properties. However, I feel there is some correspondence between the two.
To explore the correspondence, let us focus once again on container structures. The main functionality that a container offers is the lookup
function
lookup :: key > Map key value > Maybe value
If the container is spinestrict, then the following property seems natural for the lookup
function:
If looking up any one key gives ⊥, then the container was ⊥ to begin with.
In formulas:
∀ key. (lookup key container = ⊥ → container = ⊥)
This formula may take a while to digest, it looks a bit like a “reverse strictness” of the lookup
function. The idea is that if you have evaluated the container to WHNF (container ≠ ⊥
), then you already “evaluated the possible lookups” to WHNF as well (lookup key container ≠ ⊥
). If you imagine a tree, then this means that the branches may not contain thunks; otherwise, you could replace the thunk with ⊥ and blow up when you try to evaluate a lookup that takes this particular branch.
Unfortunately (and unlike I excitedly thought for some time), this property alone is not enough to imply a space invariant. In fact, it is not necessarily implied by a space invariant either. You can always have a funny implementation like
lookup key container = container `seq` ...
that has this strictness property, but doesn’t give any guarantees about space usage of the container itself. Likewise, it is also possible to have extra data in the container, so that a diverging lookup
is not enough to guarantee that the whole container has to be ⊥ as well.
Still, I think that this denotational property is very much a close relative to a space invariant. In particular, this formulation allows us to gain some partial confidence whether container library really does provide a spinestrict data structure, for example by using StrictCheck for testing.
In the same spirit, a strictness property corresponding to the space invariant of valuestrictness (oh, the misnomer!) would be
If looking at a value gives ⊥, then the container was ⊥ to begin with.
In formulas:
∀ key. (lookup key container = Just ⊥ → container = ⊥)
This description may take some time to understand as well. Note how we use of the lazy Maybe
type here: in the spinestrict case, our premise is that the lookup yields a result at all, while in the case of valuestrictness, the premise is that the lookup succeeds with a value (but we don’t say anything about the case where it diverges).
I’m sure that “reverse strictness” formulas such as the ones above have been considered in projectionbased strictness analysis.
I found all of the above considerations very helpful for fixing some space leaks in my reactivebanana library, as it gave me a clear picture of what I can rely on and what is impossible.
Namely, trying to understand the details of the evaluation process is absolutely hopeless, you never really know how soon an expression is going to be evaluated to WHNF. If you have an expression foo
that you want to be in WHNF, you can try to force its evaluation by writing evaluate foo
instead, but that doesn’t help at all, because how do you ensure that the larger expression (evaluate foo)
itself is evaluated? Oops. This is the same as putting a bang pattern on a function that is already strict, a common act of desperation amongst Haskell programmers when confronted with the evil space leak invaders.
However, what you can do is to link the evaluation of two expressions. This is precisely what a space invariant expresses and what the seq
combinator can do: by writing seq expr1 expr2
, you indicate that expr1
will be in WHNF whenever expr2
is. “If you shoot my friend, then you also have to shoot me”. No idea if a value will ever be evaluated, but when it is, then another value will be evaluated as well.
The situation in reactivebanana was quite hairy. Namely, I had a lazy state monad
type Eval a = Control.Monad.State.Lazy.State (Container,Hangman) a
and profiling indicated that the state of type Container
was leaking like a bucket without bottom. The obvious try would be to use a strict state monad, but unfortunately, I couldn’t do that: I needed the laziness to support a tieaknotaroundyourneck kind of recursion with the Hangman
type, and the values in the Container
type got populated in a somewhat recursive fashion as well. It was simply not possible to make this state monad strict and control the evaluation of the Container
type in a stepbystep fashion.
At first, I was not sure whether using a spinestrict and valuestrict container would help me in this situation, but then I became happy with the idea of only being able to link evaluation instead of being able to force it. Running the state monad was part of a larger computation step, so sneaking an evaluate
in the IO monad at the end of every step should allow the laziness I needed and yet force my container to attain a predictable size in the end. Thanks to space invariants, I can be confident that this strategy works.
Unfortunately, that didn’t quite help yet, though. The reason was building the new container involved reading values from the old container and many values were actually of function type, like
value :: [A] > [B]
value = map (fromJust $ lookup key oldContainer)
The problem is that this expression is (almost) in WHNF already, so the lookup
didn’t get evaluated and oldContainer
is still lingering about. In other words, evaluating the values to WHNF was not sufficient to ensure a predictable memory size. Fortunately, the solution is to add new space invariant that links the WHNF of the value
to the WHNF of the expression involving lookup
. Problem solved!
So much for space invariants. I think they offer a fresh perspective on how to reason about space usage in Haskell.
In particular, they explain why the changes to the unorderedcontainers library mentioned in the introduction were a rather odd idea: the strictness of a function has, unfortunately, little to do with a guarantee on space usage.
]]>