This article was first published in issue 15 of The Monad.Reader.
Another monad tutorial? Oh my god, why!? Fear not, this article is aimed at Haskellers who are already familiar with monads, though I have of course tried to keep the material as accessible as possible; the first two sections may serve as an initial introduction to monads and the monad laws for the brave.
In this tutorial, I would like to present monads from the viewpoint of operational semantics and how it makes designing and implementing new monads a piece of cake. Put differently, s > (a,s)
is not the only way to implement the state monad and this tutorial aims to present a much more systematic way. I think it is still regrettably underused, hence this text.
The main idea is to view monads as a sequence of instructions to be executed by a machine, so that the task of implementing monads is equivalent to writing an interpreter. The introductory example will be a stack automaton, followed by a remark on a monad for random numbers. Then, to showcase the simplicity of this approach, we will implement backtracking parser combinators, culminating in a straightforward breadthfirst implementation equivalent to Claessen’s parallel parsing processes.
For those in the know, I’m basically going to present the principles of Chuankai Lin’s Unimo paper. The approach is neither new nor unfamiliar; for example, John Hughes already used it to derive the state monad. But until I read Lin’s paper, I did not understand how valuable it is when done systematically and in Haskell. Ryan Ingram’s MonadPrompt
package is another recent formulation.
To encourage reuse, I have also released a package operational
on hackage which collects the generic bits of these ideas in a small library. For convenient study, the source code from each section of this article is also available.
Our introductory example will be a stack machine, i.e. an imperative minilanguage featuring two instructions push
and pop
for pushing and popping values onto and from a stack.
In other words, I have imperative programs like the following in mind:
push 5; push 42; pop;
Instructions are separated by semicolons. As shown in the following picture, this program first puts the number 5
on the stack, then puts the number 42
on top of the stack and proceeds to remove it again.
How can we embed such programs into Haskell?
First we need some way of representing the program text, for instance as a list of instructions:
type Program instr = [instr]
type StackProgram = Program StackInstruction
data StackInstruction = Push Int  Pop
Our example is represented as
example = Push 5 : Push 42 : Pop : []
In a sense, the colon (:)
for building lists takes the role of the semicolon for sequencing instructions.
Note that this representation gives us a very convenient tool for assembling bigger programs from smaller subprograms: list concatenation (++)
. For instance,
exampleTwice = example ++ example
= Push 5 : Push 42 : Pop : Push 5 : Push 42 : Pop : []
is a program that executes example
twice. Together with the empty program
empty = []
concatenation obeys the following three wellknown laws:
empty ++ is = is  left unit
is ++ empty = is  right unit
(is ++ js) ++ ks = is ++ (js ++ ks)  associativity
which seem almost too evident to be worth mentioning. For example, it is customary to leave out the parenthesis in the last line altogether.
Once accustomed to the notion of programs and (++)
to combine them, the special case of single instructions and (:)
for sequencing them is unnecessary. The user of our language does not care that we deem push
and pop
to be primitive operations but not, for example, the program
replace a = Pop : Push a : []
which replaces the topmost stack element with a
; he is entirely content to be given two programs
push :: Int > StackProgram
pop :: StackProgram
and two general combinators for building new ones
empty :: StackProgram
(++) :: StackProgram > StackProgram > StackProgram
without any mention of the distinction between single instruction and compound program. Their difference is but an implementation detail.
Well, to be entirely content, the user also needs a way to run programs. In particular, we need to implement a function interpret
that maps the program text to its intended meaning, here a function that transforms a stack of integers.
type Stack a = [a]
interpret :: StackProgram > (Stack Int > Stack Int)
The implementation follows the style of operational semantics: inspect the first instruction, change the stack accordingly, and recursively proceed with the remaining list of instructions is
:
interpret (Push a : is) stack = interpret is (a : stack)
interpret (Pop : is) stack = interpret is (tail stack)
interpret [] stack = stack
“All well and good, but why all the fuss with ‘monads’ then, when lists of instructions will do?” you may ask. Alas, the problem is of course that lists won’t do! We forgot something very important: our programs are completely unable to inspect values from the stack.
For instance, how to write a program that pops the two topmost values and pushes their sum onto the stack? Clearly, we want something like
a < pop;
b < pop;
push (a+b);
where each pop
returns the element just removed and the arrow <
binds it to a variable. But binding variables is simply impossible to express with our current representation of programs as lists of instructions.
Well, if ordinary lists of instructions are not enough to represent programs that involve binding variables like
a < pop; b < pop; push (a+b);
then let’s invent some fancy kind of list of instructions that will! The following presentation will be in close analogy to the structure of the previous section.
First, if we want to interpret pop
as a function that returns something, we had better label it with the type of the value returned! Hence, instead of a plain type
Pop :: StackInstruction
we need an additional type argument
Pop :: StackInstruction Int
which indicates that the Pop
instruction somehow returns a value of type Int
.
For simplicity, we attribute a return type to push
as well, even though it doesn’t really return anything. This can modeled just fine with the unit type ()
.
Push 42 :: StackInstruction ()
Push :: Int > StackInstruction ()
Putting both together, our type of instructions will become
data StackInstruction a where
Pop :: StackInstruction Int
Push :: Int > StackInstruction ()
If this syntax is alien to you: this is a Generalized Algebraic Data Type (GADT) which allows us to define a data type by declaring the types of its constructors directly. As of Haskell 2010, GADTs are not yet part of the language standard, but they are supported by GHC.
Like instructions, we also have to annotate programs with their return type, so that the definition for StackProgram
becomes
data Program instr a where ...
type StackProgram a = Program StackInstruction a
As before, instr
is the type of instructions, whereas a
is the newly annotated return type.
How to represent the binding of variables? Lambda abstractions will do the trick; imagine the following:
take a binding 
a < pop; rest

turn the arrow to the right 
pop > a; rest

and use a lambda expression to move it past the semicolon 
pop; \a > rest

Voila, the last step can be represented in Haskell, with a constructor named Then
taking the role of the semicolon:
Pop `Then` \a > rest
The idea is that Then
plugs the value returned by pop
into the variable a
. By the way, this is akin to how let
expressions can be expressed as lambda abstractions in Haskell:
let a = foo in bar <=> (\a > bar) foo
Anyway, our motivating example can now be represented as
example2 = Pop `Then` (\a > Pop `Then`
(\b > Push (a+b) `Then` Return))
where Return
represents the empty program which we will discuss in a moment. Remember that parentheses around the lambda expressions are optional, so we can also write
example2 = Pop `Then` \a >
Pop `Then` \b >
Push (a+b) `Then`
Return
It is instructive to think about the type of Then
. It has to be
Then :: instr a > (a > Program instr b) > Program instr b
Except for the return type a
in instr a
and the lambda abstraction, this is entirely analogous to the “cons” operation (:)
for lists.
The empty program, corresponding to the empty list []
, is best represented by a constructor
Return :: a > Program instr a
that is not “entirely empty” but rather denotes a trivial instruction that just returns the given value a
(hence the name). This is very useful, since we can now choose return values freely. For instance,
example3 = Pop `Then` \a > Pop `Then` \b > Return (a*b)
is a program that pops two values from the stack but whose return value is their product.
Taking everything together, we obtain a fancy list of instructions, once again a GADT:
data Program instr a where
Then :: instr a > (a > Program instr b) > Program instr b
Return :: a > Program instr a
And specialized to our stack machine language, we get
type StackProgram a = Program StackInstruction a
Before thinking thinking further about our new representation, let’s first write the interpreter to see the stack machine in action. This time, however, we are not interested in the final stack, only in the value returned.
interpret :: StackProgram a > (Stack Int > a)
interpret (Push a `Then` is) stack = interpret (is ()) (a:stack)
interpret (Pop `Then` is) (b:stack) = interpret (is b ) stack
interpret (Return c) stack = c
The implementation is like the previous one, except that now, we also have to pass the return values like ()
and b
to the remaining instructions is
.
Our example program executes as expected:
GHCi> interpret example3 [7,11]
77
Just as with lists, we can build large programs by concatenating smaller subprograms. And as before, we don’t want the user to bother with the distinction between single instruction and compound program.
We begin with the latter: the function
singleton :: instr a > Program instr a
singleton i = i `Then` Return
takes the role of \x > [x]
and helps us blur the line between program and instructions:
pop :: StackProgram Int
push :: Int > StackProgram ()
pop = singleton Pop
push = singleton . Push
Now, we define the concatenation operator (often dubbed “bind”) that glues two programs together:
(>>=) :: Program i a > (a > Program i b) > Program i b
(Return a) >>= js = js a
(i `Then` is) >>= js = i `Then` (\a > is a >>= js)
Apart from the new symbol (>>=)
and the new type signature, the purpose and implementation is entirely analogous to (++)
. And as before, together with the empty program,
return = Return
it obeys three evident laws
return a >>= is = is a  left unit
is >>= return = is  right unit
(is >>= js) >>= ks = is >>= (\a > js a >>= ks)  associativity
also called the monad laws. Since we need to pass return values, the laws are slightly different from the concatenation laws for ordinary lists, but their essence is the same.
The reason that these equations are called the “monad laws” is that any data type supporting two such operations and obeying the three laws is called a monad. In Haskell, monads are assembled in the type class Monad
, so we’d have to make an instance
instance Monad (Program instr) where
(>>=) = ...
return = ...
This is similar to lists which are said to constitute a monoid.
We conclude the first part of this tutorial by remarking that the (>>=)
operator is the basis for many other functions that build big programs from small ones; these can be found in the Control.Monad
module and are described elsewhere.
Those familiar with the state monad will recognize that the whole stack machine was just
State (Stack Int)
in disguise. But surprisingly, we haven’t used the pattern s > (a,s)
for threading state anywhere! Instead, we were able to implement the equivalent of
evalState :: State s > (s > a)
directly, even though the type s > a
by itself is too “weak” to serve as an implementation of the state monad.
This is a very general phenomenon and it is of course the main benefit of the operational viewpoint and the new Program instr a
type. No matter what we choose as interpreter function or instruction set, the monad laws for (>>=)
and return
will always hold, for they are entirely independent of these choices. This makes it much easier to define and implement new monads and the remainder of this article aims to give a taste of its power.
A first advantage of the operational approach is that it allows us to equip one and the same monad with multiple interpreters. We’ll demonstrate this flexibility with an example monad Random
that expresses randomness and probability distributions.
The ability to write multiple interpreters is also very useful for implementing games, specifically to account for both human and computer opponents as well as replaying a game from a script. This is what prompted Ryan Ingram to write his MonadPrompt
package.
At the heart of random computations is a type Random a
which denotes random variables taking values in a
. Traditionally, the type a
would be a numeric type like Int
, so that Random Int
denotes “random numbers”. But for the Haskell programmer, it is only natural to generalize it to any type a
. This generalization is also very useful, because it reveals hidden structure: it turns out that Random
is actually a monad.
There are two ways to implement this monad: one way is to interpret random variables as a recipe for creating pseudorandom values from a seed, which is commonly written
type Random a = StdGen > (a,StdGen)
The other is to view them as a probability distribution, as for example expressed in probabilistic functional programming as
type Probability = Double
type Random a = [(a,Probability)]
Traditionally, we’d have to choose between one way or the other depending on the application. But with the operational approach, we can have our cake and eat it, too! The two ways of implementing random variables can be delegated to two different interpreter functions for one and the same monad Random
.
For demonstration purposes, we represent Random
as a language with just one instruction uniform
that randomly selects an element from a list with uniform probability
type Random a = Program RandomInstruction a
data RandomInstruction a where
Uniform :: [a] > RandomInstruction a
uniform :: [a] > Random a
uniform = singleton . Uniform
For example, a roll of a die is modeled as
die :: Random Int
die = uniform [1..6]
and the sum of two dice rolls is
sum2Dies = die >>= \a > die >>= \b > return (a+b)
Now, the two different interpretations are: sampling a random variable by generating pseudorandom values
sample :: Random a > StdGen > (a,StdGen)
sample (Return a) gen = (a,gen)
sample (Uniform xs `Then` is) gen = sample (is $ xs !! k) gen'
where (k,gen') = System.Random.randomR (0,length xs1) gen
and calculating its probability distribution
distribution :: Random a > [(a,Probability)]
distribution (Return a) = [(a,1)]
distribution (Uniform xs `Then` is) =
[(a,p/n)  x < xs, (a,p) < distribution (is x)]
where n = fromIntegral (length xs)
Truth to be told, the distribution
interpreter has a flaw, namely that it never tallies the probabilities of equal outcomes. That’s because this would require an additional Eq a
constraint on the types of return
and (>>=)
, which is unfortunately not possible with the current Monad
type class. A workaround for this known limitation can be found in the norm
function from the paper on probabilistic functional programming.
Now, it is time to demonstrate that the operational viewpoint also makes the implementation of otherwise advanced monads a piece of cake. Our example will be monadic parser combinators and for the remainder of this article, I will assume that you are somewhat familiar with them already. The goal will be to derive an implementation of Koen Claessen’s ideas from scratch.
At their core, monadic parser combinators are a monad Parser
with just three primitives:
symbol :: Parser Char
mzero :: Parser a
mplus :: Parser a > Parser a > Parser a
which represent
respectively. (The last two operations define the MonadPlus
type class.) Furthermore, we need an interpreter, i.e. a function
interpret :: Parser a > (String > [a])
that runs the parser on the string and returns all successful parses.
The three primitives are enough to express virtually any parsing problem; here is an example of a parser number
that recognizes integers:
satisfies p = symbol >>= \c > if p c then return c else mzero
many p = return [] `mplus` many1 p
many1 p = liftM2 (:) p (many p)
digit = satisfies isDigit >>= \c > return (ord c  ord '0')
number = many1 digit >>= return . foldl (\x d > 10*x + d) 0
The instruction set for our parser language will of course consist of these three primitive operations:
data ParserInstruction a where
Symbol :: ParserInstruction Char
MZero :: ParserInstruction a
MPlus :: Parser a > Parser a > ParserInstruction a
type Parser a = Program ParserInstruction a
A straightforward implementation of interpret
looks like this:
interpret :: Parser a > String > [a]
interpret (Return a) s = if null s then [a] else []
interpret (Symbol `Then` is) s = case s of
c:cs > interpret (is c) cs
[] > []
interpret (MZero `Then` is) s = []
interpret (MPlus p q `Then` is) s =
interpret (p >>= is) s ++ interpret (q >>= is) s
For each instruction, we specify the intended effects, often calling interpret
recursively on the remaining program is
. In prose, the four cases are
Return
at the end of a program will return a result if the input was parsed completely.Symbol
reads a single character from the input stream if available and fails otherwise.MZero
returns an empty result immediately.MPlus
runs two parsers in parallel and collects their results.The cases for MZero
and MPlus
are a bit roundabout; the equations
interpret mzero = \s > []
interpret (mplus p q) = \s > interpret p s ++ interpret q s
express our intention more plainly. Of course, these two equations do not constitute valid Haskell code for we may not pattern match on mzero
or mplus
directly. The only thing we may pattern match on is a constructor, for example like this
interpret (Mplus p q `Then` is) = ...
But even though our final Haskell code will have this form, this does not mean that jotting down the left hand side and thinking hard about the ...
is the best way to write Haskell code. No, we should rather use the full power of purely functional programming and use a more calculational approach, deriving the pattern matches from more evident equations like the ones above.
In this case, we can combine the two equations with the MonadPlus
laws
mzero >>= m = mzero
mplus p q >>= m = mplus (p >>= m) (q >>= m)
which specify how mzero
and mplus
interact with (>>=)
, to derive the desired pattern match
interpret (Mplus p q `Then` is)
= { definition of concatenation and mplus }
interpret (mplus p q >>= is)
= { MonadPlus law }
interpret (mplus (p >>= is) (q >>= is))
= { intended meaning }
\s > interpret (p >>= is) s ++ interpret (q >>= is) s
Now, in light of the first step of this derivation, I even suggest to forget about constructors entirely and instead regard
interpret (mplus p q >>= is) = ...
as “valid” Haskell code; after all, it is straightforwardly converted to a valid pattern match. In other words, it is once again beneficial to not distinguish between single instructions and compound programs, at least in notation.
Unfortunately, our first implementation has a potential space leak, namely in the case
interpret (MPlus p q `Then` is) s =
interpret (p >>= is) s ++ interpret (q >>= is) s
The string s
is shared by the recursive calls and has to be held in memory for a long time.
In particular, the implementation will try to parse s
with the parser p >>= is
first, and then backtrack to the beginning of s
to parse it again with the second alternative q >>= is
. That’s why this is called a depthfirst or backtracking implementation. The string s
has to be held in memory as long the second parser has not started yet.
To ameliorate the space leak, we would like to create a breadthfirst implementation, one which does not try alternative parsers in sequence, but rather keeps a collection of all possible alternatives and advances them at once.
How to make this precise? The key idea is the following equation:
(symbol >>= is) `mplus` (symbol >>= js)
= symbol >>= (\c > is c `mplus` js c)
When the parsers on both sides of mplus
are waiting for the next input symbol, we can group them together and make sure that the next symbol will be fetched only once from the input stream.
Clearly, this equation readily extends to more than two parsers, like for example
(symbol >>= is) `mplus` (symbol >>= js) `mplus` (symbol >>= ks)
= symbol >>= (\c > is c `mplus` js c `mplus` ks c)
and so on.
We want to use this equation as a function definition, mapping the left hand side to the right hand side. Of course, we can’t do so directly because the left hand side is not one of the four patterns we can match upon. But thanks to the MonadPlus
laws, what we can do is to rewrite any parser into this form, namely with a function
expand :: Parser a > [Parser a]
expand (MPlus p q `Then` is) = expand (p >>= is) ++
expand (q >>= is)
expand (MZero `Then` is) = []
expand x = [x]
The idea is that expand
fulfills
foldr mplus mzero . expand = id
and thus turns a parser into a list of summands which we now can pattern match upon. In other words, this function expands parsers matching mzero >>= is
and mplus p q >>= is
until only summands of the form symbol >>= is
and return a
remain.
With the parser expressed as a big “sum”, we can now apply our key idea and group all summands of the form symbol >>= is
; and we also have to take care of the other summands of the form return a
. The following definition will do the right thing:
interpret :: Parser a > String > [a]
interpret p = interpret' (expand p)
where
interpret' :: [Parser a] > String > [a]
interpret' ps [] = [a  Return a < ps]
interpret' ps (c:cs) = interpret'
[p  (Symbol `Then` is) < ps, p < expand (is c)] cs
Namely, how to handle each of the summands depends on the input stream:
symbol >>= is
will proceed, the other parsers have ended prematurely.return x
have parsed the input correctly, and their results are to be returned.That’s it, this is our breadthfirst interpreter, obtained by using laws and equations to rewrite instruction lists. It is equivalent to Koen Claessen’s implementation.
As an amusing last remark, I would like to mention that our calculations can be visualized as high school algebra if we ignore that (>>=)
has to pass around variables, as shown in the following table:
Term  Mathematical operation 

return 
1 
(>>=) 
× multiplication 
mzero 
0 
mplus 
+ addition 
symbol 
x indeterminate 
For example, our key idea corresponds to the distributive law
x × a + x × b = x × (a + b)
and the monad and MonadPlus
laws have wellknown counterparts in algebra as well.
I hope I have managed to convincingly demonstrate the virtues of the operational viewpoint with my choice of examples.
There are many other advanced monads whose implementations also become clearer when approached this way, such as the list monad transformer (where the naive m [a]
is known not to work), Oleg Kiselyov’s LogicT
, Koen Claessen’s poor man’s concurrency monad, as well coroutines like Peter Thiemann’s ingenious WASH which includes a monad for tracking session state in a web server.
The operational
package includes a few of these examples.
Traditionally, the continuation monad transformer
data Cont m a = Cont { runCont :: forall b. (a > m b) > m b }
has been used to implement these advanced monads. This is no accident; both approaches are capable of implementing any monad. In fact, they are almost the same thing: the continuation monad is the refunctionalization of instructions as functions
\k > interpret (Instruction `Then` k)
But alas, I think that this unfortunate melange of instruction, interpreter and continuation does not explain or clarify what is going on; it is the algebraic data type Program
that offers a clear notion of what a monad is and what it means to implement one. Hence, in my opinion, the algebraic data type should be the preferred way of presenting new monads and also of implementing them, at least before program optimizations.
Actually, Program
is not a plain algebraic data type, it is a generalized algebraic data type. It seems to me that this is also the reason why the continuation monad has found more use, despite being conceptually more difficult: GADTs simply weren’t available in Haskell. I believe that the Program
type is a strong argument to include GADTs into a future Haskell standard.
Compared to specialized implementations, like for example s > (a,s)
for the state monad, the operational approach is not entirely without drawbacks.
First, the given implementation of (>>=)
has the same quadratic running time problem as (++)
when used in a leftassociative fashion. Fortunately, this can be ameliorated with a different (fancy) list data type; the operational
library implements one.
Second, and this cannot be ameliorated, we lose laziness. The state monad represented as s > (a,s)
can cope with some infinite programs like
evalState (sequence . repeat . state $ \s > (s,s+1)) 0
whereas the list of instructions approach has no hope of ever handling that, since only the very last Return
instruction can return values.
I also think that this loss of laziness also makes value recursion a la MonadFix
very difficult.
After some initial programming experience in Pascal, Heinrich Apfelmus picked up Haskell and purely functional programming just at the dawn of the new millenium. He has never looked back ever since, for he not only admires Haskell’s mathematical elegance, but also its practicality in personal life. For instance, he was always too lazy to tie knots, but that has changed and he now accepts shoe laces instead of velcro.
]]>This article previously appeared in issue 14 of The Monad.Reader.
In a post to reddit, user CharlieDancey presented a challenge to write a short and clever morse code decoder. Of course, what could be more clever than writing it in Haskell? ;)
In the following, I’ll present a series of solutions that gradually include dichotomic search (the straightforward generalization of binary search), stackbased languages or reverse polish notation, and finally deforestation (also called fusion), an optimization technique specific to purely functional languages like Haskell.
For your convenience, source code for the individual sections is available.
Morse code was designed to be produced and read, or rather heard by humans, who have to memorize the following table:
To write a program that decodes sequences of dots and dashes like
  .. ... . ..  .. .
into letters, we will have to store this table in some form. For instance, we could store each letter and corresponding morse code in a big association list
type MorseCode = String
dict :: [(MorseCode, Char)]
dict =
[("." , 'A'),
("..." , 'B'),
(".." , 'C'),
...
which we name dict
because it’s a dictionary translating between the latin alphabet and morse code. To decode a letter, we simply browse through this list to determine whether it contains a given code of dots and dashes
decodeLetter :: MorseCode > Char
decodeLetter code = maybe ' ' id $ lookup code dict
Decoding a whole sequence of letters is done with
decode = map decodeLetter . words
so that we have for example
> decode "  .. ... . ..  .. ."
"MORSECODE"
Of course, association lists are a rather slow data structure. A binary search tree, trie, or hash table would be a better choice. But fortunately, it is very easy to change data structures in Haskell. All you have to do is to a qualified import of a different module, such as Data.Map, and change the definition of dict
to
dict :: Data.Map.Map MorseCode Char
dict = Data.Map.fromList $
[("." , 'A'),
("..." , 'B'),
(".." , 'C'),
...
and use Data.Map.lookup
instead of Data.List.lookup
.
Faithfully replicating the morse code table is clear and preferable, but makes for quite large source code. So, let’s make an exception today and think of something as clever as we can.
The idea is the following: whenever an encoded letter starts with a dash ''
, we know that it can’t be for example an A or E, because those start with a dot '.'
. And if the next symbol is a dot, then it can’t be a G or M because their second symbol is a dash. With each symbol we read, more and more alternatives disappear until we are left with a single alternative. We can depict this with a binary tree:
At each node, the left subtree contains all the letters that can be obtained by adding a dot to the current letter, while the right subtree contains those letters that can be obtained with a dash. This is illustrated by means of a dotted line connected to the left subtree and a dashed line connected to the right subtree.
Now, to decode a letter, we simply start at the root of the tree and follow the dotted or dashed lines depending on the symbols read. For instance, to decode the sequence "..."
, we have to go right once and then left thrice, ending up at the letter B. This procedure is also called dichotomic search because at each point in our search for the right letter, we ask a yes/no question “Dot or dash?” (a dichotomy) that partitions the remaining search space into to disjoint parts.
Let’s implement this in Haskell. First, we assume that our dictionary is given as a tree
data Tree a = Leaf
 Branch { tag :: a
, left :: Tree a
, right :: Tree a }
dict :: Tree Char
dict = Branch ' ' (Branch 'E' ... ) (Branch 'T' ...)
Of course, writing out the dict
tree in full is going to be repetitive and boring, but let’s just imagine we have already done that. Then, decoding a letter by walking down the tree is simply a left fold over the code word
decodeLetter = tag . foldl (flip step) dict
where
step '.' = left
step '' = right
In this case, the accumulated value of the fold is the current subtree, although the term “accumulate” is a bit misleading in that we don’t make the dictionary bigger; we make it smaller by passing to a subtree.
The curious reader may notice that we have actually implemented a trie.
Now, let’s reduce the source code needed for the dictionary tree. In Haskell, we’d have to write something like
dict = Branch ' ' (Branch 'E' ... ) (Branch 'T' ...)
Compared to the association list, we got rid of the dots '.'
and dashes ''
, they have been made implicit in the structure of our tree. But we still need a lot of parenthesis and applications of the Branch
function.
A clever way to get rid of parenthesis is reverse polish notation, commonly used in stackbased languages. The idea is that a program in such a language is a sequence of instructions that pop and push values from a stack. For example,
1 2 +
is a program to calculate 1 plus 2. Reading from left to right, the first instruction is 1
which pushes the integer 1 onto the stack. The second instruction is 2
, pushing 2 onto the stack. And finally, the instruction +
pops the two topmost integers from the stack and pushes their sum onto the stack. This procedure is visualized as follows:
To see how that eliminates the need for parenthesis, take a look the program
1 2 + 3 4 + +
which calculates the sum (1+2)+(3+4).
To build our dictionary tree, we are going to devise a very similar language. Instead of integers, the stack will store whole trees. In analogy to the numerals, there will be an instruction for pushing a Leaf
onto the stack; and in analogy to +
, there is going to be an instruction for applying the Branch
constructor to the two topmost subtrees.
Here’s the program for building the tree, stored as a string:
program :: String
program = "__5__4H___3VS__F___2 UI__L__+_ R__P___1JWAE"
++ "__6__=B__/_XD__C__YKN__7_Z__QG__8_ __9__0 OMT "
Each character represents one instruction. The underscore '_'
pushes a Leaf
onto the stack while all the other character push a Branch
tagged with the character in question onto the stack. The interpreter for this tiny programming language can be written using a simple left fold:
dict = interpret program
where
interpret = head . foldl exec []
exec xs '_' = Leaf : xs  push Leaf
exec (x:y:xs) c = Branch c y x : xs  combine subtrees
The stack xs
is represented as a list of trees.
We have found a concise way to represent the morse code dictionary in source code, but cleverness does not end here. For instance, how about eliminating the tree entirely?
More precisely, it seems wasteful to construct the dictionary tree by creating leaves and connecting branches only to deconstruct them again when decoding letters. Wouldn’t it be more efficient to interpret the morse code tree as a call graph rather than as a data structure Tree Char
?
In other words, imagine say a function e
corresponding to the node labeled ‘E’ in the tree, and working as follows:
e :: MorseCode > Char
e ('.':ds) = i ds
e ('':ds) = a ds
e [] = 'E'
If the next symbol is a dot or dash, we proceed with the functions i
or a
corresponding to 'I'
and 'A'
respectively. And in case we have already reached the end of the code, we know that we have decoded an 'E'
. The functions i
and a
work just like e
, except that they proceed with other letters; and the morse code tree becomes their call graph.
Now, writing all these functions by hand would be most errorprone and tedious, they all look the same. But abstracting repetitive patterns is exactly where functional programming shines; we are to devise a combinator that automates the boring parts of the code for us.
What does this combinator look like? Well, the nonboring parts are the letter it represents and the two followup functions, so these will be parameters:
combinator :: Char  letter
> (MorseCode > Char)  function for dot
> (MorseCode > Char)  function for dash
> (MorseCode > Char)  result function
That’s all we need; we can implement
combinator c x y = \code > case code of
'.':ds > x ds
'':ds > y ds
[] > c
which allows us to write
e = combinator 'E' i a
i = combinator 'I' s u
... etc ...
But wait, what have we done? The type signature of combinator
looks exactly like that of
Branch :: Char  letter
> Tree Char  tree for dot
> Tree Char  tree for dash
> Tree Char  result tree
but with the type Tree Char
replaced by (MorseCode > Char)
! In fact, let’s rename the combinator to lowercase branch
branch c x y = \code > case code of
'.':ds > x ds
'':ds > y ds
[] > c
and observe that the tree of functions now reads
e = branch 'E' i a
i = branch 'I' s u
... etc ...
or
e = branch 'E' (branch 'I' ... ) (branch 'A' ...)
if we inline the function definitions. But this is of course just the tree from the section on dichotomic search with each Branch
replaced by branch
!
In other words, branch
is like a dropin replacement for the constructor Branch
. To implement the morse code tree directly as functions instead of as an algebraic data type, we simply replace every occurrence of Branch
with branch
in our previous code. Thus, we have a new implementation
dict :: MorseCode > Char
dict = interpret program
where
interpret = head . foldl exec []
exec xs '_' = leaf : xs  push leaf
exec (x:y:xs) c = branch c y x : xs  combine subtrees
decodeLetter = dict
where
leaf = undefined
is a trivial replacement for the Leaf
constructor.
To further understand how and why the replacement of Branch
by branch
works, it is instructive to derive it systematically from just the program text.
In particular, consider the implementation of decodeLetter
from the earlier section on dichotomic search:
decodeLetter :: MorseCode > Char
decodeLetter = tag . foldl (flip step) dict
where
step '.' = left
step '' = right
In this formulation, the focus is on the recursion over the input string performed by foldl
, neglecting the recursive descent into the dictionary tree. Let’s systematically rewrite this code to highlight the latter.
First, we interpret it as converting the dictionary tree into a function that decodes letters
decodeLetter = decodeWith dict
decodeWith :: Tree Char > (MorseCode > Char)
decodeWith dict = tag . foldl (flip step) dict
where
step '.' = left
step '' = right
Then, we turn the foldl
into explicit recursion
decodeWith dict [] = tag dict
decodeWith dict (d:ds) = decodeWith (step d dict) ds
where
step '.' = left
step '' = right
and dissolve the step
function into the pattern matching
decodeWith dict [] = tag dict
decodeWith dict ('.':ds) = decodeWith (left dict) ds
decodeWith dict ('':ds) = decodeWith (right dict) ds
We group the pattern matching on the input code into a case expression
decodeWith dict = \code > case code of
[] > tag dict
('.':ds) > decodeWith (left dict) ds
('':ds) > decodeWith (right dict) ds
and finally, we replace the field selectors by pattern matching on the tree, also noting the case of Leaf
:
decodeWith Leaf = undefined
decodeWith (Branch c x y) = \code > case code of
[] > c
('.':ds) > (decodeWith x) ds
('':ds) > (decodeWith y) ds
The recursion on the tree is apparent now, decodeWith
simply traverses both subtrees and combines the results. We can make this even more evident by appealing to the combinator we defined in the previous section:
decodeWith Leaf = leaf
decodeWith (Branch c x y) = branch c (decodeWith x) (decodeWith y)
In other words, decodeWith
takes a tree and simply substitutes each Leaf
constructor with leaf
and each Branch
constructor with branch
.
Of course, first using Branch
and then replacing it with branch
is a waste; we should rather use branch
from the start and thus cut away the intermediate tree. That’s exactly what we’ve done in the previous section!
Replacing constructors is a very general pattern, not restricted to binary trees. For instance, you probably know following function:
foldr f z [] = z
foldr f z (x:xs) = x `f` foldr f z xs
It’s the good old fold! And at its heart, it just replaces the constructors of the list data type: z
is the substitute for the empty list []
, and f
is put in lieu of the (:)
constructor.
In its honor, any function that substitutes constructors in such fashion is known as a generalized fold. A more exotic but commonly used alternative name is “catamorphism”. In other words, decodeWith
is a catamorphism.
In this generality, the idea of deforestation is that instead of first constructing a data structure and then chopping it up with a catamorphism, it is more efficient to saw the pieces properly at the time of creation. For example, creating a list and then adding all elements
sum [1..100] = foldr (+) 0 (enumFromTo 1 100)
is less efficient than adding the elements as they are created
sumFromTo a b  changes to enumFromTo
 a > b = 0  0 replaces []
 otherwise = a + sumFromTo (a+1) b  + replaces (:)
Merging two functions in this fashion is also called fusion.
Performing deforestation manually, as we did, sacrifices reusability: the morse code tree only exists as a call graph and cannot be printed out anymore, and sumFromTo
is not nearly as useful as are sum
and enumFromTo
were.
Hence, the goal is to teach the compiler to automatically fuse catamorphisms with their structure creating counterparts, the so called anamorphisms. That’s what efforts like a shortcut to deforestation and the recent stream fusion are set out to do, yielding dramatic gains in efficiency while preserving the compositional style.
I hope you had fun doodling around with morse code; I certainly did. If you long for more, here a few suggestions:
While we’re at it, the intermediate lists created by words
in
decode = map decodeLetter . words
can be deforested as well and fused into the letter decoding. Try it yourself, or see the example source code.
Polish notation, the converse of reverse polish notation, puts function symbols before their arguments and is thus closer to the Haskell syntax. When the arities of the functions are known in advance, like 2 for Branch
and 0 for Leaf
, this notation doesn’t require parenthesis either. Write a program
in polish notation and a corresponding tiny interpreter to create the morse code tree.
Since deforestation is supposed to make things faster, how about comparing our deforested morse code tree to say an implementation in C? But instead of doing benchmarks, let me give two general remarks on the machine representation.
First, one might think that the call graph we’ve built is compiled to function calls, with e
calling i
or a
and each function having different machine code, just as we originally intended. But this is actually not the case. When defined with branch
, the functions are represented as closures, sharing the same machine code but carrying different records that store their free variables c
, x
and y
. This is not very different from storing c
, x
and y
in a Branch
constructor. Template Haskell or some kind of partial evaluation would be needed to hardcode the free variables into the executable. For more on the Haskell execution model, take a look at the GHC commentary.
Second, all our implementations decipher a letter by follow a chain of pointers: descending a tree means following pointers to deeper nodes, and making procedure calls means repeatedly jumping to different code parts. This of course raises questions of cache locality, branch prediction or memory requirements for storing all these pointers.
In C, however, there is another technique available: instead of following a chain of pointers to get to the destination, the address of the final stop is calculated directly with pointer arithmetic. The following example illustrates this:
#include <stdio.h>
#include <string.h>
char buf[99], tree[] = " ETIANMSURWDKGOHVF L PJBXCYZQ ";
int main() {
while (scanf("%s", buf)) {
int n, t=0;
for(n=0; buf[n]!=0; n++)
t = 2*t + 1 + (buf[n]&1); /* compute destination */
putchar(tree[t]); /* fetch letter */
}
}
Here, the tree is represented as an array and paths to nodes are encoded as (zeroless) binary numbers. The program calculates the path t
to the proper letter and then fetches it with an O(1) memory lookup. Compare also the morse code translator by reddit user kayamon.
Since the number of memory accesses is kept to a minimum, this technique is the most efficient; but it is impossible to reproduce with algebraic data types alone. Fortunately, arrays libraries are readily available in Haskell as well.
The main drawback of this approach, and of arrays in general, is the lack of clarity; indices are notorious for being nondescript and messy. The raw index arithmetic in the example C code above sure seems like magic!
Such magic is best hidden behind a descriptive abstract data type. How about rewriting the example in Haskell such that decodeLetter
looks exactly like the one in the section on dichotomic search? In other words, the abstract data type is to support the functions left
, right
and tag
. One possible solution can be found in the source code accompanying this article.
HyperHaskell
— the strongly hyped Haskell interpreter —
version 0.2.1.0
HyperHaskell is a graphical Haskell interpreter (REPL), not unlike GHCi, but hopefully more awesome. You use worksheets to enter expressions and evaluate them. Results are displayed using HTML.
HyperHaskell is intended to be easy to install. It is crossplatform and should run on Linux, Mac and Windows. That said, the binary distribution is currently only built for Mac. Help is appreciated!
HyperHaskell’s main attraction is a Display class that supersedes the good old Show class. The result looks like this:
Version 0.2.1.0 supports HTML output, binding variables, text cells and GHC language extensions.
For more on HyperHaskell, see its project page on Github.
(EDITED 07 Aug 2018): Related projects that I know of:
Unfortunately, IHaskell has a reputation for being difficult to install, since it depends on Jupyter. Also, I don’t think the integration of Jupyter with the local Desktop environment and file system is particularly great. Haskell for Mac is partially proprietary and Mac only. Hence this new effort.
]]>BOB is the forum for developers, architects and builders to explore technologies beyond the mainstream and to discover the best tools available today for building software.
I gave a talk on functional reactive programming; slides and videos are available.
I particularly liked the talk by Andres Löh on the Servant library, which allows you to scaffold a REST API entirely on the type level, and the talk by Stefan Wehr on the Swift language by Apple, which showed that functional programming already has become mainstream.
]]>This is essentially a maintenance release that fixes an important type mistake. Atze van der Ploeg has kindly pointed out to me that the switchB
and switchE
combinators need to be in the Moment
monad
switchB :: Behavior a > Event (Behavior a) > Moment (Behavior a)
switchE :: Event (Event a) > Moment (Event a)
This is necessary to make their efficient implementation match the semantics.
Apart from that, I have mainly improved the API documentation, adding graphical figures to visualize the main types and adding an introduction to recursion in FRP. Atze has also contributed a simplified implementation of the model semantics, it should now be much easier to digest and understand. Thanks!
The full changelog can be found in the repository. Happy hacking!
]]>bytestring
library, which is currently at version 0.10.6.0.
Every now and then, however, a library author feels that his work has reached the level of completion that he originally envisioned before embarking on the unexpectedly long and perilous journey of actually building the library. Today is such a day. I am very pleased to announce the release of version 1.0 of my reactivebanana library on hackage!
As of now, reactivebanana is a mature library for functional reactive programming (FRP) that supports firstclass Events and Behaviors, continuous time, dynamic event switching, pushbased performance characteristics and garbage collection.
As planned, the API has changed significantly between the versions 0.9 and 1.0. The major changes are:
Dynamic event switching has become much easier to use. On the other hand, this means that all operations that depend on the previous history, like accumE
or stepper
are now required to be in the Moment
monad.
Event
s may no longer contain occurrences that are simultaneous. This is mainly a stylistic choice, but I think it makes the API simpler and makes simultaneity more explicit.
If you have been using the sodium FRP library, note that Stephen Blackheath has deprecated the Haskell variant of sodium so that he can focus on his upcoming FRP book and on the sodium ports for other languages. The APIs for sodium and reactivebanana 1.0 are very similar, porting should be straightforward.
Thanks to Markus Barenhoff and Luite Stegeman, reactivebanana 1.0 now also works with GHCJS. If it doesn’t, in particular due to premature garbage collection, please report any offending programs.
Of course, the library will continue to evolve in the future, but I think that it now has a proper foundation.
Now, go forth and program functional reactively!
]]>Since its early iterations (version 0.2), the goal of reactivebanana has been to provide an efficient pushbased implementation of functional reactive programming (FRP) that uses (a variation of) the continuoustime semantics as pioneered by Conal Elliott and Paul Hudak. Don’t worry, this will stay that way. The planned API changes may be radical, but they are not meant to change the direction of the library.
I intend to make two major changes:
The API for dynamic event switching will be changed to use a monadic approach, and will become more similar to that of the sodium FRP library. Feedback that I have received indicates that the current approach using phantom types is just too unwieldy.
The type Event a
will be changed to only allow a single event occurrence per moment, rather than multiple simultaneous occurrences. In other words, the types in the module Reactive.Banana.Experimental.Calm
will become the new default.
These changes are not entirely cast in stone yet, they are still open for discussion. If you have an opinion on these matters, please do not hesitate to write a comment here, send me an email or to join the discussion on github on the monadic API!
The new API is not without precedent: I have already implemented a similar design in my threepennygui library. It works pretty well there and nobody complained, so I have good reason to believe that everything will be fine.
Still, for completeness, I want to summarize the rationale for these changes in the following sections.
One major impediment for early implementations of FRP was the problem of socalled time leaks. The key insight to solving this problem was to realize that the problem was inherent to the FRP API itself and can only be solved by restricting certain types. The first solution with firstclass events (i.e. not arrowized FRP) that I know is from an article by Gergeley Patai [pdf].
In particular, the essential insight is that any FRP API which includes the functions
accumB :: a > Event (a > a) > Behavior a
switchB :: Behavior a > Event (Behavior a) > Behavior a
with exactly these types is always leaky. The first combinator accumulates a value similar to scanl
, whereas the second combinator switches between different behaviors – that’s why it’s called “dynamic event switching”. A more detailed explanation of the switchB
combinator can be found in a previous blog post.
One solution the problem is to put the result of accumB
into a monad which indicates that the result of the accumulation depends on the “starting time” of the event. The combinators now have the types
accumB :: a > Event (a > a) > Moment (Behavior a)
switchB :: Behavior a > Event (Behavior a) > Behavior a
This was the aforementioned proposal by Gergely and has been implemented for some time in the sodium FRP library.
A second solution, which was inspired by an article by Wolfgang Jeltsch [pdf], is to introduce a phantom type to keep track of the starting time. This idea can be expanded to be equally expressive as the monadic approach. The combinators become
accumB :: a > Event t (a > a) > Behavior t a
switchB :: Behavior t a
> Event t (forall s. Moment s (Behavior s a)
> Behavior t a
Note that the accumB
combinator keeps its simple, nonmonadic form, but the type of switchB
now uses an impredicative type. Moreover, there is a new type Moment t a
, which tags a value of type a
with a time t
. This is the approach that I had chosen to implement in reactivebanana.
There is also a more recent proposal by Atze van der Ploeg and Koen Claessen [pdf], which dissects the accumB
function into other, more primitive combinators and attributes the time leak to one of the parts. But it essentially ends up with a monadic API as well, i.e. the first of the two mentioned alternatives for restricting the API.
When implementing reactivebanana, I intentionally decided to try out the second alternative, simply in order to explore a region of the design space that sodium did not. With the feedback that people have sent me over the years, I feel that now is a good time to assess whether this region is worth staying in or whether it’s better to leave.
The main disadvantage of the phantom type approach is that it relies not just on rankn types, but also on impredicative polymorphism, for which GHC has only poor support. To make it work, we need to wrap the quantified type in a new data type, like this
newtype AnyMoment f a = AnyMoment (forall t. Moment t (f t a))
Note that we also have to parametrize over a type constructor f
, so that we are able to write the type of switchB
as
switchB :: forall t a.
Behavior t a
> Event t (AnyMoment Behavior a)
> Behavior t a
Unfortunately, wrapping and unwrapping the AnyMoment
constructor and getting the “forall”s right can be fairly tricky, rather tedious, outright confusing, or all three of it. As Oliver Charles puts it in an email to me:
Right now you’re required to provide an
AnyMoment
, which in turn means you have totrim
, and then you need aFrameworksMoment
, and then anexecute
, and then you’ve forgotten what you were donig! :)
Another disadvantage is that the phantom type t
“taints” every abstraction that a library user may want to build on top of Event
and Behavior
. For instance, image a GUI widget were some aspects are modeled by a Behavior
. Then, the type of the widget will have to include a phantom parameter t
that indicates the time at which the widget was created. Ugh.
On the other hand, the main advantage of the phantom type approach is that the accumB
combinator can keep its simple nonmonadic type. Library users who don’t care much about higherorder combinators like switchB
are not required to learn about the Moment
monad. This may be especially useful for beginners.
However, in my experience, when using FRP, even though the firstorder API can carry you quite far, at some point you will invariably end up in a situation where the expressivitiy of dynamic event switching is absolutely necessary. For instance, this happens when you want to manage a dynamic collection of widgets, as demonstrated by the BarTab.hs example for the reactivebananawx library. The initial advantage for beginners evaporates quickly when faced with managing impredicative polymorphism.
In the end, to fully explore the potential of FRP, I think it is important to make dynamic event switching as painless as possible. That’s why I think that switching to the monadic approach is a good idea.
The second change is probably less controversial, but also breaks backward compatibility.
The API includes a combinator for merging two event streams,
union :: Event a > Event a > Event a
If we think of Event
as a list of values with timestamps, Event a = [(Time,a)]
, this combinator works like this:
union ((timex,x):xs) ((timey,y):ys)
 timex < timey = (timex,x) : union xs ((timey,y):ys)
 timex > timey = (timey,y) : union ((timex,x):xs) yss
 timex == timey = ??
But what happens if the two streams have event occurrences that happen at the same time?
Before answering this question, one might try to argue that simultaneous event occurrences are very unlikely. This is true for external events like mouse movement or key presses, but not true at all for “internal” events, i.e. events derived from other events. For instance, the event e
and the event fmap (+1) e
certainly have simultaneous occurrences.
In fact, reasoning about the order in which simultaneous occurrences of “internal” events should be processed is one of the key difficulties of programming graphical user interfaces. In response to a timer event, should one first draw the interface and then update the internal state, or should one do it the other way round? The order in which state is updated can be very important, and the goal of FRP should be to highlight this difficulty whenever necessary.
In the old semantics (reactivebanana versions 0.2 to 0.9), using union
to merge two event streams with simultaneous occurrences would result in an event stream where some occurrences may happen at the same time. They are still ordered, but carry the same timestamp. In other words, for a stream of events
e :: Event a
e = [(t1,a1), (t2,a2), …]
it was possible that some timestamps coincide, for example t1 == t2
. The occurrences are still ordered from left to right, though.
In the new semantics, all event occurrences are required to have different timestamps. In order to ensure this, the union
combinator will be removed entirely and substituted by a combinator
unionWith f :: (a > a > a) > Event a > Event a > Event a
unionWith f ((timex,x):xs) ((timey,y):ys)
 timex < timey = (timex,x) : union xs ((timey,y):ys)
 timex > timey = (timey,y) : union ((timex,x):xs) yss
 timex == timey = (timex,f x y) : union xs ys
where the first argument gives an explicit prescription for how simultaneous events are to be merged.
The main advantage of the new semantics is that it simplifies the API. For instance, with the old semantics, we also needed two combinators
collect :: Event a > Event [a]
spill :: Event [a] > Event a
to collect simultaneous occurrences within an event stream. This is no longer necessary with the new semantics.
Another example is the following: Imagine that we have an input event e :: Event Int
whose values are numbers, and we want to create an event that sums all the numbers. In the old semantics with multiple simultaneous events, the event and behavior defined as
bsum :: Behavior Int
esum :: Event Int
esum = accumE 0 ((+) <@> e)
bsum = stepper 0 esum
are different from those defined by
bsum = accumB 0 ((+) <@> e)
esum = (+) <$> bsum <@ e
The reason is that accumE
will take into account simultaneous occurrences, but the behavior bsum
will not change until after the current moment in time. With the new semantics, both snippets are equal, and accumE
can be expressed in terms of accumB
.
The main disadvantage of the new semantics is that the programmer has to think more explicitly about the issue of simultaneity when merging event streams. But I have argued above that this is actually a good thing.
In the end, I think that removing simultaneous occurrences in a single event stream and emphasizing the unionWith
combinator is a good idea. If required, s/he can always use an explicit list type Event [a]
to handle these situations.
(It just occurred to me that maybe a type class instance
instance Monoid a => Monoid (Event a)
could give us the best of both worlds.)
This summarizes my rationale for these major and backward incompatible API changes. As always, I appreciate your comments!
]]>This means that the library finally features all the ingredients that I consider necessary for a mature implementation of FRP:
union
, …)The banana is ripe! In celebration, I am going to drink a strawberry smoothie and toast Oliver Charles for his invaluable bug reports and Samuel Gélineau for tracking down a nasty bug in detail. Cheers!
While the library internals are now in a state that I consider very solid, the library is still not quite done yet. When introducing the API for dynamic event switching in version 0.7, I had the choice between two very different regions of the design space: An approach using a monad and an approach using phantom types. I had chosen the latter approach, mostly because the sodium FRP library had chosen to explore the former region in the design space, so we could cover more of the design space together this way. But over the years, people have sent me questions and comments, and it is apparent that the phantom type approach is too unwieldy for practical use. For the next version of reactivebanana, version number 1.0, I plan to radically change the API and switch to the monadic approach. While we’re at it, I also intend to remove simultaneous occurences in a single event. I will discuss these upcoming API changes more thoroughly in a subsequence blog post.
]]>