Jonathan Lam

Core Developer @ Hudson River Trading


Continuations, macros, and monads

On 7/12/2021, 7:57:53 PM

Return to blog

(I'll use the abbreviations FP for functional programming and IP for imperative programming.)

These three are topics that seem to be fundamental in FP (at least from my limited experience in Haskell and Scheme) but somewhat foreign in IP. To me, these three concepts are difficult to understand either in what exactly they are or what they achieve (monads and continuations) or in their implementation (macros1).

There are already plenty of explanations out there on the web, but I wanted to write down the things I found hardest to understand in a simple way.2

These explanations were supposed to be short and sweet, but the process of writing this pulled in many questions and additional thoughts (as the writing process usually does, but it the effect was exaggerated by how loaded each of these concepts is). So I've given each section a tl;dr for those in a hurry.

(First-class (reified)) Continuations

Update 9/6/21: my final project for the I.S. was about continuations. In the report I go into greater depth exploring the meaning of continuations3.


A continuation (at a given point in the code in the IP, or for an expression in the FP sense) can be thought of as the next instruction the code will take, or the expression in which the current expression's value will be used, along with the context of that "next expression," or "continuation of the expression."

Long version

The idea is more natural in the FP sense with expressions. Take the following Scheme code as example (from the recent post about streams):

(define (stream-ref s n)
  ;; get the nth value of a stream
  (if [zero? n]
      (stream-car s)
      (stream-ref (stream-cdr s) (1- n))))

Take the expression (stream-car s). Its continuation is in the if expression, whose continuation is the context of whatever calls the procedure stream-ref. It's easy to say that the continuation of an expression is the expression that calls or uses the value of the expression in question.

The continuation also includes its context -- all of the lexically scoped values that come along with the current expression. Thus, for the following snippet:

(let ([x 2] [y 3])
  (* x (1+ (* y 4))))

the continuation of the expression (1+ (* y 4)) is the multiplication expression (* x ...), with the value 2 bound to x.

Now that we have an intuitive idea of an expression's continuation, now what? Well, we can use it as a control flow structure and jump to some continuation.

This is different than a C/Assembly label (goto label, switch case or default labels, or the implicit labels used by break or continue statements in loops). Those labels are simply branch instructions, and do not capture the context at all, which (among other reasons) makes goto a bad general programming practice. Other issues with gotos are skipped declarations (which is either an error or very dangerous) and the lack of cross-procedure jumps (which can be achieved with setjmp/longjmp, but these also don't capture context and should be handled with care).

It is more similar to calling a function and returning from that function to the previous function. In this case, we restore the previous context by popping the last stack frame off the stack. In fact, continuations are often used in Scheme to provide non-immediate returns4. However, continuations are inherently more powerful than a return point for two reasons: they are "multiply-resumable," which means we can call the continuation multiple times; and they are "optionally-resumable," which means that we don't ever have to call them (similar to the implicit labels in a for loop used for continue and break).

Because continuations require saving the entire context (call stack), we can think of it as a snapshot of a thread5. Because it is multiply-resumable, we must save a copy of the thread -- then this thread representation becomes a reification (concrete implementation) of continuations. Reification is important in order to have first-class continuations as we do in Scheme.

In IP languages, we don't typically have a reified version of continuations. We can emulate continuations using lambdas by explicitly passing the continuation of an expression to each of its subexpressions -- this is called continuation-passing style (CPS), as opposed to the direct style we are used to. This depends on lambdas capturing their context. Some examples are given in the links below.

(Another interesting topic that continuations seem to be useful for is coroutines, but I'm not too familiar with them. See the last link. There is also a famous "yin/yang" problem, which is stated on the Wikipedia page.)


We need to first get a handle to a continuation we (might) want to use later using the call/cc ("call with current continuation"), and then we can "call" that continuation at a later point, which brings us back to that earlier context. The general form is as so:

  (lambda (cc)

where cc is the current continuation (a procedure) of the lambda6. If we want to use the continuation, then we call cc with the value we want to send to the current continuation as a value. If we don't call the continuation, then the evaluated value of the lambda will be passed to its continuation as usual. As a very simple example, consider the following nonsensical example (which is basically a nonlocal goto):

(define (h x cc)
  (if [< x 0] (cc -1) 1))

(define (g x)
  (1+ (call/cc (lambda (cc) (1+ (h x cc))))))

(display (g -1))    ; displays 0, calls cc
(display (g 1))     ; displays 3, doesn't call cc

A more illustrative example comes from the Wikipedia page for call/cc:

(define (f return)
  (return 2)

(display (f (lambda (x) x)))    ; displays 3, doesn't call cc (return)
(display (call/cc f))           ; displays 2, calls cc (return)

It is also easy to see that continuations are first-class: they look just like regular procedures, and can be handled just as any other procedure can be handled.


(Syntactic) Macros


Macros are the simplest concept of these three. A macro is a compile-time (preprocessor-time) code rewriting. At the most basic level, as we often see in C/C++, a macro7 can act as a compile-time variable that gets replaced with a constant value. A more complicated version is a macro that takes arguments, which we also see in C/C++. However, while C-style macros give us greater expressive power, they can be dangerous (unhygienic) and only provide basic text-replacement, while macros in Scheme8 are syntactic.

Long version

A few general notes about macros before we start talking about Scheme's macros:

There are some major problems with C-style lexical (text-replacement) macros:

The answer to these problems are syntactic macros13. These can only generate code that is syntactically correct and closed by (functionally) manipulating the input AST into some valid output AST14. We take as input a list (of valid expressions), and output some other (valid) expression, whether it be an atom or a list. The nice thing about Scheme is that each function call or expression is an ordinary list (homoiconicity), so we naturally output valid Scheme code15.

Homoiconicity not only addresses the first point (structural/syntactic correctness), but also the second point. Because Lisp is so great for LISt Processing, we can transform Lisp code in arbitrary ways just as if we were transforming an ordinary list. This elevates macros from being a text-replacement tool to an arbitrary compile-time procedure that takes an expression as an argument and returning another expression16.

Syntactic macros also allow for hygiene. Hygienic macros were added to the core standard as of R5RS, and Scheme was the first language to implement hygienic macros17. I don't know much about the implementation details18, but my understanding is that Scheme macros automatically prevent capture for any local variables.

Some examples off the top of my head for macros is to introduce new syntactic constructs ("syntactic sugar") (e.g., list comprehensions, loops), new control flow behavior (delayed evaluation, custom short-circuiting behavior), and attaching metadata to expressions (e.g., automatic differentiation).

A quick note about decorators: they also can be thought of as metaprogramming by modifying the behavior of functions or other constructs, but they are (AFAIK) a run-time concept and do not have access to the syntax of the language like macros do. See the SWE Stack Exchange link below about attempting to make Python decorators as powerful as Lisp macros.

(I am very interested in macros in other languages, such as Scala and Rust. I am especially interested in the latter and if/how it manages to stay safe and hygienic.)


I found that learning the syntax for macros from the standard was abstruse (see §9.2, §11.2.2, §11.18, and §11.19 from R6RS), and this actually kept me from learning how to use macros for some time. I only started learning how to use macros when I wanted to handle automation differentiation transparently (I never got around to implementing this, but I still do), or when macros are required in order to create lambda expressions for lazily-evaluated streams. The syntax is not actually that hard, and there are plenty of tutorials out there that are better reading than R6RS, but the following is my intuitive understanding of it.

The example of lazy streams is very simple. Recall from the post about lazy streams that we want the stream primitive:

(stream-cons a b)

to be equivalent to the following:

(cons a (lambda () b))

We cannot rewrite stream-cons as an ordinary function because of applicative-order evaluation (the expression b would be evaluated eagerly). Of course, the user could type out the latter form every time they wanted to use the former, but this makes streams and lazy evaluation a lot less interesting. We want the laziness to be transparent and implicit.

In this case, all we need is a simple replacement, and C macros would do just as well in this case because we don't have the hygiene issue. The code for this is:

(define-syntax stream-cons
  ;; have to define as a macro so b is not eagerly evaluated
  (syntax-rules ()
    ([_ a b]
     (cons a (lambda () b)))))

There are a few moving parts here. First of all, we have the keyword define-syntax. Just like with , we bind a value (a syntax transformer created by syntax-rules) to a name (in this case, stream-cons). There are also let-syntax and letrec-syntax, which are analogous to let and letrec.

The syntax-rules expression looks like the following:

(syntax-rules literals (pattern template) ...)
literals is a list of symbols that should only be matched literally. pattern represents an input expression pattern to match, and template is the transformed code to emit. We can have multiple patterns-template pairs; the first one that matches the sexpr will be transformed into the corresponding template.

In the simple example above, we won't match any literals in the patterns, so literals is an empty list. We also only have one pattern and template, but there can be multiple patterns in the same way that a cond statement can have multiple conditions.

Just like in C, the use of macros is indistinguisable from a function invocation: the infinite stream of ones, defined by the following:

(define ones (stream-cons 1 ones))

looks like a function call but it actually gets rewritten to the following:

(define ones (cons 1 (lambda () ones)))

Here, the pattern (_ a b) matches with (stream-cons 1 ones). Thus, _ matches with stream-cons19, a matches with 1, and b matches with ones. Of course, a and b can be more complicated expressions.

I don't think I can do justice to a full explanation of the pattern and template syntax. Pattern-matching is common in Haskell, where it is used in every declaration. The easiest way for me to describe patterns to the imperative programmer is that they are very much like Python's or Javascript's destructuring assignment of (potentially nested) lists20, except that we can match literals and handle variable-length lists of patterns using the ellipses:

I've only described the syntax-rules construct to generate a macro. There is also the more low-level syntax-case. I personally don't understand how to use it yet (again, the (Guile) documentation is somewhat confusing to me) but we can compare/contrast it with syntax-rules at a high-level. The first example shown on the Guile documentation is telling:

(define-syntax when
  (syntax-rules ()
    ((_ test e e* ...)
     (if test (begin e e* ...)))))

(define-syntax when
  (lambda (x)
    (syntax-case x ()
      ((_ test e e* ...)
       #'(if test (begin e e* ...))))))

If the two forms are equivalent, this shows that syntax-rules actually returns a procedure. This makes sense, as we've established that macros are functions mapping code to other code.


Monads (and functors, applicatives)

It will be hard for me to do justice to these three very difficult ideas. I have not really used monads to any significant degree, so I am not sure I am qualified to write this in the first place, but the topic is so fascinatingly pervasive that I had to write about it to some degree. The article "Functors, Applicatives, and Monads in Pictures," and the Stack Overflow answer to "Why are side effects modeled as monads in Haskell?" were very helpful to me. I will defer a long explanation to them and all of the other articles mentioned in the Haskell wiki, and only provide a very high-level overview here with a few concrete examples for motivation.


If you've ever felt that FP is inherently lacking something that you can do in IP, especially handling non-pure functions, dealing with sequences of statements rather than expressions, or managing state, chances are the solution is monads. Monads offer a catch-all for all non-functional aspects in a functional manner, handling notably difficult concepts such as I/O, PRNGs, sequential or other custom control flows24, hashtables, and optional expressions in a uniform way. It also automatically allows us to be aware of any impure behavior in our code because any expression that contains a monad will itself have to be a monadic expression, and its type is indicated as a monadic type.

The general idea

A monad can be thought of as a wrapper around a value. This wrapper is imbued with certain properties, has to implement the Monad typeclass, and should follow some general laws25. This monad "marks" the wrapped result as the result of some impure/nondeterministic computation ("action").

All a monadic type26 has to do to implement Monad is to implement the return method, which returns a wrapped monadic version of its input argument, and a method (the bind operator) that takes a monad, a function that converts an unwrapped value to a monadic (wrapped) value (such as return), and returns a monad. It's a little confusing to hear from words alone, so here is a simplified definition of the Monad typeclass:

class Monad m where
  (>>=)         :: m a -> (a -> m b) -> m b
  return        :: a -> m a

This seemingly-obscure bind operator can be interpreted to mean: on the left we have a monad, and on the right we have a function to create another monad using the previous monad. In this way we can "pass" values from one monad to another, allowing us to sequence impure computations by passing the result of one value to the next. A really oversimplified interpretation in Scheme might look something like:

(define (>>= m1 a->m2)
  (let ([m1-result (monad-get-value m1)])
    ;; can do some logic here with m1-result to produce m1-result'
    (a->m2 m1-result'))

Of course, monads are a generic type and the specific logic heavily depends on the monadic type, but this is just to get the idea27. A sequence of monadic values essentially translates to a nested let statements. This is nice because it feels more functional: we are expressing a sequence of mutations as a series of nested expressions with the state being passed around as parameters rather than as a global side effect28.

Other nice features of monads are that we can support optional values and control flow with impure values. The former is simply the result of having a container type, and the well-known example is the Maybe monad. The latter allows us to implement all sorts of nice language features, such as continuations, and non-functional data structures, such as hash tables, all within the same neat framework.

It should also be noted that monads are not a language construct specific to Haskell, but a generic programming design pattern that can be implemented in any language (similar to how continuations can be implemented in any language with lambdas by using CPS). The "Haskell wiki: Monad" article has a section about monad implementations in other languages.

The monad hierarchy

Monads are a subclass of applicative functors (or simply applicatives), which are themselves a subclass of functors. These are all discussed in the article "Functors, Applicatives, and Monads in Pictures." The (really oversimplified) general idea is that functors are wrapped values on which you can apply a function, applicatives are wrapped values on which you can apply wrapped functions, and monads are wrapped values on which you can apply more general functions that can return other wrapped values29.

From the visual article, I'm able to easily see how applicatives are a subclass of functors, but the hierarchy between applicatives and monads was less clear. This was made clear to me by this Stack Overflow answer, which states that Applicative and Monad both support the return (monad) or pure (applicative) method as well as the >> (monad) or *> (applicative) methods, but Monads have to additionally implement the >>= (bind) operator. The bind operator is a more general version of the >> operator that allows you to use the previous wrapped monad value, and thus allows monads to support sequential (chained) operations. To put it even more concretely, here are simplified versions of the Functor, Applicative, and Monad typeclasses all next to one another:

class Functor f where
  fmap          :: (a -> b) -> f a -> f b

class Functor f => Applicative f where
  pure          :: a -> f a
  (<*>)         :: f (a -> b) -> f a -> f b

class Applicative f => Monad f where
  return        :: a -> f a
  (>>=)         :: f a -> (a -> f b) -> f b

In particular, we have the following equivalences (copied from the Hackage page on Control.Monad):

pure            = return
m1 <*> m2       = m1 >>= (x1 -> m2 >>= (x2 -> return (x1 x2)))
(*>)            = (>>)

We also have the following default definitions of *> (in the Applicative class) and >> (in the Monad class) that shows their relationships to <*>:

u *> v          = (id <$ u) <*> v
m1 >> m2        = m1 >>= const m2

Monads are also a subclass of arrows, which are probably around the same on the hierarchy as applicatives, but not the same. I don't know enough about arrows to talk about them, but they seem to be less popular in Haskell discussions than monads.

Examples of monads

I've already mentioned a few common types of monads, and I'll mention them here again. The Haskell wiki and the Hackage website have many more examples.

The stereotypical example of an impure calculation is that which requires input from a human or some other external interface System.IO. Another type of nondeterministic calculation is a random number generator, and this too is a monad Control.Monad.Random.*.

The list class is a monad. This is actually very intuitive upon inspection of the implementation (taken from Learn you a Haskell):

instance Functor [] where
  fmap          = map

instance Applicative [] where
  pure x        = [x]
  fs <*> xs     = [f x | f <- fs, x <- xs]

instance Monad [] where
  return x      = [x]
  xs >>= f      = concat (map f xs)

The other common example in the introductory study of monads is Data.Maybe, which we've already used many times in examples. This is a data type that wraps an optional value and has custom behavior depending on whether the value is present or not.

data Maybe a    = Nothing | Just a

instance Functor Maybe where
  fmap f (Just x) = Just (f x)
  fmap f Nothing = Nothing

instance Applicative Maybe where
  pure          = Just
  Nothing <*> _ = Nothing
  (Just f) <*> something = fmap f something

instance Monad Maybe where
  return x      = Just x
  Nothing >>= f = Nothing
  Just x >>= f  = f x

Other monads relevant to this discussion include Control.Monad.Reader, Control.Monad.Writer, Data.Hashtable.Class (a typeclass that is based on the Control.Monad.ST monad), and Control.Monad.Cont.



1. At least the Scheme syntax for it.

2. Another concept I could add to this list is laziness, which has enormous implications but the and motivation for which is intuitive.

3. I would've put this in a new blog post but there's no easy way to convert LaTeX to this Pug format I'm using.

4. Extending off of this idea, we can use continuations to return from nested procedures without having to rewind the whole stack. Similarly, exception handling control flow can be handled this way, also with the potential to cross procedural boundaries. I used this when I wrote a checkers-playing AI, for example, to easily break out of the DFS when the time limit was up.

5. The various wiki pages below note that continuations and threads are roughly equivalent in power, in that they can be used to implement one another. It is left as an exercise to the reader to implement continuations in C using fork().

6. A regular named procedure taking one argument can be used instead of a lambda, as shown in the next example.

7. Technically, "macros" in C are preprocessor directives.

8. And presumably other FP languages. Template Haskell is an extension for Haskell that allows macros, but I'm not aware of many other FP languages' macro support. An SWE Stack Exchange post seems to indicate that Julia and Nim have powerful macro systems.

9. For example, the common MIN and MAX macros.

10. Usually this means we want our macros to generate a semantically-closed code snippet, but there is no way of enforcing this with lexical macros. Syntactic macros enforce semantic closure by default.

11. The reason it is called hygienic, from c2 wiki: Define Syntax: "Hygienic means clean or sanitary. Something is hygienic if it does not contaminate unrelated things. For example, surgical practices are hygienic if they do not leave blood and guts lying around to contaminate a wound. Scheme macros are hygienic if they do not affect code outside of their proper scope." This is a nice definition, but I would also add that a simple characterization of a potential hygiene issue is when the macro expansion includes symbols that are not any of its pattern variables.

12. My basic understanding of hygiene is that it deals with two very similar problems. (Both deal with symbols that are not any of its pattern variables, as mentioned in the previous footnote.) The first is when the symbol is already defined in the macro's lexical scope, in which the macro expansion will bind that value, no matter the lexical scope where the macro is expanded. The second is when the symbol is undefined in the macro's lexical scope, in which a new variable is created to avoid capturing an outside variable. This feels like the GNU "dwim" behavior, which might limit some obscure use cases but generally provides the expected behavior. I could give examples of all three of these but this post is already getting far too long-winded.

13. Another name for syntactic macros is simply macros without a qualifier -- this is because Lisp's macros probably predated C's macros, which was originally (and still technically) called the preprocessor. Ref.

14. I say "ostensibly" because (in Scheme at least) the syntax naturally lends itself to an AST representation, and the syntax is only sexprs (lists), so rewriting any list to any other list acts like transforming an AST into another (valid) AST.

15. Of course, even so we can generate semantically incorrect code -- a macro may put a non-function expression as the first element of a list when it should be a function. But this is not a syntactical error, and leads to the ordinary run-time exception that is easy to detect and handle.

16. This can also be emulated at runtime using quoted expressions and then evaluating the transformed expression. This can be the basis for a very basic Lisp metacircular evaluator.

17. Of course, the hygiene problem was known for a long time and macro writers had to deal with it. There are several possible solutions listed on the Wikipedia page, but none of them as elegant as automatically supporting hygiene in the language.

18. The Wikipedia article states that the KFFD algorithm was the first hygienic macro expansion algorithm, but I do not know if that is the one currently in place.

19. The first variable of the pattern always matches the name of the macro, and thus we customarily match it with _ and ignore it.

20. Consider the following Javascript snippet: [a, [b, c]] = [2, ["Hello, world", [3.14, {foo:"bar"}]]]; which sets a to 2, b to "Hello, world", and c to [3.14, {foo:"bar"}].

21. While we have a variable binding, note that there is no potential hygiene issue here, since all of the symbols in the template are pattern variables. (Except for let, in which case overwriting let would both be stupid and be handled correctly due to the hygienic macro.)

22. This Stack Overflow answer suggests that some libraries (in particular Racket's syntax/parse library do allow for this. I haven't tried it out.

23. At least not until recently -- PEP 638 -- Syntactic Macros (Sept. 2020) may change this.

24. Including continuations, which are not implemented in Haskell as a monad.

25. The monad laws. These are not enforced in Haskell, but they should be followed in order for a monad to have predictable behavior. See link in references.

26. I.e., a concrete implementation of a monad. This must be a parameterized type, i.e., the type constructor must take exactly one parameter, the type of its value -- in other words, a generic container type.

27. Also, as a general reminder, extracting the value of a monad outside of the bind operator is generally A Bad Idea™. Here, we are extracting the value within the bind operator, which is fine, but usually this would be done by pattern matching rather than an external function like monad-get-value.

28. This is an oversimplified version of the description in "Stack Overflow: Why are side effects modeled as monads in Haskell?" That answer uses a more generic model where we define the type alias type MonadicType m = RealWorld -> (a, RealWorld). In this case, we pass the global state as a parameter to a monad, and so technically all operations become pure, since we are only operating on our parameters' values. Then again, this model is not perfect, as passing around RealWorld would definitely be dangerous, and (as the comments mention) practical monads like IO support "interaction, concurrency, and nondeterminism" which are not captured by this model. The comments also point to the article: "You could have invented Monads! (And maybe you already have)."

29. An exact distinction and characteristic for the three is very difficult for me to understand at the moment. A better interpretation of wrapped values (see "Stack Overflow: What advantage does a Monad give us over an Applicative?") are some nondeterministic (unpure) computation or action. This action may have some "side effects" or "state" (e.g., in the Maybe applicative, the effect is whether there is a value or not; and a result (e.g., in the Maybe applicative, the result is the value x in Just x). Just by looking at the typeclass definitions for Functor, Applicative, and Monad, we can somewhat see the distinction: fmap only allows us to combine a pure function with a functor to produce another functor; <*> allows us to combine the results of two actions (the first of which returns a function) into a third action; and >>= allows us to use the result of the first action to generate another action. In other words, functors cannot combine actions; the actions given to an applicative are performed independently and then their results ("sequenced") using <*> (the "sequencing operator"); and the next action in a monadic sequence can depend on the result of the previous action using the bind operator. The Stack Overflow post gives an example when the Maybe monad and applicative behave differently, in particular because the <*> operation between Maybes returns Nothing if any of its operands is Nothing, but in a monad we can introduce a "control flow" such that this is not always the case.

© Copyright 2023 Jonathan Lam