Jonathan Lam

Core Developer @ Hudson River Trading

Blog

SICP notes: functional programming

On 7/6/2021, 9:46:07 PM

This post is about Chapter 3.1-3 of SICP. Ironically (in a post about the meaning of functional programming), these sections are about mutation and state, which are more commonly associated with non-functional paradigms.

The purpose of this post is to describe what functional programming is. In the small group discussions we hold for this independent study, Prof. S¹ and I both had some revelations about what it means according to Abelson and Sussman, as opposed to what we had originally thought.

Prof. S had worked with Scheme in the past, and his impression was that functional programming primarily dealt with (linked-)list processing -- as opposed to dealing with more concrete data structures like structs and arrays.²

My previous intuition was that functional programming was all about dealing with functions as first-class objects, like lambda calculus. This means a lot of clean code and aggregate operations like filter, map, and reduce (or fold-left/fold-right).

As we went through the first two chapters³, we were always pulling parallels to grogramming constructs that we see in other languages, especially object-oriented languages. This is almost definitely because we are most familiar with these languages, like C++ and Java. By the end of Chapter 2, we see things that look quite a bit like objects: we design primitive "type systems" and consider inheritance patterns, and we see object-like structures: functions that bind state to a variable. These objects have "constructors" and "selectors" ("getters"). There are no "mutators" ("setters") mentioned before Chapter 3, but that is okay -- we can simply construct a new "object" if we want to change the value of a "field," and all is well.

At multiple times throughout the independent study thus far, Prof. S has said something along the lines of: "I see how [templates/polymorphism/lambdas] is powerful, but I still feel like it's a lot easier to do this using [C++/Java]." And I agree, but I also think that it's not the point of this book -- rather, SICP is about considering the design of programs and the design of programming languages, in order to become a better programmer. I think the question keeps coming up because the stark minimialism of Scheme still hasn't registered as "normal" with Prof. S, and thus we want to relate to concepts in more familiar languages.

But up to Chapter 3, the comparisons we made were implicitly between a functional programming language and an imperative or object-oriented one. The authors are cunning and deliberate⁴: they sneak in important concepts by means of example and connect names to those concepts only after the examples have practically spoken for themselves.⁵ Most noticeably, the "object" still felt very different from the standard idea of objects in OOP languages, perhaps because of the lack of mutators, and we don't have advanced data structures such as hashtables that require mutation of memory cells⁶. Actually, we haven't memory cells at all -- maybe only some hand-wavy mentions of "pointers." What gives?

The missing piece is found in Chapter 3. Suddenly, we are handed the set! function (and the similar set-car! and set-cdr!). The first or second concept we learn in a conventional programming class are variables. We learn how to set a variable to one variable, and then change its value to another. We learn that x = x+1 is not an equation but rather than an assignment (or we learn the walrus operator, which has the same effect), and we accept it immediately. It makes sense, right? x stores a value, and maybe that value can change over time as we mutate it.

(At this point, we should be very clear what we mean by assignment versus mutation. Assignment means the binding of a value to a variable, whereas mutation means changing the value that a variable is bound to. In Scheme, we can think of every variable as a pointer to some memory location. Assignment or reassignment means setting the pointer to point to some value in memory, and this is what define or let do. Mutation means to change the value in the memory cell that the variable is pointing to (via a pointer dereference), and this is what set! does.)

Without assignment, we've been able to survive two interesting chapters which build increasingly complex abstractions. We are given the define and let operators, but these only allow you to bind a name once in a scope⁷. It's as if we lived in the world of ANSI C89, where declarations are either top-level declarations or must be located at the top of a scope, and all variables were declared const⁸. And so far, the only composite data structures we've worked with are the lambda (whose closure can store values) and cons, as if we had a generic Pair type in Java as the only basic type. With these two basic building blocks, we've been able to write binary search trees, an N-queens solver, paintings, a type system, and pseudo-objects. Everything that can be achieved with mutation can be achieved with immutable bindings⁹ ¹⁰ ¹¹.

Despite the fact that we still can achieve the same things without mutation, mutation feels very useful. With mutation we can be satisfied with making a "true" object, where we change the values of variables by mutating their memory cells, as the authors show in the example about a bank account. We can now have a closed form for a PRNG, which is somewhat impossible in a purely functional form, because we have to store the previous state.¹² As I've mentioned in the footnotes multiple times by now, we also get all the array-based data structures, which are extremely useful because of the way RAM is linear and constant-time random-access.

We've just described what we gain -- and this is intuitive, because we already deal with mutation all the time and see that it is a very practical concept. But what do we lose? What happens if we don't allow mutation, if we only allow for immutable bindings?¹³ The concrete examples of why immutability is helpful are the following:

The substitution model The basis of any program is computing expressions. In Scheme, expressions are function applications. To evaluate an expression, we must first evaluate all of its arguments by substitution. This process happens naturally and makes it easy for us to manually trace program flow, but it is broken by assignment. With mutation, a variable may refer to multiple values at different times, which forces us to use the more complicated "environment model" that is proposed in sections 3.2-3.
Referential transparency¹⁴ In a referentially transparent program or programming language, variables are synonymous with their values. Checking equality of two values is the same as checking the equality of two variables (i.e., their references). The most obvious version of this are quoted symbols in Scheme (e.g., 'apple or '()). These symbols can be thought of as variables or values.
Structural sharing If a value needs to be shared by multiple objects, then all those objects can share the same value. This is seen in many popular languages in which string literals are immutable (e.g., C, Java): if the same string literal appear multiple times in a program, they all refer to the same place in memory. This goes hand-in-hand with referential transparency, and has huge implications in performance and memory consumption (see below note about Redux).
History of states See below note about Redux.
Race conditions Race conditions are by definition when there are multiple concurrent I/O operations to a mutable memory location. By avoiding mutation when possible, we can attempt to avoid race conditions. Of course, this is not always possible, such as in highly parallelized DBMSes. But immutability is useful parallel programming (e.g., parallel streams) by eliminating locking.

A famous example of points 2, 3, and 4 is the Redux Javascript library, a state management library, and other immutable-object libraries like Immutable.JS. For the unfamiliar, you can think of Redux as a lightweight database for your web application. Referential transparency implies that you can check that an object has changed simply by checking if it has been reassigned -- no deep check is needed except for objects that have been reassigned (a huge performance boost). Structural sharing means a value that is a member of multiple objects only appears once in memory (a huge memory saver). Another benefit of the memory savings is that we can efficiently store a history of the (immutable) states as an array or stream, which allows us to "replay history" -- which is both useful in debugging and very cool! While this means we have to store each new value in each state somewhere in memory, reused values (which likely form the majority of the Redux state) don't get duplicated.

These practical points above are specific examples of general considerations that I understand to be the design principles of immutable design:

The idea of "sameness" (and time) With referential transparency, we are able to say that two variables are the same iff. their values are the same. Practically, this manifests when we have two variables pointing to the same immutable object, and we can say that reference equality is the same as value equality. (In Java, for example, this would look like: == would mean the same as .equals()). This makes it easier to reason about our code. A value (a memory cell) has no concept of time: it always holds the same value when someone accesses it. This also has to do with eliminating race conditions. When we encounter streams in section 3.5, we show that we can emulate "time" by having a history of immutable states, and can act on this like a sequence.
Predictability A value doesn't change over the course of a program. A variable is only bound once. We can predict the behavior of programs more easily with the simple substitution model, and we get rid of undefined behavior scenarios due to possible mutation of variables (e.g., avoid the UB situations in C when using multiple post-increment operators in the same expression, or caring about the order of evaluation of function parameters).
Performance and efficiency Increased performance is due to the practical considerations above, such as in the Redux library¹⁵. Increased programmer efficency is due to the higher predictability of code, and may greatly outweigh the small performance hit, if any.
Expression-based (as opposed to statement-based) We are more focused with dealing with calculating values and expressions rather than a sequence of operations. (A sequence acts like change over "time.")

Okay, I've talked long enough about immutability, but what does this have to do with functional programming? And it turns out that this is functional programming. Abelson and Sussman provide us the following definition of FP:

Programming without any use of assignments, as we did throughout the first two chapters of this book, is accordingly known as functional programming.

(Here, they use the word "assignments" as I used the word "mutation". They may call the use of define or let "binding" rather than "assignment.")

This definition surprised both Prof. S and me.

Compare this to the description provided by Wikipedia:

In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that map values to other values, rather than a sequence of imperative statements which update the running state of the program.
In functional programming, functions are treated as first-class citizens, meaning that they can be bound to names (including local identifiers), passed as arguments, and returned from other functions, just as any other data type can. This allows programs to be written in a declarative and composable style, where small functions are combined in a modular manner.
Functional programming is sometimes treated as synonymous with purely functional programming, a subset of functional programming which treats all functions as deterministic mathematical functions, or pure functions. When a pure function is called with some given arguments, it will always return the same result, and cannot be affected by any mutable state or other side effects. This is in contrast with impure procedures, common in imperative programming, which can have side effects (such as modifying the program's state or taking input from a user).

This description (of non-pure FP) is somewhat closer to what I had in mind: composing functions and focused on expressions more than statements. It's probably closer to what Prof. S had in mind too, as we often map certain datatypes more than others, with list being the most basic type other than cons. This description also makes it sound like SICP's definition is closer to a definition of purely functional programming.

I would argue that, given these definitions, immutability plays as much a role in functional programming than the composition of functions, the namesake of FP. I guess you could say that enforcing pure functions leads to immutability, and immutability leads to pure functions¹⁶. And all of the other properties of FP are due to these core concepts.

The antithesis of this would be, of course, imperative programming, which we can now define as code that mutates state. This idea comes bundled naturally with the idea of statements and highly linear code, which is often easier for people to reason about and for computers to use, at the cost of FP's idealism.

Nowadays, the programming language world is not so black-and-white. All programming languages must allow mutation and impure functions if they are to be at all practical -- otherwise we couldn't have an efficient PRNG or even user input. We can then say that a purely functional programming language is one that only allows mutation in a very controlled and deliberate way, such as how Haskell handles side-effect inducing code (monads) explicitly.¹⁷ There are still a number of low-level languages that are highly procedural (imperative) and widely used, such as Assembly (not really procedural), C, Rust, and Go (Go is not necessarily low-level, but keeps to single-paradigm simply to keep code simple and consistent). Many modern languages (e.g., Scala, Javascript, Python, Scheme) are now multi-paradigm, which generally means they support some degree of object-oriented and functional programming.

Now that many modern languages support some degree of functional programming, it simply feels like another option, and it's easy to take the imperative route instead, especially if it's the more familiar one.

The final footnote in section 3.1 mentions that:

In view of [problems related to careless statement ordering in imperative programming], it is ironic that introductory programming is most often taught in a highly imperative style. This may be a vestige of a belief ... that programs that call procedures must inherently be less efficient than programs that perform assignments. ... Alternatively it may reflect a view that step-by-step assignment is easier for beginners to visualize than procedure call. Whatever the reason, it often saddles beginning programmers with "should I set this variable before or after that one" concerns that can complicate programming and obscure the important ideas.

at which point Prof. S asked me if I thought that this was true. Is functional programming (e.g., in Scheme) better and/or easier for a beginner than imperative programming?

My response was that I don't think it was easier for me at first, if only because I had a background in procedural and OOP programming before reading this book and the syntax felt alien to me. Now, it feels almost as natural as writing code in most languages, provided that equivalent libraries are provided and I don't have to deal directly with memory pointers or allocation. Perhaps it might be easier on mathematics students, because Scheme resembles lambda calculus and algebraic forms.

But even if I can't reset my programming knowledge and start learning programming from scratch using this book, I definitely appreciate why the first two chapters were taught in a functional style. It seems that the authors are determined to show that FP is more fundamental to computation -- building up a complex computation from primitives -- and has its benefits over imperative programming that are often missing in the common introductory course to programming. And only when its readers have a foundation of functional paradigm is imperative programming revealed, in such a way that the reader can appreciate its glory and understand its flaws.

Footnotes

1. This is the convention I used in my previous blog when referring to real people, and I will stick to it.

2. Somewhat related: When I proposed this independent study, his understanding was that this book uses Lisp because it teaches functional programming, and suggested that we use a more modern functional language like Python or Haskell. My understanding was that this book was not about functional programming at all, but rather that it chooses Lisp as a lens for studying program design for its minimalism and metaprogramming capabilities, and it happens to be a mostly-functional language. It took a while to convince him that the independent study was not about Lisp or functional programming, but about the shape of programs.

3. Chapter 1 deals with procedural abstractions, and Chapter 2 with data abstractions.

4. Schemers, if you will.

5. While this is perhaps the best part about SICP, this is also one of my criticisms -- for the unaware reader, it may seem like the book is throwing you examples before you're ready to deal with them. Sometimes the tools needed are not given to you! The authors sometimes don't give implementations of crucial functions, such as put and get in 2.4 (these require material from Chapter 3), some of the concurrency primitives in 3.4 (this involves non-standard Scheme to support asm instructions), and make-delay in 3.5 (this requires macros). This is great for the inquisitive reader with some background in programming, but it requires a decent amount of prior knowledge. I believe this is the most common criticism of the book.

6. This is a common criticism of functional languages. See this.

7. In a REPL or a non-standard program, define may allow redefinitions in the same scope. But in a standards-compliant program, that's not allowed.

8. It would be very interesting and probably very ugly to code like this in C.

9. If you allow enough nested scopes.

10. This is a laymen's way of saying that lambda calculus and Turing machines are equivalent models of computation.

11. Efficiency may not be equivalent without mutation for certain data structures and algorithms, such as in arrays and array-based structures (e.g., heaps, hashtables) or the Sieve of Eratosthenes. See the earlier footnote. But we can still emulate them in functional programming. The authors are very deliberate not to mention hashtables at all (at least up to Chapter 4). They mention the Sieve of Eratosthenes in 3.5 when mentioning streams, which is still a functional programming concept, but their implementation is very inefficient. I'll mention this in a later post about 's discussion of streams.

12. Actually, we can achieve this using streams (section 3.5), which actually stores the entire history of a variable's state. In other words, the stream is an infinite list of immutable states. More to come in a future post about streams.

13. This is supposed to be the climax of the post, but I can't think of a climactic way to display everything, so it's gonna go in a boring list. These are my key takeaways from these sections and are, in general, what I believe to be the strengths of immutability and functional programming, but it is of course not exhaustive.

14. There is a more precise definition of referential transparency than what I present here, which is just for the general idea.

15. However, there are also performance tradeoffs as mentioned many times before, but SICP suggests that the widespread belief is more pessimistic about functional programming performance than the reality. Some Scheme flavors and other functional languages like OCaml or Haskell can achieve very close to C speeds.

16. In a purely functional language like Haskell, pure functions are enforced. For example, in Haskell, non-pure functions due to I/O are "marked" with the IO monad. In a non-purely functional language, we enforce immutability in order to adhere to functional programming. Because Javascript or Scheme are not purely functional, we emulate functional programming by forcing the programmer to avoid mutation.

17. Haskell is the only purely functional language I know of without having to look up purely functional languages, which says something of their rarity.