Jonathan Lam

EE/CS @ The Cooper Union


Blog

SICP notes: scoping and namespaces

On 6/5/2021, 8:21:15 PM

Return to blog


Notes spawned off of reading Structure and Interpretation of Computer Programs. I've been following along with the exercises, and my work can be found here. More of these posts may follow.


Free vs. bound variables (section 1.3)

I never saw these variables in a language like C, because there are no nested functions. With arbitrarily nested functions and function values, this becomes important.

Bound variables are variables that are declared as parameters of some function or closure (i.e., a define procedure or let statement in Scheme). The process of assigning a bound variable by a formal parameter is called binding or capturing the variable. A variable that is not bound is called free.1

In laymen's terms, a free variable is one that is defined in some outside scope, while a bound variable has a definition in the local scope. (In C, with only no nested functions, free variables are necessarily global.)

A free variable can be thought of as a "slot" that should be filled by some variable in some outer scope. (This will come up again in the next section.)

Bound variables, since they are newly defined in the scope, can be arbitrarily named as long as they do not shadow any of the free variables.


Lexical vs. dynamic scope (section 1.3)

What is the "outer scope" that a free variable comes from? In lexical (static) binding, a free variable is found in the enclosing scope of the defining instance of a closure. In dynamic binding, a free variable is found in the scope of a procedure at runtime.

Consider the following example in Scheme, a language with lexical binding:

(define (f1 x)
  (lambda (y)
    (+ x y)))

(define f2
  (lambda (y)
    (+ x y)))

(define f3 (f1 1))

(let ((x 2))
  (f3 3)      ; 4
  (f2 3))     ; exception: variable x is not bound

In this example, the value of x in the lambda in f1 is bound to the formal parameter of f1. Thus x is bound to 1 in f3. In f2, however, Scheme is searching for a definition of x in the enclosing scope, the global scope. Since there is no x defined in the global scope, it throws an exception.

Now consider the following code from Elisp, a language with dynamic binding.

(defun f1 (x)
  (lambda (y)
    (+ x y)))

(defun f2 (y)
  (+ x y))

(defvar f3 (f1 1))

(let ((x 2))
  (funcall f3 3)  ; 5
  (f2 3))         ; 5

Now, both produce the same result. Both the x in f1 and f2 are dynamically-bound free variables, which means they will look for a definition of x in the runtime context. Both functions are called within the body of the let statement that binds x, so x is bound to 2 in both cases.

Almost all languages use lexical binding, as it is easier for a compiler to optimize (name resolution happens at compile-time rather than runtime) and allows for variable capture via closures. However, dynamic binding doesn't require storage of free variables for a function (one of the difficulties of Lisp-1's, mentioned later) and allows you to pass variables without explicit parameter-binding (see Richard Stallman's explanation of why Emacs uses dynamic binding). Elisp is one of the few languages that implement dynamic binding. The concept also goes under the name of early/late binding or lexical/dynamic scoping.

More information about lexical vs. dynamic scoping (as well as how to emulate lexical scoping in a dynamically-scoped language) can be found on the Emacs wiki, and another thought example is found in this Stack Overflow answer.


Closures as an abstract data type (ADT) (section 2.2)

This follows immediately from the discussion of lexical scoping2. Each closure in a lexically-scoped language has to store its free variables, or else they will go out of scope and disappear.

In other words, each closure acts as a storage of all its free variables. We can exploit this to implement3 composite data structures using only closures. SICP does exactly this in section 2.1:

(define (cons x y)
  (define (dispatch m)
    (cond ((= m 0) x)
	  ((= m 1) y)
	  (else (error 'cons "input not 0 or 1"))))
  dispatch)

(define (car z) (z 0))
(define (cdr z) (z 1))

I think this is brilliant4.


Lisp-1 vs. Lisp-2

The superficial difference is that Lisp-1's have a single namespace (for functions and variables), while Lisp-2's have (at least) two namespaces. Scheme is a Lisp-1, while Common Lisp and Elisp are Lisp-2's.

A Lisp-1 works nicely in the context of SICP, especially as we try to build abstractions and think of "functions as data." The book tries to emphasize the fact that the line between procedures and data is thin, and we can even implement data as functions (e.g., closures as an ADT and ). With functions as the primitives of lambda calculus, it seems natural not to separate them from other primitive type.

However, this leads to implementation difficulties. This is briefly mentioned in SICP in section 1.3:

Lisp, unlike other common programming languages, awards procedures full first-class status. This poses challenges for efficient implementation, but the resulting gain in expressive power is enormous.

(in a footnote) The major implementation cost of first-class procedures is that allowing procedures to be returned as values requires reserving storage for a procedure's free variables even while the procedure is not executing. In the Scheme implementation we will study in section 4.1, these variables are stored in the procedure's environment.

A discussion of the differences (with several good references in the comments, most notably "Technical Issues of Separation in Function Cells and Value Cells" and Lisp in Small Pieces), can be found here.

tl;dr: Scheme's abstractions of "functions as data" makes for a difficult efficient implementation (Lisp-1). So some implementations of Lisp (Lisp-2), such as CL, create a separation between function and data values.


Footnotes

1. The definition of free and bound may become a little harder to define if variables can be bound not at the beginning of some closure: it would be free up to its definition, at which point it would be bound. I'm not sure if this is technically correct but this is my understanding of it.

2. The concept of capturing variables contradicts dynamic scoping.

3. A more appropriate word might be "emulate" rather than "implement," because this is not as efficient as native data structures without any extra baggage that closures require.

4. Not directly related to closures, but still related to representing data as functions: the exercise that follows this example about Church numerals is even more fascinating. I wrote about them in this blog post.


© Copyright 2021 Jonathan Lam