Paul Khuong mostly on Lisp

Thu, 06 Sep 2007

Hacking serialisable closures with SBCL

The key element in Common Cold was being able to generate serialisable closures without having to transform all the lexically enclosing code. To do that, I had to have access to two things: all the functions in the lexical environment — remember that CL is a Lisp-2, so even if lambda are serialisable, local functions won't benefit from that automatically (that would also be pretty bad for performance) — and all the variables in the enviroment (along with their values, at runtime).

As would be expected, there is a list of all the lexical variable definitions (whether they're symbol-macros, lexical variables or dynamically scoped ones) in the lexical environment (lexenv) object. After all, most of that is needed for macroexpand to work correctly. As is logical, the innermost variables are at the head. Even better: SBCL's compiler seems to always store the information needed to inline local functions (in the lexenv), which is, luckily enough, more than enough to recover their source and their ordering. Some of that information is a bit hidden, however, since the goals of the compiler are different from ours. What is directly available in the lexenv object is a list of function 'nodes', with the most nested function(s) first. Note how there is no direct way to know how functions and variables are interleaved, or which functions were defined in the same labels or flet (this is not an issue for variables, since their value will already have been computed when we'll unserialise the closures).

To recover that information, we can use the lexenv object associated with each function node. To group functions together, we simply have to detect when they close over the same set of functions. Recursive functions (defined with labels) also close over themselves, while those defined with flet don't. Local macros are represented as closures, so there is no need to get their grouping right, only the ordering, which is implicit in the list of functions. Note that, since local functions must be defined where they are bound (CL is a Lisp-2), we know that simply getting this ordering and grouping right will be enough to rebuild the correct lexical environment.

In order to know where each variable should be defined, we can, again, use the lexenv object associated with each function node to find the outermost (group of) function that closes over that variable. Now, we can easily generate a form that will have all the function/macros definition and variable binding/symbol macro definition forms nested in the right order to recreate the lexical environment. To bind the variables to the right values, the simplest way is to wrap the form in a lambda, and take the values as arguments (bound to gensymmed variables, otherwise there would be issues with variables that have the same names). There is also the issue of local macros: macro functions are stored as functions. Trying to store them in the form would be quite painful: no dumping to fasls, no easy way to share the data across processes, ... Instead, they are left in the form only to expand all the macros away with sb-walker, and then removed from the form. We can finally splice in whatever form we want inside all those definitions and bindings (before macroexpanding everything). Once we evaluate the complete form and pass it the values of the lexical variables, we will have the value of the spliced in form, evaluated inside a recreation of the lexical environment represented by the lexenv object.

That still leaves the question of how to get the values of the variables. Naively, since we have access to all the variables' names, it seems like we could just generate a form to create a list of all the variables' values. That dies when lexical scope is used to have two variables with the same name. We also cannot only capture the value of innermost variable of each name: the values are needed for surrounding local functions. The solution that I found is to make sure I get a fresh lexenv, side effect it to associate all the variable objects with fresh names, and create the list inside that new environment.

In slightly more than 400 LOC, this gives us a generic way to evaluate forms in serialisable lexical environments, and thus serialisable closures, and a global registry to unserialise these closures (and to detect nearly duplicate unserialisers, which may happen with nested serialisable closures, or serialisable closures inside local functions that surround another serialisable closure, although that blow-up is, at most, quadratic in the number of environment serialising forms).

posted at: 00:00 | /Lisp/CommonCold | permalink

Contact me by email: pvk@pvk.ca.