[Contents]   [Back]   [Prev]   [Up]   [Next]   [Forward]  


Principles of Compilation

Expansion and Analysis

  1. Macros defined by defmacro and all the quasiquotes are expanded and compiled into equivalent form without macros and quasiquotes. For example, `(a , x) will be converted to (cons 'a (cons x '())).
  2. Define-s with the nonessential syntax like
    (define (foo x) ...)
    
    are converted to defines with the essential syntax
    (define foo (lambda (x) ...))
    
    Non-top-level defines are converted into equivalent letrec-s.
  3. Variables are renamed to avoid name clashes, so that any local variable may have a whole procedure as its scope. This renaming also converts let-s to let*-s. Variables which do not introduce potential name clashes are not renamed. For example,
    (define (foo x y)
       (let ((x y)
             (z x))
         (let* ((x (+ z x)))
           x)))
    
    is converted to
    (define foo
      (lambda (x y)
        (let* ((x__1 y)
               (z x)
               (x__2 (+ z x__1)))
           x__2)))
    
  4. In case the set of procedures defined in one letrec is actually not wholly mutually recursive (eg, f1 calls f2, but f2 does not call f1, or there are three procedures, f1, f2, f3 so that f1 and f2 are mutually recursive but f3 is not called from f1 or f2 and it does not call them, etc), it is possible to minimize the number of additional variables passed to procedures. Thus letrec-s are split into ordered chunks using dependency analysis and topological sorting, to reduce the number of mutually passed variables. Wherever possible, letrec-s are replaced by let*-s inside these chunks.
  5. Normalization is performed. This converts a majority of scheme control procedures like cond, case, or, and into equivalent terms using a small set of primitives. New variables may be introduced in this phase. In case a procedure like or or and occurs in the place where its value is treated as a boolean (eg. first argument of if), it is converted into an analogous boolean-returning procedure, which will finally be represented by an analogous C procedure (eg. || or &&). Associative procedures are converted into structures of corresponding nonassociative procedures. List is converted to a structure of cons-s. Map and for-each with more than two arguments are converted into an equivalent do-cycle. map-s and for-each-s with two arguments are treated as if they were defined in the compiled file -- the definitions map1 and for-each1 are automatically included, if needed. There is an option in `hobbit.scm' to make all map-s and for-each-s be converted into equivalent do-loops, avoiding the use of map1 and/or for-each1 altogether.
  6. Code is analysed for determining which primitive names and compiled procedure names are assumed to be redefinable.
  7. Analysing HOP clonability: hobbit will find a list of clonable HOP-s with information about higher-order arguments. Criterias for HOP clonability are given in the section 6.4.
  8. Analysis of liftability: hobbit will determine which lambda-terms have to be built as real closures (implemented as a vector where the first element is a pointer to a function and the rest contain values of environment variables or environment blocks of surrounding code) and which lambda-terms are liftable. Liftability analysis follows the criterias given in section 6.5 and 6.6.

Building Closures

Here Hobbit produces code for creating real closures for all the lambda-terms which are not marked as being liftable by the previous liftability analysis.

Global variables (eg x-glob) are translated as pointers (locations) to SCM objects and used via a fetch: *x_glob (or a fetch macro GLOBAL(x-glob) which translates to *x_glob).

While producing closures hobbit tries to minimize the indirection levels necessary. Generally a local variable x may have to be translated to an element of a vector of local variables built in the procedure creating x. If x occurs in a non-liftable closure, the whole vector of local variables is passed to a closure.

Such a translation using a local vector will only take place if either x is set! inside a non-liftable lambda-term or x is a name of a recursively defined non-liftable function, and the definition of x is irregular. The definition of x is irregular if x is given the non-liftable recursive value t by extra computation, eg as

(set! x (let ((u 1)) (lambda (y) (display u) (x (+ u 1)))))

and not as a simple lambdaterm:

(set! x (lambda (y) (display x) (x (+ y 1))))

In all the other cases a local scheme variable x is translated directly to a local C variable x having the type SCM (a 32-bit integer). If such an x occurs in a non-liftable closure, then only its value is passed to a closure via the closure-vector. In case the directly-translated variable x is passed to a liftable lambda-term where it is set!, then x is passed indirectly by using its address &x. In the lifted lambda-term it is then accessed via *.

If all the variables x1, ..., xn created in a procedure can be translated directly as C variables, then the procedure does not create a special vector for (a subset of) local variables.

An application (foo ...) is generally translated to C by an internal apply of the SCM interpreter: apply(GLOBAL(foo), ...). Using an internal apply is much slower than using direct a C function call, since:

However, in case foo is either a name of a non-redefined primitive or a name of a non-redefined liftable procedure, the application is translated to C directly without the extra layer of calling apply: foo(...).

Sometimes lambda-lifting generates the case that some variable x is accessed not directly, but by *x. See the next section.

Undefined procedures are assumed to be defined via interpreter and are called using an internal apply.

Lambda-lifting

When this pass starts, all the real (nonliftable) closures have been translated to closure-creating code. The remaining lambda-terms are all liftable.

Lambda-lifting is performed. That is, all procedures defined inside some other procedure (eg. in letrec) and unnamed lambda-terms are made top-level procedure definitions. Any N variables not bound in such procedures which were bound in the surrounding procedure are given as extra N first parameters of the procedure, and whenever the procedure is called, the values of these variables are given as extra N first arguments.

For example:

(define foo
  (lambda (x y)
     (letrec ((bar (lambda (u) (+ u x))))
        (bar y) )))

is converted to

(define foo
  (lambda (x y)
    (foo-fn1 x y) ))

(define foo-fn1
  (lambda (x u)
    (+ u x) ))

The case of mutually recursive definitions in letrec needs special treatment -- all free variables in mutually recursive funs have, in general, to be passed to each of those funs. For example, in

(define (foo x y z i)
  (letrec ((f1 (lambda (u) (if x (+ (f2 u) 1))))
           (f2 (lambda (v) (if (zero? v) 1 (f1 z)))) )
    (f2 i) ))

the procedure f1 contains a free variable x and the procedure f2 contains a free variable z. Lambda-lifted f1 and f2 must each get both of these variables:

(define (foo x y z i)
   (foo-fn2 x z i) )

(define foo-fn1
  (lambda (x z u) (if x (+ (foo-fn2 x z u) 1))) )

(define foo-fn2
  (lambda (x z v) (if (zero? v) 1 (foo-fn1 x z z))) )

Recall that hobbit has already done dependency analysis and has split the original letrec into smaller chunks according to this analysis: see pass 1.

Whenever the value of some free variable is modified by set! in the procedure, this variable is passed by reference instead. This is not directly possible in scheme, but it is possible in C.

(define foo
  (lambda (x y z)
     (letrec ((bar (lambda (u) (set! z (+ u x z)))))
        (bar y)
        z)))

is converted to incorrect scheme:

(define foo
  (lambda (x y z)
    (foo-fn1 x (**c-adr** z) y)
    z))

(define foo-fn1
  (lambda (x (**c-adr** z) u)
    (set! (**c-fetch** z) (+ u x (**c-fetch** z))) ))

The last two will finally be compiled into correct C as:

SCM foo(x, y, z)
SCM x, y, z;
{
  foo_fn1(x, &z, y);
  return z;
}

SCM foo_fn1(x, z, u)
SCM x, u;
SCM *z;
{
  return (*z =  (u + x) + *z);
}

Statement-lifting

As the scheme do-construction is compiled into C for, but for cannot occur in all places in C (it is a statement), then if the do in a scheme procedure occurs in a place which will not be a statement in C, the whole do-term is lifted out into a new top-level procedure analogously to lambda-lifting. Any statement-lifted parts of some procedure foo are called foo_auxn, where n is a number.

The special C-ish procedure **return** is pushed into a scheme term as far as possible to extend the scope of statements in the resulting C program. For example,

(define foo
  (lambda (x y)
      (if x (+ 1 y) (+ 2 y)) ))

is converted to

(define foo
   (lambda (x y)
     (if x (**return** (+ 1 y)) (**return** (+ 2 y))) ))

Immediate tailrecursion (foo calling foo tailrecursively) is recognized and converted into an assignment of new values to args and a jump to the beginning of the procedure body.

Higher-order Arglists

All procedures taking a list argument are converted into ordinary non-list taking procedures and they are called with the list-making calls inserted. For example,

(define foo
  (lambda (x . y)
    (cons x (reverse y)) ))

is converted to

(define foo
  (lambda (x y)
    (cons x (reverse y)) ))

and any call to foo will make a list for a variable y. For example,

(foo 1 2 3)

is converted to

(foo 1 (cons 2 (cons 3 '()))).

All higher-order procedure calls where an argument-term contains unbound variables will generate a new instance (provided it has not been created already) of this higher-order procedure, carrying the right amount of free variables inside to right places.

For example, if there is a following definition:

(define (member-if fn lst)
   (if (fn (car lst))
       lst
       (member-if fn (cdr lst)) ))

and a call

(member-if (lambda (x) (eq? x y)) lst),

a new instance of member-if is created (if an analogous one has not been created before):

(define (member-if_inst1 tmp fn lst)
   (if (fn tmp (car lst))
       lst
       (member-if_inst1 tmp fn (cdr lst)) ))

and the call is converted to

(member-if_inst1 y foo lst)

and a top-level define

(define (foo y x) (eq? x y))

In addition, if the higher-order procedure is to be exported, an additional instance is created, which uses apply to call all argument-procedures, assuming they are defined via interpreter. The exportable higher-order procedure will have a name fun_exporthof, where fun is the name of the original procedure.

Typing and Constants

All C<->Scheme conversions for immediate objects like numbers, booleans and characters are introduced. Internal apply is used for undefined procedures. Some optimizations are performed to decrease the amount of C<->Scheme object conversions.

All vector, pair and string constants are replaced by new variables. These variables are instantiated to the right values by init_foo*.

Procedures foo which are to be exported (made accesible to the interpreter), and which have an arity different from one of the following five templates: x, (), (x), (x y), (x y z), are made accessible via an additional procedure foo_wrapper taking a single list argument.

C Code Generation

More or less straightforward.

The type conversion between C objects and immediate Scheme objects of the type boolean, char and num is performed by macros. The scheme object '() is represented by the macro object EOL.

Intermediate files

Experiment yourself by defining:

(define *build-intermediate-files* #t)

instead of the default:

(define *build-intermediate-files* #f).


[Contents]   [Back]   [Prev]   [Up]   [Next]   [Forward]