[Contents]   [Back]   [Prev]   [Up]   [Next]   [Forward]  


Performance of Compiled Code

Gain in Speed

The author has so far compiled and tested a number of large programs (theorem provers for various logics and hobbit itself).

The speedup for the provers was between 25 and 40 times for various provable formulas. Comparison was made between the provers being interpreted and compiled with `gcc -O2 -DRECKLESS' on Sparcstation ELC in both cases.

The provers were written with care to make the compiled version run fast. They do not perform excessive consing and they perform very little arithmetic.

According to experiments made by A. Jaffer, the compiled form of the example program `pi.scm' was approximately 11 times faster than the interpreted form.

As a comparison, his hand-coded C program for the same algorithm of computing pi was about 12 times faster than the interpreted form. `pi.scm' spends most of of its time in immediate arithmetics, vector-ref and vector-set!.

P. Kelloma"ki has reported a 20-fold speedup for his generic scheme debugger. T. Moore has reported a 16-fold speedup for a large gate-level IC optimizer.

Self-compilation speeds Hobbit up only ca 10 times.

However, there are examples where the code compiled by hobbit runs actually slower than the same code running under interpreter: this may happen in case the speed of the code relies on non-liftable closures and proper mutual tailrecursion. See for example the closure-intensive benchmark CPSTAK in the following table.

Benchmarks

We will present a table with the performance of three scheme systems on a number of benchmarks: interpreted SCM, byte-compiled VSCM and hobbit-compiled code. The upper 13 benchmarks of the table are the famous Gabriel benchmarks (originally written for lisp) modified for scheme by Will Clinger. The lower five benchmarks of the table are proposed by other people. Selfcompile is the self-compile time of Hobbit.

Hobbit performs well on most of the benchmarks except CPSTAK and CTAK: CPSTAK is a closure-intensive tailrecursive benchmark and CTAK is a continuations-intensive benchmark. Hobbit performs extremely well on these benchmarks which essentially satisfy the criterias for well-optimizable code outlined in the section 6 above.

FFT is real-arithmetic-intensive.

All times are in seconds.

SCM 4c0(U) and 1.1.5*(U) (the latter is the newest version of VSCM) are compiled and run by Matthias Blume on a DecStation 5000 (Ultrix). VSCM is a bytecode-compiler using continuation-passing style, and is well optimized for continuations and closures.

SCM 4e2(S) and Hobbit4b(S) compiled (with `cc -xO3') and run by Tanel Tammet on a Sun SS10 (lips.cs.chalmers.se). Hobbit is a Scheme-to-C compiler for SCM, the code it produces does not do any checking. SCM and hobbit are not optimized for continuations. Hobbit is not optimized for closures and proper mutual tailrecursion.

SCM and Hobbit benchmarks were run giving ca 8 MB of free heap space before each test.

Benchmark       |SCM 4c0(U) 1.1.5*(U)| SCM 4e2(S)  Hobbit4b(S)
----------------|------------------------------------------------
Deriv           | 3.40      3.86     |  2.9            0.18
Div-iter        | 3.45      2.12     |  2.6            0.083
Div-rec         | 3.45      2.55     |  3.5            0.42
TAK             | 1.81      1.71     |  1.4            0.018
TAKL            |14.50     11.32     | 13.8(1.8 in gc) 0.13
TAKR            | 2.20      1.64     |  1.7 1.5        0.018
Destruct        | ?          ?       |  7.4(1.8 in gc) 0.18
Boyer           | ?          ?       | 27.(3.8 in gc)  1.9
CPSTAK          | 2.72      2.64     |  2.0 1.92       3.46(2.83 in gc)
CTAK            |31.0       4.11     | memory          memory
CTAK(7 6 1)     | ?          ?       |  0.83           0.74
FFT             |12.45     15.7      | 11.4 10.8       1.0
Puzzle          | 0.28      0.41     |  0.46(0.22 gc)  0.03
----------------------------------------------------------------
(recfib 25)     | ?          ?       |  4.1            0.079
(recfib 30)     | ?          ?       | 55. (10.in gc)  0.87
(pi 300 3)      | ?          ?       |  7.4            0.46
(hanoi 15)      | ?          ?       |  0.68           0.007
(hanoi 20)      | ?          ?       | 31. (9. in gc)  0.2
----------------------------------------------------------------

Benchmark Sources

A selection of (smaller) benchmark sources

Destruct

;;;; Destructive operation benchmark
(define (destructive n m)
  (let ((l (do ((i 10 (- i 1))
                (a '() (cons '() a)))
               ((= i 0) a))))
    (do ((i n (- i 1)))
        ((= i 0))
      (if (null? (car l))
          (do ((l l (cdr l)))
              ((null? l))
            (or (car l) (set-car! l (cons '() '())))
            (append! (car l) (do ((j m (- j 1))
                                  (a '() (cons '() a)))
                                 ((= j 0) a))))
          (do ((l1 l (cdr l1))
               (l2 (cdr l) (cdr l2)))
              ((null? l2))
            (set-cdr! (do ((j (quotient (length (car l2)) 2) (- j 1))
                           (a (car l2) (cdr a)))
                          ((zero? j) a)
                        (set-car! a i))
                      (let ((n (quotient (length (car l1)) 2)))
                        (cond ((= n 0) (set-car! l1 '()) (car l1))
                              (else (do ((j n (- j 1))
                                         (a (car l1) (cdr a)))
                                        ((= j 1)
                                         (let ((x (cdr a)))
                                           (set-cdr! a '()) x))
                                      (set-car! a i)))))))))))
;; call:  (destructive 600 50)

Recfib

(define (recfib x)
  (if (< x 2)
      x
      (+ (recfib (- x 1))
         (recfib (- x 2)))))

div-iter and div-rec

;;;; Recursive and iterative benchmark divides by 2 using lists of ()'s.
(define (create-n n)
  (do ((n n (- n 1))
       (a '() (cons '() a)))
      ((= n 0) a)))
(define *ll* (create-n 200))
(define (iterative-div2 l)
  (do ((l l (cddr l))
       (a '() (cons (car l) a)))
      ((null? l) a)))
(define (recursive-div2 l)
  (cond ((null? l) '())
        (else (cons (car l) (recursive-div2 (cddr l))))))
(define (test-1 l)
  (do ((i 300 (- i 1))) ((= i 0))
    (iterative-div2 l)
    (iterative-div2 l)
    (iterative-div2 l)
    (iterative-div2 l)))
(define (test-2 l)
  (do ((i 300 (- i 1))) ((= i 0))
    (recursive-div2 l)
    (recursive-div2 l)
    (recursive-div2 l)
    (recursive-div2 l)))
;; for the iterative test call: (test-1 *ll*)
;; for the recursive test call: (test-2 *ll*)

Hanoi

;;; C optimiser should be able to remove the first recursive call to
;;; move-them.  But Solaris 2.4 cc, gcc 2.5.8, and hobbit don't.
(define (hanoi n)
  (letrec ((move-them
            (lambda (n from to helper)
              (if (> n 1)
                  (begin
                    (move-them (- n 1) from helper to)
                    (move-them (- n 1) helper to from))))))
    (move-them n 0 1 2)))

Tak

;;;; A vanilla version of the TAKeuchi function
(define (tak x y z)
  (if (not (< y x))
      z
      (tak (tak (- x 1) y z)
           (tak (- y 1) z x)
           (tak (- z 1) x y))))
;; call: (tak 18 12 6)

Ctak

;;;; A version of the TAK function that uses continuations
(define (ctak x y z)
  (call-with-current-continuation
   (lambda (k)
     (ctak-aux k x y z))))

(define (ctak-aux k x y z)
  (cond ((not (< y x)) (k z))
        (else (call-with-current-continuation
               (ctak-aux
                k
                (call-with-current-continuation
                 (lambda (k) (ctak-aux k (- x 1) y z)))
                (call-with-current-continuation
                 (lambda (k) (ctak-aux k (- y 1) z x)))
                (call-with-current-continuation
                 (lambda (k) (ctak-aux k (- z 1) x y))))))))

(define (id x) x)

(define (mb-test r x y z)
  (if (zero? r)
      (ctak x y z)
      (id (mb-test (- r 1) x y z))))
;;; call: (ctak 18 12 6)

Takl

;;;; The TAKeuchi function using lists as counters.
(define (listn n)
  (if (not (= 0 n))
      (cons n (listn (- n 1)))
      '()))

(define l18 (listn 18))
(define l12 (listn 12))
(define l6 (listn 6))

(define (mas x y z)
  (if (not (shorterp y x))
      z
      (mas (mas (cdr x) y z)
           (mas (cdr y) z x)
           (mas (cdr z) x y))))

(define (shorterp x y)
  (and (pair? y) (or (null? x) (shorterp (cdr x) (cdr y)))))
;; call: (mas l18 l12 l6)

Cpstak

;;;; A continuation-passing version of the TAK benchmark.
(define (cpstak x y z)
  (define (tak x y z k)
    (if (not (< y x))
        (k z)
        (tak (- x 1)
             y
             z
             (lambda (v1)
               (tak (- y 1)
                    z
                    x
                    (lambda (v2)
                      (tak (- z 1)
                           x
                           y
                           (lambda (v3)
                             (tak v1 v2 v3 k)))))))))
  (tak x y z (lambda (a) a)))
;;; call: (cpstak 18 12 6)

Pi

(define (pi n  . args)
      (let* ((d (car args))
             (r (do ((s 1 (* 10 s))
                     (i 0 (+ 1 i)))
                    ((>= i d) s)))
             (n (+ (quotient n d) 1))
             (m (quotient (* n d 3322) 1000))
             (a (make-vector (+ 1 m) 2)))
        (vector-set! a m 4)
        (do ((j 1 (+ 1 j))
             (q 0 0)
             (b 2 (remainder q r)))
            ((> j n))
          (do ((k m (- k 1)))
              ((zero? k))
            (set! q (+ q (* (vector-ref a k) r)))
            (let ((t (+ 1 (* 2 k))))
              (vector-set! a k (remainder q t))
              (set! q (* k (quotient q t)))))
          (let ((s (number->string (+ b (quotient q r)))))
            (do ((l (string-length s) (+ 1 l)))
                ((>= l d) (display s))
              (display #\0)))
          (if (zero? (modulo j 10)) (newline) (display #\space)))
        (newline)))


[Contents]   [Back]   [Prev]   [Up]   [Next]   [Forward]