Paul Khuong mostly on Lisp

Mon, 07 Apr 2008

Malmesure, quand tu nous tiens

I would say that the most important thing I learned during my short stint in health science is that quality experiment plans are essential. It's true in the most empirical of (pseudo-)sciences, and also in that other pseudo-science, benchmarking. We have less problems with sample size and getting accurate numbers. The fundamental question still remains: are we really measuring what we believe we are?

Leslie Polzer found the following results, comparing his seqrnd and my random-shuffle:

Here are the results, comparing Paul Khuong's
RANDOM-SHUFFLE with SEQRND.

[...]

(time (benchmark #'random-shuffle))
(time (benchmark #'seqrnd))
CLISP 2.41 (interpreted):
Real time: 25.72314 sec.
Run time: 10.665971 sec.
Space: 86264000 Bytes
GC: 164, GC time: 0.669959 sec.

Real time: 72.53122 sec.
Run time: 29.028109 sec.
Space: 79356432 Bytes
GC: 152, GC time: 0.5233 sec.

Bottom line: RANDOM-SHUFFLE 3x faster,
SEQRND slightly less space.
CLISP 2.41 (compiled)

Real time: 19.819485 sec.
Run time: 8.232797 sec.
Space: 86016072 Bytes
GC: 164, GC time: 0.663291 sec.

Real time: 28.287846 sec.
Run time: 10.196001 sec.
Space: 79108984 Bytes
GC: 151, GC time: 0.519969 sec.

Bottom line: RANDOM-SHUFFLE 1.35x faster,
SEQRND slightly less space.
SBCL 1.0.15:

Evaluation took:
13.861 seconds of real time
5.013006 seconds of user run time
0.126658 seconds of system run time
[Run times include 0.074 seconds GC run time.]
0 calls to %EVAL
0 page faults and
52,039,552 bytes consed.

Evaluation took:
4.639 seconds of real time
2.263186 seconds of user run time
0.026665 seconds of system run time
[Run times include 0.009 seconds GC run time.]
0 calls to %EVAL
0 page faults and
30,302,776 bytes consed.

Bottom line: SEQRND 3x faster, SEQRND 3/5 space.

That's interesting: completely different results depending on the implementation. What's also interesting is the presence of a few obvious flaws in the experiment. Let's keep in mind what we want to decide: which of using a EQ hash-table or storing values in cons-cells in-place is more space or time efficient.

First, the benchmark harness generates a list of random numbers for each run. That's pure noise. We could instead pass a copy of the same sequence for each iteration. It might introduce bias, but, looking at the sources, that seems unlikely. So, from

(defun benchmark (fn)
 (dotimes (x 1000)
   (let ((seq (loop for i from 1 to 1000 collect (random i))))
     (funcall fn seq))))

we go to

(defparameter *random-sequence*
  (loop repeat 1000
        collect (random 1.0)))

(defun test-shuffle (fn list n)
  (dotimes (i n)
    (funcall fn (copy-seq list))))

We can also see an immediate difference between seqrnd and random-shuffle: seqrnd generates single floats, random-shuffle double floats. Considering we'll not be shuffling millions of items, single floats should be good enough. Obviously, if we're to measure time efficiency, we want to turn optimisations on. Thus, we get these functions:

(defvar *random-id-ht* nil)

(defun initialize-memo ()
  (setf *random-id-ht* (make-hash-table :test #'eq)))

(declaim (sb-ext:maybe-inline consistent-random-id))
(defun consistent-random-id (obj)
  (declare (optimize speed))
  (multiple-value-bind (val found-p) (gethash obj *random-id-ht*)
    (if found-p val
      (setf (gethash obj *random-id-ht*)
            (random 1.0)))))

(defun seqrnd (seq)
  "Randomize the elements of a sequence. Destructive on SEQ."
  (declare (optimize speed)
           (inline consistent-random-id))
  (initialize-memo) ; need to clear between runs
  (sort seq #'> :key (lambda (x) (consistent-random-id x))))

(defun random-shuffle (sequence)
  (declare (optimize speed))
  (map-into sequence #'car
            (sort (map 'vector (lambda (x)
                                 (cons x (random 1.0)))
                       sequence)
                  #'< :key #'cdr)))

For 1000 shuffles of the same 1000-element list, we find 1.82s for seqrnd and 4.11s for random-shuffle. From that, we could conclude that it is faster to use a hash table than to take the car of a cons. The question remains, are we really measuring the relative efficiency of the two methods? In the last test case, random-shuffle does most of its work on vectors, seqrnd on lists. As an experiment, let's shuffle a 1000-element vector. The figures are now reversed! seqrnd takes 3.41s, and random-shuffle 1.47s. It seems rather unlikely that we are really measuring the effects of the two ways to associate random numbers to keys.

Let's time sorts of lists and vectors, passing (lambda (x) (sort x #'< :key #'identity)) to test-shuffle. Sorting lists takes .436s, compared to 1.37s for vectors. At least part of the discrepancy between our two methods can be attributed to SORT. Why is there a difference? SBCL's SORT calls different routines depending on the sequence's type: lists are passed to a mergesort, vectors to an in-place heapsort. Their performance envelopes are wildly different. The heapsort is clearly ahead on large datasets, but it varies on smaller inputs.

We have a hint that working with slightly different data structures can lead to large performance differences in third-party code. What effect can it have on MAP-INTO? Let's test with (lambda (x) (map-into x #'identity x)). Mapping into a list (1000 elements, 1000 times) takes 7.78s, 0.114s to a vector. The difference is enormous. The reason is simple. Quite clearly, MAP-INTO for lists sucks. How bad can it suck? ELT and LENGTH -using bad, surprisingly. SBCL's MAP-INTO is in need of some TLC if someone has the time.

Since we seem to be primarily interested in shuffling medium-sized lists, we could add a special case for such inputs, to keep working on lists and avoid MAP-INTO, something like:

(defun random-shuffle (sequence)
  (declare (optimize speed))
  (etypecase sequence
    (list (mapcar #'car
                  (sort (mapcar (lambda (x)
                                  (cons x (random 1.0)))
                                sequence)
                        #'< :key #'cdr)))
    (t    (map-into sequence #'car
                    (sort (map 'vector
                               (lambda (x)
                                 (cons x (random 1.0)))
                               sequence)
                          #'< :key #'cdr)))))

With this specialised code, we find random-shuffle takes 0.839s for 1000 shuffles of 1000-elements lists (1.82s for seqrnd). It also conses less, 64MB instead 81MB.

What happens if we make it in-place? First, let's write a non-sucky equivalent to MAP-INTO (something like that could be patched in SBCL):

(defun list-map-into (output function input)
  (declare (optimize speed))
  (let ((function (coerce function 'function))) ; FIXME: that's slightly wrong...
    (loop for outs on output
          for x in input
          do (setf (car outs)
                   (funcall function x))
          finally (return output))))

We can then use it in random-shuffle, being careful that the conses passed to SORT can be reused arbitrarily.

(defun random-shuffle (sequence)
  (declare (optimize speed))
  (etypecase sequence
    (list
     (let ((result
            (sort (list-map-into sequence
                                 (lambda (x)
                                   (cons x (random 1.0)))
                                 sequence)
                  #'< :key #'cdr)))
       (list-map-into result #'car result)))
    (t    (map-into sequence #'car
                    (sort (map 'vector
                               (lambda (x)
                                 (cons x (random 1.0)))
                               sequence)
                          #'< :key #'cdr)))))

Like memoisation, it could be argued that a good MAP-INTO should be part of every lisper's toolbox (... as part of the implementation). With this new MAP-INTO look-alike, 1000 random-shuffle of 1000-element lists takes 0.829s and conses 32MB (1.82s and 81MB for seqrnd).

After some optimisations, guided by a better knowledge of our implementation's library, we see that the correct approach (that doesn't fail on sequences with repeated elements) is also faster and more space-efficient (1.47s, 64MB for random-shuffle of vectors and 3.41s, 74MB for seqrnd). We can also observe the importance of understanding the performance of third-party, magic-pixie-dust-style code when trying to understand that of our own code, lest we be lead to misconclude.

posted at: 15:49 | /Lisp | permalink

Contact me by email: pvk@pvk.ca.