Malmesure, quand tu nous tiens
I would say that the most important thing I learned during my short stint in health science is that quality experiment plans are essential. It's true in the most empirical of (pseudo-)sciences, and also in that other pseudo-science, benchmarking. We have less problems with sample size and getting accurate numbers. The fundamental question still remains: are we really measuring what we believe we are?
Leslie Polzer found the following results, comparing his seqrnd
and my random-shuffle
:
Here are the results, comparing Paul Khuong's RANDOM-SHUFFLE with SEQRND. [...] (time (benchmark #'random-shuffle)) (time (benchmark #'seqrnd)) CLISP 2.41 (interpreted): Real time: 25.72314 sec. Run time: 10.665971 sec. Space: 86264000 Bytes GC: 164, GC time: 0.669959 sec. Real time: 72.53122 sec. Run time: 29.028109 sec. Space: 79356432 Bytes GC: 152, GC time: 0.5233 sec. Bottom line: RANDOM-SHUFFLE 3x faster, SEQRND slightly less space. CLISP 2.41 (compiled) Real time: 19.819485 sec. Run time: 8.232797 sec. Space: 86016072 Bytes GC: 164, GC time: 0.663291 sec. Real time: 28.287846 sec. Run time: 10.196001 sec. Space: 79108984 Bytes GC: 151, GC time: 0.519969 sec. Bottom line: RANDOM-SHUFFLE 1.35x faster, SEQRND slightly less space. SBCL 1.0.15: Evaluation took: 13.861 seconds of real time 5.013006 seconds of user run time 0.126658 seconds of system run time [Run times include 0.074 seconds GC run time.] 0 calls to %EVAL 0 page faults and 52,039,552 bytes consed. Evaluation took: 4.639 seconds of real time 2.263186 seconds of user run time 0.026665 seconds of system run time [Run times include 0.009 seconds GC run time.] 0 calls to %EVAL 0 page faults and 30,302,776 bytes consed. Bottom line: SEQRND 3x faster, SEQRND 3/5 space.
That's interesting: completely different results depending on the
implementation. What's also interesting is the presence of a few
obvious flaws in the experiment. Let's keep in mind what we want to
decide: which of using a EQ
hash-table or storing values in cons-cells
in-place is more space or time efficient.
First, the benchmark harness generates a list of random numbers for each run. That's pure noise. We could instead pass a copy of the same sequence for each iteration. It might introduce bias, but, looking at the sources, that seems unlikely. So, from
(defun benchmark (fn) (dotimes (x 1000) (let ((seq (loop for i from 1 to 1000 collect (random i)))) (funcall fn seq))))we go to
(defparameter *random-sequence* (loop repeat 1000 collect (random 1.0))) (defun test-shuffle (fn list n) (dotimes (i n) (funcall fn (copy-seq list))))
We can also see an immediate difference between seqrnd
and
random-shuffle
: seqrnd
generates single floats, random-shuffle
double
floats. Considering we'll not be shuffling millions of items, single
floats should be good enough. Obviously, if we're to measure time
efficiency, we want to turn optimisations on. Thus, we get these
functions:
(defvar *random-id-ht* nil) (defun initialize-memo () (setf *random-id-ht* (make-hash-table :test #'eq))) (declaim (sb-ext:maybe-inline consistent-random-id)) (defun consistent-random-id (obj) (declare (optimize speed)) (multiple-value-bind (val found-p) (gethash obj *random-id-ht*) (if found-p val (setf (gethash obj *random-id-ht*) (random 1.0))))) (defun seqrnd (seq) "Randomize the elements of a sequence. Destructive on SEQ." (declare (optimize speed) (inline consistent-random-id)) (initialize-memo) ; need to clear between runs (sort seq #'> :key (lambda (x) (consistent-random-id x))))
(defun random-shuffle (sequence) (declare (optimize speed)) (map-into sequence #'car (sort (map 'vector (lambda (x) (cons x (random 1.0))) sequence) #'< :key #'cdr)))
For 1000 shuffles of the same 1000-element list, we find 1.82s for
seqrnd
and 4.11s for random-shuffle
. From that, we could conclude that
it is faster to use a hash table than to take the car
of a cons. The
question remains, are we really measuring the relative efficiency of
the two methods? In the last test case, random-shuffle
does most of
its work on vectors, seqrnd
on lists. As an experiment, let's shuffle
a 1000-element vector. The figures are now reversed! seqrnd
takes
3.41s, and random-shuffle
1.47s. It seems rather unlikely that we are
really measuring the effects of the two ways to associate random
numbers to keys.
Let's time sorts of lists and vectors, passing (lambda (x) (sort x
#'< :key #'identity))
to test-shuffle
. Sorting lists takes .436s,
compared to 1.37s for vectors. At least part of the discrepancy
between our two methods can be attributed to SORT
. Why is there a
difference? SBCL's SORT
calls different routines depending on the
sequence's type: lists are passed to a mergesort, vectors to an
in-place heapsort. Their performance envelopes are wildly
different. The heapsort is clearly ahead on large datasets, but it
varies on smaller inputs.
We have a hint that working with slightly different data structures
can lead to large performance differences in third-party code. What
effect can it have on MAP-INTO
? Let's test with (lambda (x) (map-into
x #'identity x))
. Mapping into a list (1000 elements, 1000 times)
takes 7.78s, 0.114s to a vector. The difference is enormous. The
reason is simple. Quite clearly, MAP-INTO
for lists sucks. How bad can
it suck? ELT
and LENGTH
-using bad, surprisingly. SBCL's MAP-INTO
is
in need of some TLC if someone has the time.
Since we seem to be primarily interested in shuffling medium-sized
lists, we could add a special case for such inputs, to keep working on
lists and avoid MAP-INTO
, something like:
(defun random-shuffle (sequence) (declare (optimize speed)) (etypecase sequence (list (mapcar #'car (sort (mapcar (lambda (x) (cons x (random 1.0))) sequence) #'< :key #'cdr))) (t (map-into sequence #'car (sort (map 'vector (lambda (x) (cons x (random 1.0))) sequence) #'< :key #'cdr)))))
With this specialised code, we find random-shuffle
takes 0.839s for
1000 shuffles of 1000-elements lists (1.82s for seqrnd
). It also conses
less, 64MB instead 81MB.
What happens if we make it in-place? First, let's write a non-sucky
equivalent to MAP-INTO
(something like that could be patched in SBCL):
(defun list-map-into (output function input) (declare (optimize speed)) (let ((function (coerce function 'function))) ; FIXME: that's slightly wrong... (loop for outs on output for x in input do (setf (car outs) (funcall function x)) finally (return output))))
We can then use it in random-shuffle
, being careful that the conses
passed to SORT
can be reused arbitrarily.
(defun random-shuffle (sequence) (declare (optimize speed)) (etypecase sequence (list (let ((result (sort (list-map-into sequence (lambda (x) (cons x (random 1.0))) sequence) #'< :key #'cdr))) (list-map-into result #'car result))) (t (map-into sequence #'car (sort (map 'vector (lambda (x) (cons x (random 1.0))) sequence) #'< :key #'cdr)))))
Like memoisation, it could be argued that a good MAP-INTO
should be
part of every lisper's toolbox (... as part of the
implementation). With this new MAP-INTO
look-alike, 1000 random-shuffle
of 1000-element lists takes 0.829s and conses 32MB (1.82s and 81MB for
seqrnd
).
After some optimisations, guided by a better knowledge of our
implementation's library, we see that the correct approach (that
doesn't fail on sequences with repeated elements) is also faster and
more space-efficient (1.47s, 64MB for random-shuffle
of vectors and
3.41s, 74MB for seqrnd
). We can also observe the importance of
understanding the performance of third-party, magic-pixie-dust-style
code when trying to understand that of our own code, lest we be lead
to misconclude.