A recent post tried to extract information from a microbenchmark, but the author absolutely did not care whether the programs computed the right, or even the same, thing.
The result? Pure noise.
(expt 10 10)
overflows 32 bit signed integers, so the C version
wound up going through 1410065408 iterations instead. In fact, signed
overflow is undefined in C, so a sufficiently devious compiler could
cap the iteration count to 65536 and still be standard compliant.
On SBCL/x86-64, we can do the following and explicitly ask for machine unsigned arithmetic:
CL-USER> (lambda (max)
(declare (type (unsigned-byte 64) max)
(optimize speed))
(let ((sum 0))
(declare (type (unsigned-byte 64) sum))
(dotimes (i max sum)
(setf sum (ldb (byte 64 0) (+ sum i))))))
#<FUNCTION (LAMBDA (MAX)) {1004DA3D6B}>
CL-USER> (disassemble *)
; disassembly for (LAMBDA (MAX))
; 04DA3E02: 31C9 XOR ECX, ECX ; no-arg-parsing entry point
; 04: 31C0 XOR EAX, EAX
; 06: EB0E JMP L1
; 08: 0F1F840000000000 NOP
; 10: L0: 4801C1 ADD RCX, RAX
; 13: 48FFC0 INC RAX
; 16: L1: 4839D0 CMP RAX, RDX
; 19: 72F5 JB L0
[ function epilogue ]
Now that ldb
portably ensures modular arithmetic, we
virtually get the exact same thing as what GCC outputs, down to
alignment. It’s still slower than the C version because it goes
through 1e10 iterations of the lossy sum, rather than
1.4e9.
Microbenchmarks are useful to improve our understanding of complex systems. Microbenchmarks whose results we completely discard not so much: if there’s nothing keeping us or the compiler honest, we might as well get them to compile to no-ops.