SWAR implementation of (some #'zerop ...)
SWAR (SIMD Within A Register) codes can portably express short-vector parallelism for operations on small packed data (e.g. byte or even nybble vectors). A trivial application of the technique is when we test whether any bit in a word is set (equal to 1) by comparing the whole word against 0. Obviously, that also work to test whether any field (of arbitrary width) is not filled with 0. This document, from 1997, provides a fairly clear and complete overview.
Just like testing whether any bit is set, it is easy to find whether some bit is not set (it's simply the opposite of whether every bit in the word is set). Things are more complex when the data are wider than a single bit (but obviously narrower than a full word). I found a short implementation (and barely tested it), but it might be possible to do even shorter. Skip to the series of asterisks if you want to solve that puzzle (to efficiently find whether any field in a sequence of data, itself packed into a single word, is 0) yourself.
To simplify the description, I'll assume that we're working with 4-bit-wide fields. It should be clear how the code can be adapted to other widths or even mixed widths.
Let x = aaaabbbbccccdddd...
be the input.
1. x' = x | (x >> 1)
.
The first bit in each field is now, for our purposes, noise. However,
some of the remaining 3 bits are now non-zero iff some the original 4
were.
2. ones = x' & ~0x8888...
. The noise bits are masked out.
3. borrow = 0x8888... - ones
. The first bit of each field
in borrow
is 0 iff some of the 3 other bits in ones
aren't (iff some
of the 4 bits in x
weren't).
4. result = borrow & 0x8888...
is zero iff the first bit
of every field in borrow
is 0 (iff every field in x
was non-null).
And, finally, that is easy to test for, a word at a time. In the end,
it takes 5 (hopelessly serial) operations (>>
, |
, &
, -
and &
) and a
conditional branch.
****
Testing whether any field in a word is filled with 0 may seem
incredibly obscure. Apart from (some #'zerop
[packed-unboxed-vector])
, what use is there for such code sequences?
One trick is to exploit xor
. xor
lets us compare multiple fields at a
time: a field is 0 in a xor b
iff the corresponding fields in a
and b
are identical (bit for bit). Now that we can determine when at least
one pair of fields is equal, it's simple to implement, e.g., default
FIND
on specialised vectors without having to test each datum
separately (until the very end, when we know that one of the pairs is
equal, but not which). As usual, a full implementation, capable of
dealing with :start
, :end
and displaced vectors is a lot more work.