… with one caveat: the hash functions only generate one bit.
1 2 3 4 5 6 | |
With hardware popcount, this compiles to something like the following.
1 2 3 4 5 | |
This should raise a few questions:
- Why?
- Why does it work?
- Is it useful?
Someone with a passing familiarity with x86 would also ask why we use
popcnt instead of checking the parity flag after xor.
Unfortunately, the parity flag only considers the least significant
byte of the result (:
One-bit hash functions: but why?
When implementing something like the hashing trick or count sketches (PDF), you need two sets of provably strong hash functions: one to pick the destination bucket, and another to decide whether to increment or decrement by the sketched value.
One-bit hash functions are ideal for the latter use case.
How does that even work?
The bitwise operations in bit_hash implement a degenerate form of
tabulation hashing. It considers
the 64 bit input value x as a vector of 64 bits, and associates a
two intermediate output values with each index. The naïve
implementation would be something like the following.
1 2 3 4 5 6 7 8 9 10 11 | |
Of course, the representation of random_table is inefficient, and we
should hand-roll a bitmap. However, the loop itself is a problem.
The trick is to notice that we can normalise the table so that the
value for random_table[i][0] is always 0: in order to do so, we have
to fix the initial value for acc to a random bit. That initial
value is the hash value for 0, and the values in
random_table[i][1] now encode whether a non-zero bit i in x
flips the hash value or leaves it as is.
The table argument for bit_hash is simply the 64 bits in
random_table[i][1], and bit is the hash value for 0. If bit i
in table is 0, bit i is irrelevant to the hash. If bit i in
table is 1, the hash flips when bit i in x is 1. Finally, the
parity counts how many times the hash was flipped.
Is it even useful?
I don’t think so. Whenever we need a hash bit, we also want a hash bucket; we might as well steal one bit from the latter wider hash. Worse, we usually want a few such bucket/bit pairs, so we could also compute a wider hash and carve out individual bits.
I only thought about this trick because I’ve been reading a few
empirical evaluation of sketching techniques, and a few authors find
it normal that computing a hash bit doubles the CPU time spent on
hashing. It seems to me the right way to do this is to map
columns/features to not-too-small integers (e.g., universal hashing to
[0, n^2) if we have n features), and apply strong hashing to
these integers. Hashing machine integers is fast, and we can always
split strong hashes in multiple values.
In the end, this family of one-bit hash functions seems like a good solution to a problem no one should ever have. But it’s still a cute trick!