Zbr's days.
March
Sun Mon Tue Wed Thu Fri Sat
       
23
2007
Months
Mar

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Fri, 23 Mar 2007

Jenkins hash analysis and conclusion.


There were several attempts to use Jenkins hash for socket lookup functions instead of old XOR hash, but I was always against that.
My objectinos were based on the tests I ran during netchannels hash selection process.
It especially becomes visible in the following usage case - we use the same constant IP addresses (two 32 bit values) and one constant destination port (16 bits) and variable source port (16 bits), so essentially we hash 3 32 bit values.
It can be done for example using following scheme:

 hash = jhash_3words(saddr, daddr, (dport<<16)|sport, random_seed);
where random_seed is selected early.

Jenkins and XOR hash results

This graph shows chain length on X axis and number of elements with such a lentgs on Y axis.
As you can see, there is a significant artifact with about 3200 elements in some hash chains.

My first (and incorrect) impression was that Jenkins hash has some distribution problems, but further analysis showed that it is not true. I performed distribution checks in full 32 bit field and found, that it has uniform distribution as long as XOR hash.

Problem appeared when I got 16 bit distribution (size of the hash table is 2^16 elements), while XOR hash was still uniform.

Further analysis showed that Jenkins hash itself does not have distribution problems, but they appear when its folded into hash table size boundaries.
This theory was also confirmed by different hash tests (new Jenkins with rotations and RIPEMD).
XOR hash does not have such a problem, because it uses (u32 ^ u16) as one round, which results in the uniform (it is not correct to call that distribution uniform as is, but only getting into account that u16 values used in tests were uniformly distributed) distribution inside F(16), which does not suffer from hash_size boundary folding. Since XOR hash has 3 rounds, only one of them (xor of the final u32 values) will suffer from folding, but tests where is it can be determined for sure use constant addresses, so problem hides again.

So, I was wrong, Jenkins hash does not have problems as is, they appear only because of folding into hash table boundaries, which is unavoidable for general-purpose hash.

/devel/other :: Link / Comments (0)