jch-erl 0.1 – Fast minimal memory consistent hashing for Erlang/OTP

« back to News

Hey folk.

jch-erl 0.1 freshly baked.

NIF wrapper for Jump Consistent Hash algorithm by John Lamping and Eric Veach developed at Google, Inc. Paper: “A Fast, Minimal Memory, Consistent Hash Algorithm”.

 

http://arxiv.org/ftp/arxiv/papers/1406/1406.2294.pdf

This implementation uses the xorshift64* pseudo-random number generator rather than the linear congruential generator in the paper as it is reasonably fast but, more importantly, memory efficient.

Usage

A single function jch:ch/2 is offered. Simply pass in a 64 bit long (or less) integer key argument followed by the desired bucket or continuum partition size. The function returns the partition allocated for the key.

Performance is very stable as bucket size increases and distribution across the ring is stable (standard deviation for a reasonable sample size is typically <2% relative to the mean).

 

Example

% ERL_LIBS=deps erl +sfwi 1 +scl false -pa ebin
Erlang R16B02 (erts-5.10.3) [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

Eshell V5.10.3  (abort with ^G)

1> jch:ch(1,128).
29
2> jch:ch(13,128).
121
3> jch:ch(13,128).
121
4> jch:ch(trunc(math:pow(2,64))-1,128).
78
5> jch:ch(trunc(math:pow(2,64)),128). %% off by 1 mate
** exception error: bad argument
in function  jch:ch/2
called as jch:ch(18446744073709551616,128)
6> %% off by 1 mate
6>

 

Access times are typically sub-microsecond. Performance tests of the erlang NIF
and underlying C function are included.
Example performance for a space of 1 billion buckets:
- 1B Buckets. Hash performance.
ch1b N: 10 Min: —–1 Max: —–2 Median: —–1 Avg: —–1 Elapsed: 16
ch1b N: 100 Min: —–0 Max: —–1 Median: —–1 Avg: —–1 Elapsed: 102
ch1b N: 1000 Min: —–0 Max: —–9 Median: —–0 Avg: —–0 Elapsed: 1178
ch1b N: 10000 Min: —–0 Max: —-13 Median: —–0 Avg: —–0 Elapsed: 10146
ch1b N: 100000 Min: —–0 Max: —-59 Median: —–0 Avg: —–0 Elapsed: 109772
ch1b N: 1000000 Min: —–0 Max: —886 Median: —–0 Avg: —–0 Elapsed: 1048561
ch1b N: 10000000 Min: —–0 Max: –3734 Median: —–0 Avg: —–0 Elapsed: 11757297
And a sample of how keys partition over the space:
- 32 Buckets. 10M hashes Uniform Distribution Check.

312292 312457 312660 312522 313805 311987 311431 312299
311975 312834 312233 312125 313391 312222 312170 312389
311811 313134 312986 312434 312002 313060 312784 312699
312114 312170 312385 312721 313163 311883 313060 312802
oOo| Min: 311431 Max: 313805 Median: 312389 Avg: 312500 Elapsed: 15748372
Worst: 99.2435 Med: 99.5488 Avg: 99.5841 RSD: 0.9139

The RSD here is the standard deviation relative to the mean (avg) which is less than 1%
after 10M hash operations on random uniform test data. That’s pretty good.
A pint (or fancy fizzy water, or cup of pant wetting tea…) to anyone who can strip off an order of magnitude off the max outlier latencies and explain through scheduler, gc or emulator tuning what insight I missed or runtime vagary i’ve yet to master. Mainly written to toy with micro-benchmarking and get practice with tuning the emulator.
Enjoy!
Cheers,
Darach.

_______________________________________________
erlang-questions mailing list
erlang-questions@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

« back to News

Posted on November 13, 2014

Tags: