Measure of rolling hash
Looking for a fast rolling hash for a later experiment. Tried a few variations - this gave the best throughput. The sources are up on Github as rolling-hash.
$ make clean ; make
rm -rf o
mkdir o
cc -O3 -c -o o/main.o sources/main.cpp
cc -O3 -c -o o/ZRollingHash.o sources/ZRollingHash.cpp
cc -O3 -lc++ -o bin/rate o/main.o o/ZRollingHash.o
bin/rate
sizeof char: 1 unsigned: 4 long: 8 char*: 8
Prime buffer (256 MB)...
405 (ms) - prime buffer
Prime rolling hash 1...
Scan buffer for at least 30 seconds...
29562 (ms) - passes: 171 chunks: 11628 total: 45902462976 average: 3947580 rate: 1480 MB/s
$ make
bin/rate
sizeof char: 1 unsigned: 4 long: 8 char*: 8
Prime buffer (256 MB)...
389 (ms) - prime buffer
Prime rolling hash 1...
Scan buffer for at least 30 seconds...
29384 (ms) - passes: 170 chunks: 11560 total: 45634027520 average: 3947580 rate: 1481 MB/s
~1480MB/s on a late-2013 MacBookPro (2.3Ghz i7 CPU).
I am looking to slice bulk data at variable data-dependent boundaries for de-duplication of storage.