[WIP] Add FNV1a hash function #1760

betatim · 2017-08-22T19:06:00Z

Fixes #1757

Test driving the FNV hash. Benchmarks to do:

computation time
"quality" of the hashing (returning a constant would be a fast hash, but low quality) not sure yet how to measure this.

Is it mergeable?
make test Did it pass the tests?
make clean diff-cover If it introduces new functionality in
scripts/ is it tested?
make format diff_pylint_report cppcheck doc pydocstyle Is it well
formatted?
Did it change the command-line interface? Only backwards-compatible
additions are allowed without a major version increment. Changing file
formats also requires a major version number increment.
For substantial changes or changes to the command-line interface, is it
documented in CHANGELOG.md? See keepachangelog
for more details.
Was a spellchecker run on the source code and documentation after
changes were made?
Do the changes respect streaming IO? (Are they
tested for streaming IO?)

betatim · 2017-08-22T19:09:07Z

Surprising how few tests fail with this!

ctb · 2017-08-22T19:32:53Z

nice :).

betatim · 2017-08-25T14:48:28Z

Results from running this script as a benchmark.

$ python fp.py bf # FNV
kind, load, query, t_load, t_queryNP, t_queryP
bf, 9000000, 1000000, 10.381915807723999, 1.5168020725250244, 1.2786040306091309
bf, 9000000, 1000000, 10.489625930786133, 1.3121399879455566, 1.2331571578979492
bf, 9000000, 1000000, 9.589091062545776, 1.1095459461212158, 1.0996849536895752
bf, 9000000, 1000000, 10.481687068939209, 1.2617030143737793, 1.159147024154663

$ python fp.py bf # murmur
kind, load, query, t_load, t_queryNP, t_queryP
bf, 9000000, 1000000, 10.766052007675171, 1.184602975845337, 1.1206409931182861
bf, 9000000, 1000000, 9.716826915740967, 1.169848918914795, 1.0594408512115479
bf, 9000000, 1000000, 10.398908853530884, 1.231614112854004, 1.2056090831756592
bf, 9000000, 1000000, 11.516963958740234, 1.6280360221862793, 1.1633579730987549
bf, 9000000, 1000000, 10.743386030197144, 1.2828280925750732, 1.1859049797058105

The thing to look at are the last three columns which measure how long how it takes to load kmers, lookup a kmer that isn't present and kmers which are present.

Observations: the benchmark isn't very useful because it is pretty noisy and I'd conclude that FNV1a is no faster than murmurhash. I find that at least somewhat surprising.

ctb · 2017-08-26T22:09:24Z

Question for all y'all - are we benchmarking with different *numbers* of hash tables? I'm wondering if the prime number modulus is causing any issues. I am still stuck on the question of (a) whether things could be faster, (b) if so, how easy it would be to speed things up. Right now we seem to be stuck in the never never land of "it's hard to identify any one factor that is slowing things down."

codecov-io · 2017-09-12T09:38:34Z

Codecov Report

Merging #1760 into master will increase coverage by <.01%.
The diff coverage is 0%.

@@            Coverage Diff            @@
##           master   #1760      +/-   ##
=========================================
+ Coverage    0.05%   0.06%   +<.01%     
=========================================
  Files          87      78       -9     
  Lines       11441    9807    -1634     
  Branches     3072    2459     -613     
=========================================
  Hits            6       6              
+ Misses      11435    9801    -1634

Impacted Files	Coverage Δ
include/oxli/hashtable.hh	`0% <ø> (ø)`	⬆️
include/oxli/kmer_hash.hh	`0% <ø> (ø)`	⬆️
src/oxli/kmer_hash.cc	`0% <0%> (ø)`	⬆️
src/oxli/hllcounter.cc	`0% <0%> (ø)`	⬆️
scripts/abundance-dist.py	`0% <0%> (ø)`	⬆️
scripts/trim-low-abund.py	`0% <0%> (ø)`	⬆️
scripts/make-initial-stoptags.py	`0% <0%> (ø)`	⬆️
src/oxli/labelhash.cc	`0% <0%> (ø)`	⬆️
include/oxli/hashgraph.hh	`0% <0%> (ø)`	⬆️
src/khmer/_cpy_khmer.cc	`0% <0%> (ø)`	⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update acdacef...43aaf93. Read the comment docs.

Add FNV1a hash function

6cc0d0b

betatim force-pushed the try/fnv-hash branch from 417b1fa to 6cc0d0b Compare August 22, 2017 19:10

betatim mentioned this pull request Aug 23, 2017

K-mer banding, bit patterns, and the fast (2-bit) hash #1761

Open

Temporary commit to store state of play

43aaf93

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Add FNV1a hash function #1760

[WIP] Add FNV1a hash function #1760

Uh oh!

betatim commented Aug 22, 2017

Uh oh!

betatim commented Aug 22, 2017

Uh oh!

ctb commented Aug 22, 2017

Uh oh!

betatim commented Aug 25, 2017

Uh oh!

ctb commented Aug 26, 2017 via email

Uh oh!

codecov-io commented Sep 12, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[WIP] Add FNV1a hash function #1760

Are you sure you want to change the base?

[WIP] Add FNV1a hash function #1760

Uh oh!

Conversation

betatim commented Aug 22, 2017

Uh oh!

betatim commented Aug 22, 2017

Uh oh!

ctb commented Aug 22, 2017

Uh oh!

betatim commented Aug 25, 2017

Uh oh!

ctb commented Aug 26, 2017 via email

Uh oh!

codecov-io commented Sep 12, 2017

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants