-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Description
Provide functionality similar to get_int for the seqan3::interleaved_bloom_filter:
uint64_t get_int(size_type idx, uint8_t len = 64) constreturns an uint64_t that includes the bits [idx, idx + len).
This provides an efficient way to access multiple bits of the seqan3::interleaved_bloom_filter.
I already did a benchmark of the different ways one might access the IBF:
----------------------------
Benchmark hashes/sec
----------------------------
native/64/512 23.9736M/s
native/64/16384 24.1353M/s
native/8192/4 207.304k/s
native/8192/128 210.824k/s
iterator/64/512 11.9472M/s
iterator/64/16384 11.3783M/s
iterator/8192/4 92.3077k/s
iterator/8192/128 96k/s
get_int/64/512 133.358M/s
get_int/64/16384 124.121M/s
get_int/8192/4 3.50958M/s
get_int/8192/128 3.50958M/s
data/64/512 132.741M/s
data/64/16384 127.431M/s
data/8192/4 2.96145M/s
data/8192/128 2.89564M/s
The format of the benchmark names is method / number of bins / size per bin. All of them use 2 hash functions and sequences of length 1'000.
Links:
The general setup is that we have a query and want to count the occurrence of all k-mers in all bins.
native uses the bulk_contains and uses the subscript operator to increase the counters of a counting vector.
iterator uses bulk_contains and then iterates over the binning bitvector to increase the counters of a counting vector.
get_int uses bulk_contains and then accesses batches of 64 bit via get_int to increase the counters of a counting vector.
data uses bulk_contains and then accesses batches of 64 bit via data to increase the counters of a counting vector.
Acceptance Criteria
- The
seqan3::interleaved_bloom_filterexposes functionality similar toget_int.
Tasks
- Adapt
seqan3::interleaved_bloom_filter::binning_bitvectorto allow access of multiple bits.
Definition of Done
- Implementation and design approved
- Unit tests pass
- Test coverage = 100%
- Microbenchmarks added and/or affected microbenchmarks < 5% performance drop
- API documentation added
- Tutorial/teaching material added
- Test suite compiles in less than 30 seconds (on travis)
- Changelog entry added