Skip to content

get_int for interleaved_bloom_filter #77

@eseiler

Description

@eseiler

Description

Provide functionality similar to get_int for the seqan3::interleaved_bloom_filter:

uint64_t get_int(size_type idx, uint8_t len = 64) const

returns an uint64_t that includes the bits [idx, idx + len).

This provides an efficient way to access multiple bits of the seqan3::interleaved_bloom_filter.

I already did a benchmark of the different ways one might access the IBF:

----------------------------
Benchmark         hashes/sec
----------------------------
native/64/512     23.9736M/s
native/64/16384   24.1353M/s
native/8192/4     207.304k/s
native/8192/128   210.824k/s

iterator/64/512   11.9472M/s
iterator/64/16384 11.3783M/s
iterator/8192/4   92.3077k/s
iterator/8192/128 96k/s

get_int/64/512    133.358M/s
get_int/64/16384  124.121M/s
get_int/8192/4    3.50958M/s
get_int/8192/128  3.50958M/s

data/64/512       132.741M/s
data/64/16384     127.431M/s
data/8192/4       2.96145M/s
data/8192/128     2.89564M/s

The format of the benchmark names is method / number of bins / size per bin. All of them use 2 hash functions and sequences of length 1'000.
Links:

The general setup is that we have a query and want to count the occurrence of all k-mers in all bins.

native uses the bulk_contains and uses the subscript operator to increase the counters of a counting vector.
iterator uses bulk_contains and then iterates over the binning bitvector to increase the counters of a counting vector.
get_int uses bulk_contains and then accesses batches of 64 bit via get_int to increase the counters of a counting vector.
data uses bulk_contains and then accesses batches of 64 bit via data to increase the counters of a counting vector.

Acceptance Criteria

  • The seqan3::interleaved_bloom_filter exposes functionality similar to get_int.

Tasks

  • Adapt seqan3::interleaved_bloom_filter::binning_bitvector to allow access of multiple bits.

Definition of Done

  • Implementation and design approved
  • Unit tests pass
  • Test coverage = 100%
  • Microbenchmarks added and/or affected microbenchmarks < 5% performance drop
  • API documentation added
  • Tutorial/teaching material added
  • Test suite compiles in less than 30 seconds (on travis)
  • Changelog entry added

Metadata

Metadata

Assignees

No one assigned

    Labels

    ready to tackleThis story was discussed and can be immidietly tackled

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions