Skip to content

Releases: jltsiren/gbwt

GBWT v1.4

08 Mar 06:27

Choose a tag to compare

  • GBWT is now silent by default; adjust with Verbosity::set() if necessary.
  • GBWTBuilder (and related tools) will automatically increase buffer size if a sequence is too large for the buffer.
  • Metadata improvements:
    • FullPathName: A standalone version of PathName that stores sample/contig names/ids as strings without requiring Metadata.
    • Metadata::findFragment(): Returns the path identifier of the haplotype fragment possibly covering the (sample, contig, haplotype, offset) represented by a path name.
    • FragmentMap: A helper structure for working with fragmented haplotypes.
  • New functionality:
    • FastLocate::decompressSA() and FastLocate::decompressDA() for decompressing the part of the suffix array / document array corresponding to a node.

GBWT v1.3.1

18 Feb 06:15

Choose a tag to compare

Minor patch release for the GBZ paper.

  • Empty paths are fully supported (but still discouraged).
  • Text input format for build_gbwt (mostly for testing).
  • The broken CMake support has been removed.

GBWT v1.3

15 Nov 19:29

Choose a tag to compare

  • Supports 64-bit ARM.
  • File format version 5:
    • Optional serialization using simple-sds structures.
    • Tags structure storing arbitrary key-value pairs.
    • Compatible with versions 1-4.
    • Uses Metadata version 2 (compatible with versions 0-1).
  • inverseLF(): Follow the sequence backward in a bidirectional index.
  • Serialization and loading use exceptions to handle failures.
  • Requires the vgteam fork of SDSL.

GBWT v1.2

23 Jan 04:06

Choose a tag to compare

  • Uses C++14 and the vgteam fork of SDSL.
  • Direct GBWT to DynamicGBWT conversion.
  • Temporary files are now thread-safe.
  • An option to use persistent phasing files for haplotype generation. These files persist when the associated object is deleted, but they are still deleted when the program exits.
  • The fast GBWT merging algorithm now works with overlapping node id ranges as long as the non-empty records do not overlap.
  • metadata_tool now prints metadata or removes it completely.

GBWT v1.1

14 Sep 23:25

Choose a tag to compare

Major new functionality: FastLocate. An add-on structure for the compressed GBWT implementing the r-index locate() algorithm. Larger than the existing locate() structure but also much faster. Must be rebuilt whenever the GBWT is changed.

Other improvements:

  • Metadata is ignored when merging empty GBWTs.
  • Faster construction when the paths contain many different starting nodes.

GBWT v1.0

06 Sep 01:10

Choose a tag to compare

Various minor improvements. The GBWT is now stable enough to reach v1.0.

  • Option to force the phasing of homozygous variants (default on).
  • CachedGBWT: A caching layer over GBWT for workloads that repeatedly access the same subset of nodes.
  • Direct DynamicGBWT to GBWT conversion.
  • Install script.

GBWT v0.9

12 Apr 22:14

Choose a tag to compare

Proper metadata: Each path (or a combination of a path and its reverse complement in a bidirectional index) has a name that consists of four integer components: sample, contig, phase, and count. Sample and contig ids may further have strings as names.

  • Extended metadata with path, sample, and contig names.
  • Sample names and contig name in VCF parse.
  • Create full metadata when building GBWT from a VCF parse using build_gbwt.
  • Renamed metadata to metadata_tool.
  • Remove sequences by sample / contig name in remove_seq.
  • New functionality: GBWT::firstNode(), GBWT::empty(node).

GBWT v0.8

11 Jan 20:36

Choose a tag to compare

Construction improvements. This version was used for the benchmarks in the full version of the paper.

  • An algorithm for removing sequences from DynamicGBWT.
  • Multiple parallel merge jobs in BWT-merge. If the temporary disk is fast enough, merging is roughly twice as fast as in v0.8.
  • build_gbwt improvements: Accept file lists, write metadata when building from VCF parse.

GBWT v0.7

22 Nov 05:06

Choose a tag to compare

Faster construction for datasets larger than 1000GP.

  • Parallel merging algorithm for quickly merging multiple GBWTs over the same chromosome. It can reduce the index construction time for large datasets by a factor of 2 to 3.
  • Optional metadata in the GBWT index.
  • New functionality: GBWT::extract(position), GBWT::extract(position, max_length), DynamicGBWT::fullLF().

GBWT v0.6

24 Sep 19:03

Choose a tag to compare

Various improvements to support building GBWT for larger datasets than 1000GP.

  • Option to change the path identifier sampling interval.
  • Save the temporary structures from haplotype generation and use them as input for build_gbwt.
  • Decompress the endmarker of compressed GBWT for faster extract() queries in indexes with millions of paths.
  • Bug fix: Initialize incoming edges correctly when loading DynamicGBWT if alphabet offset is non-zero.
  • Support for Clang.

Full decompression of the endmarker made changes to the index file format unnecessary at the moment. The changes will be made before v1.0, though.