Releases: jltsiren/gbwt
GBWT v1.4
- GBWT is now silent by default; adjust with
Verbosity::set()if necessary. GBWTBuilder(and related tools) will automatically increase buffer size if a sequence is too large for the buffer.- Metadata improvements:
FullPathName: A standalone version ofPathNamethat stores sample/contig names/ids as strings without requiringMetadata.Metadata::findFragment(): Returns the path identifier of the haplotype fragment possibly covering the (sample, contig, haplotype, offset) represented by a path name.FragmentMap: A helper structure for working with fragmented haplotypes.
- New functionality:
FastLocate::decompressSA()andFastLocate::decompressDA()for decompressing the part of the suffix array / document array corresponding to a node.
GBWT v1.3.1
Minor patch release for the GBZ paper.
- Empty paths are fully supported (but still discouraged).
- Text input format for
build_gbwt(mostly for testing). - The broken CMake support has been removed.
GBWT v1.3
- Supports 64-bit ARM.
- File format version 5:
- Optional serialization using simple-sds structures.
Tagsstructure storing arbitrary key-value pairs.- Compatible with versions 1-4.
- Uses
Metadataversion 2 (compatible with versions 0-1).
inverseLF(): Follow the sequence backward in a bidirectional index.- Serialization and loading use exceptions to handle failures.
- Requires the vgteam fork of SDSL.
GBWT v1.2
- Uses C++14 and the vgteam fork of SDSL.
- Direct
GBWTtoDynamicGBWTconversion. - Temporary files are now thread-safe.
- An option to use persistent phasing files for haplotype generation. These files persist when the associated object is deleted, but they are still deleted when the program exits.
- The fast GBWT merging algorithm now works with overlapping node id ranges as long as the non-empty records do not overlap.
metadata_toolnow prints metadata or removes it completely.
GBWT v1.1
Major new functionality: FastLocate. An add-on structure for the compressed GBWT implementing the r-index locate() algorithm. Larger than the existing locate() structure but also much faster. Must be rebuilt whenever the GBWT is changed.
Other improvements:
- Metadata is ignored when merging empty GBWTs.
- Faster construction when the paths contain many different starting nodes.
GBWT v1.0
Various minor improvements. The GBWT is now stable enough to reach v1.0.
- Option to force the phasing of homozygous variants (default on).
CachedGBWT: A caching layer overGBWTfor workloads that repeatedly access the same subset of nodes.- Direct
DynamicGBWTtoGBWTconversion. - Install script.
GBWT v0.9
Proper metadata: Each path (or a combination of a path and its reverse complement in a bidirectional index) has a name that consists of four integer components: sample, contig, phase, and count. Sample and contig ids may further have strings as names.
- Extended metadata with path, sample, and contig names.
- Sample names and contig name in VCF parse.
- Create full metadata when building GBWT from a VCF parse using
build_gbwt. - Renamed
metadatatometadata_tool. - Remove sequences by sample / contig name in
remove_seq. - New functionality:
GBWT::firstNode(),GBWT::empty(node).
GBWT v0.8
Construction improvements. This version was used for the benchmarks in the full version of the paper.
- An algorithm for removing sequences from
DynamicGBWT. - Multiple parallel merge jobs in BWT-merge. If the temporary disk is fast enough, merging is roughly twice as fast as in v0.8.
build_gbwtimprovements: Accept file lists, write metadata when building from VCF parse.
GBWT v0.7
Faster construction for datasets larger than 1000GP.
- Parallel merging algorithm for quickly merging multiple GBWTs over the same chromosome. It can reduce the index construction time for large datasets by a factor of 2 to 3.
- Optional metadata in the GBWT index.
- New functionality:
GBWT::extract(position),GBWT::extract(position, max_length),DynamicGBWT::fullLF().
GBWT v0.6
Various improvements to support building GBWT for larger datasets than 1000GP.
- Option to change the path identifier sampling interval.
- Save the temporary structures from haplotype generation and use them as input for
build_gbwt. - Decompress the endmarker of compressed GBWT for faster
extract()queries in indexes with millions of paths. - Bug fix: Initialize incoming edges correctly when loading
DynamicGBWTif alphabet offset is non-zero. - Support for Clang.
Full decompression of the endmarker made changes to the index file format unnecessary at the moment. The changes will be made before v1.0, though.