mini core bioinformatics algorithms- Smith-Waterman, k-mer, and variant calling (with DeepVariant), all run on a 50GB WGS from Nucleus.
DNA sequence alignment using SIMD instructions. Compares two DNA sequences and scores how well the letters line up. Match gets +2 and Mismatch gets -1.
A = A (+2)
T = T (+2)
C = T (-1)
G = G (+2)
T = G (-1)
...
Direct alignment: compare to average reference genome
Complementary alignment: find what % of genome is not perfectly complementary (boooo)
Create a .env file in the project root with your WGS data configuration:
# WGS Data Configuration
WGS_DATA_DIR=/path/to/your/wgs/data
WGS_SAMPLE_ID=your-sample-id
WGS_LANES=8
WGS_READS_PER_LANE=2
# GPU Configuration
GPU_CHUNK_SIZE_READS=10000
GPU_CHUNK_SIZE_BASES=1000000# Test WGS file reading
cargo run -- --test-wgs --gpu
# Process full WGS dataset
cargo run -- --full-wgs --gpu
# Run with Nsight Systems
nsys profile -t opencl,cuda,osrt --output wgs_profile ./target/release/rustseq_mini --full-wgs --gpuThe aligner expects files named: {SAMPLE_ID}_L{LANE:03}_R{READ}_001.fastq.gz
- Example:
SAMPLE_001_L001_R1_001.fastq.gz