-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Dear YaHS Developers and Users,
For large genomes, the .assembly and .hic files generated by YaHS are not fully compatible with Juicebox. Manually setting the scale factor in Juicebox may be necessary. Additionally, Juicebox cannot correctly import and parse a modified assembly (i.e., .review.assembly) in these scenarios. To address this, I made some modifications in juicer.c, asset.c and asset.h to eliminate the need for manually setting the scale factor in Juicebox for YaHS.
This issue arises because the method for calculating the scale factor in YaHS differs from that of Juicebox. Juicebox uses 1 + genome_size / 2,100,000,000 as the scale factor, while YaHS uses the smallest n that fulfills genome_size / 2^n < INT_MAX, resulting in Juicebox being unable to infer the correct scale factor. For example, a genome with a size of 9 Gb will have a scale factor of 5 in Juicebox, whereas YaHS will calculate a scale factor of 2^3 = 8.
I also made another modification for the MAPQ filtering function in juicer.c. In the original version, MAPQ filtering is enabled only when the BAM is queryname sorted. However, the filtering process should also support unsorted BAM files.
The modified version of juicer.c, asset.c and asset.h is available at: https://github.com/zengxiaofei/yahs. Please feel free to use it in your work. I can make a pull request if the developers think it appropriate to merge these changes.
Best regards,
Xiaofei