Skip to content

Conversation

@ekg
Copy link
Collaborator

@ekg ekg commented Sep 2, 2024

Previously, settings that might make runtime slightly better when aligning pangenomes hurt performance in comparative genomics contexts. Updates related to mashmap3 and alignment have made us much more robust to defaults that are more sensitive.

In this PR, I'm setting a bunch of defaults which have become standard in my testing:

  • Default minimum mapping identity reduced from 90% to 70%.
  • Set maximum mapping length to 50k by default (previously unlimited).
  • Changed block length default from 5x segment length to 3x segment length.
  • Set default chain gap to 30kb (previously was 6x segment length, up to 30k).
  • Reduced default segment length from 5k to 1k.
  • Changed default kmer size from 19 to 15.
  • Modified wflign to run on all fragments except very small ones (less than 1000 bp).
  • Changed filtering logic to use Euclidean distance as an absolute cutoff instead of axis-weighted Euclidean distance, while still ranking based on axis-weighted distance.

These should tend to make wfmash more sensitive at the edges of its performance envelope with minimal costs for easy, low-divergence pangenome alignment problems.

@ekg ekg merged commit 4521c10 into main Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants