par Faure, Roland ;Lavenier, Dominique;Flot, Jean-François
Référence Peer Community Journal, 4, e96
Publication Publié, 2024-10-01
Référence Peer Community Journal, 4, e96
Publication Publié, 2024-10-01
Article révisé par les pairs
Résumé : | Motivation: Long-read assemblers face challenges in discerning closely related viral or bacterial strains, often collapsing similar strains into a single sequence. This limitation has been hampering metagenome analysis, as diverse strains may harbor crucial functional distinctions. Results: We introduce a novel software, HairSplitter, designed to retrieve strains from a partially or totally collapsed assembly and long reads. The method uses a custom variant-calling process to operate with erroneous long reads and introduces a new read binning algorithm to recover an a priori unknown number of strains. On noisy long reads, HairSplitter recovers more strains while being faster than state-of-the-art tools, both in the cases of viruses and bacteria. Availability: Hair-Splitter is freely available on GitHub at https://github.com/RolandFaure/Hairsplitter (https://doi.org/10.5281/zenodo.13753481). |