par Faure, Roland
Président du jury Chikhi, Rayan
Promoteur Flot, Jean-François ;Lavenier, Dominique
Publication Non publié, 2024-11-27
Président du jury Chikhi, Rayan
Promoteur Flot, Jean-François ;Lavenier, Dominique
Publication Non publié, 2024-11-27
Thèse de doctorat
Résumé : | This thesis presents solutions to improve genome assembly from third-generation sequencing reads, with a specific focus on improving the assembly of (meta)genomes containing multiple haplotypes, such as polyploid genomes or close bacterial strains. Current assemblers struggle to separate highly similar haplotypes, often collapsing all or parts of the haplotypes into one, thereby discarding polymorphisms and heterozygosity. This work introduces a series of methods and software tools to achieve haplotype-separated assemblies. Specifically, GenomeTailor and HairSplitter transform a collapsed assembly obtained with erroneous long reads into a phased assembly, significantly improving on the state of the art when numerous strains are present. The software Alice introduces a new method based on the new ``MSR'' sketching technique for efficiently assembling multiple haplotypes sequenced with high-fidelity reads. Additionally, this thesis proposes a new Hi-C scaffolding strategy that involves untangling assembly graphs which significantly improves final assemblies, particularly when several haplotypes are present. |