Thèse de doctorat
Résumé : Thanks to the development of next-generation sequencing (NGS) technology, whole genome data can be readily obtained from a variety of samples. Since the massive increase in available sequencing data, the development of efficient assembly algorithms has become the new bottleneck. Almost every new released tool is based on the De Brujin graph method, which focuses on assembling complete datasets with mathematical models. Although the decreasing sequencing costs made whole genome sequencing (WGS) the most straightforward and least laborious approach of gathering sequencing data, many research projects are only interested in the extranuclear genomes. Unfortunately, few of the available tools are specifically designed to efficiently retrieve these extranuclear genomes from WGS datasets. We developed a seed-and-extend algorithm that assembles organelle circular genomes from WGS data, starting from a single short seed sequence. The algorithm has been tested on several new (Gonioctena intermedia and Avicennia marina) and public (Arabidopsis thaliana and Oryza sativa) whole genome Illumina datasets and always outperformed other assemblers in assembly accuracy and contiguity. In our benchmark, NOVOPlasty assembled all genomes in less than 30 minutes with a maximum RAM memory requirement of 16 GB. NOVOPlasty is the only de novo assembler that provides a fast and straightforward manner to extract the extranuclear sequences from WGS data and generates one circular high quality contig.Heteroplasmy, the existence of multiple mitochondrial haplotypes within an individual, has been researched across different fields. Mitochondrial genome polymorphisms have been linked to multiple severe disorders and are of interest to evolutionary studies and forensic science. By utilizing ultra-deep sequencing, it is now possible to uncover previously undiscovered patterns of intra-individual polymorphism. However, it remains challenging to determine its source. Current available software can detect polymorphic sites but are not capable of determining the link between them. We therefore developed a new method to not only detect intra-individual polymorphisms within mitochondrial and chloroplast genomes, but also to look for linkage among polymorphic sites by assembling the sequence around each detected polymorphic site. Our benchmark study shows that this method can detect heteroplasmy more accurately than any method previously available and is the first tool that is able to completely or partially reconstruct the origin sequences for each intra-individual polymorphism.