Thèse de doctorat
Résumé : Rare diseases are a global health issue that affects around 3% of the population worldwide. Some of those diseases affect pathways involved in establishing or maintaining 5-mC DNA methylation (DNAm). This leads to detectable variations in DNA methylation levels that act as proxy markers for those diseases. Identification of those variations may lead to an increased diagnosis yield as well as a better understanding of the etiology of those diseases. In the last decades, two major approaches to tackle DNAm in diseases have emerged. The first relies on the identification of differentially methylated regions (DMRs) of biological interest whose modified methylation state correlates with the phenotype linked to the disease. The second approach is machine learning based and consists of a classifier that distinguishes patients and controls based on a set of differentially methylated CpGs. Those models have been termed episignatures and showed promising utility in a clinical setting. Although both methods are complementary, they differ in their implementation and raise several questions when applied to rare diseases.First, most methods for the detection of differentially methylated CpGs and their aggregation in regions have been implemented for common diseases and are based on group comparisons. However, it is often difficult to gather enough patient samples to apply those methods in the context of rare diseases. Therefore, those methods are not compatible with this context and a different approach to detecting DMRs should be used.Second, no real guidelines have been proposed for building episignatures. Therefore, there is a need to evaluate the parameters that affect episignatures implementation.In addition, most episignatures used to discriminate between patients suffering from rare diseases and controls have been trained on small cohorts coming from one genetic center and on a deprecated technology. This raises the question of the transposability of those models on a new set of data from other centers and generated on other technologies.In this thesis, we address those 3 questions. First, we investigate what other methods for differentially methylated regions detection may be used in the context of rare diseases. We describe a new method based on the Z-score and the Empirical Brown’s aggregation method to identify DMRs in a single-patient setting and show its diagnosis utility in patients suffering from rare imprinting disorders. Second, we investigate how batch effects such as array technology and the origin of training data affect classifier performances and discuss ways to handle those problems. Then, we propose guidelines on how to implement episignatures and illustrate those in two rare disorders of the methylation machinery. Moreover, we discuss those issues in light of clinical diagnosis and try to offer solutions compatible with this setting.