Thèse de doctorat
Résumé : The advancements in cell sequencing techniques over the last decade encouraged increasing adoption rates and lead to the release of numerous publicly available datasets.Single-cell RNA sequencing (scRNA-seq) is a sequencing technique providing transcriptomic profilings of individual cells and which became the dominant technology for the study of gene-expression. Due to the typical absence of annotations, clustering algorithms are routinely employed in the underlying computational analysis. Several technical challenges (i.e. high dimensionality and sparsity, dropout) motivated the pro-posal of numerous techniques adapted to the specificities of scRNA-seq data. However, despite these efforts, there is no consensus on the best performing method.We propose a suite of three novel methods for the unsupervised analysis (i.e. clustering) of scRNA-seq data. contrastive-sc applies the self-supervised contrastive learning techniques, originally proposed for image processing, to scRNA-seq data in order to create and cluster cell embeddings. graph-sc clusters scRNA-seq data with a graph convolutional auto-encoder model and offers the possibility to seamlessly in-tegrate various types of external data (i.e. gene correlations) under the same optimization task. discover performs bottom-up subspace clustering on scRNA-seq, bulk RNA-seq and microarray data with a hybrid genetic algorithm. An extensive experimental study is performed to assess each of the proposed methods on simulated and real-world datasets. Our methods compared favorably with state-of-the-art techniques when compared with over 10 competing algorithms.The clustering analysis is typically followed by differential expression (DE) analysis, which identifies the genes expressed differently across the identified clusters – the entry point for the biological downstream validation experiments. Our final contribution demonstrates that employing gradient-based explainability techniques on neural network clustering methods can identify the DE genes and outperforms several state-of-the-art dedicated methods while being significantly faster.