par Schwersensky, Martin
Président du jury Prévost, Martine
Promoteur Rooman, Marianne
Co-Promoteur Pucci, Fabrizio
Publication Non publié, 2023-12-20
Président du jury Prévost, Martine
Promoteur Rooman, Marianne
Co-Promoteur Pucci, Fabrizio
Publication Non publié, 2023-12-20
Thèse de doctorat
Résumé : | In recent years, the development of machine learning applications in protein science has seen performance breakthroughs which have further established bioinformatics as an essential discipline to cope with and exploit the huge amount of data coming from biology. However, the problem of predicting the change in a protein’s stability upon the mutation of its residues has struggled to measure up to these breakthroughs. This area of research faces several challenges such as the insufficient amount of available data, the many biases that can limit prediction performance, and the necessity to utilize sufficiently informative features and to combine them in a way that agrees with the principles of physics. In this regard, the integration of protein structural and evolutionary information stands out as a major research area to improve protein stability predictors. Conversely, our understanding of molecular evolution can benefit from the application of protein stability prediction, as protein folding stability is recognised as a strong selection pressure in protein evolution.The present thesis aims to leverage the study of this relationship between protein stability and evolution in order to contribute to the improvement of both protein stability predictors and our understanding of molecular evolution. We start by reviewing the challenges faced by protein stability predictors and we then illustrate their relevance to both fundamental and applied research questions. In particular, we uncover patterns of natural selection for protein mutational robustness at various levels of biological information. At the amino acid level, we show how the surface of proteins tends to be more robust against random mutations and to evolve faster than their core. At the genetic code level, our results suggest that it is optimized to limit the impact of random mutations, but even more so to limit translation errors. At the codon level, both codon usage and the codon usage bias appear to optimize mutational robustness and translation accuracy. Next, we apply predictors of protein structural stability and evolutionary fitness to characterize cancer-associated missense variants in the context of our participation to CAGI, a blind prediction experiment. Our results expose the difficulty of predicting the impact of stabilizing mutations of small magnitude, as well as the impacts of mutations on protein function. These results further emphasise the need to develop improved protein stability predictors that are less prone to biases. Finally, we perform a preliminary analysis of structural and evolutionary features to guide the future improvements of our protein stability predictor. We show how most features are able to inform about destabilizing mutations, particularly in core and conserved residues, and we clarify the contexts in which each evolutionary and each structural feature is most relevant to predict protein stability.Overall this thesis constitutes a comprehensive resource to help improve the performance of current protein stability predictors and models of molecular evolution. |