par Tsishyn, Matsvei 
Président du jury Gilis, Dimitri
Promoteur Pucci, Fabrizio
Co-Promoteur Rooman, Marianne
Publication Non publié, 2026-06-29

Président du jury Gilis, Dimitri

Promoteur Pucci, Fabrizio

Co-Promoteur Rooman, Marianne

Publication Non publié, 2026-06-29
Thèse de doctorat
| Résumé : | Mutations are the fundamental building blocks of evolution, and understanding their effects on protein biophysical properties and fitness remains a central question in biology. Such knowledge is essential for elucidating the molecular basis of genetic diseases, understanding genotype–phenotype relationships, and guiding rational protein design. Although high-throughput experimental techniques have generated unprecedented amounts of mutational data, experimental characterization remains challenging, assay-dependent, and limited in scope. Computational approaches are therefore needed to complement experiments, explore larger mutational spaces, and provide insights into mechanisms underlying mutation effects.In contrast to the current trend towards increasingly complex black-box models with billions of parameters, this thesis focuses on interpretable approaches rooted in biological principles. A recurring theme throughout this work is the difficulty of reliably training and evaluating models in the presence of biases, uncertainty and assay-specific effects in available data. The core of this thesis investigates how evolutionary information can be used to predict mutational effects.In particular, we compared single-site evolutionary models, which consider each protein position independently, with epistatic models that additionally account for interactions between positions. We found that current epistatic approaches often fail to outperform much simpler single-site models despite their higher complexity. We further showed that integrating structural information into evolutionary models leads to an impressive amount of improvement. Building on these observations, we developed RSALOR, a model that combines residue conservation with solvent accessibility. Despite relying on two old and well-established biological principles, the model achieves prediction accuracy comparable to, or exceeding, that of much more complex evolutionary and structure-based methods, questioning latest advances in the field.Finally, we revisited epistatic models by incorporating structural information directly into the inference process. By restricting residue couplings to pairs that are in contact in the 3D structure, we drastically reduced model complexity while focusing on physically relevant interactions. This led to the development of StructureDCA, a structure-informed direct coupling analysis model that provides predictions beyond single-site conservation, while remaining computationally efficient and interpretable.Overall, this work demonstrates the strength of simplicity, building models rooted in biologically meaningful principles, and efficiently combining evolutionary and structural information. |



