Thèse de doctorat
Résumé : Living cells and their components play a key role within biotechnology industry. Cell cultures and their products of interest are used for the design of vaccines as well as in the agro-alimentary field. In order to ensure optimal working of such bioprocesses, the understanding of the complex mechanisms which rule them is fundamental. Mathematical models may be helpful to grasp the biological phenomena which intervene in a bioprocess. Moreover, they allow prediction of system behaviour and are frequently used within engineering tools to ensure, for instance, product quality and reproducibility.

Mathematical models of cell cultures may come in various shapes and be phrased with varying degrees of mathematical formalism. Typically, three main model classes are available to describe the nonlinear dynamic behaviour of such biological systems. They consist of macroscopic models which only describe the main phenomena appearing in a culture. Indeed, a high model complexity may lead to long numerical computation time incompatible with engineering tools like software sensors or controllers. The first model class is composed of the first principles or white box models. They consist of the system of mass balances for the main species (biomass, substrates, and products of interest) involved in a reaction scheme, i.e. a set of irreversible reactions which represent the main biological phenomena occurring in the considered culture. Whereas transport phenomena inside and outside the cell culture are often well known, the reaction scheme and associated kinetics are usually a priori unknown, and require special care for their modelling and identification. The second kind of commonly used models belongs to black box modelling. Black boxes consider the system to be modelled in terms of its input and output characteristics. They consist of mathematical function combinations which do not allow any physical interpretation. They are usually used when no a priori information about the system is available. Finally, hybrid or grey box modelling combines the principles of white and black box models. Typically, a hybrid model uses the available prior knowledge while the reaction scheme and/or the kinetics are replaced by a black box, an Artificial Neural Network for instance.

Among these numerous models, which one has to be used to obtain the best possible representation of a bioprocess? We attempt to answer this question in the first part of this work. On the basis of two simulated bioprocesses and a real experimental one, two model kinds are analysed. First principles models whose reaction scheme and kinetics can be determined thanks to systematic procedures are compared with hybrid model structures where neural networks are used to describe the kinetics or the whole reaction term (i.e. kinetics and reaction scheme). The most common artificial neural networks, the MultiLayer Perceptron and the Radial Basis Function network, are tested. In this work, pure black box modelling is however not considered. Indeed, numerous papers already compare different neural networks with hybrid models. The results of these previous studies converge to the same conclusion: hybrid models, which combine the available prior knowledge with the neural network nonlinear mapping capabilities, provide better results.

From this model comparison and the fact that a physical kinetic model structure may be viewed as a combination of basis functions such as a neural network, kinetic model structures allowing biological interpretation should be preferred. This is why the second part of this work is dedicated to the improvement of the general kinetic model structure used in the previous study. Indeed, in spite of its good performance (largely due to the associated systematic identification procedure), this kinetic model which represents activation and/or inhibition effects by every culture component suffers from some limitations: it does not explicitely address saturation by a culture component. The structure models this kind of behaviour by an inhibition which compensates a strong activation. Note that the generalization of this kinetic model is a challenging task as physical interpretation has to be improved while a systematic identification procedure has to be maintained.

The last part of this work is devoted to another kind of biological systems: proteins. Such macromolecules, which are essential parts of all living organisms and consist of combinations of only 20 different basis molecules called amino acids, are currently used in the industrial world. In order to allow their functioning in non-physiological conditions, industrials are open to modify protein amino acid sequence. However, substitutions of an amino acid by another involve thermodynamic stability changes which may lead to the loss of the biological protein functionality. Among several theoretical methods predicting stability changes caused by mutations, the PoPMuSiC (Prediction Of Proteins Mutations Stability Changes) program has been developed within the Genomic and Structural Bioinformatics Group of the Université Libre de Bruxelles. This software allows to predict, in silico, changes in thermodynamic stability of a given protein under all possible single-site mutations, either in the whole sequence or in a region specified by the user. However, PoPMuSiC suffers from limitations and should be improved thanks to recently developed techniques of protein stability evaluation like the statistical mean force potentials of Dehouck et al. (2006). Our work proposes to enhance the performances of PoPMuSiC by the combination of the new energy functions of Dehouck et al. (2006) and the well known artificial neural networks, MultiLayer Perceptron or Radial Basis Function network. This time, we attempt to obtain models physically interpretable thanks to an appropriate use of the neural networks.