Article révisé par les pairs
Résumé : Several types of potentials are derived from a dataset of known protein structures by computing statistical relations between amino acid sequence and different descriptions of the protein conformation. These potentials formulate in different ways backbone dihedral angle preferences, pairwise distance-dependent interactions between amino acid residues, and solvation effects based on accessible surface area calculations. Parameters affecting the characteristics and the performance of the potentials are critically assessed by monitoring recognition of the native fold in a strict screening test, where each sequence in the dataset is threaded through a repertoire of motifs, generated from all corresponding structures. Sequence gaps are not allowed, to avoid additional approximations. Results show that residue interaction potentials computed from distances between average side-chain centroids perform significantly better on this test than those computed considering inter-C alpha or inter-C beta distances. Combining potentials that are based on different structural descriptions and different interactions is also beneficial. The performance of some of these potentials is in fact so good that they recognize the correct fold for all the tested proteins, including subunits known to be unstable in the absence of quaternary interactions. Most strikingly, potentials representing backbone dihedral angle preferences recognize as many as 68 protein chains out of a total of 74, even though they consider solely local interactions along the chain, which, being the same as those considered in secondary structure prediction methods, are well known to be incapable of determining the full three-dimensional fold. This leads us to question the ability of procedures that screen a limited repertoire of structures to act as a stringent test for the potentials. We concede, however, that they are useful and fast tests, capable of revealing gross shortcomings of the potentials, or possible biases towards native recognition due, for example, to effects of sequence memory.