par Kwasigroch, Jean-Marc
;Chomilier, Jacques;Mornon, Jean Paul
Référence Journal of Molecular Biology, 259, 4, page (855-872)
Publication Publié, 1996-06

Référence Journal of Molecular Biology, 259, 4, page (855-872)
Publication Publié, 1996-06
Article révisé par les pairs
Résumé : | A bank of loops from three to eight amino acid residues long has been constituted. On the basis of statistical analysis of occurrences of conformations and residue, loops could be divided into two parts: the side residues directly bonded to the secondary structure flanking element, and the inner part. The conformations of the side residues are correlated to the nature of their neighboring flanks, while the inner residues adopt conformations uncorrelated from one residue to the next; thus they are unrelated to the flanks. Two zones in the Ramachandran plot are important: alpha L and beta P. In particular, the high occurrence of alpha L, mainly occupied by glycine residues, is necessary to induce flexibility and thus allow loops to comply with the geometrical constraints of the flanks. An algorithm of clustering has been used to aggregate loops of the same length within families of similar 3D structures. At each position in each cluster, sequence and conformational signatures have been deduced if the occurrence of a residue (or a conformation) is higher than an equiprobable distribution over all clusters. The result is that some positions favor particular amino acids and conformations, which are typical of a cluster although not unique. This is an indication of a relation between structure and sequence in loops. A taxonomy is proposed that classifies the various clusters. It relies on two terms: the mean distance between the first and last C alpha in one cluster and, perpendicular to this line, the distance to the center of gravity of the cluster. It is noteworthy that the differently populated clusters represented in such 2D plots can be separated. Thus, although the conformations of loops in globular proteins could cover a continuum, it has been possible to cluster them into a limited number of well populated families and superfamilies. This basic feature of protein architecture could be further exploited to better predict their geometry. |