Article révisé par les pairs
Résumé : Analysis of the statistical distribution of amino acid compositions within 22 protein families shows that a GC bias generally affects proteins with a variety of functions from the extreme thermophile Thermus. This results in evident enrichment in amino acids of the group L, V, A, P, R and G and underrepresentation of amino acids of the group I, M, E S, T, C and W. The strong amino acid composition biases noted in Thermus proteins are not related to thermoadaptation; they were also found in mesophilic homologues encoded by GC-rich genes. The results of a comparative analysis on large samples of translated sequences from 30 organisms, representing the three major kingdoms of life and including extremophiles, indicate a universal correlation between the usage of particular amino acids and the genomic GC content. It is concluded that the codon first letter plays a dominant role in translating the genomic GC signature into protein amino acid composition and sequences.