Article révisé par les pairs
Résumé : The study of a few genes has permitted the identification of three elements that constitute a yeast polyadenyl-ation signal: the efficiency element (EE), the positioning element and the actual site for cleavage and poly-adenyl-ation. In this paper we perform an analysis of oligonucleotide composition on the sequences located downstream of the stop codon of all yeast genes. Several oligonucleotide families appear over-represented with a high significance (referred to herein as 'words'). The family with the highest over-representation includes the oligonucleotides shown experimentally to play a role as EEs. The word with the highest score is TATATA, followed, among others, by a series of single-nucleotide variants (TATGTA, TACATA, TAAATA.) and one-letter shifts (ATATAT). A position analysis reveals that those words have a high preference to be in 3' flanks of yeast genes and there they have a very uneven distribution, with a marked peak around 35 bp after the stop codon. Of the predicted ORFs, 85% show one or more of those sequences. Similar results were obtained using a data set of EST sequences. Other clusters of over-represented words are also detected, namely T- and A-rich signals. Using these results and previously known data we propose a general model for the 3' trailers of yeast mRNAs.