Article révisé par les pairs
Résumé : Bacteriophage genomes show pervasive mosaicism, indicating the importance of horizontal gene exchange in their evolution. Phage genomes represent unique combinations of modules, each of them with a different phylogenetic history. The traditional classification, based on a variety of criteria such as nucleic acid type (single/double-stranded DNA/RNA), morphology, and host range, appeared inconsistent with sequence analyses. With the genomic era, an ever increasing number of sequenced phages cannot be classified, in part due to a lack of morphological information and in part to the intrinsic incapability of tree-based methods to efficiently deal with mosaicism. This problem led some virologists to call for a moratorium on the creation of additional taxa in the order Caudovirales, in order to let virologists discuss classification schemes that might better suit phage evolution. In this context, we propose a framework for a reticulate classification of phages based on gene content. Starting from gene families, we built a weighted graph, where nodes represent phages and edges represent phage-phage similarities in terms of shared genes. We then apply various measures of graph topology to analyze the resulting graph. Most double-stranded DNA phages are found in a single component. The values of the clustering coefficient and closeness distinguish temperate from virulent phages, whereas chimeric phages are characterized by a high betweenness coefficient. We apply a 2-step clustering method to this graph to generate a reticulate classification of phages: Each phage is associated with a membership vector, which quantitatively characterizes its membership to the set of clusters. Furthermore, we cluster genes based on their "phylogenetic profiles" to define "evolutionary cohesive modules." In virulent phages, evolutionary modules span several functional categories, whereas in temperate phages they correspond better to functional modules. Moreover, despite the fact that modules only cover a fraction of all phage genes, phage groups can be distinguished by their different combination of modules, serving the bases for a higher level reticulate classification. These 2 classification schemes provide an automatic and dynamic way of representing the relationships within the phage population and can be extended to include newly sequenced phage genomes, as well as other types of genetic elements. © 2008 The Authors.