Résumé : 1. Analyzing the phylogenetic structure of natural communities may illuminate the processes governing the assembly and coexistence of species. For instance, an association between species co-occurrence in local communities and their phylogenetic proximity may reveal the action of habitat filtering, niche conservatism and/or competitive exclusion. 2. Different methods were recently proposed to test such community-wide phylogenetic patterns, based on the phylogenetic clustering or overdispersion of the species in a local community. This provides a much needed framework for addressing long standing questions in community ecology as well as the recent debate on community neutrality. The testing procedures are based on (i) a metric measuring the association between phylogenetic distance and species co-occurrence, and (ii) a data set randomization algorithm providing the distribution of the metric under a given `null model'. However, the statistical properties of these approaches are not well-established and their reliability must be tested against simulated data sets. 3. This paper reviews metrics and null models used in previous studies. A `locally neutral' subdivided community model is simulated to produce data sets devoid of phylogenetic structure in the spatial distribution of species. Using these data sets, the consistency of Type I error rates of tests based on 10 metrics combined with nine null models is examined. 4. This study shows that most tests can become liberal (i.e. tests rejecting too often the null hypothesis that only neutral processes structured spatially the local community) when the randomization algorithm breaks down a structure in the original data set unrelated to the null hypothesis to test. Hence, when overall species abundances are distributed non-randomly across the phylogeny or when local abundances are spatially autocorrelated, better statistical performances were achieved by randomization algorithms preserving these structural features. The most reliable randomization algorithm consists of permuting species with similar abundances among the tips of the phylogenetic tree. One metric, RPD-DO, also proved to be robust under most simulated conditions using a variety of null models. 5. Synthesis. Given the suboptimal performances of several tests, attention must be paid to the testing procedures used in future studies. Guidelines are provided to help choosing an adequate test.