Article révisé par les pairs
Résumé : Summary: The Bayesian inference of demographic parameters under an Isolation-Migration (IM) model of population evolution offers a major improvement over previously available approaches. This method is implemented in a popular program, IMa, widely used in population genetic studies. While the robustness of the method to deviations of the IM model has previously been evaluated, we assess the performance of the program with two populations when the model used to generate the analysed data meets the assumptions of the IM model completely; the goal is to identify the conditions under which the method works best. Overall, we test eighteen sets of conditions and analyse ± 500 simulated data sets, for a total of over 200,000 hours of analyses using a large computer cluster. Although we find clear differences in quality estimates among models, the best ranges of demographic parameter values to infer accurate estimates differ among parameters. Divergence time is best estimated in the absence of gene flow and when population sizes are large compared to divergence time. In contrast, the classic population parameter θ{symbol} (= 4Nμ) is best estimated, for the two current populations, when divergence time is large compared to population size, with or without migration. The parameter is always poorly estimated in the case of the ancestral population. While it is possible to distinguish between scenarios with or without gene flow, estimating the extent of gene flow, when different from 0, is associated with relatively high error rates. In general, increasing the number of loci or the sample size reduces the variance and credible interval of the estimates, and only for the migration rate, it slightly improves the accuracy of the estimate as well. Increasing the prior distribution range of a parameter can dramatically increase that of its posterior distribution. Surprisingly, differences are highlighted among the estimates inferred from sequences generated by different simulation programs, especially for the simulation program SIMDIV. Overall, the performances of the method shown here probably reflect the limitation of the method in general and/or of the historical information contained in DNA sequence data.