Article révisé par les pairs
Résumé : The analysis of the empirical distribution of univariate data often includes the computation of location, scale, skewness, and tail-heaviness measures, which are estimates of specific parameters of the underlying population distribution. Several measures are available, but they differ by Gaussian efficiency, robustness regarding outliers, and meaning in the case of asymmetric distributions. In this article, we briefly compare, for each type of parameter (location, scale, skewness, and tail heaviness), the “classical” estimator based on (centered) moments of the empirical distribution, an estimator based on specific quantiles of the distribution, and an estimator based on pairwise comparisons of the observations. This last one always performs better than the other estimators, particularly in terms of robustness, but it requires a heavy computation time of an order of n2. Fortunately, as explained in Croux and Rousseeuw (1992, Computational Statistics 1: 411-428), the algorithm of Johnson and Mizoguchi (1978, SIAM Journal of Scientific Computing 7: 147-153) allows one to substantially reduce the computation time to an order of n log n and, hence, allows the use of robust estimators based on pairwise comparisons, even in very large datasets. This has motivated us to program this algorithm for Stata. In this article, we describe the algorithm and the associated commands. We also illustrate the computation of these robust estimators by involving them in a normality test of Jarque-Bera form (Jarque and Bera 1980, Economics Letters 6: 255-259; Brys, Hubert, and Struyf, 2008, Computational Statistics 23: 429-442) using real data.