Article révisé par les pairs
Résumé : | The optimization of an information criterion in a variable selection procedure leads to an additional bias, which can be substantial for sparse, high-dimensional data. One can compensate for the bias by applying shrinkage while estimating within the selected models. This paper presents modified information criteria for use in variable selection and estimation without shrinkage. The analysis motivating the modified criteria follows two routes. The first, which we explore for signal-plus-noise observations only, proceeds by comparing estimators with and without shrinkage. The second, discussed for general regression models, describes the optimization or selection bias as a double-sided effect, which we call a mirror effect: among the numerous insignificant variables, those with large, noisy values appear more valuable than an arbitrary variable, while in fact they carry more noise than an arbitrary variable. The mirror effect is investigated for Akaike's information criterion and for Mallows' Cp, with special attention paid to the latter criterion as a stopping rule in a least-angle regression routine. The result is a new stopping rule, which focuses not on the quality of a lasso shrinkage selection but on the least-squares estimator without shrinkage within the same selection. © 2014 Biometrika Trust. |