par Marquis, Bastien ;Jansen, Maarten
Référence Springer Proceedings in Mathematics and Statistics, 339, page (357-365)
Publication Publié, 2020-09-01
Article révisé par les pairs
Résumé : In sparse high-dimensional data, the selection of a model can lead to an overestimation of the number of nonzero variables. Indeed, the use of an norm constraint while minimising the sum of squared residuals tempers the effects of false positives, thus they are more likely to be included in the model. On the other hand, an regularisation is a non-convex problem and finding its solution is a combinatorial challenge which becomes unfeasible for more than 50 variables. To overcome this situation, one can perform selection via an penalisation but estimate the selected components without shrinkage. This leads to an additional bias in the optimisation of an information criterion over the model size. Used as a stopping rule, this IC must be modified to take into account the deviation of the estimation with and without shrinkage. By looking into the difference between the prediction error and the expected Mallows’s Cp, previous work has analysed a correction for the optimisation bias and an expression can be found for a signal-plus-noise model given some assumptions. A focus on structured models, in particular, grouped variables, shows similar results, though the bias is noticeably reduced.