Publication dans des actes
Résumé : Hellinger Distance Decision Trees [10] (HDDT) has been previously used for static datasets with skewed distributions. In unbalanced data streams, state-of-the-art techniques use instance propagation and standard decision trees (e.g. C4.5 [27]) to cope with the unbalanced problem. However it is not always possible to revisit/store old instances of a stream. In this paper we show how HDDT can be successfully applied in unbalanced and evolving stream data. Using HDDT allows us to remove instance propagations between batches with several benefits: i) improved predictive accuracy ii) speed iii) single-pass through the data. We use a Hellinger weighted ensemble of HDDTs to combat concept drift and increase accuracy of single classifiers. We test our framework on several streaming datasets with unbalanced classes and concept drift.