Limiting the number of trees in random forests

Latinne, Patrice; Debeir, Olivier; Decaestecker, Christine

Citer

Limiting the number of trees in random forests

par Latinne, Patrice

;Debeir, Olivier

;Decaestecker, Christine

Référence Lecture notes in computer science, 2096, page (178-187)
Publication Publié, 2001

Article révisé par les pairs

Résumé :

The aim of this paper is to propose a simple procedure that a priori determines a minimum number of classifiers to combine in order to obtain a prediction accuracy level similar to the one obtained with the combination of larger ensembles. The procedure is based on the McNe- mar non-parametric test of significance. Knowing a priori the minimum size of the classifier ensemble giving the best prediction accuracy, constitutes a gain for time and memory costs especially for huge data bases and real-time applications. Here we applied this procedure to four multiple classifier systems with C4.5 decision tree (Breiman's Bagging, Ho's Random subspaces, their combination we labeled ‘Bagfs', and Breiman's Random forests) and five large benchmark data bases. It is worth noticing that the proposed procedure may easily be extended to other base learning algorithms than a decision tree as well. The experimental results showed that it is possible to limit significantly the number of trees. We also showed that the minimum number of trees required for obtaining the best prediction accuracy may vary from one classifier combination method to another.

Référencement	Visibilité	Pérennité	Facilité
Les publications encodées constituent la bibliographie académique de l'Université.	Les documents déposés sont indexés par les moteurs de recherche (Google Scholar,…).	Les documents déposés en open-access sont archivés au sein du réseau de préservation SAFE-PLN (www.safepln.org).	Les listes de publications sont compatibles avec le CV-ULB, le FNRS et accessibles sur le web.

Limiting the number of trees in random forests

Documents en relation

DI-fusion