Introduction to pattern mining

Calders, Toon

doi:doi/10.1007/978-3-319-05461-2_1

Citer

Introduction to pattern mining

par Calders, Toon

Référence Lecture Notes in Business Information Processing, 172 LNBIP, page (1-32)
Publication Publié, 2014

Article révisé par les pairs

Résumé :

We present an overview of data mining techniques for extracting knowledge from large databases with a special emphasis on the unsupervised technique pattern mining. Pattern mining is often defined as the automatic search for interesting patterns and regularities in large databases. In practise this definition most often comes down to listing all patterns that exceed a user-defined threshold for a fixed interestingness measure. The simplest such problem is that of listing all frequent itemsets: given a database of sets, called transactions, list all sets of items that are subset of at least a given number of the transactions. We revisit the two main strategies for mining all frequent itemsets: the breadth-first Apriori algorithm and the depth-first FPGrowth, after which we show what are the main issues when extending to more complex patterns such as listing all frequent subsequences or subgraphs. In the second part of the paper we then look into the pattern explosion problem. Due to redundancy among patterns, most often the list of all patterns satisfying the frequency thresholds is so large that post-processing is required to extract useful information from them. We give an overview of some recent techniques to reduce the redundancy in pattern collections using statistical methods to model the expectation of a user given background knowledge on the one hand, and the minimal description length principle on the other. © Springer International Publishing Switzerland 2014.

Référencement	Visibilité	Pérennité	Facilité
Les publications encodées constituent la bibliographie académique de l'Université.	Les documents déposés sont indexés par les moteurs de recherche (Google Scholar,…).	Les documents déposés en open-access sont archivés au sein du réseau de préservation SAFE-PLN (www.safepln.org).	Les listes de publications sont compatibles avec le CV-ULB, le FNRS et accessibles sur le web.

Introduction to pattern mining

Documents en relation

DI-fusion