Semi-supervised Classification and Betweenness: Centrality Computation on Large, Sparse, Graphs

Mantrach, Amin; van Zeebroeck, Nicolas; Francq, Pascal; Shimbo, Masashi; Bersini, Hugues; Saerens, Marco

doi:doi/10.1016/j.patcog.2010.11.019

Citer

Semi-supervised Classification and Betweenness: Centrality Computation on Large, Sparse, Graphs

par Mantrach, Amin

;van Zeebroeck, Nicolas

;Francq, Pascal

;Shimbo, Masashi ;Bersini, Hugues

;Saerens, Marco

Référence Pattern recognition, 44, 6, page (1212-1224)
Publication Publié, 2011

Article révisé par les pairs

Résumé :

This work addresses graph-based semi-supervised classification and betweenness computation in large, sparse, networks (several millions of nodes). The objective of semi-supervised classification is to assign a label to unlabeled nodes using the whole topology of the graph and the labeling at our disposal. Two approaches are developed to avoid explicit computation of pairwise proximity between the nodes of the graph, which would be impractical for graphs containing millions of nodes. The first approach directly computes, for each class, the sum of the similarities between the nodes to classify and the labeled nodes of the class, as suggested initially in [1,2]. Along this approach, two algorithms exploiting different state-of-the-art kernels on a graph are developed. The same strategy can also be used in order to compute a betweenness measure. The second approach works on a trellis structure built from biased random walks on the graph, extending an idea introduced in [3]. These random walks allow to define a biased bounded betweenness for the nodes of interest, defined separately for each class. All the proposed algorithms have a linear computing time in the number of edges while providing good results, and hence are applicable to large sparse networks. They are empirically validated on medium-size standard data sets and are shown to be competitive with state-of-the-art techniques. Finally, we processed a novel data set, which is made available for benchmarking, for multi-class classification in a large network: the U.S. patents citation network containing 3M nodes (of six different classes) and 38M edges. The three proposed algorithms achieve competitive results (around 85% classification rate) on this large networkthey classify the unlabeled nodes within a few minutes on a standard workstation. © 2010 Elsevier Ltd. All rights reserved.

Référencement	Visibilité	Pérennité	Facilité
Les publications encodées constituent la bibliographie académique de l'Université.	Les documents déposés sont indexés par les moteurs de recherche (Google Scholar,…).	Les documents déposés en open-access sont archivés au sein du réseau de préservation SAFE-PLN (www.safepln.org).	Les listes de publications sont compatibles avec le CV-ULB, le FNRS et accessibles sur le web.

Semi-supervised Classification and Betweenness: Centrality Computation on Large, Sparse, Graphs

Documents en relation

DI-fusion