Concepts in topics. using word embeddings to leverage the outcomes of topic modeling for the exploration of digitized archival collections

Coeckelbergs, Mathias; Van Hooland, Seth

doi:doi/10.1007/978-3-030-50072-6_4

Citer

Concepts in topics. using word embeddings to leverage the outcomes of topic modeling for the exploration of digitized archival collections

par Coeckelbergs, Mathias

;Van Hooland, Seth

Référence Lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 319 LNICST, page (41-52)
Publication Publié, 2020-01-01

Article révisé par les pairs

Résumé :

Within the field of Digital Humanities, unsupervised machine learning techniques such as topic modeling have gained a lot of attention over the last years to explore vast volumes of non-structured textual data. Even if this technique is useful to capture recurring themes across document sets which have no metadata, the interpretation of topics has been consistently highlighted in the literature as problematic. This paper proposes a novel method based on Word Embeddings to facilitate the interpretation of terms which constituted a topic, allowing to discern different concepts automatically within a topic. In order to demonstrate this method, the paper uses the “Cabinet Papers” held and digitised by the The National Archives (TNA) of the United Kingdom (UK). After a discussion of our results, based on coherence measures, we provide details of how we can linguistically interpret these results.

Référencement	Visibilité	Pérennité	Facilité
Les publications encodées constituent la bibliographie académique de l'Université.	Les documents déposés sont indexés par les moteurs de recherche (Google Scholar,…).	Les documents déposés en open-access sont archivés au sein du réseau de préservation SAFE-PLN (www.safepln.org).	Les listes de publications sont compatibles avec le CV-ULB, le FNRS et accessibles sur le web.

Concepts in topics. using word embeddings to leverage the outcomes of topic modeling for the exploration of digitized archival collections

Documents en relation

DI-fusion