par Coeckelbergs, Mathias ;Van Hooland, Seth
Référence Lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 319 LNICST, page (41-52)
Publication Publié, 2020-01-01
Article révisé par les pairs
Résumé : Within the field of Digital Humanities, unsupervised machine learning techniques such as topic modeling have gained a lot of attention over the last years to explore vast volumes of non-structured textual data. Even if this technique is useful to capture recurring themes across document sets which have no metadata, the interpretation of topics has been consistently highlighted in the literature as problematic. This paper proposes a novel method based on Word Embeddings to facilitate the interpretation of terms which constituted a topic, allowing to discern different concepts automatically within a topic. In order to demonstrate this method, the paper uses the “Cabinet Papers” held and digitised by the The National Archives (TNA) of the United Kingdom (UK). After a discussion of our results, based on coherence measures, we provide details of how we can linguistically interpret these results.