Exploring large-scale digital archives - opportunities and limits to use unsupervised machine learning for the extraction of semantics

Van Hooland, Seth; Coeckelbergs, Mathias

doi:doi/10.1515/9783110430295-046

Citer

Exploring large-scale digital archives - opportunities and limits to use unsupervised machine learning for the extraction of semantics

par Van Hooland, Seth

;Coeckelbergs, Mathias

Référence Handbook of Digital Public History, De Gruyter, page (517-529)
Publication Publié, 2022-04

Partie d'ouvrage collectif

Résumé :

The current excitement in regards to machine learning has spurred enthusiasm amongst collection holders and historians alike to rely on algorithms to reduce the amount of manual labor required for management and appraisal of large volumes of non-structured archival content. The Digital Humanities and commercial archival software promote out-of-the-box tools for auto-classification, but is the adoption of machine learning as straightforward as it is currently presented in both the popular press and the Digital Humanities literature? This chapter brings a sense of pragmatism to the debate by giving an overview of both possibilities and limits of machine learning to extract semantics from large collections of digitized textual archives. Two methods have gained substantial popularity: Topic Modeling (TM) and Word Embeddings (WE). This chapter introduces these non-supervised machine learning methods to the community of historians, based on an experimental case-study of digitized archival holdings of the European Commission (EC).

Référencement	Visibilité	Pérennité	Facilité
Les publications encodées constituent la bibliographie académique de l'Université.	Les documents déposés sont indexés par les moteurs de recherche (Google Scholar,…).	Les documents déposés en open-access sont archivés au sein du réseau de préservation SAFE-PLN (www.safepln.org).	Les listes de publications sont compatibles avec le CV-ULB, le FNRS et accessibles sur le web.

Exploring large-scale digital archives - opportunities and limits to use unsupervised machine learning for the extraction of semantics

Documents en relation

DI-fusion