par de Valeriola, Sébastien
Référence Histoire & mesure, 35, 2, page (171-196)
Publication Publié, 2020-12
Article révisé par les pairs
Résumé : For a historian analysing a corpus of acts, each document must be examined to extract pertinent sets of information, such as the names of the protagonists, dates, amounts, etc. When the set of documents is large, this process can be problematic. In this article we present a methodology for semi-automatic analysis of such corpora using quantitative methods. In doing so, we focus on three steps in the process: The division of acts into sub-sections, the lemmatization of anthroponyms and the extraction of dates. We underline the importance of human intervention after completion of the automatic process, an essential step in any analytical system of this type.