Mining User Queries with Information Extraction Methods and Linked Data

Chardonnens, Anne; Rizza, Ettore; Coeckelbergs, Mathias; Van Hooland, Seth

doi:doi/10.1108/JD-09-2017-0133

Citer

Mining User Queries with Information Extraction Methods and Linked Data

par Chardonnens, Anne

;Rizza, Ettore

;Coeckelbergs, Mathias

;Van Hooland, Seth

Référence Journal of Documentation, 74, 5, page (936-950)
Publication Publié, 2018

Article révisé par les pairs

Résumé :

Purpose: Advanced usage of web analytics tools allows to capture the content of user queries. Despite their relevant nature, the manual analysis of large volumes of user queries is problematic. The purpose of this paper is to address the problem of named entity recognition in digital library user queries. Design/methodology/approach: The paper presents a large-scale case study conducted at the Royal Library of Belgium in its online historical newspapers platform BelgicaPress. The object of the study is a data set of 83,854 queries resulting from 29,812 visits over a 12-month period. By making use of information extraction methods, knowledge bases (KBs) and various authority files, this paper presents the possibilities and limits to identify what percentage of end users are looking for person and place names. Findings: Based on a quantitative assessment, the method can successfully identify the majority of person and place names from user queries. Due to the specific character of user queries and the nature of the KBs used, a limited amount of queries remained too ambiguous to be treated in an automated manner. Originality/value: This paper demonstrates in an empirical manner how user queries can be extracted from a web analytics tool and how named entities can then be mapped with KBs and authority files, in order to facilitate automated analysis of their content. Methods and tools used are generalisable and can be reused by other collection holders.

Référencement	Visibilité	Pérennité	Facilité
Les publications encodées constituent la bibliographie académique de l'Université.	Les documents déposés sont indexés par les moteurs de recherche (Google Scholar,…).	Les documents déposés en open-access sont archivés au sein du réseau de préservation SAFE-PLN (www.safepln.org).	Les listes de publications sont compatibles avec le CV-ULB, le FNRS et accessibles sur le web.

Mining User Queries with Information Extraction Methods and Linked Data

Documents en relation

DI-fusion