par Hubain, Raphael ;Van Hooland, Seth ;De Wilde, Max
Référence Cataloging & classification quarterly
Publication Publié, 2016-08-03
Article révisé par les pairs
Résumé : Ensuring quick and consistent access to large collections of unstructured documents is one of the biggest challenges facing knowledge-intensive organizations. Designing specific vocabularies to index and retrieve documents is often deemed too expensive, full-text search being preferred despite its known limitations. However, the process of creating controlled vocabularies can be partly automated thanks to natural language processing and machine learning techniques. With a case study from the biopharmaceutical industry, we demonstrate how small organizations can use an automated workflow in order to create a controlled vocabulary to index unstructured documents in a semantically meaningful way.