par Hubain, Raphael
;Van Hooland, Seth
;De Wilde, Max 
Référence Cataloging & classification quarterly
Publication Publié, 2016-08-03



Référence Cataloging & classification quarterly
Publication Publié, 2016-08-03
Article révisé par les pairs
Résumé : | Ensuring quick and consistent access to large collections of unstructured documents is one of the biggest challenges facing knowledge-intensive organizations. Designing specific vocabularies to index and retrieve documents is often deemed too expensive, full-text search being preferred despite its known limitations. However, the process of creating controlled vocabularies can be partly automated thanks to natural language processing and machine learning techniques. With a case study from the biopharmaceutical industry, we demonstrate how small organizations can use an automated workflow in order to create a controlled vocabulary to index unstructured documents in a semantically meaningful way. |