Extracting with Style: Using Natural Language Processing to Generate Summaries of Rare Materials

In our collection, the majority of resources have a blank summary field (MARC 520), presenting an opportunity for us to improve resource discovery for patrons. Not only can a good resource summary help a patron decide if they want to access a resource without leaving the library website, but the summary field is an indexed field that can increase a resource’s discoverability in our search layer. (In house analysis of our bibliographic and usage data has shown that there is a correlation, not necessarily causation, between patron usage of a resource and its length of summary field.)

Using special collections documents here at Notre Dame, I extracted keywords based on co-appearances of words and generated summaries with Natural Language Processing methods for extractive summarization. I present samples of results from the project, quantitative analysis (via ROUGE metric, a common performance measure in NLP), and discuss further use for improving library records.

Presenter(s): Jeremiah Flannery


3:20 PM