The abundance of consumer-grade AI tools and services makes it seem easier than ever to bring machine learning to library production workflows for description. But while the tools have become easy to implement, it’s still very difficult to ensure usable, reliable outputs or to manage the risk of negative social impact. In this presentation we’ll share methodologies for assessing machine learning technologies within a framework of metadata quality and applying emerging best practices for accountable algorithms. Drawing from work we’ve been doing in the Audiovisual Metadata Platform Pilot Development (AMPPD) project, we’ll show you how we incorporated both quantitative and qualitative measures to evaluate accuracy and risk of open source and commercial tools for tasks such as speech-to-text transcription, named entity recognition, and video OCR. We’ll also share tools and techniques for engaging collection managers and other library staff in reviewing machine learning outputs and defining quality measures for targeted uses of this data.