On the Books, led by the University of North Carolina at Chapel Hill University Libraries, is one of six projects in the first cohort of Collections as Data: Part to Whole, a project funded by the Andrew W. Mellon Foundation to foster the implementation and use of collections as data. On the Books will build on the products of the IMLS-funded project Ensuring Democracy through Digital Access to create a plain-text corpus of over one hundred years of North Carolina session laws and use text analysis methods to identify discoverable North Carolina Jim Crow laws. This presentation will survey some of the project’s major challenges. We used image analysis to exclude marginalia from the corpus and have developed a novel method for assessing Optical Character Recognition (OCR) accuracy. Time allowing, I can also discuss some of our preliminary text analysis results. Finally, we are working with humanities scholars to carefully communicate machine learning results and uncertainty in historical context.