Proposed Conference Talks

In the past year, much debate arose on the Code4Lib mailing list about a survey effort regarding sexual harassment in libraries. While multiple discussions took place, one key theme concerned the sensitive nature of the topic and whether the researcher’s practices properly protected and respected the respondents and their stories. This talk encourages such research on sensitive topics and vulnerable populations, with the hope that more research and healthy discussions will occur. However, this recommendation comes with the caveat of using ethical best practices to protect those involved. Insights from human subjects research, internal review board procedures, journalism ethics, and more will be shared so that the next time a mailing list is giving a survey, you will know what information to additionally ask for going forward.

We are all visualizing data these days, but what happens when you listen to it? At our library we are exploring a large data set of scholarly publications by attempting to represent its metadata sonically and musically. In the process we are learning to collaborate with our campus Center for High Throughput Computing and bridge the worlds of data science and arts and humanities research.

Our campus libraries serve as the local data custodians for a service that enables faculty, staff, and students to explore a raw data dump of the Clarivate Web of Science database. This presentation will talk about our efforts to establish some basic access patterns to facilitate use by researchers and the challenges of working with a quantity of data that far exceeds anything we have managed to date. The data set consists of more than 60M publications, including their cited references, and is stored as 2-3TB of XML data. We will discuss using big data processing techniques on the Condor/Open Science Grid high throughput computing cluster and provide an overview of basic parallelization techniques for operating on a large data graph. Finally, this presentation will touch upon an extracurricular project that attempts to express metadata as music as a way to encourage innovative research approaches and provide insights into how we can establish more comprehensive data services for researchers on campus.

The Public Knowledge Project Documentation Interest Group (DIG) hosts bi-weekly sprints to work on documentation for open-source scholarly publishing software used across libraries internationally. In this presentation I’ll talk about our processes, what we’ve learned as we’ve built out our documentation, and ideas for forming and sustaining a successful documentation initiative alongside a changing codebase.

Systems reflect the people and organizations with the privilege to build them. If our libraries and museums are not neutral, then the applications and services we create and maintain cannot be neutral. Our values, our biases, and our histories manifest in our open source projects. Both the individuals with the power to make local technology decisions, as well as the organizations with the resources to influence collaborative projects, impact global users and shape communities in myriad ways. Embracing an intentional and caring practice is as essential to the health of our projects as functional code—and without both, we put our digital cultural heritage at risk. This presentation serves as a starting point for a reflective and challenging dialogue spanning those who code and those who care, and building shared understanding of the ways race, gender, and wealth come to bear on our products and communities. With this common knowledge, we can seek to build systems that truly deliver on the social mission of libraries.

On the Books, led by the University of North Carolina at Chapel Hill University Libraries, is one of six projects in the first cohort of Collections as Data: Part to Whole, a project funded by the Andrew W. Mellon Foundation to foster the implementation and use of collections as data. On the Books will build on the products of the IMLS-funded project Ensuring Democracy through Digital Access to create a plain-text corpus of over one hundred years of North Carolina session laws and use text analysis methods to identify discoverable North Carolina Jim Crow laws. This presentation will survey some of the project’s major challenges. We used image analysis to exclude marginalia from the corpus and have developed a novel method for assessing Optical Character Recognition (OCR) accuracy. Time allowing, I can also discuss some of our preliminary text analysis results. Finally, we are working with humanities scholars to carefully communicate machine learning results and uncertainty in historical context.

The Brazilian Institute of Museums is responsible for the direct management of 30 museums. It contains an estimated cultural heritage collection of over 200,000 items representing different cultural and artistic themes. Starting in 2016, the Institute started a project for the reorganization, treatment and dissemination of its museum documentation, the Tainacan project, having already released more than 30,000 items for public access. The project consists of two fronts: application of data science principles in the treatment of museum documentation and the organization of digital repositories for information dissemination and management. The technology used for the project is the development of a plugin called Tainacan based on Wordpress software (https://wordpress.org/plugins/tainacan/). The plugin has a very flexible conception for configuring metadata, filters, facets, taxonomies, among other elements necessary for the organization and management of collections. In addition to providing an installation for each museum, the project has also invested in research on repository aggregation and the design of an integrated search tool to facilitate the user experience. The project has used the Linked.art conceptual model, based on CIDOC-CRM, as a proposal for an ontology of museum data aggregation.

“Has a VPAT been completed for Hyku?” This question came from a potential adopter of Hyku - a Samvera digital repository solution - and opened the rabbit hole of accessibility for me, the next step of which was to google “VPAT.” This entry to accessibility has informed my understanding of software and some of my goals for my company. Creating accessible applications is socially responsible, and implementing these practices can help us set better expectations as community leaders. Samvera is a community dedicated to preservation and to providing access to a broad range of assets, and this access should not be limited by a lack of robust UX development that includes accessibility features. But there are challenges to developing features for a community-driven, open source software like Hyku, when any contribution is either volunteered or sponsored. Starting with an explanation of auditing tools (VPAT) and accessibility standards (WCAG), I’ll assess the state of accessibility in Hyku. I’ll also summarize the results of a preliminary audit of Hyku, discuss features that could provide significant value to digital repositories, and the challenges of bringing this work to fruition.

Many aspects of librarianship have been automated to one degree or another. Now, in a time of “big data”, is it possible to go beyond mere automation and towards the more intelligent use of computers; the use of algorithms and machine learning is an integral part of future library collection building and service provision. To make the point, this presentation first compares & contrasts library automation with the “intelligent” use of computers. Next, this presentation introduces an application/system called the Distant Reader which puts some of these ideas into practice. The Distant Reader is a high performance computing system taking an arbitrary amount of unstructured data (text) as input, and outputs sets of structured data for analysis, use, and understanding – reading. The Distant Reader “reads” and make sense of dozens of websites, hundreds of books, or thousands of journal articles. Thus, the Reader enables researchers to observe trends, uncover patterns & anomalies, and thoroughly grasp nuances in a given collection; the Distant Reader addresses the proverbial problem of “drinking from a firehose”. This presentation outlines how & why.

As cultural heritage work becomes increasingly defined by code–code as artifact to be preserved, code as infrastructure for preservation–we need new models for sustaining code that is 4 libraries. In this collaborative talk, representatives with diverse job descriptions from across the IMLS FCoP Software Preservation and Emulation cohort (https://www.softwarepreservationnetwork.org/fcop/) will discuss what “software preservation” meant to them at the start of their projects and how working in a cohort has challenged and inspired those meanings to evolve. Team members, from undergraduate student software developer to oral history archivist, project manager to museum software preservationist, will explore the role of communication and community in preserving software and code, how overlapping library code cohorts can support each other, and ways of integrating long-term thinking about software maintenance into existing cultural heritage ecosystems. They’ll explore lessons learned and future opportunities for cohorts as a model of maintaining library technology work.

Archipelago is a new open source repository solution developed at the Metropolitan New York Library Council (METRO). Archipelago is designed with an open metadata schema, with a system in place that learns from you as you describe your collections however you like. Because the architecture was born at our multi-type library consortium, it was built with multi-tenancy in mind from day one. In this presentation, lead architect and developer Diego Pino will show you around Archipelago and set you up to dream big about the future of open repositories.

The adoption of open source platforms continues to grow. According to Library Systems Report, 2019, Koha and Evergreen are still big players where services-based companies offer support for implementation and maintenance. Further, these open source ILSes tend to be implemented at small to midsized institutions with few large institutions according to this report. For academic libraries, the proprietary Ex Libris’ Alma dominates the market. Is it possible to implement an open source LSP that meets the need of a large academic library and a consortium to boot? FOLIO LSP has already been implemented at a small academic library, Chalmers in Sweden. Early adopters stand ready to go-live in 2020. And the Five Colleges Consortium is one of the first consortium early adopters. This presentation will highlight what it means to be an early adopter at the University of Massachusetts as a Five Colleges Consortium member.

Liaison librarians want to know about new faculty publications – for acquisition, promotion, or relationship building – but faculty don’t always keep their librarians in the loop. While tools like Scopus can create a feed of journal publications by affiliated authors, new books are a more slippery beast. Without constant checking, a new book might already be on the shelf before the librarian knows about it, or even remain unpurchased. This presentation describes a solution using WorldCat’s API with R that continually updates librarians on forthcoming faculty books, usually well before the publication date. The results can be easily imported into a citation manager to keep a running archive.

Libraries have long positioned themselves as champion of intellectual freedom, but how has technology changed intellectual freedom? While the internet and other communication technologies give the illusion that information is more free than ever before, the logic of capitalism means that knowledge and information that carry market value have become more commodified, protected, and controlled. We are moving toward a world where on one side there is an avalanche of individualized, extreme, and possibly useless information and on the other there is tightly restricted, patented information hidden behind impossibly complex encryption software and legal frameworks. On many online platforms, intellectual freedom has become the freedom for the loudest and most disturbing voices to emerge as victorious in the “marketplace of ideas”. In addition, the algorithms of the technological tools we use are not neutral – capitalism underpins these algorithms and profit is almost always the goal. Can libraries reject the false choice between protecting the worst of speech and mediating access to commodified knowledge? How can we actively participate in questioning the way that information is produced, circulated, and acknowledged within our technocapitalist society?

“Print & Probability” is an interdisciplinary and inter-institutional project to develop new techniques for visual anomaly detection in the OCR of early printed books. By detecting damaged letterforms that create consistent aberrations, the project aims to allow direct inference of letterpress printers at scale.

This presentation will detail the unique data management issues that the resulting 13 billion+ character images present, and how CMU Libraries is strategizing to publish extracts of these data that are both sustainable and usable. The team’s research software engineer will outline the design and technologies behind their management pipeline: a REST API interface to a database managed at the Pittsburgh Supercomputing Center to store and filter image data and metadata from the automated extraction pipeline; and a Vue JS-based web interface to assess results and provide new annotations for model training. Finally, this talk will present plans to distill this massive research database into a data deposit of interest to computer scientists and digital humanities researchers, as well as a sustainable static site that presents a human-and-machine-curated collection of distinctive early type usable by historians and librarians of rare books.

Automated tasks and batch processes are integral components of successful, efficient metadata workflows. Creating and cleaning metadata for user-submitted content, like electronic theses and dissertations (ETDs), can prove time-consuming and overwhelming small metadata teams. At Oklahoma State University (OSU), Digital Resources and Discovery Services has expedited the metadata creation and cleanup process for ETDs by incorporating tools and software that automate batch processing into their workflow. ProQuest’s UMI Dissertation Publishing helps stipulate required metadata fields upon content submission, producing consistently-tagged metadata as XML files. Two XSLT transformations are run to sort out restricted submissions, extract select metadata from individual files, and merge this metadata into a single XML file. This batch file is then uploaded to OpenRefine for mass editing and final formatting, before export as a .csv file. Manual effort of the metadata team is whittled down to verifying accuracy and completeness of metadata before upload to the institutional repository. This presentation will illustrate the success and efficiency this combination of tools and software has brought the OSU metadata team by producing good quality metadata in minimal time.

Data cleanup during a migration is more than an opportunity to standardize data practices, it is also a necessary step to ensure the data will be ingested by the new system. This presentation will therefore outline the data cleanup process undertaken during two major system migrations at McGill University, including shared challenges, themes, and the lessons learned from this work. In 2019, McGill University Library underwent two major system migrations: the library’s integrated library system (ILS) and its institutional repository (IR), and I was heavily involved in both migrations. In total, over 4 million items were migrated. I will discuss challenges that were common to both migrations, as well as issues that were unique to each.

The migration of our ILS from Aleph (Ex-Libris) to Worldshare (OCLC) had the advantage of retaining similar data structures, MARC records for example. However, the scale of the migration and the need to meet the vendor’s requirements meant adjusting the data and presenting creative solutions to retain data that couldn’t be ingested by the system.

Conversely, the biggest data challenge faced during the migration of the repository to an open-source solution (Samvera) was the structure of the data going from a relational database to linked data.

Much of what makes documents accessible to people with disabilities and their assistive technologies boils down to the file format. Every format comes with benefits and drawbacks when it comes to accessibility. Which format or formats should libraries choose to best benefit their patrons and the needs of archives and discovery? This talk will cover the vagaries of different formats, including providing insights to popular formats such as Markdown (avoid!) and the potential of the relatively new accessibility features in EPUB3 that may one day permit moving away from PDFs.

The J. Paul Getty Trust has been publishing images over the IIIF protocol in its J.P. Getty Museum Collection website, as well as for selected projects of the Getty Research Institute for some time.

During the last year a thorough refactoring of the infrastructure supporting the production and delivery of IIIF media has been carried out. Aside from a performance speedup, the end users will barely notice a difference. Where is the value of a full year of refactoring efforts?

This presentation will describe the strategy behind the Getty’s expansion and structuring of its IT infrastructure to support upcoming digitization projects from major acquisitions, some of which consisting of millions of digital objects.

The challenges posed by the increasing scale and complexity of the Getty’s archives requires highly coordinated and self-sufficient systems, where manual labor is shifted from handling and fixing of one-off data points to optimization of systems and software that automates as many repetitive tasks as possible.

Software architecture, error handling, and system deployment will be covered.

This talk will discuss an implementation of the Measure the Future project. Measure the Future, currently in public beta, is designed to allow library workers to generate heat maps of their public spaces, collecting more robust space usage data than is possible with gate counts, and generating new questions about how our spaces are used. As we are in the process of significantly reimagining and redesigning our building, we are particularly interested in how users are interacting with both our brand-new and existing spaces. This presentation will cover our implementation steps and the results of our data collection. We’ll include: our project goals, timeline to completion, overall staff commitment and skill sets required, mistakes we made and lessons learned, what new questions we’ve generated, and our next steps.

The presenters will outline efforts to create a unified discovery platform for digital collections from libraries, archives, and museums. They will guide attendees through the process of designing harmonized metadata profiles for these related but very different domains, including solutions for conciliating controlled vocabularies. They will discuss the software stack’s approach to moving the harmonized data from varied back-end systems to a singular point from which they build out multiple sites leveraging cloud services and open source technologies. Overall, presenters will describe a comprehensive plan for unifying diverse digital collections to allow for serendipitous discovery of cultural heritage collections.

The stock version of Innovative’s WebPAC Pro OPAC is built according to outdated web standards. With a small number of tweaks and integration of the open source front-end Bootstrap framework, WebPac pro can be customized to be responsive and mobile friendly. This session will highlight how WebPac Pro is designed, the steps developers should take to identify areas for improvement, and a brief overview of the process to integrate bootstrap into the existing structure.

Abes is a public institution in France which provides french academic libraries with tools and digital services. Since our coworkers in the libraries are our main users, we would like to leverage the fact of being a central institution to offer a way to collaborate with them more efficiently on a technical level. To achieve this, we are setting up a co-development model that takes advantage of Github and open source. Our idea is to create a space which allows contribution to our projects and applications. We want to show how we build them. We also provide APIs for librarians to interact with our apps. But this space is meant for librarians as well : they can let others know about their own projects and share them with the whole community. In this presentation, I will explain how we are currently putting this process in motion.

When a user visits the library website, they typically have a specific information need. We offer users a host of tools and resources to resolve that need. However, there’s often a fundamental misalignment between our resource-focused approach to wayfinding on the website (“databases,” “course tools,” “books,” “special collections”) and the topic-focused way that users typically frame their need. In addition, those tools and resources are dispersed among data silos such as the website, catalog, and digital collections repository, making it difficult to discover resources without intentionally navigating to specific layers of the library’s discovery ecosystem.

This presentation outlines an ongoing web discovery project to align the library’s web platforms with current user information-seeking behavior. This project will create a more seamless discovery experience across the library’s web platforms by contextually exposing the full range of library offerings throughout all stages of the online user journey. We will share early inspiration and research that underpin the project, including demonstrating low-fidelity prototypes; research on user discovery needs; the potential of technologies, such as taxonomies and knowledge graphs, to harmonize metadata across platforms; and strategies for surveying metadata architecture to form a foundation on which to build the project.

Ohio Wesleyan University Libraries recently retooled its digital preservation strategy to incorporate a local Network Attached Storage (NAS) drive and a shared instance of DuraCloud. Our goal was to automate the process of backing up assets in our digital collections to cloud storage while also a) maintaining the unique file structures and dynamic between master and derivative copies in our various collections and b) curating a subset of these assets to be backed up to DuraCloud in an effort to keep storage costs down. We relied on command line control of DuraCloud’s SyncTool and the Windows Task Scheduler to accomplish these goals. This presentation will cover our assessment of the SyncTool’s capabilities and walk through the scripts used to automate the preservation storage of our varied collections.

This presentation will discuss two open source tools for creating and using annotations on images. One of these tools allows users to create annotations without requiring institutional resources or an installed annotation server. The second tool is a JavaScript library for displaying and using annotations. The library allows for the use of annotations for display and storytelling purposes. This rich display of annotations demonstrates the reuse value of annotations and provides the opportunity for new forms of scholarly output. This presentation will introduce annotations and IIIF (for which the tools are optimized), demonstrate the low barrier of entry to using the tools, challenges around creating and using annotations from multiple data models, potential use cases, future development opportunities, and issues of using annotations as scholarly output.

Jonathan Hamilt, co-founder of Drag Queen Story Hour shares the international non-profit’s herstory and its mission of giving kids glamorous, positive, and unabashedly queer role models to see people who defy rigid gender restrictions and imagine a world where people can present as they wish.

A migration project that does not take into consideration the people and resources necessary to maintain the system post-migration is incomplete. But the people making decisions about migrations are not necessarily the same people that will be responsible for maintenance. How can we best involve, value, and consider the perspective of staff responsible for maintenance in our migration projects? If a migration is the “shiny object”, how do we equally prepare for the (hopefully) much longer period of maintenance that will come after? This talk will reflect on lessons learned from multiple migration projects and how the plans for future maintainability did or did not work out.

What do you do when you’re getting DDOSed? I’ll go over how we discovered we had a problem; how we diagnosed it; steps we took to resolve the problem; and all the things we learned along the way. It’s one part adventure story – debugging in prod! possible state actors! – and one part practical incident response manual. While some tools we used are specific to Rails, most of what we learned is applicable in any environment. Come learn how to keep your site up when the Internet is trying to take you down.

The collective intelligence of teams improves when everyone can contribute. Unfortunately, factors like impostor syndrome, Dunning-Kruger syndrome, and organizational power structures can prevent some voices being heard. Design Sprints provide a decision making framework in which the team can work together toward finding the best solution instead of deferring to the loudest voices. Design Sprints can also strengthen and democratize the products built by software teams. This talk will cover design sprint structure, report on a real world example, and describe how this technique can be incorporated into an agile project plan to align the team’s vision of what will be produced and why.

DPLA is building an intelligent recommendation system for our online collection of approximately 35 million metadata records. The project has two goals: First, to give our users a tool for serendipitous discovery; and second, to test our ability to integrate both the derivation and use of learned intelligence into our production system. Advances in hardware and software make machine learning easier and more affordable than it was even a decade ago. Yet, bringing a machine learning project to production is not only a matter of acquiring bigger, faster computers. It also involves interoperating with existing tools and workflows, deliberate deployment of human oversight, balancing speed efficiency with the quality of intelligence, and designing systems that are responsive to change. Sharing my experiences will help others in the community anticipate and prepare for similar challenges in their own efforts to implement, sustain, and scale machine learning projects.

Starling is a free and open-source application for decentralized storage that is designed specifically for use in archival settings where the ability to demonstrate the authenticity of a file over the course of time is of paramount importance. Built on Filecoin, the tool utilizes powerful cryptography and mathematical protocols to ensure that any files you store with Starling are kept safe and uncorrupted.

This talk will cover: decentralized storage technologies and their applications and challenges in cultural heritage settings; how Starling can be implemented by memory institutions as a digital preservation solution citing real-world use cases; and what is coming up next for the project. If time permits, the presentation will include a demo of storing files with Starling.

While the number of materials available through libraries has grown exponentially, the work of providing discovery of and access to these materials has become an increasingly-complex number of systems and interactions to manage. Although the systems used in public and academic libraries often differ, we share many of the same concerns. I will provide a brief historical introduction to the current landscape, share profiles of some of the most common combinations of systems, and discuss the financial and affective consequences of the choices we make and implement.

What are the labor and maintenance burdens of some of the most popular systems in the current catalog/discovery landscape? How did we get here? How do we communicate the complexity of maintaining and improving these systems to the people who evaluate and fund our work?

To replace our previous sufia-based digital collections app at our small independent academic institution, we decided to write a custom application taking a somewhat new Rails-based approach. Attempting to provide just enough just enough ‘extra’ abstraction to easily support common digital repository patterns, while still being amenable to developing in standard Rails patterns, and flexible enough for less standard ways people develop Rails too, our app is now live. The architecture involves ActiveRecord; postgres jsonb; shrine for file attachment handling; some generic HTML form support for complex/repeatable fields; and concise solr indexing support with traject. The architecture is designed with performant DB query patterns in mind from the start, and to let you use the full range of existing Rails and ActiveRecord features including associations and eager loading, and be compatible with most existing Rails-based gems. Some support for common digital collections patterns has been extracted into a shareable gem, to make it easier to build additional applications along these lines without reinventing, but only where the cost of maintaining/designing the shared code seemed justified. This talk will provide a tour of the architecture by showing sample code, and some reflections on it.

This talk will explore the constellation of non-technical aspects involved in a huge technical project: the replacement of a highly complex and aging statewide library directory and continuing education calendar platform. Without the in-house resources or dedicated staff to do the development ourselves, we’ve had to get creative in order to deliver on astronomical expectations. From the politics of securing buy-in and state funding, to project management with a cross-functional team, managing a Request for Proposals process, working with an outside vendor, wrangling stakeholders in the thousands, managing expectations, planning for data cleanup and migration, and supporting the existing platform along the way, I’ll share how we’re making the seemingly impossible happen.

Digital exhibitions facilitate collaborations and exchanges between libraries, archives, and museums, and they also provide opportunities to examine different approaches to collections and metadata. At Code4Lib, I’d love to talk about my experience as the Mellon Postdoctoral Scholar in Library-Museum Collaboration at the University of Oregon and the ways in which technology has helped us to create a blueprint of library-museum collaboration for years to come. Although there are plenty of hurdles in terms of objects and protocols, our innovative and adaptable approach to inclusive digital inquiry has allowed us to showcase a diverse range of faculty research and to improve student engagement with digital assets as we expand possibilities for the greater UO community. Current projects include a digital exhibition-podcast bundle on the history of psychiatric care in Portland and using high-res images and new dating techniques to study fragile medieval Japanese calligraphic fragments.

This presentation will cover a new project that USC Digital Library (USCDL) is undertaking. USCDL is using AWS Glacier Deep Archive as an inexpensive solution to store over 250TB of assets. We are developing the tools necessary to store the assets from our digitization lab servers to the AWS Deep Archive as well a web interface to request the retrievals. In addition to the technology, we are figuring out various aspects of this project including workflow, archive fie limits, metadata needs and optimal parameters to keep costs low.

With small or non-existent IT departments, many libraries rely on service providers to install and manage software for their organizations, so that staff can focus their time and energy on patrons and day-to-day activities. But how can a library determine if their service provider is following best practices or if a provider’s infrastructure is secure? What questions should organizations ask to determine what (if any) privacy and security standards a service provider follows and if they will align with the library’s IT requirements?

Libraries should treat externally managed services like an extension of their library and never assume that service providers are automatically following best practices. This session will discuss some of the most important questions a library can ask a potential or current service provider. It is never to late to conduct due diligence and protect the integrity of your collections.

Project Reshare (https://projectreshare.org/) is an initiative that seeks to adapt the FOLIO architecture for inter-library loan and resource sharing. This project brings library consortia, institutions, vendors and institutional developers together in a standards-based architecture (ISO 18626, Z39.50, NCIP, and OpenURL) to support resource sharing between vended library service platforms and integrated library solutions. The service uses vuFind as a union catalog that patrons can browse and request materials, but we can also use commercial products such as Ex Libris Primo or the Ebsco Discovery Service to serve as the bibliographic shared index. The talk will coincide with a beta release of the software. Our goal is to show what has been built so far - the shared index, the vuFind discovery layer, the communication components to manage and supply requests, and the roadmap for a the next set of features. Ideally, this talk will underscore the need for commercial vendors and open-source projects to be using NISO standards to support library software choice, innovation and inter-operability.

During these years in the University Libraries, we continue moving our existing services and new applications into the Cloud. We aim to facilitate the software development lifecycle and reduce the burden of server maintenance. We dedicate ourselves to looking for cloud solutions to build resilient, scalable, and cost-effective applications. This presentation will present the architecture evolution of our projects and how cloud service (AWS) can be integrated in support of digital library development. It will also cover how we achieved our goals, challenges we faced, and the lessons we learned through the process - from ground to Cloud, from Server to Serverless.

The advent of koha Open source Integrated Library Software (ILS) has revived the hope of Nigerian Academic Libraries in catching up with the rest of the world in term of information dissemination to users and effective library service delivery using modern technologies to keep in line with global best practices. Koha continues to receive wide acceptance by the academic libraries in Nigeria. This paper assessed the adoption and implementation of koha open source software among selected academic libraries in the south-west, Nigeria.

The study adopted the descriptive survey design with Purposive Sampling Technique. A structured questionnaire was designed using an online questionnaire tool (survey monkey). The link to the questionnaire was sent to the respondents (librarians and staffs in charge of automation) through various media such as Email, WhatsApp and Facebook to ascertain the extent of koha (ILS) adoption, performance, level of satisfaction and likely challenges.

Sixty respondents from the twenty-three academic libraries (Universities, Polytechnics, and Colleges of Education) in the south-west geopolitical zones of Nigeria participated in the research. The data collected was analyzed using simple frequency tables and percentage. The study revealed that koha adoption has favorably impacted academic libraries services and continues to receive wide acceptance by the academic libraries after mass adoption of the software. No doubt, koha adoption is a reality among the academic libraries that participated in the study. The paper concludes by proffering solutions to the observed problems like ineffective vendors’ support services, level of ICT skills needed by library personnel for the effective performance and maintenance of the software.

This presentation will discuss a workflow for using these packages to automate the process of gathering metadata on a researcher publications in order to determine which versions are eligible for deposit in an institutional repository. We start with a researcher’s name and/or CV, using rcrossref (the Crossref API), rorcid (the ORCID API), rscopus (Scopus API), scholar (Google Scholar), and/or microdemic (Microsoft Academic) to gather and deduplicate metadata on those publications. We then merge those publications by DOI with metadata gathered from roadoi (the Unpaywall API) in order to determine whether open access versions are available and pertinent licensing information. We then merge again by ISSN with rromeo (SHERPA/RoMEO) to determine the versions that publishers permit authors to deposit, and we end with crminer, a tool for downloading full text PDFs for those that are eligible.

This presentation will be based in part on my walkthoughs available at https://ciakovx.github.io/fsci_syllabus.html. By the time of the workshop, I intend to have a fuller set of scripts available for attendees to use when they return. This will be useful for those working in scholarly communications, repositories, publishing, reference and liaison librarianship, as well as researchers themselves.

Many academic library consortia are at a crossroads. Strapped library budgets and an increasing number of problems to solve have led many consortia to either merge together, disappear entirely, or evolve their role. The PALCI and PALNI consortia are combining forces to develop a low-cost, sustainable model for an Institutional Repository service that will open up new possibilities for their membership. Working with development partners at Notch8, the consortia have embarked on a project to advance the development of the open-source Samvera-based Hyku repository into a fully featured, multi-tenant solution for their combined memberships. The project partners will work to scale Hyku software to create the functionality and configuration options needed to create a centralized collaborative repository infrastructure with multiple, library-based portals. This talk will examine the unique consortial needs and opportunities for IR services and examine how those translated into a suite of features. We will discuss why Hyku is the right choice for this project and discuss the development of features like custom theming, collaborative workflows, and the support of unique IR materials like electronic thesis and dissertation and open educational resources.

I was between jobs, so I volunteered to help the Prelinger Library with any lengthy projects they’d back-burnered for lack of time. They asked me to “photograph the stacks” — but in the process, I realized that I could build a web interface around the photos.

The Prelingers’ requests for the interface stemmed from the ethos of the library itself: They don’t keep a catalog, and wanted to emphasize a shelf-browsing rather than a query-based flow (ie: serendipitous research). Lastly, they asked if I could connect the site to the scanned material in their archive.org page.

The end result was the Stacks Explorer: http://prelingerlibrary.org/stacks

My talk will focus on the thinking/questions which resulted in this project’s odd technical setup, and include some guideposts for others who want to build similar large-scale DIY photo interfaces.

Mass digitization offers an unprecedented opportunity to provide blind and print disabled users with access to books. However, there are also unique accessibility challenges to mass digitized books, as well as hosting and managing access to these collections. The HathiTrust Digital Library is the largest, not-for-profit digital library collection at 17 million books, and we serve over 600 thousand visitors from around the world every month. The high volume of visitors, the large scale of the collection, the characteristics of mass digitized books, and the diverse needs of various user groups present unique challenges for updating and changing our platform.

In 2019, we undertook an accessibility assessment of our digital library platform. We’ll share our process for accessibility assessment and updating the digital library, and attendees will learn about conducting an accessibility evaluation and incorporating accessibility techniques in their own websites. Finally, we will reflect on the lessons we’ve learned on the costs of stasis and look forward to additional opportunities to meet the needs of diverse users.

The Hemispheric Institute Digital Video Library (HIDVL) is a longstanding digital resource on performance and politics in the Americas. The project provides access to streaming video and metadata co-created by library professionals and curators from the Hemispheric Institute. This talk will outline the evolution of the project’s web form for collecting descriptive metadata from content experts, from its earliest days as a custom web form to its current iteration as an Airtable base. The development and updating of the form will be contextualized with a discussion of how metadata transformation and publishing workflows have changed over the course of the project’s history.

Library and information science (LIS) environments are increasingly tasked with making data-driven decisions on resource allocation, services, and strategic planning. However, the data often needed for these complex decisions may require intensive analysis before they yield meaningful information about a certain topic in an LIS environment, and data from several sources may be required to fully address said topic. Open source technologies allow for an analytical engagement with these data without the financial cost to the institution associated with subscription-based and proprietary software. Highlighting a use case of data fusion in Python, QGIS, and SuperDecisions, I will describe how LIS environments can leverage open source technology to fuse unique datasets into a single information source, allowing for a fresh approach to data-driven decision making.

Making our websites accessible is a struggle we see everyday, in every corner of the internet. As a community, we have all been working hard to alleviate stress when it comes to navigating the web, but a lot of the time, we forget about making PDFs accessible! Many of the same principles from web accessibility carry over into PDFs, but successfully implementing and testing these principles is where many need guidance. In this talk, we will go over how to create, evaluate, repair and enhance documents using Microsoft Word, as well as Adobe Acrobat pro, and make them suitable for all users on the web.

Artificial intelligence (AI) has the power to disrupt but also to revolutionize our services and profession so that libraries can continue to contribute positive changes to the world. We are experimenting with AI as a way to scale the integration of popular information literacy micro-credentials to large numbers of students. By doing so, we will be able to reach more students, gain data richness about our learners, innovate at the micro level, and augment and improve our teaching practices. This session will demonstrate an innovative application of AI to the new form of academic currency–the micro-credential–and discuss how this application could lead to broader uses in libraries and academia.

Archives Portal Europe (APE) is a platform where institutions holding archival material on Europe can publish their inventories. APE aims to be the single entry point for this material, and it currently holds information from archives in more than 30 countries, making it the largest online archival repository in the world. Integrating such vast and heterogeneous material poses a series of important technical challenges that intersect with the different archival traditions in Europe.

The common technical ground for this is the XML standard Encoded Archival Description (EAD). As this standard is very broad, different archival conventions and traditions led to different ways EAD has been used from country to country, creating a challenge regarding unified processing, storage, and display of the provided metadata.

This talk aims to describe how this challenge has been overcome by:

  • Defining a subset of the official standard, apeEAD, to streamline central data processing;
  • Managing variations in the submitted data via conversion to the central format;
  • Ensuring that conversion does not result in losing essential local/national peculiarities, while still enabling joint approaches to indexing, retrieval, display etc.;
  • Communicating changes and their potential impact to continuously improve the use of the standard.

The vast archival holdings of Columbia University’s special collections provide a rich tableau for data visualization and analysis across several dimensions, with a view toward improving discovery, usability, and management of archival collections. The presentation will show how Google Data Studio’s “data blending” feature can be used to weave multiple sources—site metrics from Google Analytics, finding-aid and accession data from ArchivesSpace, and ILS MARC records, among other possibilities—into compelling interactive visualizations yielding practical and sometimes surprising insights hiding in plain sight. Tools and techniques highlighted include Python, Google Query Language, and some hacky but effective regex maneuvers.

We have a big problem: a lot of scholarship and library-infrastructure relies on Git hosting platforms (e.g. GitHub, Bitbucket, GitLab). Unfortunately, there’s no plan to preserve this material. However, the code isn’t the only asset to worry about; there’s rich contextual information in the scholarly ephemera — code reviews on pull requests, threads of discussions on issues — that are at risk. The presence of this scholarship on Git hosting platforms requires feasible ways of capturing source code and ephemera. This is especially so given that these platforms have no commitment to long-term preservation and can make business decisions that are at odds with LIS values (e.g. working with ICE).

We, the IASGE team, have two streams of work to address these problems: 1) qualitative inquiries into how folx in academia use Git hosting platforms, how these platforms meet researchers’ needs, and where gaps in features lie, and 2) an environmental scan of potential ways that source code and its scholarly ephemera can be archived and preserved by professionals.

We’d love to talk to the Code4Lib community (prodigious users of Git hosting platforms also!) about our research and hope for a future where code & its context are safe.

At UCLA we are building our digital library using Agile methodologies. We use quick iterative sprints, work towards a minimum viable product and continuously deploy as we build. This is a switch from the waterfall building techniques that resulted in many unfinished projects and long waits for simple changes. I will discuss the Agile tools and methods our teams uses, including sprint planning, retrospectives, pair programming, and test driven development. I will also discuss the benefits of Agile project management and the success we have experienced through using these tools.

Most current systems for the acquisition, housing, and care of cultural objects reflect existing structures of oppression. Typically, digital technology further perpetuates these power differentials. Furthermore, as library, archive, and museum practitioners seek to increase digital access to the history of disenfranchised and marginalized communities, they engage with communities that have been at best ignored by cultural heritage fields, and more likely actively harmed.

Providing access to the history of marginalized communities foremost requires genuine, responsive partnership. Our technical systems and workflows (processing, digitization, metadata, web design) must be equally responsive, but they often aren’t. This problem is not new or unknown – what are the barriers to change?

This talk presents resources from Design for Diversity, an IMLS-funded project exploring strategies for the development of more inclusive information systems in libraries, archives, and museums. We explored two main points of impact: the education of new practitioners, and opportunities for change in the workplace. We collected case studies and study paths, created specifically for the Design for Diversity project, with additional pointers to exemplary readings and projects in our online Toolkit. Through this Toolkit, we explore key moments where practitioners can make decisions that lead to more inclusive information systems.

Frustrated by the limitations and high overhead of traditional digital repository platforms, librarians at the University of Idaho Library have been developing an agile, lightweight approach to creating digital scholarship websites driven by metadata and powered by static web technologies. Our current IMLS-sponsored CollectionBuilder project embodies this methodology in a digital collection generator optimized for non-professional developers and simple hosting solutions. This approach has opened up new opportunities for collaboration between librarians, researchers, and educators to take true ownership over their digital projects. Serving as a toolkit of flexible recipes, CollectionBuilder templates have already been adapted into a variety of custom, data-driven websites for digital scholarship collaborations, from a project management kit to oral history transcript visualizations. This flexible, agile workflow enables rapid iterative development of new features, while collaborators develop fundamental web skills. This presentation will outline the key technologies and workflows that enable this approach, reflecting on the challenges and rewards of library centric development encountered in our work on the CollectionBuilder project.

Creating a digital media wall (a multi-screen display) can be challenging in both a technical and administrative sense. Selecting the right vendor and hardware for the proposed project is important, and there are several options available. Additionally, each library may have different roles in mind for their media wall, and there can be competing content to display.

This presentation will discuss some of the challenges we faced at Western Carolina University’s Hunter Library, and how we solved those issues. We currently use a media wall to display content from Springshare’s LibCal, and I will cover how we used the LibCal API to display study room and event calendar information. I will also discuss some of our other current uses of the media wall, and some of our plans for the future, with regard to content and technical possibilities.

MARC records were gathered from the British Library, National Library of Scotland, National Library of Wales, and HathiTrust as part of an Arts and Humanities Research Council (UK) funded exploration of a global catalogue of digitized texts. Most of this massive pile of records lacked identifiers. We conducted an investigation into matching duplicate records without identifiers, in order to identify an effective, scalable approach to record aggregation. Various matching algorithms were attempted, including raw title matching and classifiers from Python’s machine learning package scikit-learn. We will review our methods and results, some of which show promise for future services (enhanced clustering, collection assessment, authority work), and discuss some of the pitfalls of statistical duplicate detection.

Working as a project manager involves vulnerability, emotional risk, and exposure, particularly if one is new to project management and leadership roles. This presentation will briefly review project management strategies for those new to project management and discuss imposter syndrome and vulnerability in the context of technical projects. As a former ‘front-facing’ librarian charged with overseeing the migration of the library’s institutional repository, overcoming imposter syndrome and accepting vulnerability were as much a part of the migration story as data transfer and metadata crosswalks. Without formal project management training, and nominal knowledge of linked data and repository technical infrastructure, I led a team of three programmers, three metadata specialists, and one repository administrator. Managing vulnerability was not only an individual exercise, but one that extended to the team as they grappled with building the first linked data project to run in production at the institution. Therefore, this presentation will also briefly discuss managing vulnerability, emotional risk, and imposter syndrome as a team as we navigated learning new technologies, programming languages, and metadata standards.

Digital Research and Strategy at the New York Public Library would like to present a new project that leverages available statistical information in our various databases to tell a holistic narrative about the digitization lifecycle. Making use of SQL databases, Python, and R, our interface tracks the digitization lifecycle from the creation of a metadata record to digital objects ingested into the repository to rights statement assignment to approval and online publishing. Each stage of the lifecycle had previously been reported separately by individual departments but had never been pulled together into a single snapshot despite the ways each part of the story informs the other. Our goal is to create a comprehensive view into the intertwined processes that constitute the digitization story to help stakeholders better understand the invisible work required to bring digital assets online and provide useful data for strategic decision-making and funding.

Crossref has been working on two projects with different use cases that have been successful in using the same approach to process unstructured metadata and match it to an identifier. The two projects; Reference matching at Crossref and affiliation matching in the Research Organization Registry(ROR) were two projects that had data in different domains. Reference matching involves matching unstructured citations to DOIs in the Crossref registry while affiliation matching in ROR consisted of matching organization affiliation strings against the data stored in ROR. Both projects used the respective organization’s search functionality to retrieve results using the unstructured data as a query term and calculated the similarity of the unstructured entities to what was recalled. The talk will address the methodology used, how sample datasets were generated, what hijinks ensued, applications of the project, and conclusions reached from this work.

Data Visualization Dashboards in the Library apps. How a data visualization dashboard can improve the communication and assist in the decision making process. What is the storytelling behind the dashboard?

For academics, evaluating the fit of a manuscript with a prospective journal’s focus is a hard problem. The manual approach to finding an appropriate journal – exhaustively reading articles or abstracts from many different journals – is a time-consuming solution. An appropriate use of technology can hopefully improve this process. I attempted to build a tool that would programmatically recommend open access journals to authors based on their draft abstracts.

Highlights of my attempts at building a journal recommender include: initially trying to build journal-matching logic from scratch; then thinking the better of it, and learning to stand on the shoulders of other programmers’ work; and finally, finding practical ways to do computationally intensive analyses on large amounts of journal data. The takeaway is that there are many interesting and consequential choices to make when designing a journal recommender. This talk will explore that journey.

This talk will discuss a year-long project to rebuild our metadata services at our organization – including, in particular, a new MARC processing library.

Parsing MARC effectively begins with understanding how to model it for modern concerns, and, unfortunately, the MARC standard provides little assistance in that regard. In order to improve our ability to work with MARC, we started by constructing our own “modern MARC” model and then building application layers on top of that to parse and transform MARC data more powerfully than we could before. The lessons we’ve learned on how to do this are likely to be of interest to anyone who needs to work with imperfect legacy data.

Publisher names in WorldCat are uncontrolled strings. These uncontrolled names present challenges in converting WorldCat to Linked Data and offering best search results.

Attempts to cluster or “entify” publishers often require expensive manual effort or if automated, include too many errors to be useful.

This talk will report on recent progress by OCLC Research in clustering and entifying publishers in WorldCat data using Data Science methods.

Transitioning to management can be stressful. You’ve likely done a really great job as an individual contributor, so now you’ll stop doing that and do something else instead - management. If you’re used to being a high performer, the absolute cluelessness you feel as a new manager can be particularly nasty. There are tons of books and trainings out there, but you don’t even know what you don’t know, and most training material is geared towards the corporate world anyway. In this talk, I’ll run through a few key lessons learned and things I wish I had known the day before I started as a manager in a library environment.

At UCLA Library, we have a team of people who build microservices. We built one for converting TIFF images to JPEG 2000 images, for use with our IIIF image server. In the process of building what we thought we needed to build, we also did some measurements to be sure we were building something that could do this job as fast as possible. And in that process of research, we discovered that what we had built wasn’t even close to what we needed to build. This talk will be a bit about what we ended up building, because it’s cool and we’re proud of it. It will also be about that moment when we realized that what we had already built needed to be set aside, so we could build the best solution we could.

The Digital Scholarship Group at Northeastern University uses a WordPress plugin to connect items records and associated metadata from our institutional repository to presentations and exhibits within WordPress sites for archives, research, and teaching. That plugin was based on creating shortcodes for the display of content in the IR within WordPress pages. With WordPress 5 and the Gutenberg editing interface, that approach became outdated. This talk will address the various factors – technological, organizational, and philosophical – that are part of the ongoing efforts to rebuild the plugin for WordPress 5.

Topics covered will include:

  • The learning curve for React and WordPress’s wrappers around it for Gutenberg blocks
  • Design choices and constraints that reflect the priorities of our Digital Humanities shop
  • Gains and losses in moving from shortcodes to Gutenberg blocks
  • Gains and losses in moving from a PHP-centric interface to a Javascript/React-centric interface

In 2016, my library purchased the Summon discovery layer and since then it has shown increased usage with each passing year. Yet with more Summon searches came reports of broken connections to full-text sources. As we went about diagnosing and reporting these problems, it became apparent that discovery layer linking is inherently flawed. Librarians are expected to donate tremendous labor investigating problems with vendor software and supplying fixes for them. If we want our users to be able to access the full-text resources that we’ve paid for, we have to correct others’ inaccurate metadata or locate identifiers such that direct links can be constructed. In some instances, the software’s structure and assumptions prohibit fixes. Broken links must remain present, to an extent that perhaps does not even meaningfully decrease with time. In this presentation, I will describe the state of discovery layer linking, categorize the most common failures, and present the methodology and results of exhaustive research I’ve performed into how often links break and the particular nature of the breakages. I will discuss how we automate what we can of the fixing process and advocate for algorithmic changes.

The Folger Shakespeare Library is a major scholarly center for the study of early modern British and European history and literature, and in fact holds the world’s third-largest collection of books printed in Britain or in English before 1640, and is one of the top ten North American holders of British and English-language imprints from the remainder of the 17th-century. Until now, though, the collection of narratively- and iconographically-rich woodcut and engraved illustrations bound into these books has remained unsearchable and largely inaccessible.

British Book Illustration (BBI) was a project to digitize and index 10,000 of these illustrations from the Folger’s collection. The indexing was done in Iconclass, the internationally-accepted standard for the description and retrieval of subjects represented in images, able to account for both the simple—this is a dog—and the complex—this is a dog representing the concept of insatiable-ness.

This presentation will cover the BBI project, and will then take a deep dive into Iconclass.

Over the last year, the University of Texas Libraries has started exploring the use of Neo4J Community Edition, an open source graph database, for managing complex datasets in Libraries’ collections and modeling digital asset management workflows. Neo4J was selected for these tasks because of its open source license, unique relationship modeling capabilities, large user community, and wide availability of online reference materials. Experimentation with Neo4J has revealed it to be a promising technology that can be utilized to yield new insights into Libraries collections due to the ease with which data can be queried and visualized based on relationship characteristics. While Neo4J’s graph database model does provide unique benefits, it also has significant limitations compared to more traditionally utilized relational databases like PostgreSQL. These limitations, like a lack of integration with GIS software, have thus far prevented incorporation of Neo4J into production workflows, but it is expected that they can be overcome in time. This presentation will discuss specific projects that Neo4J has facilitated, will provide an assessment of the unique capabilities that graph database technology has enabled, will demonstrate how Python scripts have been used to manage Neo4j workflows, and discuss plans for overcoming current limitations.

The University of North Texas Health Science Center (UNTHSC) approached the Texas Digital Library (TDL) in 2018 about migrating their scholarly repository from Bepress to a new DSpace 6.x instance at TDL. During this project, TDL developed a workflow for the repository migration that included: using AWS for data storage; developing code for pre-processing, metadata mapping, and generating DSpace item packages; and incorporating existing DSpace import tooling for repository data.

In collaboration with UNTHSC, TDL staff created a process for importing a community/collection hierarchy spreadsheet into DSpace. Additionally, TDL developed code to create Simple Archive Format packages for digital objects in the repository that incorporated 1) a metadata crosswalk (also built in collaboration with UNTHSC staff), 2) data about the repository harvested from its OAI-PMH feed, and 3) the metadata and digital objects themselves located in S3. Finally, TDL worked with UNTHSC staff to customize the configuration and look-and-feel of the new DSpace instance to meet their needs.

This presentation will discuss our Bepress to DSpace migration from initial design, to the project execution, to the conclusion and assessment of the project deliverables. I will lay out what we learned and offer suggestions for others considering similar migration projects.

Sorin is a collaborative discovery and research platform produced and recently open sourced by St. Edward’s University’s Munday Library. Sorin fills in the gaps between our existing systems by providing a clean, simple, and integrated web interface for querying our search engine (Primo), creating collections of search results, collaboratively adding notes and file attachments, exporting to other platforms, and publishing work to the rest of the community or the open web.

Written in Elixir and React, Sorin has been in production at St. Edward’s for over a year and has become our primary discovery layer. Adoption has been swift and users have been enthusiastic. This talk will discuss our technology stack, our development process, and some of the lessons we have learned along the way.

How does one enhance access to historical data through mapping and visualization tools? In this presentation, the experience of working with over 21 thousand historical records from the Historical Society of Pennsylvania’s school admission registers will be discussed. In particular, the normalizing of historical occupational data using hand coding, OpenRefine, and Tableau will be examined. How many gentlewomen were there in Philadelphia? Where did their children attend school, and did the children of tinmen go there, too? And what, exactly, is a shrimp fiend? Learn the answers to these questions, as well as lessons from this second-year LEADS-4-NDP Fellowship project, in this humorous and informative presentation.

The abundance of consumer-grade AI tools and services makes it seem easier than ever to bring machine learning to library production workflows for description. But while the tools have become easy to implement, it’s still very difficult to ensure usable, reliable outputs or to manage the risk of negative social impact. In this presentation we’ll share methodologies for assessing machine learning technologies within a framework of metadata quality and applying emerging best practices for accountable algorithms. Drawing from work we’ve been doing in the Audiovisual Metadata Platform Pilot Development (AMPPD) project, we’ll show you how we incorporated both quantitative and qualitative measures to evaluate accuracy and risk of open source and commercial tools for tasks such as speech-to-text transcription, named entity recognition, and video OCR. We’ll also share tools and techniques for engaging collection managers and other library staff in reviewing machine learning outputs and defining quality measures for targeted uses of this data.

Mana-kb is an Open Source Knowledge Base and micro-service platform dedicated to library data for Koha Open Source ILS. This service is based on crowdsourcing. Librarians can share, view, import and comment on information for Serials and Reports to see what other libraries are using around the world. They can share subscription numbering patterns, book reviews, and reports.

Although there are numerous publicly-available moving image archives, in many cases it would take years of continuous viewing for an individual scratch the surface of a single archive. My goal is not to make sense of the contents of a given archive. Rather, by leveraging computer algorithms that analyze and organize sonic and visual patterns, I have developed techniques to immerse viewers in a massive birds-eye-view of sounds and images that surface the underlying forms, textures, and atmospheres of the source materials. I have compressed tens of thousands of clips from numerous public film and video archives into minutes-long visualizations, including the U.S. National Archives, the 2016 U.S. Presidential TV Ad Archive from the Internet Archive, and the Macaulay Library from The Cornell Lab of Ornithology. Each archive is run through the same computer algorithm, but results in distinct experiences that highlight each archive’s unique fingerprint of sound and image. All the visualizations are generated completely from custom code that will be released open source to the public. In this talk, I will demo some of the visualizations and walk through my technical and creative process.

The Jon Bilbao Basque Library has digitized around 5,000 photographs in various projects since 2003. In 2018 a new Digital Services unit was created at the University of Nevada, Reno Libraries. This unit was in charge of implementing a new preservation system based on Islandora. Ingestion of “standard” TIFF files into Islandora found several issues related to the metadata of the images. The approach taken to identify the origin of said issues and the tools used to fix them are explained.

The proposed application, AICatalog, facilitates, by way of an efficient/replicable path, the batch cataloging millions of uncataloged federal government documents currently held across the Federal Depository Library Program (FDLP) network. Importantly, this project will benefit from a formal partnership with the Government Publishing Office (GPO), who will provide key technical support, as well as national visibility for this project among FDLP members because, using traditional methods, a project to catalog 400K+federal government documents would require approx. 70K personnel hours; that means dedicating 5 full-time catalogers, the minimum timeframe for completion of such a project would be 8-10 years. However, using the AICatalog the workload could be completed within months. The AICatalog was built upon two deep learning techniques: computer vision and natural language processing. The features make the AICatalog outperform other optical character recognition (OCR) products in three ways: 1) AICatalog detects text entities from unstructured/unaligned documents, 2) AICatalog classifies the entities accordingly (e.g.,“SuDoc-Class”, “Year”, and “Issuing Agency”), 3) AICatalog wraps procedures into a one-step solution. Specifically, the AICatalog was constructed in R (Python alternatively) with Keras, which is TensorFlow’s high-level API for building and training deep learning models.

The Social Network and Archival Context project (SNAC, snaccooperative.org) presents a documentary-social network of over 3.7 million identities described in finding aids and resource descriptions from archives around the globe; that is, a graph in which individuals, corporate bodies, and families as nodes are connected with edges based on their co-occurrence in printed or collected works. SNAC’s network can be considered as an evolving network, in which identities are only connected during their lifespans.

I developed an analysis tool and applied three social network analysis metrics–betweenness, harmonic, and degree centrality–temporally across the SNAC network’s 400-year lifespan to understand the network as it changes over time. This analysis uncovered an irregularity in the overall centrality, i.e., a highly dynamic point in time, in the mid-1700s. We expected the cause to include collections of prominent figures in US history, since the majority of holdings referenced by SNAC are in the United States. However, we found that it was instead caused by the connections of English authors James Boswell and Samuel Johnson, whose descriptions from Harvard’s Theatre collection were over-described compared with identities from that time. I explore the use of these metrics to uncover accession and collection bias.

In this presentation, we will share and discuss tools & techniques our team has used and developed over the past year to improve and streamline how we work together, how we communicate, and how we interface with management and stakeholders. Areas of focus include formalizing requests for work estimation; documenting architectural decisions; planning for maintenance; sharing current work progress and backlog/roadmap information; triaging production issues without disrupting project work; and recognizing each other’s contributions and accomplishments. We acknowledge that teams are busy, so all techniques are lightweight, and none of them depend on one another; they are an a la carte menu.

In 2018, the Metadata Services department at Iowa State University Library began creating and managing Archival Resource Keys (ARKs) for online resources including finding aids, digital collections, and digitized archival and special collections material. To better manage the several thousand ARKs we’ve created so far, we built a custom Python library called ARKimedes to mint, update, and track our ARKs. This talk will provide an overview of the use and management of ARKs at Iowa State, our early automation efforts, and the pain points ARKimedes aims to solve.