We have a big problem: a lot of scholarship and library-infrastructure relies on Git hosting platforms (e.g. GitHub, Bitbucket, GitLab). Unfortunately, there’s no plan to preserve this material. However, the code isn’t the only asset to worry about; there’s rich contextual information in the scholarly ephemera — code reviews on pull requests, threads of discussions on issues — that are at risk. The presence of this scholarship on Git hosting platforms requires feasible ways of capturing source code and ephemera. This is especially so given that these platforms have no commitment to long-term preservation and can make business decisions that are at odds with LIS values (e.g. working with ICE).
We, the IASGE team, have two streams of work to address these problems: 1) qualitative inquiries into how folx in academia use Git hosting platforms, how these platforms meet researchers’ needs, and where gaps in features lie, and 2) an environmental scan of potential ways that source code and its scholarly ephemera can be archived and preserved by professionals.
We’d love to talk to the Code4Lib community (prodigious users of Git hosting platforms also!) about our research and hope for a future where code & its context are safe.