As I’ve mentioned in past blog posts, I’ve been working with the LGBTQ Video Game Archive, founded by Adrienne Shaw at Temple University, to record and preserve cases of LGBTQ representation in video games since the 1980s. One of the difficulties the Archive has faced in recent years has been the ephemeral nature of many of the digital sources the Archive draws on to provide evidence and information for its entries. Many of these sources are blogs, personal websites, or social media posts, and as soon as their creators stop maintaining them they can disappear suddenly. An example of this was gaygamer.net, a website for LGBTQ players to discuss games and gaming cultures that went dark without notice in May 2016.
To help prevent the loss of queer representation and culture in games, the Archive has been storing copies of the sources its entries draw on for storage at the Strong National Museum of Play. For this blog, I thought I’d lay out the process I’ve been using to do that copying/storing/preserving, and to welcome suggestions for how to improve the process in the future!
The first step of the process is saving all of the webpages that the Archive uses as HTML files.We’ve organized these sources according to type (article, blog, etc.), and I plug the list of URLs for these pages into Chrome Download Manager, a Chrome extension that downloads each URL as a HTML file. Chrome Download Manager makes it easy to do this in large batches, and allows one to designate the filename convention for the resulting HTML files. I usually save them as *URL*.html, where *URL* in each case is the source’s URL. This helps keep them in a specific order to it’s easy to rename them and store them.
Once I have all the HTML files, I first rename them to a simple unique identifier. Something like, A1, A2, A3, etc. for articles, and so on. I then use a Mac Automator script to convert all of them to PDF files (the Strong Museum’s preferred file format for preservation).
This process has made it relatively easy—and fast!—to store sources as both HTML and PDF files. There are a few hiccups usually in doing this with large batches of files, specifically with converting HTML to PDF. But in general it’s easy to fix those issues and to have quality PDFs on the other side. For videos, I’ve been using Youtube-dl, a command-line tool for downloading videos from URLs.
While this process isn’t perfect, it’s functional, and it doesn’t require individually downloading each and every source. If you have suggestions for how to improve on the process (or have gotten wkhtmltopdf, another command line tool, to be more cooperative), please contact me!