I am pleased to announce the launch of my CHI Project, “The Novelty Project.” This is more of a soft launch, really; this collaboration between Arend Hintze, Devin Higgins, and I has been in the works for the past two years now, yielding one forthcoming publication and a grant. I’ve built a website to serve as a companion landing page—where we will expand on information published, include information that we weren’t able to fit into our published work, and include some of the weirder findings that we uncover.

Novelty looks like this!

The major work of my year as a CHI Fellow was not the building of the public site, but the development of a 20th-century Corpus with the HathiTrust Research Center. Our team was awarded an HTRC Advanced Collaborative Support Grant last July, providing us access to in-copyright works in HTRC’s holdings via a Virtual Machine. Throughout the year, we worked with the incredibly talented and patient Eleanor Dickson to develop a corpus of 20th-century novels, split into two categories: canonical and non-canonical. Our goal was to build a corpus that would be large enough for us to a.) replicate our initial proof of concept, and b.) consider some of our more provocative hypotheses regarding literary modernism, p

ostmodernism, and the periodization of the 20th-century. Our goal in developing two, contrapuntal corpora was to get at the dynamic identified by Algee-Hewitt et. al in Pamphlet 11 from the Stanford Lit Lab, “Canon/Archive: Large-scale Dynamics in the Literary Field.” Our hope was to develop both an admittedly inclusive canon of the 20th-century novel, and an archive against or within which we might understand the canonical, and broader dynamics of the field.

One might suspect that determining the canonicity of our texts would be an impossible text—after all, “what’s in,” and “what’s out,” has been the subject of much heated debate (to put it lightly). In fact, canonicity was rather straightforward, thanks to a useful tool created by Nathaniel Conroy called Metacanon.* Metacanon collects citation scores from Google Scholar, JSTOR, The New York Times, and several other sources to calculate the most influential novels of any given time period. We used Metacanon’s date-range function to develop our canon list, gathering the top 100 most frequently cited works of fiction published within each decade of the 20th-century. This provided us a relatively even spread of publication dates (though the turn-of-the-century skews Jamesian). Once we had identified these texts, we isolated the novels from our results, queried HathiTrust’s holdings, selected a preferred edition, et voila: a canon corpus.

From there, we set out to build our corpus of non-canonical novels. This corpus isolates a hazy middle in the literary field—novels that were important or influential enough to have been digitized, but not important or influential enough to have been cited by scholars. Because we were not starting from a pre-determined list, but working from within the disorientingly rich and complex Hathi library this process turned out to be rather tedious: how does one identify a novel according to MARC Records? How to distinguish between a book published during the 20th-century and a book republished in the 20th-century? (Dickens, it seems, gets a reprint every five years). What of novels spread over multiple volumes—a fad that, fortunately for us, appeared to be falling out of fashion after 1900? How do we identify—and eliminate—works of criticism that are about novels, but not novels themselves? And what difference does a collection of short stories make in our results? Which versions do we keep, and why? Each of these questions, their answers, and our corresponding action has the potential to change our results. And while these finely-tuned details may make little difference at scale, they mattered significantly to us as we determined what-to-add and what-to-cut.

We are in the process of running these texts through our Novelty Filter, in hopes of turning to Phase III of this project over the summer. An online landing page is, unfortunately, a poor substitute for the work that our team has completed. But it provides us a space to consider some weird stuff—such as our comparison between Bestsellers and Prizewinners—and to provide our audience a chance to interact with our (forthcoming) data at a more granular level.

A final note of thanks to HTRC, and, especially, Eleanor Dickson. This project would have been impossible without Eleanor’s efforts, and without HTRC’s generosity. I’m eager to see how The Novelty Project continues to unfold, and hope that you’ll follow along with us.



*While we made great use of Metacanon.org, it appears that the site is no longer functioning?