As a CHI Fellow, I’m undertaking a large-scale text analysis of the Armed Services Editions, a collection of novels sent to US Soldiers during WWII to “fight the war on ideas,” to consider issues of politics and literary form. I first stumbled on the Armed Services Editions a few years ago, while researching Ernest Hemingway’s The Sun Also Rises. You may recall Jake’s description of Robert Cohn, early in the novel:

He had been reading W.H. Hudson. That sounds like an innocent occupation, but Cohn and read and reread “The Purple Land.” “The Purple Land” is a very sinister book if read too late in life…For a man to take it at thirty-four as a guide-book to what life holds is about as safe as it would be for a man of the same age to enter Wall Street direct from a French convent, equipped with a set of the more practical Alger books.

I was working on a project on modernist reading networks, and this passage jumped out at me. I looked into The Purple Land and found that it was chosen to be a part of the Armed Services Editions in World War II, 16 years after the publication of The Sun Also Rises. Cursory research into the Armed Services Editions led me to the Council on Books in Wartime, a committee of publishers that assembled during World War II and contracted with the US Military to produce cheap paperback editions for US soldiers abroad. The goal (and slogan) of the Council on Books in Wartime was to use books as “weapons in the war of ideas.” Books had an important role to play in the war effort, the CBW wrote, because “Books can help us recover our past and teach us what a tough-fibered people we can be when we have to. Books can tell us what our enemies are like. Even prizefighters study their opponents carefully.[…]Books can tell us what our allies are like.” All of this was vitally important to such a “total war.”

Yet, the process for selecting these books for such an important task was fairly opaque. According to a booklet commemorating the ASEs found in the Princeton University Mudd Manuscript Library,

“Titles are selected by the following process: Publishers’ lists are combed and copies of books thought desirable are asked for. Each book is then carefully read by a professional editor who makes out a written report. The books and the reports are submitted every two weeks to an Advisory committee consisting of publishers, librarians, booksellers, critics, and authors. Books that meet with the approval of this Advisory Committee are then sent to the Army and Navy, both of which services must agree on a title before it is accepted for publication.”

Presumably, a desirable book would be selected and approved because of its fit within the general aims of the ASEs: to boost morale, to promote democracy, to learn about the enemy. Histories of the ASEs show very little censorship of books (though, presumably, certain books would not have been “thought desirable” and suggested for publication in the first place—James Joyce didn’t make the cut, nor did DH Lawrence). A quick scan of the ASE database reveals some books that make sense as “desirable” in the promotion of democracy for the war on ideas (in the hive-mind of the DoD in 1943): Jack London novels, for instance. Others seem out of place, such as Virginia Woolf’s The Waves. Yet, over 120 million copies of 1,322 books were distributed on the front lines and in military hospitals, all of which met the criteria outlined by the CBW: they each helped to “fight the war of ideas.”

I’ll be looking at this corpus for my CHI project, analyzing what it would mean for a text to be made into a weapon for democracy.

Big picture: how might an understanding of the CBW Corpus help us think about textual politics, politics and style, politics and form? To answer this question, I want to consider how “democracy” might be operationalized and measured—in other words, what formal or stylistic measures might make a text “democratic”? I have other plans for this project down the road, including developing a predictive model. But for the purposes of my CHI Project, I’m going to be building this corpus and conducing some preliminary analysis in R. Right now, I’m eyes-deep in Phase One: Building the Corpus.

Fortunately, it is quite easy to find a full list of all of the ASEs. Also fortunately, many of the titles assembled by the CBW were written prior to 1923—that is, public domain. It is unlikely that I will be able to assemble a corpus of all 1,300 titles. I plan to do the following:

  • Follow the release of the ASEs chronologically, starting with the A series and moving through ZZ.
  • Keep texts that I can find already digitized in the public domain (Hathi Trust, Project Gutenberg, Google Books)
  • Keep a running list of texts that
    • not digitized but ARE public domain
    • still protected under copyright
  • See what I end up with and make some hard choices about scaling, about digitization, and about copyright and fair use.

Highly scientific and conclusive, I know. I’ll cross the OCR bridge when I get there.

There are some texts that I know already that I can discard. The ASEs assembled some “made texts,” short story collections by famous authors like Ernest Hemingway (his novels were excluded). There will certainly be more difficult choices to make about inclusion/exclusion. For instance, some texts were abridged to fit the specific production dimensions of ASEs, such as Moby Dick. In these cases, I’ll have to decide if I want to take the full-length version or discard it entirely.

And I’ll also have to think critically about the sort of metadata I hope to assemble in the process. Author gender might be interesting (if infuriating). I was surprised to find that the most popular ASE was Betty Smith’s A Tree Grows in Brooklyn. Perhaps I expected something with more machismo, or perhaps I’ve just got Jonathan Franzen perpetually in the back of my head bashing women writers (god help me). Regardless, I’d be interested to see how author gender impacted the selection of books.

Given the CBW’s aims of “learning about our allies” and “learning about our enemies,” I would also be interested to track author nationality, or the book’s primary setting. Some of this can be collected as metadata—though I don’t want to put too much weight on authorship—but some of these questions can best be answered through analysis (NLP recognition for place names, for instance, to track primary settings). Through the process of building, I hope to develop some more hypotheses beyond my initial thoughts (to be shared later) that might help guide the analysis phase of the project.

I’ll clean the data and make the corpus (or, as much as possible) available via GitHub, as I would love for others to join me in this analysis. And I’ll certainly be blogging about the process the way.