Launching the Armed Services Editions: A Computational Analysis – Cultural Heritage Informatics Initiative

I am happy to announce the launch of my CHI Project, The Armed Services Editions: A Computational Analysis. On my page, users can navigate through three “Data Narratives”: simple analyses that I conducted to answer critical questions about these data. The Gender Data Narrative considers the distribution of gendered pronoun usage throughout the corpus, and features a basic foray into LDA topic modeling. The Genre Data Narrative considers the types of books that were sent to servicemen, and how the generic representation of books may have shifted over time. Finally, the Geography Data Narrative the geographic imagination of the corpus– both domestic and internationally– with NER.

This first phase of this project is, quite simply, a book history project. To date, the ASE Corpus has not been studied in total. Several scholars have published institutional histories of the Council on Books in Wartime, or discussed the role of specific books, or even discussed the ASEs in relation to a larger sociological project. I am interested in assembling a more thorough, stylistic, macro-history of the ASEs, that attends to both it sociological import as well as its formal properties through computational analysis. The data I’ve assembled is descriptive, working toward that end, and is a necessary foundation to the more advanced analysis I will be conducting this summer.

In addition to an analysis of the ASE Corpus, this website is also a record that chronicles the development of my methodological chops. While I had a basic foundation in R (thanks to a fabulous course at HILT), my skills needed (and still need) development. I used two textbooks to improve my skills, testing my dataset throughout. Users familiar with Text Analysis with R for Students of Literature by Matt Jockers and Humanities Data in R by Lauren Tilton and Taylor Arnold will likely be able to trace my data analysis back to the chapter problem sets.

Full disclosure: I feel insecure about this. I would like, eventually, to publish on the ASEs. A record of my fledgling explorations in R and data analysis is… well, nerve-wracking. Yet, as Ethan Wattrall has reminded me in a variety of ways, it’s also an important intervention. Over and over again this year, I have been reminded of and impressed by the generosity of my colleagues in DH; I post this basic data analysis in hopes of inviting that same generous conversation.

Only a fraction of the work that was completed on this project his featured on my project website. I should have foreseen this problem and created a time-lapse video of my hours and hours running OCR on hundreds of documents, or adding metadata to my database. Or, better yet, learning how to analyze data in R. For this project, however, I decided to visualize my data using Tableau. Tableau provides far less specificity, for sure, but it also allows for a greater degree of user interactivity. Since my data is, at this stage, largely descriptive, I wanted users to be able to explore with greater flexibility.

It’s been a long year working on this project, and that long year has turned out to be just the beginning. I’m so excited to see how this project continues to develop. Over the summer, I’ll be continuing this project by running these analyses—and much more interesting, advanced analysis (fingers crossed)—on the entirety of my corpus.

The questions motivating this project are increasingly pressing, and continue to motivate me—particularly as a powerful political candidate has remained consistently hostile toward the free exchange of ideas that should define any democratic discourse. Ultimately, this project asks, what (or whose) ideas are acceptable, and what (or whose) ideas aren’t? And what (and who) makes that so? These questions should be asked about 1940, and they should be asked about 2016.