As a returning CHI fellow, I was tasked with organizing a workshop on a digital humanities tool of my choice. I knew I wanted to do something related to data analysis and visualization and eventually decided on the computing language R, which I used last year to clean and transform a dataset. I like using R, a free software environment, when working with large datasets or projects with multiple datasets. The advantages include its data analysis and visualization packages, such as Tidyverse, and its active online user community. Although R can be learned in the command line, it is much easier to use in RStudio Desktop, its own graphical user interface. For this reason, I setup the workshop to showcase two examples for using R and RStudio Desktop in the digital humanities.
The first example comes from my CHI project last year. I used R to analyze and transform the African Commodity Trade Database (ACTD), which includes historical information on imports and exports from Africa. In preparation for the workshop, I renamed many of the fields, so they could be read without a codebook. I also saved the updated file as a csv to be imported into the RStudio project. I included basic commands for summarizing data and analyzing the different commodities in the dataset. I selected peanut exports from Senegal as an example for creating a scatterplot that can be exported as a jpg file to be used elsewhere. Finally, I included a command for converting a dataframe in R into a json file.
For the second example, I wanted to include text analysis because of its importance in the ditigal humanities. I found an excellent example on The Programming Historian website by Taylor Arnold and Lauren Tilton. Since I am less familiar with text analysis in R, I decided to use a portion of their article. Towards the end of the article, the authors explain how to read a folder of text files into RStudio and then how to conduct a stylometric analysis by calculating the number of words per sentence in each of the files. Finally, the analysis is displayed as a scatterplot showing change over time. I liked this section of the article for our CHI workshop, since it demonstrates the use of more advanced commands, including for loops and sapply.
In setting up the workshop, I created an R project folder with all of the commands and underlying data files. I uploaded this folder to a GitHub repository so the other CHI fellows could download the folder to their personal machines. Once downloaded, along with the R language and RStudio Desktop, the R project can be quickly opened with all of the relevant files to complete the workshop.
On the day of the workshop, I decided to rearrange the order of the examples, with the text example first, since it seemed more CHI fellows were interested in text analysis than in tabular data. At the beginning of the workshop, it took additional time to connect my computer to the monitor and then time for the other CHI fellows to download the relevant software and to find the workshop folder on GitHub. While running through the prepared commands can be done quickly, I tried to explain what each command does in enough detail that the progression of commands makes sense. I decided not to explain how to write out the commands and how those commands can be adjusted to account for other factors. Nonetheless, in explaining each of the lines in the text analysis example, it took longer than I thought. I decided to stop at the end of the first example to allow others to look through it themselves and ask any questions, rather than jump into the second example. As a result, we didn’t move on to the tabular data example during the session, but it was available for CHI fellows to run on their own.
In future, I would include instructions in the R script explaining each example and each command. That way, participants could refer back to the script in future. Additionally, the workshop could more easily be shared with the larger digital humanities community at MSU.