A major part of my CHI project is cleaning trade data that I collected from the Public Records and Archives Administration Department (PRAAD) in Accra and Tamale, Ghana that includes paper records of the goods carried by traders across the Volta River. The statistics, however, are not a complete picture of trade in the region, as many traders avoided customs stations. Nonetheless, they illustrate trends in internal trade that reflect wider political, economic, and social changes in the first half of the twentieth century.

Transferring the information from old and sometimes decaying colonial documents into data was a laborious process. The information that most interested me was hand written by colonial officials on forms distributed by the customs department. While I could decipher most of the hand writing, some words and descriptions remained illegible to me. I typed the information into several different tables, organized by geography and time period to make the data entry process faster. As much as possible, I entered the hand written information word for word or number for number into the table.

The dataset that I plan to publish as part of my CHI project will follow the organization of the customs forms, with goods as the main focus. Each entry will include the amount crossing the Volta River for monthly or yearly periods, depending on the custom station and reporting year, and additional information about the type of good being traded and the location where the record was recorded.

In working with the archival documents, I found that the information written down by customs officials was more descriptive and more comprehensive than the reports issued by the department in Accra. For this reason, I sought to include as many original—rather than summary—documents as possible from the customs records at PRAAD. I entered approximately 8,000 records contained in nine different tables, covering the period from the early twentieth century to the 1950s.

My main focus in cleaning data has been to separate objects from their descriptions or places of origin. To do this, I’ve used what are essentially find and replace commands in R, a language for statistical analysis, to move words and phrases into the correct field. This process, however, has been tedious as it involves running summary statistics and then looking through the summaries to see what terms still need to be corrected. At the end of this process, I hope to have a more standardized field of trade items that will facilitate future analysis.