In this blog post I will explore some of the views on the use of crowdsourcing for digital projects in the humanities. I am interested this this literature because for my future dissertation project I am considering crowdsourcing data from the local community. In general researchers and institutions can crowdsource project tasks through self-developed sites (https://www.citizenscience.gov/smithsonian-transcription-center/#) or through platforms that host projects (https://crowdsourced.micropasts.org/ & https://www.zooniverse.org/). While crowdsourcing is a broad term that describes a variety of tools are utilized across many disciplines and sectors, crowdsourcing in digital heritage has been defined in two ways:

  • an emerging form of engagement with cultural heritage that contributes towards a shared, significant goal or research area by asking the public to undertake tasks that cannot be done automatically, in an environment where the tasks, goals (or both) provide inherent rewards for participation” (Ridge 2012)
  • the harnessing of online activities and behaviour [sic] to aid in large-scale ventures such as tagging, commenting, rating, reviewing, text correcting, and the creation and uploading of content in a methodical, task-based fashion to improve the quality of, and widen access to, online collections (Terras 2016).

Researchers, museums, libraries, and archives working with digital cultural heritage collections have expanded their use of crowdsourced since 2006, when the term was coined. 2010-2011 marks a breakthrough period for the development of crowdsourcing projects in digital heritage, as several large institutions in Europe and the United States utilized crowdsourcing to engage audiences and perform digitization and transcription tasks (Terras 2016). When brought into digital humanities crowdsourcing had to be rethought, primarily through the long tradition of volunteership and public engagement at museums, archives, and libraries. For digital heritage projects:

  • crowdsourcing has not involved massive crowds, but rather relied on smaller cohorts of “super users” that contribute to projects for their own personal reasons (Terras 2016; Van Hyning 2019)
  • Crowdsourcing has not been a source of “free” or “cheap” labor, but rather a collaboration between external communties and the institution with a great deal of work performed by intuitions on the backend. For the humanities, crowdsourcing is valuable because it brings in new knowledge and increased meaningful collaboration with the public (Deines et al. 2018; Terras 2016).
  • Crowdsourcing has been about connecting institutions, researchers, and/or collections with a community that allows individuals to interact with and explore the historical record in a meaningful way (Terras 2016)

Crowdsourcing projects in digital heritage can be broken down into two general trends based on their goals and within these are several common tasks (Carletti et al. 2013):

  • Crowdsourcing projects that require the “crowd” to integrate/enrich/reconfigure existing institutional resources ask the public to contribute to:
    • curation (e.g., social tagging, image selection, exhibition curation, classification); revision (e.g., transcription, correction); and location (e.g., artworks mapping, map matching, location storytelling).
  • Crowdsourcing projects that ask the “crowd” to create/contribute novel resource that ask the public to:
    • share physical or digital objects, such as document private life (e.g., audio/video of intimate conversations); document historical events (e.g., family memorabilia); and enrich known locations (e.g., location-related storytelling)

In a review of crowdsourcing projects in the digital humanities, Melissa Terras (2016: 432) has shown that crowdsourcing is attached to issues of public engagement where project success demonstrates the benefits of engaging existing communities of interest and build projects “for, and involving, a wide audience.” The literature suggests that successful crowdsourcing projects should have a clear goal, state clearly the terms of use and use license of the data generated, tap into existing communities of interest, maintain long-term connections and communication with the contributors, listen to contributors’ suggestions with regards to workflow, interface, and instructions, and to make sure that the project is well-developed upon release ( Deines et al. 2018; Schreibman 2016; Van Hyning 2019).

My review of these studies show that researchers in digital humanities have argued for the use crowdsourcing because (Deines et al. 2018; Terras 2016; Van Hyning 2019):

  • People want to transcribe historic documents
  • It allows for researchers and institutions to build or engage with new group and communities
  • Goals can be achieved quicker than the institution could working alone
  • Provides projects with external knowledge, expertise, and interest
  • It improves the quality of data and improving the way data can be discovered
  • Provides researchers and institutions to gain insight into users opinions and desires by building a relationship with the community of interest
  • It shows the relevance and importance of the institution and its collections through high levels of public interest
  • It builds trust and loyalty to the institution
  • It encourage a sense of public ownership and responsibility towards heritage collections
  • There is pent-up knowledge in institution and pent-up expertise in the public
  • It allows members of the public to engage with content in ways that allow them to be authors of the historical record

Despite the advantages, crowdsourcing can bring to projects, some researchers who have studied and written on the use of crowdsourcing in digital heritage express concerns about quality of data and long-term sustainability. Other concerns center on a fundamental tension between researcher’s instinct to control context and authenticity and a desire to share access and promote usage of collection while others express concerns over the authority and accuracy of crowdsourced transcription (Van Hyning 2019). These include fears that a large quantity of poor quality work could crowd out better scholarship (Terras 2016: 442-443). There is particular concern with projects involving crowdsourced transcriptions, as it is not possible to ‘average the transcriptions’. To ensure that the crowdsourced data is useable the teams running these projects must develop robust methodologies for identifying the most accurate transcriptions without know what is in the document (Deines et al. 2018). Others warn that the labor involved in creating and sustaining crowdsourced projects should not outweigh the time it would take staff to perform those tasks and that quality control and data clean-up should not become a larger task than the work off-set by crowdsourcing (Van Hyning 2019).

This overview has provided me with insights into the history of crowdsourcing in digital heritage and some of the goals, methods, benefits, and concerns. It also represents the beginning of myself thinking about the ways in which crowdsourcing may benefit my project and the work I will have to undertake if I choose to to do this work.

Below are some examples of digital heritage projects which have utilized crowdsourcing.

Sources: