Contemplating Crowdsourcing in Digital Cultural Heritage – Cultural Heritage Informatics Initiative

In this blog post I will explore some of the views on the use of crowdsourcing for digital projects in the humanities. I am interested this this literature because for my future dissertation project I am considering crowdsourcing data from the local community. In general researchers and institutions can crowdsource project tasks through self-developed sites (https://www.citizenscience.gov/smithsonian-transcription-center/#) or through platforms that host projects (https://crowdsourced.micropasts.org/ & https://www.zooniverse.org/). While crowdsourcing is a broad term that describes a variety of tools are utilized across many disciplines and sectors, crowdsourcing in digital heritage has been defined in two ways:

an emerging form of engagement with cultural heritage that contributes towards a shared, significant goal or research area by asking the public to undertake tasks that cannot be done automatically, in an environment where the tasks, goals (or both) provide inherent rewards for participation” (Ridge 2012)
the harnessing of online activities and behaviour [sic] to aid in large-scale ventures such as tagging, commenting, rating, reviewing, text correcting, and the creation and uploading of content in a methodical, task-based fashion to improve the quality of, and widen access to, online collections (Terras 2016).

Researchers, museums, libraries, and archives working with digital cultural heritage collections have expanded their use of crowdsourced since 2006, when the term was coined. 2010-2011 marks a breakthrough period for the development of crowdsourcing projects in digital heritage, as several large institutions in Europe and the United States utilized crowdsourcing to engage audiences and perform digitization and transcription tasks (Terras 2016). When brought into digital humanities crowdsourcing had to be rethought, primarily through the long tradition of volunteership and public engagement at museums, archives, and libraries. For digital heritage projects:

crowdsourcing has not involved massive crowds, but rather relied on smaller cohorts of “super users” that contribute to projects for their own personal reasons (Terras 2016; Van Hyning 2019)
Crowdsourcing has not been a source of “free” or “cheap” labor, but rather a collaboration between external communties and the institution with a great deal of work performed by intuitions on the backend. For the humanities, crowdsourcing is valuable because it brings in new knowledge and increased meaningful collaboration with the public (Deines et al. 2018; Terras 2016).
Crowdsourcing has been about connecting institutions, researchers, and/or collections with a community that allows individuals to interact with and explore the historical record in a meaningful way (Terras 2016)

Crowdsourcing projects in digital heritage can be broken down into two general trends based on their goals and within these are several common tasks (Carletti et al. 2013):

Crowdsourcing projects that require the “crowd” to integrate/enrich/reconfigure existing institutional resources ask the public to contribute to:
- curation (e.g., social tagging, image selection, exhibition curation, classification); revision (e.g., transcription, correction); and location (e.g., artworks mapping, map matching, location storytelling).
Crowdsourcing projects that ask the “crowd” to create/contribute novel resource that ask the public to:
- share physical or digital objects, such as document private life (e.g., audio/video of intimate conversations); document historical events (e.g., family memorabilia); and enrich known locations (e.g., location-related storytelling)

In a review of crowdsourcing projects in the digital humanities, Melissa Terras (2016: 432) has shown that crowdsourcing is attached to issues of public engagement where project success demonstrates the benefits of engaging existing communities of interest and build projects “for, and involving, a wide audience.” The literature suggests that successful crowdsourcing projects should have a clear goal, state clearly the terms of use and use license of the data generated, tap into existing communities of interest, maintain long-term connections and communication with the contributors, listen to contributors’ suggestions with regards to workflow, interface, and instructions, and to make sure that the project is well-developed upon release ( Deines et al. 2018; Schreibman 2016; Van Hyning 2019).

My review of these studies show that researchers in digital humanities have argued for the use crowdsourcing because (Deines et al. 2018; Terras 2016; Van Hyning 2019):

People want to transcribe historic documents
It allows for researchers and institutions to build or engage with new group and communities
Goals can be achieved quicker than the institution could working alone
Provides projects with external knowledge, expertise, and interest
It improves the quality of data and improving the way data can be discovered
Provides researchers and institutions to gain insight into users opinions and desires by building a relationship with the community of interest
It shows the relevance and importance of the institution and its collections through high levels of public interest
It builds trust and loyalty to the institution
It encourage a sense of public ownership and responsibility towards heritage collections
There is pent-up knowledge in institution and pent-up expertise in the public
It allows members of the public to engage with content in ways that allow them to be authors of the historical record

Despite the advantages, crowdsourcing can bring to projects, some researchers who have studied and written on the use of crowdsourcing in digital heritage express concerns about quality of data and long-term sustainability. Other concerns center on a fundamental tension between researcher’s instinct to control context and authenticity and a desire to share access and promote usage of collection while others express concerns over the authority and accuracy of crowdsourced transcription (Van Hyning 2019). These include fears that a large quantity of poor quality work could crowd out better scholarship (Terras 2016: 442-443). There is particular concern with projects involving crowdsourced transcriptions, as it is not possible to ‘average the transcriptions’. To ensure that the crowdsourced data is useable the teams running these projects must develop robust methodologies for identifying the most accurate transcriptions without know what is in the document (Deines et al. 2018). Others warn that the labor involved in creating and sustaining crowdsourced projects should not outweigh the time it would take staff to perform those tasks and that quality control and data clean-up should not become a larger task than the work off-set by crowdsourcing (Van Hyning 2019).

This overview has provided me with insights into the history of crowdsourcing in digital heritage and some of the goals, methods, benefits, and concerns. It also represents the beginning of myself thinking about the ways in which crowdsourcing may benefit my project and the work I will have to undertake if I choose to to do this work.

Below are some examples of digital heritage projects which have utilized crowdsourcing.

https://transcription.si.edu/
https://whatamericaate.org/
https://genizalab.princeton.edu/crowdsourcing-and-the-humanities
- Scribes of the Cairo Geniza is a multilingual crowdsourcing project launched in 2017 to classify and transcribe manuscript fragments from a medieval Egyptian synagogue. An initiative led by the University of Pennsylvania Libraries and Zooniverse, the project harnesses the power of technology and people to decipher some of the most challenging fragments in the world.
https://ch.uni-leipzig.de/blog/crowdsourcing-class/
- CROWDSOURCING IN THE DIGITAL HUMANITIES – FIVE CASE STUDIES

Sources:

Carletti, L; Giannachi, G; Price, D; et al. (2013) Digital humanities and crowdsourcing: an exploration, paper presented at Museums and the Web 2013 https://ore.exeter.ac.uk/repository/bitstream/handle/10871/17763/Digital%20Humanities%20and%20Crowdsourcing%20-%20An%20Exploration.pdf?sequence=2
Deines, Nathaniel, Melissa Gill, Matthew Lincoln and Marissa Clifford
Ridge, Mia (2012) Frequently Asked Questions about crowdsourcing in cultural heritage. Blog. Open Objects, https://www.openobjects.org.uk/2012/06/frequently-asked-questions-about-crowdsourcing-in-cultural-heritage/
Terras, Melissa (2016) Crowdsourcing in the digital humanities. In Schreibman, S., Siemens, R., and Unsworth, J. (eds), A New Companion to Digital Humanities, (p. 420 – 439). Wiley-Blackwell.
Van Hyning, Victoria (2019) “Curating Crowds: A Review of Crowdsourcing Our Cultural Heritage.” DHQ: Digital Humanities Quarterly 2019 Volume 13 Number 1. http://www.digitalhumanities.org/dhq/vol/13/1/000410/000410.pdf