Every day the research community creates many transcriptions of historical manuscripts from a broad variety of collections and archives. These are important raw data, but are very time consuming to create. As they are rarely published and there is no platform to share them, these valuable transcriptions remain inaccessible to the public. Existing only in isolated data silos on local computers. Consequently, other researchers must begin at the same starting point as their predecessors: mining of data by creating their own transcription of the same document. Sometimes only after doing so, it is discovered, that the specific document does not even relate closely to the actual topic of interest.
Empowering the community
Our project provides the infrastructure to collect and share crowdsourced transcriptions. Our goal is to reduce redundant work and improve the transcription process. Everybody will be able to up- and download transcriptions. We execute little control over the content of the platform but rather pursue a crowd-driven approach. Whatever content is of interest to users can be added and shared.
The jigsaw of exploration – piece by piece
Our independence from digitized sources (people can upload digital copies or photos, but do not have to) allows for new ways of thinking about transcriptions. In this way, smaller archives which do not have the funds to digitize their collections, can still be represented well.
During the uploading process, users are asked to add metadata for the source they have transcribed in order to provide a rich set of additional information for each record. This allows collections to be explored in further depth and sources to be evaluated for relevance to a particular use. We also want to improve the network within the research community by building bridges between historians, data scientists, students, archives and citizen scientists.
Acting in concert
We believe that working together with other platforms and institutions is essential; for example for community building, sharing and attracting new users. Additionally, as our transcriptions can be linked with digitized manuscripts, we can provide valuable training data for data scientists interested in developing HTR-models (Handwritten Text Recognition) which can automatically transcribe other sources by the same scribe or scriptorium.
Project Duration
1. January 2020 – 30. June 2020