Abstract Accepted for Presentation at DHNB Conference

2022-01-31 22:59

An abstract submitted to the Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022) Digital Humanities in Action was accepted for a presentation. Robert will discuss a toolkit developed in his New Speakers of Minority Languages project to collect linguistic data remotely over the web. Here's the abstract:

The COVID-19 pandemic effectively halted data-gathering activities in many disciplines. In linguistics, particularly the area of language documentation, researchers have been unable to conduct fieldwork without risking the health and safety of already vulnerable speech communities. Given that documentation projects generate records of highly endangered languages, the creation of primary-data sets to preserve and potentially revitalize these languages is particularly urgent. Although there is arguably no substitute for in-person fieldwork, COVID-19 has raised many questions about how to effectively employ our now widely-available technological infrastructure for generating data remotely.

Clearly technological developments introduced in response to the pandemic will continue to be useful whether or not COVID-19 ceases to hinder mobility. And at least in some research contexts, the availability of technology and know-how is sufficient to collect primary data remotely or in hybrid form. Recent/ongoing remote data collection in general linguistics can be categorized as either supervised or semi-supervised. Supervised data collection consists of “the usual” data collection activities (elicitation, translation, story telling, etc) over video-conferencing software, e.g. Zoom, (cf. Mannby 2021), or a combination of video conferencing and stimulus-display applications (cf. Leemann et al. 2020). In semi-supervised scenarios, researchers provide direction to individuals in the target-language community, who then return primary data either via existing general technological infrastructure, (e.g. WhatsApp (K. Rybka p.c. 2021)), or via dedicated software (Griscom 2020).

Unsupervised data collection methods have also been used for more specific tasks (i.e. dialectology, cf. Hinskens et al. 2021; Hasse et al. 2021), where sentence reading tasks, lexical selection tasks, and matched-guise tasks were crowdsourced via mobile apps. These apps were extremely successful in terms of the amount of data gathered, however, the tasks are overly specific, and furthermore, the apps themselves are (apparently) hidden behind closed-access licenses. This means that attempts at similar unsupervised data collection need to engage in the expensive undertaking of building infrastructure from scratch. It is clear that the need for generalized, open-access tools is opportune.

MoReDaT is just such a toolkit. It is a modular web app, developed for unsupervised remote collection of linguistic data, written primarily in Python’s Django framework. It will be (by the time of the DHNB meeting) available on an Open-Source license and distributed as a fully functional application. In this talk, I will first discuss the theoretical and practical issues surrounding remote data collection in linguistics and situate the development of MoReDaT within its broader research project. I will then present/demo the main modules of MoReDaT, which have been designed (a) to replicate the typical field tasks utilized by general linguists, irrespective of preexisting knowledge on the target language, and (b) to be fully customizable in terms of the the stimuli that can be utilized. I will conclude with a discussion about the promising future of remote data collection in linguistics; not only as a reactionary measure during COVID-19, but in the face of increasingly acute environmental costs and scarce research budgets, ‘going (partially) remote’ will become an effective means to continually source more data.


Griscom, Richard. 2020. “Mobilizing Metadata: Open Data Kit (ODK) for Language Resource Development in East Africa.” In Proceedings of the First Workshop on Resources for African Indigenous Languages, 31–35. Marseille, France: European Language Resources Association (ELRA). https://aclanthology.org/2020.rail-1.6.

Hasse, Anja, Sandro Bachmann, and Elvira Glaser. 2021. “Gschmöis – Crowdsourcing Grammatical Data of Swiss German.” Linguistics Vanguard 7 (s1). https://doi.org/10.1515/lingvan-2019-0026.

Hinskens, Frans, Stefan Grondelaers, and David van Leeuwen. 2021. “Sprekend Nederland, a Multi-Purpose Collection of Dutch Speech.” Linguistics Vanguard 7 (s1). https://doi.org/10.1515/lingvan-2019-0024.

Leemann, Adrian, Péter Jeszenszky, Carina Steiner, Melanie Studerus, and Jan Messerli. 2020. “Linguistic Fieldwork in a Pandemic: Supervised Data Collection Combining Smartphone Recordings and Videoconferencing.” Linguistics Vanguard 6 (s3). https://doi.org/10.1515/lingvan-2020-0061.

Mannby, Emil. 2021. “Linguistic e-fieldwork: how the Lingfil field-methods course survived a pandemic.” Presentation given at Fieldwork in Anthropology and Linguistics’ event for student fieldworkers dept. of Anthropology and dept. of Linguistics and Philology, Uppsala University. https://www.lingfil.uu.se/digitalAssets/926/c_926886-l_1-k_fal_for-students_1.pdf

More info to follow in a later post...