S. Adolphs, Dawn Knight, Catherine Smith, Dominic T. Price
{"title":"Crowdsourcing formulaic phrases: towards a new type of spoken corpus","authors":"S. Adolphs, Dawn Knight, Catherine Smith, Dominic T. Price","doi":"10.3366/COR.2020.0192","DOIUrl":null,"url":null,"abstract":"Spoken corpora have traditionally been assembled through careful recording and transcription of discourse events, a process which is both labour intensive and often restrictive in terms of breadth of recording contexts available. To overcome these potential challenges in spoken corpus compilation, we explore the use of crowdsourcing of language samples that are reported by participants. We investigate the level of precision and recall of the ‘crowd’ when it comes to reporting language they have heard in certain contexts, alongside the use of a crowdsourcing toolkit to facilitate this task. As a focussing device for the selection of reported language samples, we draw on the use of formulaic phrases as an area that has received considerable attention by corpus linguists and applied linguists over the years. We argue that while studying reported language usage instead of actual language-in-use is problematic for several reasons, many of which have been highlighted in the literature on Discourse Completion Tasks ( Schauer and Adolphs, 2006 ), our suggested approach presents several advantages and opportunities for spoken corpus linguistics.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Corpora","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3366/COR.2020.0192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 4
Abstract
Spoken corpora have traditionally been assembled through careful recording and transcription of discourse events, a process which is both labour intensive and often restrictive in terms of breadth of recording contexts available. To overcome these potential challenges in spoken corpus compilation, we explore the use of crowdsourcing of language samples that are reported by participants. We investigate the level of precision and recall of the ‘crowd’ when it comes to reporting language they have heard in certain contexts, alongside the use of a crowdsourcing toolkit to facilitate this task. As a focussing device for the selection of reported language samples, we draw on the use of formulaic phrases as an area that has received considerable attention by corpus linguists and applied linguists over the years. We argue that while studying reported language usage instead of actual language-in-use is problematic for several reasons, many of which have been highlighted in the literature on Discourse Completion Tasks ( Schauer and Adolphs, 2006 ), our suggested approach presents several advantages and opportunities for spoken corpus linguistics.