Crossroads Corpus creation: Design and case study

Yearbook of the Poznan Linguistic Meeting Pub Date : 2017-12-20 DOI:10.1515/yplm-2017-0009

Abbie Hantgan-Sonko

{"title":"Crossroads Corpus creation: Design and case study","authors":"Abbie Hantgan-Sonko","doi":"10.1515/yplm-2017-0009","DOIUrl":null,"url":null,"abstract":"Abstract This paper illustrates a methodological approach to the design of an annotated corpus using a case study of phonetic convergences and divergences by multilingual speakers in southwestern Senegal’s Casamance region. The newly compiled corpus contains approximately 183,000 annotations of multilingual, spoken data, gathered by eight researchers over a ten year span using methods ranging from structured lexical elicitation in controlled contexts to naturally occurring, multilingual conversations. The area from which the data were collected consists of three villages and their primary languages, and yet many more contribute to the linguistic landscape. Detailed metadata inform analyses of variation, the context in which a speech act took place and between whom, the speakers’ linguistic repertoires, trajectories, and social networks, as well as the larger language context. A potential path for convergence or divergence that emerged during data collection and in building and searching the corpus is the crossroads in the phonetic production of word-initial velar plosives. Word-initial [k] emerges in one language where only [ɡ] is present in the other; the third utilizes both. The corpus design makes it feasible, not only to identify areas of accommodation, but to grasp the context, enabling a sociolinguistically informed analysis of the speakers’ linguistic behavior.","PeriodicalId":431433,"journal":{"name":"Yearbook of the Poznan Linguistic Meeting","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Yearbook of the Poznan Linguistic Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/yplm-2017-0009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Abstract This paper illustrates a methodological approach to the design of an annotated corpus using a case study of phonetic convergences and divergences by multilingual speakers in southwestern Senegal’s Casamance region. The newly compiled corpus contains approximately 183,000 annotations of multilingual, spoken data, gathered by eight researchers over a ten year span using methods ranging from structured lexical elicitation in controlled contexts to naturally occurring, multilingual conversations. The area from which the data were collected consists of three villages and their primary languages, and yet many more contribute to the linguistic landscape. Detailed metadata inform analyses of variation, the context in which a speech act took place and between whom, the speakers’ linguistic repertoires, trajectories, and social networks, as well as the larger language context. A potential path for convergence or divergence that emerged during data collection and in building and searching the corpus is the crossroads in the phonetic production of word-initial velar plosives. Word-initial [k] emerges in one language where only [ɡ] is present in the other; the third utilizes both. The corpus design makes it feasible, not only to identify areas of accommodation, but to grasp the context, enabling a sociolinguistically informed analysis of the speakers’ linguistic behavior.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

十字路口语料库创建:设计与案例研究

摘要本文通过对塞内加尔西南部卡萨芒斯地区多语使用者语音融合和差异的案例研究，阐述了一种方法方法来设计一个带注释的语料库。新编译的语料库包含大约183,000个多语言口语数据注释，由8位研究人员在10年的时间里收集，使用的方法从受控环境中的结构化词汇引出到自然发生的多语言对话。收集数据的地区包括三个村庄和他们的主要语言，但还有更多的村庄对语言景观做出了贡献。详细的元数据为变异分析提供了信息，包括言语行为发生的语境、说话者的语言技能、轨迹、社交网络以及更大的语言语境。在数据收集和语料库的构建和搜索过程中出现的趋同或分歧的潜在路径是单词起始元音爆破音的语音产生的十字路口。单词首字母[k]出现在一种语言中，而在另一种语言中只有[j]出现;第三种是两者兼而有之。语料库的设计使其变得可行，不仅可以确定适应的领域，而且可以掌握上下文，从而对说话者的语言行为进行社会语言学上的分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Yearbook of the Poznan Linguistic Meeting

自引率

0.00%

发文量

期刊最新文献

Semantic prosody of extended lexical units: A case study London calling (or cooling?): Feature theory, phonetic variation, and phonological change New vs. similar sound production accuracy: The uneven fight A critical look at partial acceptability in English and Polish Foreword to the special section