Drawing areal information from a corpus of noisy dialect data

Alfred Lameli, Elvira Glaser, Philipp Stöckle
{"title":"Drawing areal information from a corpus of noisy dialect data","authors":"Alfred Lameli, Elvira Glaser, Philipp Stöckle","doi":"10.1017/jlg.2020.4","DOIUrl":null,"url":null,"abstract":"Abstract This article is an analysis of linguistic survey data representing German dialects in Switzerland in 1933/34 based on the so-called Wenker sentences. The data are impressionistic in terms of applied phonetic transcriptions, which were produced by non-specialists using the Latin alphabet. Due to the lack of pre-defined standardization, the phonetic transcriptions are very heterogeneous. From a technical perspective, this leads to very noisy data, which is why the validity of the Wenker data in general and the Swiss Wenker data in particular has been questioned. Using methods from computational linguistics, we compare, for the first time, Wenker data with linguistic data collected at virtually the same time by linguistics professionals. Direct comparison with a sample from the published atlas of German-speaking Switzerland (SDS) reveals that despite the noisiness of the data, they nevertheless provide reliable information, e.g., in terms of the spatial structuring of Swiss dialects. The study is thus a successful pilot for other corpus-based studies dealing with unstructured Wenker data in other regions.","PeriodicalId":93207,"journal":{"name":"Journal of linguistic geography","volume":"8 1","pages":"31 - 48"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/jlg.2020.4","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of linguistic geography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/jlg.2020.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Abstract This article is an analysis of linguistic survey data representing German dialects in Switzerland in 1933/34 based on the so-called Wenker sentences. The data are impressionistic in terms of applied phonetic transcriptions, which were produced by non-specialists using the Latin alphabet. Due to the lack of pre-defined standardization, the phonetic transcriptions are very heterogeneous. From a technical perspective, this leads to very noisy data, which is why the validity of the Wenker data in general and the Swiss Wenker data in particular has been questioned. Using methods from computational linguistics, we compare, for the first time, Wenker data with linguistic data collected at virtually the same time by linguistics professionals. Direct comparison with a sample from the published atlas of German-speaking Switzerland (SDS) reveals that despite the noisiness of the data, they nevertheless provide reliable information, e.g., in terms of the spatial structuring of Swiss dialects. The study is thus a successful pilot for other corpus-based studies dealing with unstructured Wenker data in other regions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从嘈杂的方言数据语料库中提取区域信息
摘要本文分析了1933/34年瑞士德语方言的语言学调查数据。这些数据是由非专业人士使用拉丁字母制作的应用语音转录的印象式数据。由于缺乏预定义的标准化,语音转录非常异构。从技术角度来看,这会导致数据非常嘈杂,这就是为什么温克数据,特别是瑞士温克数据的有效性受到质疑的原因。使用计算语言学的方法,我们首次将温克数据与语言学专业人员几乎同时收集的语言数据进行了比较。与已出版的瑞士德语地图集(SDS)样本的直接比较表明,尽管数据很嘈杂,但它们仍然提供了可靠的信息,例如瑞士方言的空间结构。因此,该研究是其他地区处理非结构化Wenker数据的基于语料库的研究的成功试点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Corpus-based dialectometry with topic models Linguistic tug-of-war: regional perceptions of Ukrainian Detecting linguistic variation with geographic sampling Broad, strong, and soft: Using geospatial analysis to understand folk-linguistic terminology Perceptions of regional origin and social attributes of phonetic variants used in Iberian Spanish
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1