Sequence comparison in computational historical linguistics

IF 2.1 0 LANGUAGE & LINGUISTICS Journal of Language Evolution Pub Date : 2018-07-01 DOI:10.1093/JOLE/LZY006
Johann-Mattis List, M. Walworth, Simon J. Greenhill, Tiago Tresoldi, Robert Forkel
{"title":"Sequence comparison in computational historical linguistics","authors":"Johann-Mattis List, M. Walworth, Simon J. Greenhill, Tiago Tresoldi, Robert Forkel","doi":"10.1093/JOLE/LZY006","DOIUrl":null,"url":null,"abstract":"With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multi-lingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual analysis can drastically speed-up the process of cognate detection. Furthermore, it allows us to get a quick overview on data which have not yet been intensively studied by experts. LingPy is a Python library which provides a large arsenal of routines for sequence comparison in historical linguistics. With LingPy, linguists can not only automatically search for cognates in lexical data, but they can also align the automatically identified words, and output them in various forms, which aim at facilitating manual inspection. In this tutorial, we will briefly introduce the basic concepts behind the algorithms employed by LingPy and then illustrate in concrete workflows how automatic sequence comparison can be applied to multi-lingual word lists. The goal is to provide the readers with all information they need to (1) carry out cognate detection and alignment analyses in LingPy, (2) select the appropriate algorithms for the appropriate task, (3) evaluate how well automatic cognate detection algorithms perform compared to experts, and (4) export their data into various formats useful for additional analyses or data sharing. While basic knowledge of the Python language is useful for all analyses, our tutorial is structured in such a way that scholars with basic knowledge of computing can follow through all steps as well.","PeriodicalId":37118,"journal":{"name":"Journal of Language Evolution","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/JOLE/LZY006","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Language Evolution","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/JOLE/LZY006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 40

Abstract

With increasing amounts of digitally available data from all over the world, manual annotation of cognates in multi-lingual word lists becomes more and more time-consuming in historical linguistics. Using available software packages to pre-process the data prior to manual analysis can drastically speed-up the process of cognate detection. Furthermore, it allows us to get a quick overview on data which have not yet been intensively studied by experts. LingPy is a Python library which provides a large arsenal of routines for sequence comparison in historical linguistics. With LingPy, linguists can not only automatically search for cognates in lexical data, but they can also align the automatically identified words, and output them in various forms, which aim at facilitating manual inspection. In this tutorial, we will briefly introduce the basic concepts behind the algorithms employed by LingPy and then illustrate in concrete workflows how automatic sequence comparison can be applied to multi-lingual word lists. The goal is to provide the readers with all information they need to (1) carry out cognate detection and alignment analyses in LingPy, (2) select the appropriate algorithms for the appropriate task, (3) evaluate how well automatic cognate detection algorithms perform compared to experts, and (4) export their data into various formats useful for additional analyses or data sharing. While basic knowledge of the Python language is useful for all analyses, our tutorial is structured in such a way that scholars with basic knowledge of computing can follow through all steps as well.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
计算历史语言学中的序列比较
随着来自世界各地的数字化数据的不断增加,历史语言学中多语言词表中同源词的手工标注变得越来越耗时。在人工分析之前,使用可用的软件包对数据进行预处理可以大大加快同源检测的过程。此外,它使我们能够对尚未被专家深入研究的数据进行快速概述。LingPy是一个Python库,它为历史语言学中的序列比较提供了大量例程。使用LingPy,语言学家不仅可以在词汇数据中自动搜索同源词,还可以对自动识别的词进行对齐,并以各种形式输出,以方便人工检查。在本教程中,我们将简要介绍LingPy使用的算法背后的基本概念,然后在具体的工作流中说明如何将自动序列比较应用于多语言单词列表。目标是为读者提供他们所需的所有信息(1)在LingPy中执行同源检测和对齐分析,(2)为适当的任务选择适当的算法,(3)评估自动同源检测算法与专家相比的表现如何,以及(4)将其数据导出为各种格式,用于其他分析或数据共享。虽然Python语言的基本知识对所有分析都很有用,但我们的教程的结构使具有基本计算知识的学者也可以遵循所有步骤。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Language Evolution
Journal of Language Evolution Social Sciences-Linguistics and Language
CiteScore
4.50
自引率
7.70%
发文量
8
期刊最新文献
Derivational morphology and suffixing bias on linguistic and nonlinguistic material Bayesian phylogenetic analysis of pitch-accent systems based on accentual class merger: a new method applied to Japanese dialects The evolution of evolutionary linguistics Evolutionary pathways of complexity in gender systems Correction to: The scientometric landscape of Evolang: A comprehensive database of the Evolang conference
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1