Low hanging fruit and the Boasian trilogy in digital lexicography of morphologically rich languages

E. Pankratz, Antti Arppe, Jordan Lachler
{"title":"Low hanging fruit and the Boasian trilogy in digital lexicography of morphologically rich languages","authors":"E. Pankratz, Antti Arppe, Jordan Lachler","doi":"10.7557/12.6441","DOIUrl":null,"url":null,"abstract":"Online lexicographical resources for the morphologically rich Indigenous languages in Canada use a wide range of strategies for conveying their language’s morphological system, i.e. how words are inflected and derived, which this paper illustrates in a survey of seventeen bilingual online resources. The strategies these resources employ boil down to two basic approaches to the underlying structure of the resource: 1) a lexical database, or 2) a computational model. Most resources we surveyed are constructed around lexical databases. These assume the word(form) as the basic unit, an assumption that makes it difficult to incorporate the language’s sub-word, morphological structure in full detail. However, one resource uses a computational morphological model to bring the language’s morphology into the core of the lexicon – this proved to be a “low-hanging fruit” in the application of language technology that had been accomplished within a reasonable time-frame, as has been advocated by Trond Trosterud. We discuss the value created and questions raised by this approach and argue that it successfully overcomes the traditional Boasian three-way partition of dictionary, grammar, and text, creating integrated language resources that meet the modern needs of low-resource endangered languages and their communities.","PeriodicalId":29976,"journal":{"name":"Nordlyd Tromso University Working Papers on Language Linguistics","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nordlyd Tromso University Working Papers on Language Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7557/12.6441","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Online lexicographical resources for the morphologically rich Indigenous languages in Canada use a wide range of strategies for conveying their language’s morphological system, i.e. how words are inflected and derived, which this paper illustrates in a survey of seventeen bilingual online resources. The strategies these resources employ boil down to two basic approaches to the underlying structure of the resource: 1) a lexical database, or 2) a computational model. Most resources we surveyed are constructed around lexical databases. These assume the word(form) as the basic unit, an assumption that makes it difficult to incorporate the language’s sub-word, morphological structure in full detail. However, one resource uses a computational morphological model to bring the language’s morphology into the core of the lexicon – this proved to be a “low-hanging fruit” in the application of language technology that had been accomplished within a reasonable time-frame, as has been advocated by Trond Trosterud. We discuss the value created and questions raised by this approach and argue that it successfully overcomes the traditional Boasian three-way partition of dictionary, grammar, and text, creating integrated language resources that meet the modern needs of low-resource endangered languages and their communities.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语态丰富的语言数字词典编纂中的低垂果实和博亚斯三部曲
加拿大土著语言的词法丰富,其在线词典资源使用了多种策略来传达其语言的词法系统,即单词的屈折变化和衍生方式。本文通过对17种双语在线资源的调查来说明这一点。这些资源使用的策略可以归结为资源底层结构的两种基本方法:1)词法数据库,或2)计算模型。我们调查的大多数资源都是围绕词法数据库构建的。它们以词(形式)为基本单位,这种假设使得很难将语言的子词、形态结构完整地结合起来。然而,有一种资源使用计算形态学模型将语言的形态学纳入词典的核心——正如Trond Trosterud所倡导的那样,在合理的时间框架内完成的语言技术应用中,这被证明是“容易实现的目标”。我们讨论了这种方法所创造的价值和提出的问题,并认为它成功地克服了传统的博亚式的词典、语法和文本三位一体的分割,创造了满足资源匮乏的濒危语言及其社区的现代需求的综合语言资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
25
审稿时长
38 weeks
期刊最新文献
Uttryck för VÄG med ackusativ singular i sydsamiska Åarjelsaemien gïele goh dïhte jillemes uralske gïele jïh akte gieltegs dotkemeobjeekte Partiklar i sørsamisk Er maskinoversetting fra nordsamisk nyttig for sørsamisk? Åvtebaakoe – Forord
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1