Linguist vs. Machine: Rapid Development of Finite-State Morphological Grammars

Special Interest Group on Computational Morphology and Phonology Workshop Pub Date : 2020-07-01 DOI:10.18653/v1/2020.sigmorphon-1.18

Sarah Beemer, Zak Boston, April Bukoski, Daniel Chen, P. Dickens, Andrew Gerlach, Torin Hopkins, Parth Anand Jawale, Chris Koski, Akanksha Malhotra, Piyush Mishra, S. Muradoglu, Lan Sang, Tyler Short, Sagarika Shreevastava, Eliza Spaulding, Testumichi Umada, Beilei Xiang, Changbing Yang, Mans Hulden

{"title":"Linguist vs. Machine: Rapid Development of Finite-State Morphological Grammars","authors":"Sarah Beemer, Zak Boston, April Bukoski, Daniel Chen, P. Dickens, Andrew Gerlach, Torin Hopkins, Parth Anand Jawale, Chris Koski, Akanksha Malhotra, Piyush Mishra, S. Muradoglu, Lan Sang, Tyler Short, Sagarika Shreevastava, Eliza Spaulding, Testumichi Umada, Beilei Xiang, Changbing Yang, Mans Hulden","doi":"10.18653/v1/2020.sigmorphon-1.18","DOIUrl":null,"url":null,"abstract":"Sequence-to-sequence models have proven to be highly successful in learning morphological inflection from examples as the series of SIGMORPHON/CoNLL shared tasks have shown. It is usually assumed, however, that a linguist working with inflectional examples could in principle develop a gold standard-level morphological analyzer and generator that would surpass a trained neural network model in accuracy of predictions, but that it may require significant amounts of human labor. In this paper, we discuss an experiment where a group of people with some linguistic training develop 25+ grammars as part of the shared task and weigh the cost/benefit ratio of developing grammars by hand. We also present tools that can help linguists triage difficult complex morphophonological phenomena within a language and hypothesize inflectional class membership. We conclude that a significant development effort by trained linguists to analyze and model morphophonological patterns are required in order to surpass the accuracy of neural models.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Special Interest Group on Computational Morphology and Phonology Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.sigmorphon-1.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Sequence-to-sequence models have proven to be highly successful in learning morphological inflection from examples as the series of SIGMORPHON/CoNLL shared tasks have shown. It is usually assumed, however, that a linguist working with inflectional examples could in principle develop a gold standard-level morphological analyzer and generator that would surpass a trained neural network model in accuracy of predictions, but that it may require significant amounts of human labor. In this paper, we discuss an experiment where a group of people with some linguistic training develop 25+ grammars as part of the shared task and weigh the cost/benefit ratio of developing grammars by hand. We also present tools that can help linguists triage difficult complex morphophonological phenomena within a language and hypothesize inflectional class membership. We conclude that a significant development effort by trained linguists to analyze and model morphophonological patterns are required in order to surpass the accuracy of neural models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

语言学家与机器:有限状态形态语法的快速发展

序列到序列模型已被证明在从示例中学习形态变化方面非常成功，如SIGMORPHON/CoNLL共享任务系列所示。然而，人们通常认为，研究屈折例子的语言学家原则上可以开发出一种金标准级别的形态分析器和生成器，其预测的准确性将超过经过训练的神经网络模型，但这可能需要大量的人力。在本文中，我们讨论了一个实验，在这个实验中，一群受过一定语言训练的人开发了25个以上的语法，作为共享任务的一部分，并权衡了手工开发语法的成本/收益比。我们还提供了一些工具，可以帮助语言学家在语言中分类困难的复杂词形音素现象，并假设屈折词类隶属关系。我们得出的结论是，为了超越神经模型的准确性，需要训练有素的语言学家在分析和建模词音模式方面做出重大的发展努力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Special Interest Group on Computational Morphology and Phonology Workshop

自引率

0.00%

发文量