首页 > 最新文献

Special Interest Group on Computational Morphology and Phonology Workshop最新文献

英文 中文
SIGMORPHON–UniMorph 2023 Shared Task 0: Typologically Diverse Morphological Inflection 不同类型的形态学变化
Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.sigmorphon-1.13
Omer Goldman, Khuyagbaatar Batsuren, Salam Khalifa, Aryaman Arora, Garrett Nicolai, Reut Tsarfaty, Ekaterina Vylomova
The 2023 SIGMORPHON–UniMorph shared task on typologically diverse morphological inflection included a wide range of languages: 26 languages from 9 primary language families. The data this year was all lemma-split, to allow testing models’ generalization ability, and structured along the new hierarchical schema presented in (Batsuren et al., 2022). The systems submitted this year, 9 in number, showed ingenuity and innovativeness, including hard attention for explainability and bidirectional decoding. Special treatment was also given by many participants to the newly-introduced data in Japanese, due to the high abundance of unseen Kanji characters in its test set.
2023年SIGMORPHON-UniMorph共享任务涉及9个主要语系的26种语言。今年的数据都是引理分裂的,以便测试模型的泛化能力,并按照(Batsuren et al., 2022)中提出的新的分层模式进行结构化。今年提交的9个系统显示出独创性和创新性,包括对可解释性和双向解码的努力关注。由于测试集中有大量未见过的汉字,许多参与者对新引入的日文数据也给予了特殊的处理。
{"title":"SIGMORPHON–UniMorph 2023 Shared Task 0: Typologically Diverse Morphological Inflection","authors":"Omer Goldman, Khuyagbaatar Batsuren, Salam Khalifa, Aryaman Arora, Garrett Nicolai, Reut Tsarfaty, Ekaterina Vylomova","doi":"10.18653/v1/2023.sigmorphon-1.13","DOIUrl":"https://doi.org/10.18653/v1/2023.sigmorphon-1.13","url":null,"abstract":"The 2023 SIGMORPHON–UniMorph shared task on typologically diverse morphological inflection included a wide range of languages: 26 languages from 9 primary language families. The data this year was all lemma-split, to allow testing models’ generalization ability, and structured along the new hierarchical schema presented in (Batsuren et al., 2022). The systems submitted this year, 9 in number, showed ingenuity and innovativeness, including hard attention for explainability and bidirectional decoding. Special treatment was also given by many participants to the newly-introduced data in Japanese, due to the high abundance of unseen Kanji characters in its test set.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122427800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Using longest common subsequence and character models to predict word forms 使用最长公共子序列和字符模型来预测词形
Pub Date : 1900-01-01 DOI: 10.18653/v1/W16-2009
A. Sorokin
This paper presents an algorithm for automatic word forms inflection. We use the method of longest common subsequence to extract abstract paradigms from given pairs of basic and inflected word forms, as well as suffix and prefix features to predict this paradigm automatically. We elaborate this algorithm using combination of affix feature-based and character ngram models, which substantially enhances performance especially for the languages possessing nonlocal phenomena such as vowel harmony. Our system took part in SIGMORPHON 2016 Shared Task and took 3rd place in 17 of 30 subtasks and 4th place in 7 substasks among 7 participants.
本文提出了一种自动词形变形算法。我们使用最长公共子序列方法从给定的基本和屈折词形对中提取抽象范式,并使用后缀和前缀特征自动预测该范式。我们使用词缀特征模型和字符图模型的结合来详细阐述该算法,该算法大大提高了性能,特别是对于具有非局部现象(如元音和谐)的语言。我们的系统参加了SIGMORPHON 2016共享任务,在30个子任务中获得17个第三名,在7个参与者中获得7个子任务第四名。
{"title":"Using longest common subsequence and character models to predict word forms","authors":"A. Sorokin","doi":"10.18653/v1/W16-2009","DOIUrl":"https://doi.org/10.18653/v1/W16-2009","url":null,"abstract":"This paper presents an algorithm for automatic word forms inflection. We use the method of longest common subsequence to extract abstract paradigms from given pairs of basic and inflected word forms, as well as suffix and prefix features to predict this paradigm automatically. We elaborate this algorithm using combination of affix feature-based and character ngram models, which substantially enhances performance especially for the languages possessing nonlocal phenomena such as vowel harmony. Our system took part in SIGMORPHON 2016 Shared Task and took 3rd place in 17 of 30 subtasks and 4th place in 7 substasks among 7 participants.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131330208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
The SIGMORPHON 2022 Shared Task on Cross-lingual and Low-Resource Grapheme-to-Phoneme Conversion 跨语言和低资源字素到音素转换的SIGMORPHON 2022共享任务
Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.sigmorphon-1.27
Arya D. McCarthy, Jackson L. Lee, Alexandra DeLucia, Travis M. Bartley, M. Agarwal, Lucas F. E. Ashby, L. Signore, Cameron Gibson, R. Raff, Winston Wu
Grapheme-to-phoneme conversion is an important component in many speech technologies, but until recently there were no multilingual benchmarks for this task. The third iteration of the SIGMORPHON shared task on multilingual grapheme-to-phoneme conversion features many improvements from the previous year’s task (Ashby et al., 2021), including additional languages, three subtasks varying the amount of available resources, extensive quality assurance procedures, and automated error analyses. Three teams submitted a total of fifteen systems, at best achieving relative reductions of word error rate of 14% in the crosslingual subtask and 14% in the very-low resource subtask. The generally consistent result is that cross-lingual transfer substantially helps grapheme-to-phoneme modeling, but not to the same degree as in-language examples.
字素到音素的转换是许多语音技术的重要组成部分,但直到最近还没有多语言的基准。SIGMORPHON多语言字素到音素转换共享任务的第三次迭代比前一年的任务有许多改进(Ashby等人,2021),包括额外的语言、三个不同可用资源数量的子任务、广泛的质量保证程序和自动错误分析。三个团队总共提交了15个系统,最多在跨语言子任务中实现相对减少14%的单词错误率,在资源非常少的子任务中实现14%的错误率。普遍一致的结果是,跨语言迁移在很大程度上帮助了字素到音素的建模,但与语言内示例的程度不同。
{"title":"The SIGMORPHON 2022 Shared Task on Cross-lingual and Low-Resource Grapheme-to-Phoneme Conversion","authors":"Arya D. McCarthy, Jackson L. Lee, Alexandra DeLucia, Travis M. Bartley, M. Agarwal, Lucas F. E. Ashby, L. Signore, Cameron Gibson, R. Raff, Winston Wu","doi":"10.18653/v1/2023.sigmorphon-1.27","DOIUrl":"https://doi.org/10.18653/v1/2023.sigmorphon-1.27","url":null,"abstract":"Grapheme-to-phoneme conversion is an important component in many speech technologies, but until recently there were no multilingual benchmarks for this task. The third iteration of the SIGMORPHON shared task on multilingual grapheme-to-phoneme conversion features many improvements from the previous year’s task (Ashby et al., 2021), including additional languages, three subtasks varying the amount of available resources, extensive quality assurance procedures, and automated error analyses. Three teams submitted a total of fifteen systems, at best achieving relative reductions of word error rate of 14% in the crosslingual subtask and 14% in the very-low resource subtask. The generally consistent result is that cross-lingual transfer substantially helps grapheme-to-phoneme modeling, but not to the same degree as in-language examples.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129884903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Morphological Segmentation Can Improve Syllabification 形态切分可以改善音节化
Pub Date : 1900-01-01 DOI: 10.18653/v1/W16-2016
Garrett Nicolai, Lei Yao, Grzegorz Kondrak
Syllabification is sometimes influenced by morphological boundaries. We show that incorporating morphological information can improve the accuracy of orthographic syllabification in English and German. Surprisingly, unsupervised segmenters, such as Morfessor, can be more useful for this purpose than the supervised ones.
音节化有时受词素边界的影响。我们的研究表明,结合词形信息可以提高英语和德语正字法音节的准确性。令人惊讶的是,无监督的分词,比如教授,在这方面比有监督的分词更有用。
{"title":"Morphological Segmentation Can Improve Syllabification","authors":"Garrett Nicolai, Lei Yao, Grzegorz Kondrak","doi":"10.18653/v1/W16-2016","DOIUrl":"https://doi.org/10.18653/v1/W16-2016","url":null,"abstract":"Syllabification is sometimes influenced by morphological boundaries. We show that incorporating morphological information can improve the accuracy of orthographic syllabification in English and German. Surprisingly, unsupervised segmenters, such as Morfessor, can be more useful for this purpose than the supervised ones.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121354637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Morphotactics as Tier-Based Strictly Local Dependencies 作为基于层的严格局部依赖的形态策略
Pub Date : 1900-01-01 DOI: 10.18653/v1/W16-2019
Alëna Aksënova, T. Graf, S. Moradi
It is commonly accepted that morphological dependencies are finite-state in nature. We argue that the upper bound on morphological expressivity is much lower. Drawing on technical results from computational phonology, we show that a variety of morphotactic phenomena are tierbased strictly local and do not fall into weaker subclasses such as the strictly local or strictly piecewise languages. Since the tier-based strictly local languages are learnable in the limit from positive texts, this marks a first important step towards general machine learning algorithms for morphology. Furthermore, the limitation to tier-based strictly local languages explains typological gaps that are puzzling from a purely linguistic perspective.
人们普遍认为形态依赖在本质上是有限状态的。我们认为形态学表达性的上界要低得多。利用计算音系学的技术结果,我们表明各种形态现象是基于层的严格局部的,并且不属于较弱的子类,如严格局部或严格分段语言。由于基于层的严格局部语言在极限情况下可以从积极文本中学习,这标志着向通用机器学习形态学算法迈出了重要的第一步。此外,对基于层次的严格本地语言的限制解释了从纯粹语言学的角度来看令人困惑的类型学差距。
{"title":"Morphotactics as Tier-Based Strictly Local Dependencies","authors":"Alëna Aksënova, T. Graf, S. Moradi","doi":"10.18653/v1/W16-2019","DOIUrl":"https://doi.org/10.18653/v1/W16-2019","url":null,"abstract":"It is commonly accepted that morphological dependencies are finite-state in nature. We argue that the upper bound on morphological expressivity is much lower. Drawing on technical results from computational phonology, we show that a variety of morphotactic phenomena are tierbased strictly local and do not fall into weaker subclasses such as the strictly local or strictly piecewise languages. Since the tier-based strictly local languages are learnable in the limit from positive texts, this marks a first important step towards general machine learning algorithms for morphology. Furthermore, the limitation to tier-based strictly local languages explains typological gaps that are puzzling from a purely linguistic perspective.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122975938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Low-resource grapheme-to-phoneme mapping with phonetically-conditioned transfer 具有语音条件迁移的低资源字素到音素映射
Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.sigmorphon-1.29
Michael Hammond
In this paper we explore a very simple nonneural approach to mapping orthography to phonetic transcription in a low-resource context with transfer data from a related language. We start from a baseline system and focus our efforts on data augmentation. We make three principal moves. First, we start with an HMMbased system (Novak et al., 2012). Second, we augment our basic system by recombining legal substrings in restricted fashion (Ryan and Hulden, 2020). Finally, we limit our transfer data by only using training pairs where the phonetic form shares all bigrams with the target language.
在本文中,我们探索了一种非常简单的非神经方法来映射正字法到语音转录在低资源背景下与相关语言的传输数据。我们从一个基线系统开始,把精力集中在数据增强上。我们有三个主要动作。首先,我们从基于hmm的系统开始(Novak et al., 2012)。其次,我们通过以受限的方式重组合法子字符串来增强我们的基本系统(Ryan和Hulden, 2020)。最后,我们通过只使用语音形式与目标语言共享所有双字母的训练对来限制我们的迁移数据。
{"title":"Low-resource grapheme-to-phoneme mapping with phonetically-conditioned transfer","authors":"Michael Hammond","doi":"10.18653/v1/2023.sigmorphon-1.29","DOIUrl":"https://doi.org/10.18653/v1/2023.sigmorphon-1.29","url":null,"abstract":"In this paper we explore a very simple nonneural approach to mapping orthography to phonetic transcription in a low-resource context with transfer data from a related language. We start from a baseline system and focus our efforts on data augmentation. We make three principal moves. First, we start with an HMMbased system (Novak et al., 2012). Second, we augment our basic system by recombining legal substrings in restricted fashion (Ryan and Hulden, 2020). Finally, we limit our transfer data by only using training pairs where the phonetic form shares all bigrams with the target language.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127289786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Findings of the SIGMORPHON 2023 Shared Task on Interlinear Glossing SIGMORPHON 2023共享任务在行间光泽上的发现
Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.sigmorphon-1.20
Michael Ginn, Sarah Moeller, Alexis Palmer, Anna Stacey, Garrett Nicolai, Mans Hulden, Miikka Silfverberg
This paper presents the findings of the SIGMORPHON 2023 Shared Task on Interlinear Glossing. This first iteration of the shared task explores glossing of a set of six typologically diverse languages: Arapaho, Gitksan, Lezgi, Natügu, Tsez and Uspanteko. The shared task encompasses two tracks: a resource-scarce closed track and an open track, where participants are allowed to utilize external data resources. Five teams participated in the shared task. The winning team Tü-CL achieved a 23.99%-point improvement over a baseline RoBERTa system in the closed track and a 17.42%-point improvement in the open track.
本文介绍了SIGMORPHON 2023关于行间光泽的共享任务的研究结果。共同任务的第一次迭代探索了六种不同类型语言的注释:Arapaho, Gitksan, Lezgi, nat, Tsez和Uspanteko。共享任务包含两个轨道:资源稀缺的封闭轨道和允许参与者利用外部数据资源的开放轨道。五个小组参与了这个共同的任务。获胜队伍 - cl在封闭赛道上比基线RoBERTa系统提高23.99%,在开放赛道上提高17.42%。
{"title":"Findings of the SIGMORPHON 2023 Shared Task on Interlinear Glossing","authors":"Michael Ginn, Sarah Moeller, Alexis Palmer, Anna Stacey, Garrett Nicolai, Mans Hulden, Miikka Silfverberg","doi":"10.18653/v1/2023.sigmorphon-1.20","DOIUrl":"https://doi.org/10.18653/v1/2023.sigmorphon-1.20","url":null,"abstract":"This paper presents the findings of the SIGMORPHON 2023 Shared Task on Interlinear Glossing. This first iteration of the shared task explores glossing of a set of six typologically diverse languages: Arapaho, Gitksan, Lezgi, Natügu, Tsez and Uspanteko. The shared task encompasses two tracks: a resource-scarce closed track and an open track, where participants are allowed to utilize external data resources. Five teams participated in the shared task. The winning team Tü-CL achieved a 23.99%-point improvement over a baseline RoBERTa system in the closed track and a 17.42%-point improvement in the open track.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122815172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Ensembled Encoder-Decoder System for Interlinear Glossed Text 行间有光文本的集成编码器-解码器系统
Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.sigmorphon-1.23
Edith Coates
This paper presents my submission to Track 1 of the 2023 SIGMORPHON shared task on interlinear glossed text (IGT). There are a wide amount of techniques for building and training IGT models (see Moeller and Hulden, 2018; McMillan-Major, 2020; Zhao et al., 2020). I describe my ensembled sequence-to-sequence approach, perform experiments, and share my submission’s test-set accuracy. I also discuss future areas of research in low-resource token classification methods for IGT.
本文介绍了我对2023年SIGMORPHON关于行间光滑文本(IGT)共享任务的第1轨道的提交。有大量的技术用于构建和训练IGT模型(参见Moeller和Hulden, 2018;McMillan-Major, 2020;赵等,2020)。我描述了我的集成序列到序列方法,进行了实验,并分享了我提交的测试集的准确性。我还讨论了IGT的低资源令牌分类方法的未来研究领域。
{"title":"An Ensembled Encoder-Decoder System for Interlinear Glossed Text","authors":"Edith Coates","doi":"10.18653/v1/2023.sigmorphon-1.23","DOIUrl":"https://doi.org/10.18653/v1/2023.sigmorphon-1.23","url":null,"abstract":"This paper presents my submission to Track 1 of the 2023 SIGMORPHON shared task on interlinear glossed text (IGT). There are a wide amount of techniques for building and training IGT models (see Moeller and Hulden, 2018; McMillan-Major, 2020; Zhao et al., 2020). I describe my ensembled sequence-to-sequence approach, perform experiments, and share my submission’s test-set accuracy. I also discuss future areas of research in low-resource token classification methods for IGT.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126454721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SIGMORPHON–UniMorph 2023 Shared Task 0, Part 2: Cognitively Plausible Morphophonological Generalization in Korean 韩文语音学的认知似是而非
Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.sigmorphon-1.14
Canaan Breiss, Jinyoung Jo
This paper summarises data collection and curation for Part 2 of the 2023 SIGMORPHON-UniMorph Shared Task 0, which focused on modeling speaker knowledge and generalization of a pair of interacting phonological processes in Korean. We briefly describe how modeling the generalization task could be of interest to researchers in both Natural Language Processing and linguistics, and then summarise the traditional description of the phonological processes that are at the center of the modeling challenge. We then describe the criteria we used to select and code cases of process application in two Korean speech corpora, which served as the primary learning data. We also report the technical details of the experiment we carried out that served as the primary test data.
本文总结了2023年SIGMORPHON-UniMorph共享任务0第2部分的数据收集和整理,该部分的重点是建模说话人知识和推广一对相互作用的韩语语音过程。我们简要描述了自然语言处理和语言学研究人员如何对泛化任务进行建模,然后总结了传统的语音过程描述,这是建模挑战的核心。然后,我们描述了我们在两个韩语语音语料库中选择和编码过程应用案例的标准,这些语料库作为主要的学习数据。我们还报告了作为主要测试数据的实验的技术细节。
{"title":"SIGMORPHON–UniMorph 2023 Shared Task 0, Part 2: Cognitively Plausible Morphophonological Generalization in Korean","authors":"Canaan Breiss, Jinyoung Jo","doi":"10.18653/v1/2023.sigmorphon-1.14","DOIUrl":"https://doi.org/10.18653/v1/2023.sigmorphon-1.14","url":null,"abstract":"This paper summarises data collection and curation for Part 2 of the 2023 SIGMORPHON-UniMorph Shared Task 0, which focused on modeling speaker knowledge and generalization of a pair of interacting phonological processes in Korean. We briefly describe how modeling the generalization task could be of interest to researchers in both Natural Language Processing and linguistics, and then summarise the traditional description of the phonological processes that are at the center of the modeling challenge. We then describe the criteria we used to select and code cases of process application in two Korean speech corpora, which served as the primary learning data. We also report the technical details of the experiment we carried out that served as the primary test data.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134630502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linear Discriminative Learning: a competitive non-neural baseline for morphological inflection 线性判别学习:形态变化的竞争性非神经基线
Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.sigmorphon-1.16
Cheon-Yeong Jeong, Dominic Schmitz, Akhilesh Kakolu Ramarao, Anna Stein, Kevin Tang
This paper presents our submission to the SIGMORPHON 2023 task 2 of Cognitively Plausible Morphophonological Generalization in Korean. We implemented both Linear Discriminative Learning and Transformer models and found that the Linear Discriminative Learning model trained on a combination of corpus and experimental data showed the best performance with the overall accuracy of around 83%. We found that the best model must be trained on both corpus data and the experimental data of one particular participant. Our examination of speaker-variability and speaker-specific information did not explain why a particular participant combined well with the corpus data. We recommend Linear Discriminative Learning models as a future non-neural baseline system, owning to its training speed, accuracy, model interpretability and cognitive plausibility. In order to improve the model performance, we suggest using bigger data and/or performing data augmentation and incorporating speaker- and item-specifics considerably.
本文介绍了我们提交给SIGMORPHON 2023任务2的认知似是而非的韩国语词音概括。我们实现了线性判别学习模型和Transformer模型,发现在语料库和实验数据的组合上训练的线性判别学习模型表现出最好的性能,总体准确率约为83%。我们发现,最好的模型必须在语料库数据和一个特定参与者的实验数据上进行训练。我们对说话人变异性和说话人特定信息的研究并没有解释为什么一个特定的参与者与语料库数据结合得很好。我们推荐线性判别学习模型作为未来的非神经基线系统,因为它具有训练速度、准确性、模型可解释性和认知合理性。为了提高模型的性能,我们建议使用更大的数据和/或执行数据增强,并在很大程度上结合说话人和物品的细节。
{"title":"Linear Discriminative Learning: a competitive non-neural baseline for morphological inflection","authors":"Cheon-Yeong Jeong, Dominic Schmitz, Akhilesh Kakolu Ramarao, Anna Stein, Kevin Tang","doi":"10.18653/v1/2023.sigmorphon-1.16","DOIUrl":"https://doi.org/10.18653/v1/2023.sigmorphon-1.16","url":null,"abstract":"This paper presents our submission to the SIGMORPHON 2023 task 2 of Cognitively Plausible Morphophonological Generalization in Korean. We implemented both Linear Discriminative Learning and Transformer models and found that the Linear Discriminative Learning model trained on a combination of corpus and experimental data showed the best performance with the overall accuracy of around 83%. We found that the best model must be trained on both corpus data and the experimental data of one particular participant. Our examination of speaker-variability and speaker-specific information did not explain why a particular participant combined well with the corpus data. We recommend Linear Discriminative Learning models as a future non-neural baseline system, owning to its training speed, accuracy, model interpretability and cognitive plausibility. In order to improve the model performance, we suggest using bigger data and/or performing data augmentation and incorporating speaker- and item-specifics considerably.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"55 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132287091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Special Interest Group on Computational Morphology and Phonology Workshop
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1