首页 > 最新文献

Special Interest Group on Computational Morphology and Phonology Workshop最新文献

英文 中文
Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection 低资源语言的集成自我训练:字素-音素转换和形态屈折
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.5
Xiang Yu, Ngoc Thang Vu, Jonas Kuhn
We present an iterative data augmentation framework, which trains and searches for an optimal ensemble and simultaneously annotates new training data in a self-training style. We apply this framework on two SIGMORPHON 2020 shared tasks: grapheme-to-phoneme conversion and morphological inflection. With very simple base models in the ensemble, we rank the first and the fourth in these two tasks. We show in the analysis that our system works especially well on low-resource languages.
我们提出了一个迭代的数据增强框架,该框架训练和搜索最优集合,同时以自我训练的方式注释新的训练数据。我们将该框架应用于两个SIGMORPHON 2020共享任务:字素到音素转换和形态变形。由于集合中基础模型非常简单,我们在这两个任务中分别获得了第一名和第四名。我们在分析中表明,我们的系统在低资源语言上工作得特别好。
{"title":"Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection","authors":"Xiang Yu, Ngoc Thang Vu, Jonas Kuhn","doi":"10.18653/v1/2020.sigmorphon-1.5","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.5","url":null,"abstract":"We present an iterative data augmentation framework, which trains and searches for an optimal ensemble and simultaneously annotates new training data in a self-training style. We apply this framework on two SIGMORPHON 2020 shared tasks: grapheme-to-phoneme conversion and morphological inflection. With very simple base models in the ensemble, we rank the first and the fourth in these two tasks. We show in the analysis that our system works especially well on low-resource languages.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126106904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection SIGMORPHON 2020共享任务0:类型学上多样的形态变化
Pub Date : 2020-06-20 DOI: 10.18653/v1/2020.sigmorphon-1.1
Ekaterina Vylomova, Jennifer C. White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, E. Ponti, R. Maudslay, Ran Zmigrod, Josef Valvoda, S. Toldova, Francis M. Tyers, E. Klyachko, I. Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden
A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems’ ability to generalize across typologically distinct languages, many of which are low resource. Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages. A total of 22 systems (19 neural) from 10 teams were submitted to the task. All four winning systems were neural (two monolingual transformers and two massively multilingual RNN-based models with gated attention). Most teams demonstrate utility of data hallucination and augmentation, ensembles, and multilingual training for low-resource languages. Non-neural learners and manually designed grammars showed competitive and even superior performance on some languages (such as Ingrian, Tajik, Tagalog, Zarma, Lingala), especially with very limited data. Some language families (Afro-Asiatic, Niger-Congo, Turkic) were relatively easy for most systems and achieved over 90% mean accuracy while others were more challenging.
自然语言处理(NLP)的一个广泛目标是开发一个能够处理任何自然语言的系统。然而,大多数系统仅使用一种语言(如英语)的数据开发。SIGMORPHON 2020关于形态反射的共享任务旨在研究系统在不同类型语言之间进行泛化的能力,其中许多语言资源匮乏。系统的开发使用了45种语言和5个语族的数据,并对另外45种语言和10个语族(总共13个)的数据进行了微调,并对所有90种语言进行了评估。共有来自10个团队的22个系统(19个神经系统)被提交给该任务。所有四个赢得系统神经(两个单语变压器和两个大规模多语种RNN-based模型封闭的关注)。大多数团队演示了对低资源语言的数据幻觉和增强、集成和多语言训练的效用。非神经学习器和人工设计的语法在某些语言(如英格里亚语、塔吉克语、他加洛语、扎尔马语、林加拉语)上表现出竞争力,甚至更好,特别是在数据非常有限的情况下。一些语系(亚非语系、尼日尔-刚果语系、突厥语系)对大多数系统来说相对容易,平均准确率达到90%以上,而其他语系则更具挑战性。
{"title":"SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection","authors":"Ekaterina Vylomova, Jennifer C. White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, E. Ponti, R. Maudslay, Ran Zmigrod, Josef Valvoda, S. Toldova, Francis M. Tyers, E. Klyachko, I. Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden","doi":"10.18653/v1/2020.sigmorphon-1.1","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.1","url":null,"abstract":"A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems’ ability to generalize across typologically distinct languages, many of which are low resource. Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages. A total of 22 systems (19 neural) from 10 teams were submitted to the task. All four winning systems were neural (two monolingual transformers and two massively multilingual RNN-based models with gated attention). Most teams demonstrate utility of data hallucination and augmentation, ensembles, and multilingual training for low-resource languages. Non-neural learners and manually designed grammars showed competitive and even superior performance on some languages (such as Ingrian, Tajik, Tagalog, Zarma, Lingala), especially with very limited data. Some language families (Afro-Asiatic, Niger-Congo, Turkic) were relatively easy for most systems and achieved over 90% mean accuracy while others were more challenging.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124253652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
In search of isoglosses: continuous and discrete language embeddings in Slavic historical phonology 寻找同音损失:斯拉夫历史音韵学中连续和离散的语言嵌入
Pub Date : 2020-05-27 DOI: 10.18653/v1/2020.sigmorphon-1.28
C. Cathcart, Florian Wandl
This paper investigates the ability of neural network architectures to effectively learn diachronic phonological generalizations in amultilingual setting. We employ models using three different types of language embedding (dense, sigmoid, and straight-through). We find that the Straight-Through model out-performs the other two in terms of accuracy, but the Sigmoid model’s language embeddings show the strongest agreement with the traditional subgrouping of the Slavic languages. We find that the Straight-Through model has learned coherent, semi-interpretable information about sound change, and outline directions for future research.
本文研究了神经网络结构在多语言环境下有效学习历时语音概括的能力。我们使用了三种不同类型的语言嵌入模型(密集、s形和直通)。我们发现直通式模型在准确性方面优于其他两种模型,但Sigmoid模型的语言嵌入显示出与斯拉夫语言的传统子组最强烈的一致性。我们发现直通式模型已经学习了有关声音变化的连贯的、半可解释的信息,并概述了未来研究的方向。
{"title":"In search of isoglosses: continuous and discrete language embeddings in Slavic historical phonology","authors":"C. Cathcart, Florian Wandl","doi":"10.18653/v1/2020.sigmorphon-1.28","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.28","url":null,"abstract":"This paper investigates the ability of neural network architectures to effectively learn diachronic phonological generalizations in amultilingual setting. We employ models using three different types of language embedding (dense, sigmoid, and straight-through). We find that the Straight-Through model out-performs the other two in terms of accuracy, but the Sigmoid model’s language embeddings show the strongest agreement with the traditional subgrouping of the Slavic languages. We find that the Straight-Through model has learned coherent, semi-interpretable information about sound change, and outline directions for future research.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115690591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The IMS–CUBoulder System for the SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion 无监督形态范式完成SIGMORPHON 2020共享任务的IMS-CUBoulder系统
Pub Date : 2020-05-25 DOI: 10.18653/v1/2020.sigmorphon-1.9
Manuel Mager, Katharina Kann
In this paper, we present the systems of the University of Stuttgart IMS and the University of Colorado Boulder (IMS--CUBoulder) for SIGMORPHON 2020 Task 2 on unsupervised morphological paradigm completion (Kann et al., 2020). The task consists of generating the morphological paradigms of a set of lemmas, given only the lemmas themselves and unlabeled text. Our proposed system is a modified version of the baseline introduced together with the task. In particular, we experiment with substituting the inflection generation component with an LSTM sequence-to-sequence model and an LSTM pointer-generator network. Our pointer-generator system obtains the best score of all seven submitted systems on average over all languages, and outperforms the official baseline, which was best overall, on Bulgarian and Kannada.
在本文中,我们介绍了斯图加特大学IMS和科罗拉多大学博尔德分校(IMS- CUBoulder)用于SIGMORPHON 2020任务2的无监督形态范式完成的系统(Kann等人,2020)。该任务包括生成一组引理的形态范式,只给出引理本身和未标记的文本。我们建议的系统是与任务一起引入的基线的修改版本。特别是,我们尝试用LSTM序列到序列模型和LSTM指针生成器网络代替拐点生成组件。在所有语言中,我们的指针生成器系统在所有提交的七个系统中平均得分最高,并且在保加利亚语和卡纳达语方面的表现优于官方基准,而官方基准总体上是最好的。
{"title":"The IMS–CUBoulder System for the SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion","authors":"Manuel Mager, Katharina Kann","doi":"10.18653/v1/2020.sigmorphon-1.9","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.9","url":null,"abstract":"In this paper, we present the systems of the University of Stuttgart IMS and the University of Colorado Boulder (IMS--CUBoulder) for SIGMORPHON 2020 Task 2 on unsupervised morphological paradigm completion (Kann et al., 2020). The task consists of generating the morphological paradigms of a set of lemmas, given only the lemmas themselves and unlabeled text. Our proposed system is a modified version of the baseline introduced together with the task. In particular, we experiment with substituting the inflection generation component with an LSTM sequence-to-sequence model and an LSTM pointer-generator network. Our pointer-generator system obtains the best score of all seven submitted systems on average over all languages, and outperforms the official baseline, which was best overall, on Bulgarian and Kannada.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130824490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Induced Inflection-Set Keyword Search in Speech 语音诱导屈折集关键字搜索
Pub Date : 2019-10-27 DOI: 10.18653/v1/2020.sigmorphon-1.25
Oliver Adams, Matthew Wiesner, J. Trmal, Garrett Nicolai, David Yarowsky
We investigate the problem of searching for a lexeme-set in speech by searching for its inflectional variants. Experimental results indicate how lexeme-set search performance changes with the number of hypothesized inflections, while ablation experiments highlight the relative importance of different components in the lexeme-set search pipeline and the value of using curated inflectional paradigms. We provide a recipe and evaluation set for the community to use as an extrinsic measure of the performance of inflection generation approaches.
我们研究了通过搜索语音的屈折变体来搜索词素集的问题。实验结果表明了词素集搜索性能随假设屈折数的变化而变化,而消融实验则强调了词素集搜索管道中不同成分的相对重要性以及使用精选屈折范式的价值。我们为社区提供了一个配方和评估集,作为拐点生成方法性能的外在度量。
{"title":"Induced Inflection-Set Keyword Search in Speech","authors":"Oliver Adams, Matthew Wiesner, J. Trmal, Garrett Nicolai, David Yarowsky","doi":"10.18653/v1/2020.sigmorphon-1.25","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.25","url":null,"abstract":"We investigate the problem of searching for a lexeme-set in speech by searching for its inflectional variants. Experimental results indicate how lexeme-set search performance changes with the number of hypothesized inflections, while ablation experiments highlight the relative importance of different components in the lexeme-set search pipeline and the value of using curated inflectional paradigms. We provide a recipe and evaluation set for the community to use as an extrinsic measure of the performance of inflection generation approaches.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133354661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Morphological reinflection with convolutional neural networks 基于卷积神经网络的形态反射
Pub Date : 2016-08-11 DOI: 10.18653/v1/W16-2003
Robert Östling
We present a system for morphological reinflection based on an encoder-decoder neural network model with extra convolutional layers. In spite of its simplicity, the method performs reasonably well on all the languages of the SIGMORPHON 2016 shared task, particularly for the most challenging problem of limited-resources reinflection (track 2, task 3). We also find that using only convolution achieves surprisingly good results in this task, surpassing the accuracy of our encoder-decoder model for several languages.
我们提出了一个基于具有额外卷积层的编码器-解码器神经网络模型的形态反射系统。尽管它很简单,但该方法在SIGMORPHON 2016共享任务的所有语言上都表现得相当好,特别是对于最具挑战性的有限资源反射问题(轨道2,任务3)。我们还发现,仅使用卷积在该任务中取得了惊人的好结果,超过了我们对几种语言的编码器-解码器模型的精度。
{"title":"Morphological reinflection with convolutional neural networks","authors":"Robert Östling","doi":"10.18653/v1/W16-2003","DOIUrl":"https://doi.org/10.18653/v1/W16-2003","url":null,"abstract":"We present a system for morphological reinflection based on an encoder-decoder neural network model with extra convolutional layers. In spite of its simplicity, the method performs reasonably well on all the languages of the SIGMORPHON 2016 shared task, particularly for the most challenging problem of limited-resources reinflection (track 2, task 3). We also find that using only convolution achieves surprisingly good results in this task, surpassing the accuracy of our encoder-decoder model for several languages.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114534700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Automatic Detection of Intra-Word Code-Switching 字内码交换的自动检测
Pub Date : 2016-08-11 DOI: 10.18653/v1/W16-2013
Dong Nguyen, L. Cornips
Many people are multilingual and they may draw from multiple language varieties when writing their messages. This paper is a first step towards analyzing and detecting code-switching within words. We first segment words into smaller units. Then, words are identified that are composed of sequences of subunits associated with different languages. We demonstrate our method on Twitter data in which both Dutch and dialect varieties labeled as Limburgish, a minority language, are used.
许多人会说多种语言,他们在写信息时可能会使用多种语言。本文是分析和检测词内语码转换的第一步。我们首先把单词分成更小的单位。然后,识别由与不同语言相关的亚单位序列组成的单词。我们在Twitter数据上演示了我们的方法,其中使用了荷兰语和标记为林堡语(一种少数民族语言)的方言变体。
{"title":"Automatic Detection of Intra-Word Code-Switching","authors":"Dong Nguyen, L. Cornips","doi":"10.18653/v1/W16-2013","DOIUrl":"https://doi.org/10.18653/v1/W16-2013","url":null,"abstract":"Many people are multilingual and they may draw from multiple language varieties when writing their messages. This paper is a first step towards analyzing and detecting code-switching within words. We first segment words into smaller units. Then, words are identified that are composed of sequences of subunits associated with different languages. We demonstrate our method on Twitter data in which both Dutch and dialect varieties labeled as Limburgish, a minority language, are used.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"52 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121016308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Towards a Formal Representation of Components of German Compounds 德语复合词成分的形式表示研究
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2017
Thierry Declerck, P. Lendvai
This paper presents an approach for the formal representation of compo- nents in German compounds. We as- sume that such a formal representa- tion will support the segmentation and analysis of unseen compounds that feature components already seen in other compounds. An extensive lan- guage resource that explicitly codes components of compounds is Ger- maNet, a lexical semantic network for German. We summarize the Ger- maNet approach to the description of compounds, discussing some of its shortcomings. Our proposed exten- sion of this representation builds on the lemon lexicon model for ontolo- gies, established by the W3C Ontol- ogy Lexicon Community Group.
本文提出了一种德语复合词中组成成分的形式表示方法。我们认为,这种形式化的表示将支持对未见化合物的分割和分析,这些化合物的特征成分已经在其他化合物中看到。一个广泛的语言资源是德语词汇语义网络Ger- maNet,它明确地对复合词的成分进行编码。我们总结了Ger- maNet方法对化合物的描述,讨论了它的一些缺点。我们提出的对这种表示的扩展是建立在W3C本体词典社区组建立的本体词典模型的基础上的。
{"title":"Towards a Formal Representation of Components of German Compounds","authors":"Thierry Declerck, P. Lendvai","doi":"10.18653/v1/W16-2017","DOIUrl":"https://doi.org/10.18653/v1/W16-2017","url":null,"abstract":"This paper presents an approach for the formal representation of compo- nents in German compounds. We as- sume that such a formal representa- tion will support the segmentation and analysis of unseen compounds that feature components already seen in other compounds. An extensive lan- guage resource that explicitly codes components of compounds is Ger- maNet, a lexical semantic network for German. We summarize the Ger- maNet approach to the description of compounds, discussing some of its shortcomings. Our proposed exten- sion of this representation builds on the lemon lexicon model for ontolo- gies, established by the W3C Ontol- ogy Lexicon Community Group.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131293531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MED: The LMU System for the SIGMORPHON 2016 Shared Task on Morphological Reinflection 形态学反射SIGMORPHON 2016共享任务的LMU系统
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2010
Katharina Kann, Hinrich Schütze
This paper presents MED, the main system of the LMU team for the SIGMORPHON 2016 Shared Task on Morphological Reinflection as well as an extended analysis of how different design choices contribute to the final performance. We model the task of morphological reinflection using neural encoder-decoder models together with an encoding of the input as a single sequence of the morphological tags of the source and target form as well as the sequence of letters of the source form. The Shared Task consists of three subtasks, three different tracks and covers 10 different languages to encourage the use of language-independent approaches. MED was the system with the overall best performance, demonstrating our method generalizes well for the low-resource setting of the SIGMORPHON 2016 Shared Task.
本文介绍了LMU团队为SIGMORPHON 2016形态学反射共享任务设计的主要系统MED,并对不同的设计选择如何影响最终性能进行了扩展分析。我们使用神经编码器-解码器模型对形态学反射任务进行建模,并将输入编码为源形式和目标形式的形态学标签的单一序列以及源形式的字母序列。共享任务由三个子任务、三个不同的轨道和涵盖10种不同的语言组成,以鼓励使用与语言无关的方法。MED是整体性能最好的系统,表明我们的方法可以很好地泛化SIGMORPHON 2016共享任务的低资源设置。
{"title":"MED: The LMU System for the SIGMORPHON 2016 Shared Task on Morphological Reinflection","authors":"Katharina Kann, Hinrich Schütze","doi":"10.18653/v1/W16-2010","DOIUrl":"https://doi.org/10.18653/v1/W16-2010","url":null,"abstract":"This paper presents MED, the main system of the LMU team for the SIGMORPHON 2016 Shared Task on Morphological Reinflection as well as an extended analysis of how different design choices contribute to the final performance. We model the task of morphological reinflection using neural encoder-decoder models together with an encoding of the input as a single sequence of the morphological tags of the source and target form as well as the sequence of letters of the source form. The Shared Task consists of three subtasks, three different tracks and covers 10 different languages to encourage the use of language-independent approaches. MED was the system with the overall best performance, demonstrating our method generalizes well for the low-resource setting of the SIGMORPHON 2016 Shared Task.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126759944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
Evaluating Sequence Alignment for Learning Inflectional Morphology 评估序列比对学习屈折形态
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2008
David L. King
This work examines CRF-based sequence alignment models for learning natural language morphology. Although these systems have performed well for a limited number of languages, this work, as part of the SIGMORPHON 2016 shared task, specifically sets out to determine whether these models handle non-concatenative morphology as well as previous work might suggest. Results, however, indicate a strong preference for simpler, concatenative morphological systems.
这项工作研究了基于crf的序列比对模型,用于学习自然语言形态学。尽管这些系统在有限数量的语言中表现良好,但作为SIGMORPHON 2016共享任务的一部分,这项工作特别着手确定这些模型是否能像以前的工作那样处理非连接形态学。然而,结果表明,强烈倾向于简单的,连接的形态系统。
{"title":"Evaluating Sequence Alignment for Learning Inflectional Morphology","authors":"David L. King","doi":"10.18653/v1/W16-2008","DOIUrl":"https://doi.org/10.18653/v1/W16-2008","url":null,"abstract":"This work examines CRF-based sequence alignment models for learning natural language morphology. Although these systems have performed well for a limited number of languages, this work, as part of the SIGMORPHON 2016 shared task, specifically sets out to determine whether these models handle non-concatenative morphology as well as previous work might suggest. Results, however, indicate a strong preference for simpler, concatenative morphological systems.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"352 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122294913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Special Interest Group on Computational Morphology and Phonology Workshop
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1