首页 > 最新文献

Special Interest Group on Computational Morphology and Phonology Workshop最新文献

英文 中文
Colexifications for Bootstrapping Cross-lingual Datasets: The Case of Phonology, Concreteness, and Affectiveness 自举跨语言数据集的共化:音系、具体和情感的案例
Pub Date : 2023-06-05 DOI: 10.48550/arXiv.2306.02646
Yiyi Chen, Johannes Bjerva
Colexification refers to the linguistic phenomenon where a single lexical form is used to convey multiple meanings. By studying cross-lingual colexifications, researchers have gained valuable insights into fields such as psycholinguistics and cognitive sciences (Jack- son et al., 2019; Xu et al., 2020; Karjus et al., 2021; Schapper and Koptjevskaja-Tamm, 2022; François, 2022). While several multilingual colexification datasets exist, there is untapped potential in using this information to bootstrap datasets across such semantic features. In this paper, we aim to demonstrate how colexifications can be leveraged to create such cross-lingual datasets. We showcase curation procedures which result in a dataset covering 142 languages across 21 language families across the world. The dataset includes ratings of concreteness and affectiveness, mapped with phonemes and phonological features. We further analyze the dataset along different dimensions to demonstrate potential of the proposed procedures in facilitating further interdisciplinary research in psychology, cognitive science, and multilingual natural language processing (NLP). Based on initial investigations, we observe that i) colexifications that are closer in concreteness/affectiveness are more likely to colexify ; ii) certain initial/last phonemes are significantly correlated with concreteness/affectiveness intra language families, such as /k/ as the initial phoneme in both Turkic and Tai-Kadai correlated with concreteness, and /p/ in Dravidian and Sino-Tibetan correlated with Valence; iii) the type-to-token ratio (TTR) of phonemes are positively correlated with concreteness across several language families, while the length of phoneme segments are negatively correlated with concreteness; iv) certain phonological features are negatively correlated with concreteness across languages. The dataset is made public online for further research.
同音化是指用一种词汇形式来表达多种意义的语言现象。通过研究跨语言共化,研究人员在心理语言学和认知科学等领域获得了宝贵的见解(Jack- son等人,2019;Xu et al., 2020;Karjus et al., 2021;Schapper and Koptjevskaja-Tamm, 2022;FranA§ois, 2022)。虽然存在一些多语言共化数据集,但使用这些信息来跨这些语义特征引导数据集的潜力尚未开发。在本文中,我们的目标是演示如何利用共化来创建这样的跨语言数据集。我们展示了管理程序,它产生了一个涵盖全球21个语系142种语言的数据集。该数据集包括具体和情感的评级,与音素和音系特征映射。我们进一步分析了不同维度的数据集,以证明所提出的程序在促进心理学、认知科学和多语言自然语言处理(NLP)的进一步跨学科研究方面的潜力。根据初步调查,我们观察到i)在具体/情感上更接近的共色现象更容易共色;ii)语系内某些起始音位/末音位与具体性/情感性显著相关,如突厥语和泰卡代语的起始音位/k/与具体性相关,德拉威语和汉藏语的起始音位/p/与价性相关;(3)在多个语系中,音素的类型标记比(TTR)与具体性呈正相关,而音素段长度与具体性呈负相关;某些语音特征与跨语言的具体性负相关。该数据集在网上公开,以供进一步研究。
{"title":"Colexifications for Bootstrapping Cross-lingual Datasets: The Case of Phonology, Concreteness, and Affectiveness","authors":"Yiyi Chen, Johannes Bjerva","doi":"10.48550/arXiv.2306.02646","DOIUrl":"https://doi.org/10.48550/arXiv.2306.02646","url":null,"abstract":"Colexification refers to the linguistic phenomenon where a single lexical form is used to convey multiple meanings. By studying cross-lingual colexifications, researchers have gained valuable insights into fields such as psycholinguistics and cognitive sciences (Jack- son et al., 2019; Xu et al., 2020; Karjus et al., 2021; Schapper and Koptjevskaja-Tamm, 2022; François, 2022). While several multilingual colexification datasets exist, there is untapped potential in using this information to bootstrap datasets across such semantic features. In this paper, we aim to demonstrate how colexifications can be leveraged to create such cross-lingual datasets. We showcase curation procedures which result in a dataset covering 142 languages across 21 language families across the world. The dataset includes ratings of concreteness and affectiveness, mapped with phonemes and phonological features. We further analyze the dataset along different dimensions to demonstrate potential of the proposed procedures in facilitating further interdisciplinary research in psychology, cognitive science, and multilingual natural language processing (NLP). Based on initial investigations, we observe that i) colexifications that are closer in concreteness/affectiveness are more likely to colexify ; ii) certain initial/last phonemes are significantly correlated with concreteness/affectiveness intra language families, such as /k/ as the initial phoneme in both Turkic and Tai-Kadai correlated with concreteness, and /p/ in Dravidian and Sino-Tibetan correlated with Valence; iii) the type-to-token ratio (TTR) of phonemes are positively correlated with concreteness across several language families, while the length of phoneme segments are negatively correlated with concreteness; iv) certain phonological features are negatively correlated with concreteness across languages. The dataset is made public online for further research.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123786843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Transliteration for Cross-Lingual Morphological Inflection 跨语言形态变化的音译
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.22
Nikitha Murikinati, Antonios Anastasopoulos, Graham Neubig
Cross-lingual transfer between typologically related languages has been proven successful for the task of morphological inflection. However, if the languages do not share the same script, current methods yield more modest improvements. We explore the use of transliteration between related languages, as well as grapheme-to-phoneme conversion, as data preprocessing methods in order to alleviate this issue. We experimented with several diverse language pairs, finding that in most cases transliterating the transfer language data into the target one leads to accuracy improvements, even up to 9 percentage points. Converting both languages into a shared space like the International Phonetic Alphabet or the Latin alphabet is also beneficial, leading to improvements of up to 16 percentage points.
在类型学相关语言之间的跨语迁移已被证明是成功的形态学变形任务。但是,如果语言不共享相同的脚本,则当前的方法产生的改进比较有限。我们探讨了相关语言之间的音译,以及字素到音素的转换,作为数据预处理方法,以缓解这一问题。我们对几种不同的语言对进行了实验,发现在大多数情况下,将迁移语言数据音译为目标语言可以提高准确性,甚至可以提高9个百分点。将两种语言转换成一个共享的空间,如国际音标或拉丁字母,也是有益的,可以提高多达16个百分点。
{"title":"Transliteration for Cross-Lingual Morphological Inflection","authors":"Nikitha Murikinati, Antonios Anastasopoulos, Graham Neubig","doi":"10.18653/v1/2020.sigmorphon-1.22","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.22","url":null,"abstract":"Cross-lingual transfer between typologically related languages has been proven successful for the task of morphological inflection. However, if the languages do not share the same script, current methods yield more modest improvements. We explore the use of transliteration between related languages, as well as grapheme-to-phoneme conversion, as data preprocessing methods in order to alleviate this issue. We experimented with several diverse language pairs, finding that in most cases transliterating the transfer language data into the target one leads to accuracy improvements, even up to 9 percentage points. Converting both languages into a shared space like the International Phonetic Alphabet or the Latin alphabet is also beneficial, leading to improvements of up to 16 percentage points.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"22 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123728410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
CLUZH at SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion 多语言字素到音素转换的共享任务
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.19
Peter Makarov, S. Clematide
This paper describes the submission by the team from the Institute of Computational Linguistics, Zurich University, to the Multilingual Grapheme-to-Phoneme Conversion (G2P) Task of the SIGMORPHON 2020 challenge. The submission adapts our system from the 2018 edition of the SIGMORPHON shared task. Our system is a neural transducer that operates over explicit edit actions and is trained with imitation learning. It is well-suited for morphological string transduction partly because it exploits the fact that the input and output character alphabets overlap. The challenge posed by G2P has been to adapt the model and the training procedure to work with disjoint alphabets. We adapt the model to use substitution edits and train it with a weighted finite-state transducer acting as the expert policy. An ensemble of such models produces competitive results on G2P. Our submission ranks second out of 23 submissions by a total of nine teams.
本文描述了苏黎世大学计算语言学研究所的团队向SIGMORPHON 2020挑战赛的多语言字形到音素转换(G2P)任务提交的内容。该提交采用了我们2018年版SIGMORPHON共享任务的系统。我们的系统是一个神经换能器,通过明确的编辑动作进行操作,并通过模仿学习进行训练。它非常适合于形态学字符串转导,部分原因是它利用了输入和输出字符字母重叠的事实。G2P提出的挑战是调整模型和训练程序来处理不相交的字母。我们调整模型使用替换编辑,并使用加权有限状态换能器作为专家策略来训练它。这些模型的集合在G2P上产生竞争性结果。我们的作品在共有9个团队提交的23份作品中排名第二。
{"title":"CLUZH at SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion","authors":"Peter Makarov, S. Clematide","doi":"10.18653/v1/2020.sigmorphon-1.19","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.19","url":null,"abstract":"This paper describes the submission by the team from the Institute of Computational Linguistics, Zurich University, to the Multilingual Grapheme-to-Phoneme Conversion (G2P) Task of the SIGMORPHON 2020 challenge. The submission adapts our system from the 2018 edition of the SIGMORPHON shared task. Our system is a neural transducer that operates over explicit edit actions and is trained with imitation learning. It is well-suited for morphological string transduction partly because it exploits the fact that the input and output character alphabets overlap. The challenge posed by G2P has been to adapt the model and the training procedure to work with disjoint alphabets. We adapt the model to use substitution edits and train it with a weighted finite-state transducer acting as the expert policy. An ensemble of such models produces competitive results on G2P. Our submission ranks second out of 23 submissions by a total of nine teams.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132530653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Frustratingly Easy Multilingual Grapheme-to-Phoneme Conversion 令人沮丧的简单多语言字母到音素转换
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.13
Nikhil Prabhu, Katharina Kann
In this paper, we describe two CU-Boulder submissions to the SIGMORPHON 2020 Task 1 on multilingual grapheme-to-phoneme conversion (G2P). Inspired by the high performance of a standard transformer model (Vaswani et al., 2017) on the task, we improve over this approach by adding two modifications: (i) Instead of training exclusively on G2P, we additionally create examples for the opposite direction, phoneme-to-grapheme conversion (P2G). We then perform multi-task training on both tasks. (ii) We produce ensembles of our models via majority voting. Our approaches, though being conceptually simple, result in systems that place 6th and 8th amongst 23 submitted systems, and obtain the best results out of all systems on Lithuanian and Modern Greek, respectively.
在本文中,我们描述了两个CU-Boulder提交给SIGMORPHON 2020任务1的多语言字素到音素转换(G2P)。受到标准转换器模型(Vaswani et al., 2017)在该任务上的高性能的启发,我们通过添加两个修改来改进这种方法:(i)不是仅在G2P上进行训练,我们还为相反的方向创建了音素到字素转换(P2G)的示例。然后我们对这两个任务进行多任务训练。(ii)我们通过多数投票产生模型的集合。我们的方法虽然在概念上很简单,但结果系统在23个提交的系统中排名第6和第8,并分别在立陶宛语和现代希腊语的所有系统中获得最佳结果。
{"title":"Frustratingly Easy Multilingual Grapheme-to-Phoneme Conversion","authors":"Nikhil Prabhu, Katharina Kann","doi":"10.18653/v1/2020.sigmorphon-1.13","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.13","url":null,"abstract":"In this paper, we describe two CU-Boulder submissions to the SIGMORPHON 2020 Task 1 on multilingual grapheme-to-phoneme conversion (G2P). Inspired by the high performance of a standard transformer model (Vaswani et al., 2017) on the task, we improve over this approach by adding two modifications: (i) Instead of training exclusively on G2P, we additionally create examples for the opposite direction, phoneme-to-grapheme conversion (P2G). We then perform multi-task training on both tasks. (ii) We produce ensembles of our models via majority voting. Our approaches, though being conceptually simple, result in systems that place 6th and 8th amongst 23 submitted systems, and obtain the best results out of all systems on Lithuanian and Modern Greek, respectively.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115163114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Joint learning of constraint weights and gradient inputs in Gradient Symbolic Computation with constrained optimization 基于约束优化的梯度符号计算中约束权值与梯度输入的联合学习
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.27
Max Nelson
This paper proposes a method for the joint optimization of constraint weights and symbol activations within the Gradient Symbolic Computation (GSC) framework. The set of grammars representable in GSC is proven to be a subset of those representable with lexically-scaled faithfulness constraints. This fact is then used to recast the problem of learning constraint weights and symbol activations in GSC as a quadratically-constrained version of learning lexically-scaled faithfulness grammars. This results in an optimization problem that can be solved using Sequential Quadratic Programming.
在梯度符号计算(GSC)框架下,提出了一种约束权重和符号激活的联合优化方法。证明了GSC中可表示的语法集是具有词汇尺度忠实约束的可表示语法集的子集。然后利用这一事实将GSC中的学习约束权重和符号激活问题重新定义为学习词汇尺度忠实语法的二次约束版本。这导致了一个优化问题,可以用顺序二次规划来解决。
{"title":"Joint learning of constraint weights and gradient inputs in Gradient Symbolic Computation with constrained optimization","authors":"Max Nelson","doi":"10.18653/v1/2020.sigmorphon-1.27","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.27","url":null,"abstract":"This paper proposes a method for the joint optimization of constraint weights and symbol activations within the Gradient Symbolic Computation (GSC) framework. The set of grammars representable in GSC is proven to be a subset of those representable with lexically-scaled faithfulness constraints. This fact is then used to recast the problem of learning constraint weights and symbol activations in GSC as a quadratically-constrained version of learning lexically-scaled faithfulness grammars. This results in an optimization problem that can be solved using Sequential Quadratic Programming.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116975560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion 多语言字素到音素转换的SIGMORPHON 2020共享任务
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.2
Kyle Gorman, Lucas F. E. Ashby, Aaron Goyzueta, Arya D. McCarthy, Shijie Wu, Daniel You
We describe the design and findings of the SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion. Participants were asked to submit systems which take in a sequence of graphemes in a given language as input, then output a sequence of phonemes representing the pronunciation of that grapheme sequence. Nine teams submitted a total of 23 systems, at best achieving a 18% relative reduction in word error rate (macro-averaged over languages), versus strong neural sequence-to-sequence baselines. To facilitate error analysis, we publicly release the complete outputs for all systems—a first for the SIGMORPHON workshop.
我们描述了SIGMORPHON 2020关于多语言字素到音素转换的共享任务的设计和发现。参与者被要求提交一种系统,该系统将给定语言中的一系列字素作为输入,然后输出一系列代表该字素序列发音的音素。9个团队总共提交了23个系统,与强大的神经序列到序列基线相比,单词错误率最多减少18%(语言宏观平均)。为了便于错误分析,我们公开发布了所有系统的完整输出——这在SIGMORPHON研讨会上还是第一次。
{"title":"The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion","authors":"Kyle Gorman, Lucas F. E. Ashby, Aaron Goyzueta, Arya D. McCarthy, Shijie Wu, Daniel You","doi":"10.18653/v1/2020.sigmorphon-1.2","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.2","url":null,"abstract":"We describe the design and findings of the SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion. Participants were asked to submit systems which take in a sequence of graphemes in a given language as input, then output a sequence of phonemes representing the pronunciation of that grapheme sequence. Nine teams submitted a total of 23 systems, at best achieving a 18% relative reduction in word error rate (macro-averaged over languages), versus strong neural sequence-to-sequence baselines. To facilitate error analysis, we publicly release the complete outputs for all systems—a first for the SIGMORPHON workshop.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129991861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
University of Illinois Submission to the SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection 伊利诺伊大学提交给SIGMORPHON 2020共享任务0:类型学上多样的形态变化
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.15
Marc E. Canby, A. Karipbayeva, B. Lunt, Sahand Mozaffari, Charlotte Yoder, J. Hockenmaier
The objective of this shared task is to produce an inflected form of a word, given its lemma and a set of tags describing the attributes of the desired form. In this paper, we describe a transformer-based model that uses a bidirectional decoder to perform this task, and evaluate its performance on the 90 languages and 18 language families used in this task.
这个共享任务的目标是生成一个单词的屈折形式,给定它的引理和一组描述所需形式属性的标签。在本文中,我们描述了一个基于变压器的模型,该模型使用双向解码器来执行该任务,并评估了该任务中使用的90种语言和18种语言族的性能。
{"title":"University of Illinois Submission to the SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection","authors":"Marc E. Canby, A. Karipbayeva, B. Lunt, Sahand Mozaffari, Charlotte Yoder, J. Hockenmaier","doi":"10.18653/v1/2020.sigmorphon-1.15","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.15","url":null,"abstract":"The objective of this shared task is to produce an inflected form of a word, given its lemma and a set of tags describing the attributes of the desired form. In this paper, we describe a transformer-based model that uses a bidirectional decoder to perform this task, and evaluate its performance on the 90 languages and 18 language families used in this task.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132047188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The CMU-LTI submission to the SIGMORPHON 2020 Shared Task 0: Language-Specific Cross-Lingual Transfer CMU-LTI提交给SIGMORPHON 2020共享任务0:特定语言的跨语言迁移
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.6
Nikitha Murikinati, Antonios Anastasopoulos
This paper describes the CMU-LTI submission to the SIGMORPHON 2020 Shared Task 0 on typologically diverse morphological inflection. The (unrestricted) submission uses the cross-lingual approach of our last year’s winning submission (Anastasopoulos and Neubig, 2019), but adapted to use specific transfer languages for each test language. Our system, with fixed non-tuned hyperparameters, achieved a macro-averaged accuracy of 80.65 ranking 20th among 31 systems, but it was still tied for best system in 25 of the 90 total languages.
本文描述了CMU-LTI提交给SIGMORPHON 2020共享任务0的关于类型学上多样的形态变化。(无限制的)提交使用了我们去年获奖提交的跨语言方法(Anastasopoulos和Neubig, 2019),但针对每种测试语言进行了调整,使用特定的迁移语言。我们的系统使用固定的非调优超参数,实现了80.65的宏观平均准确率,在31个系统中排名第20,但它仍然在90种语言中的25种语言中并列最佳系统。
{"title":"The CMU-LTI submission to the SIGMORPHON 2020 Shared Task 0: Language-Specific Cross-Lingual Transfer","authors":"Nikitha Murikinati, Antonios Anastasopoulos","doi":"10.18653/v1/2020.sigmorphon-1.6","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.6","url":null,"abstract":"This paper describes the CMU-LTI submission to the SIGMORPHON 2020 Shared Task 0 on typologically diverse morphological inflection. The (unrestricted) submission uses the cross-lingual approach of our last year’s winning submission (Anastasopoulos and Neubig, 2019), but adapted to use specific transfer languages for each test language. Our system, with fixed non-tuned hyperparameters, achieved a macro-averaged accuracy of 80.65 ranking 20th among 31 systems, but it was still tied for best system in 25 of the 90 total languages.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133856851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Grapheme-to-Phoneme Conversion with a Multilingual Transformer Model 使用多语言转换模型的字素到音素转换
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.7
Omnia S. ElSaadany, Benjamin Suter
In this paper, we describe our three submissions to the SIGMORPHON 2020 shared task 1 on grapheme-to-phoneme conversion for 15 languages. We experimented with a single multilingual transformer model. We observed that the multilingual model achieves results on par with our separately trained monolingual models and is even able to avoid a few of the errors made by the monolingual models.
在本文中,我们描述了我们向SIGMORPHON 2020共享任务1提交的三份关于15种语言的字素到音素转换的报告。我们尝试了一个单一的多语言转换器模型。我们观察到,多语言模型达到了与我们单独训练的单语言模型相当的结果,甚至能够避免单语言模型所犯的一些错误。
{"title":"Grapheme-to-Phoneme Conversion with a Multilingual Transformer Model","authors":"Omnia S. ElSaadany, Benjamin Suter","doi":"10.18653/v1/2020.sigmorphon-1.7","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.7","url":null,"abstract":"In this paper, we describe our three submissions to the SIGMORPHON 2020 shared task 1 on grapheme-to-phoneme conversion for 15 languages. We experimented with a single multilingual transformer model. We observed that the multilingual model achieves results on par with our separately trained monolingual models and is even able to avoid a few of the errors made by the monolingual models.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122341668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Low-Resource G2P and P2G Conversion with Synthetic Training Data 基于综合训练数据的低资源G2P和P2G转换
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.sigmorphon-1.12
B. Hauer, Amir Ahmad Habibi, Yixing Luan, Arnob Mallik, Grzegorz Kondrak
This paper presents the University of Alberta systems and results in the SIGMORPHON 2020 Task 1: Multilingual Grapheme-to-Phoneme Conversion. Following previous SIGMORPHON shared tasks, we define a low-resource setting with 100 training instances. We experiment with three transduction approaches in both standard and low-resource settings, as well as on the related task of phoneme-to-grapheme conversion. We propose a method for synthesizing training data using a combination of diverse models.
本文介绍了阿尔伯塔大学在SIGMORPHON 2020任务1:多语言字素到音素转换中的系统和结果。在前面的SIGMORPHON共享任务之后,我们定义了一个具有100个训练实例的低资源设置。我们在标准和低资源环境下对三种转导方法进行了实验,并对音素到字素转换的相关任务进行了实验。我们提出了一种使用多种模型组合来综合训练数据的方法。
{"title":"Low-Resource G2P and P2G Conversion with Synthetic Training Data","authors":"B. Hauer, Amir Ahmad Habibi, Yixing Luan, Arnob Mallik, Grzegorz Kondrak","doi":"10.18653/v1/2020.sigmorphon-1.12","DOIUrl":"https://doi.org/10.18653/v1/2020.sigmorphon-1.12","url":null,"abstract":"This paper presents the University of Alberta systems and results in the SIGMORPHON 2020 Task 1: Multilingual Grapheme-to-Phoneme Conversion. Following previous SIGMORPHON shared tasks, we define a low-resource setting with 100 training instances. We experiment with three transduction approaches in both standard and low-resource settings, as well as on the related task of phoneme-to-grapheme conversion. We propose a method for synthesizing training data using a combination of diverse models.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121174014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Special Interest Group on Computational Morphology and Phonology Workshop
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1