Special Interest Group on Computational Morphology and Phonology Workshop最新文献

英文中文

Unsupervised Word Segmentation for Sesotho Using Adaptor Grammars 使用适配器语法的Sesotho无监督分词

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2008-06-19 DOI: 10.3115/1626324.1626328

Mark Johnson

This paper describes a variety of non-parametric Bayesian models of word segmentation based on Adaptor Grammars that model different aspects of the input and incorporate different kinds of prior knowledge, and applies them to the Bantu language Sesotho. While we find overall word segmentation accuracies lower than these models achieve on English, we also find some interesting differences in which factors contribute to better word segmentation. Specifically, we found little improvement to word segmentation accuracy when we modeled contextual dependencies, while modeling morphological structure did improve segmentation accuracy.

本文描述了基于Adaptor Grammars的多种非参数贝叶斯分词模型，这些模型对输入的不同方面进行建模，并结合了不同类型的先验知识，并将其应用于班图语Sesotho。虽然我们发现整体分词准确率低于这些模型在英语上的结果，但我们也发现了一些有趣的差异，这些差异有助于更好的分词。具体来说，我们发现当我们建模上下文依赖关系时，对分词精度的提高很小，而建模形态结构确实提高了分词精度。

引用次数: 83

Evaluating an Agglutinative Segmentation Model for ParaMor 参数的粘合分割模型评价

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2008-06-19 DOI: 10.3115/1626324.1626332

Christian Monson, A. Lavie, J. Carbonell, Lori S. Levin

This paper describes and evaluates a modification to the segmentation model used in the unsupervised morphology induction system, ParaMor. Our improved segmentation model permits multiple morpheme boundaries in a single word. To prepare ParaMor to effectively apply the new agglutinative segmentation model, two heuristics improve ParaMor's precision. These precision-enhancing heuristics are adaptations of those used in other unsupervised morphology induction systems, including work by Hafer and Weiss (1974) and Goldsmith (2006). By reformulating the segmentation model used in ParaMor, we significantly improve ParaMor's performance in all language tracks and in both the linguistic evaluation as well as in the task based information retrieval (IR) evaluation of the peer operated competition Morpho Challenge 2007. ParaMor's improved morpheme recall in the linguistic evaluations of German, Finnish, and Turkish is higher than that of any system which competed in the Challenge. In the three languages of the IR evaluation, our enhanced ParaMor significantly outperforms, at average precision over newswire queries, a morphologically naive baseline; scoring just behind the leading system from Morpho Challenge 2007 in English and ahead of the first place system in German.

本文描述并评价了一种用于无监督形态诱导系统的分割模型的改进，ParaMor。我们改进的分词模型允许在一个词中有多个语素边界。为了使ParaMor能够有效地应用新的粘合分割模型，采用了两种启发式方法来提高ParaMor的精度。这些提高精度的启发式方法是对其他无监督形态诱导系统的改进，包括Hafer和Weiss(1974)和Goldsmith(2006)的工作。通过重构ParaMor中使用的分割模型，我们显著提高了ParaMor在所有语言轨道、语言评估以及基于任务的信息检索(IR)评估中的性能。在德语、芬兰语和土耳其语的语言评价中，ParaMor提高的语素回忆率高于任何参加挑战赛的系统。在IR评估的三种语言中，我们增强的ParaMor在平均精度上明显优于新闻线查询，这是一个形态学朴素基线;在2007年Morpho挑战赛中，该系统在英语方面的得分仅落后于领先的系统，而在德语方面则领先于第一名系统。

{"title":"Evaluating an Agglutinative Segmentation Model for ParaMor","authors":"Christian Monson, A. Lavie, J. Carbonell, Lori S. Levin","doi":"10.3115/1626324.1626332","DOIUrl":"https://doi.org/10.3115/1626324.1626332","url":null,"abstract":"This paper describes and evaluates a modification to the segmentation model used in the unsupervised morphology induction system, ParaMor. Our improved segmentation model permits multiple morpheme boundaries in a single word. To prepare ParaMor to effectively apply the new agglutinative segmentation model, two heuristics improve ParaMor's precision. These precision-enhancing heuristics are adaptations of those used in other unsupervised morphology induction systems, including work by Hafer and Weiss (1974) and Goldsmith (2006). By reformulating the segmentation model used in ParaMor, we significantly improve ParaMor's performance in all language tracks and in both the linguistic evaluation as well as in the task based information retrieval (IR) evaluation of the peer operated competition Morpho Challenge 2007. ParaMor's improved morpheme recall in the linguistic evaluations of German, Finnish, and Turkish is higher than that of any system which competed in the Challenge. In the three languages of the IR evaluation, our enhanced ParaMor significantly outperforms, at average precision over newswire queries, a morphologically naive baseline; scoring just behind the leading system from Morpho Challenge 2007 in English and ahead of the first place system in German.","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127872526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

A Bayesian Model of Natural Language Phonology: Generating Alternations from Underlying Forms 自然语言音系的贝叶斯模型:从基础形式生成替代

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2008-06-19 DOI: 10.3115/1626324.1626327

David Ellis

A stochastic approach to learning phonology. The model presented captures 7--15% more phonologically plausible underlying forms than a simple majority solution, because it prefers "pure" alternations. It could be useful in cases where an approximate solution is needed, or as a seed for more complex models. A similar process could be involved in some stages of child language acquisition; in particular, early learning of phonotactics.

学习音韵学的随机方法。所提出的模型比简单的多数解决方案多捕获了7- 15%的语音上似是而非的潜在形式，因为它更喜欢“纯粹”的替代。在需要近似解的情况下，或者作为更复杂模型的种子，它可能很有用。类似的过程可能涉及儿童语言习得的某些阶段;尤其是语音战术的早期学习。

引用次数: 1

Phonotactic Probability and the Maori Passive: A Computational Approach 音致性概率与毛利人被动语态:一种计算方法

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2008-06-19 DOI: 10.3115/1626324.1626331

Oiwi Parker Jones

Two analyses of Maori passives and gerunds have been debated in the literature. Both assume that the thematic consonants in these forms are unpredictable. This paper reports on three computational experiments designed to test whether this assumption is sound. The results suggest that thematic consonants are predictable from the phonotactic probabilities of their active counterparts. This study has potential implications for allomorphy in other Polynesian languages. It also exemplifies the benefits of using computational methods in linguistic analyses.

毛利语被动语态和动名词的两种分析在文献中一直存在争议。两者都假定主位辅音在这些形式中是不可预测的。本文报道了三个计算实验，旨在检验这一假设是否合理。结果表明，主位辅音是可预测的，从其主动对应物的语音策略概率。这项研究对其他波利尼西亚语言的异型现象有潜在的启示。它还举例说明了在语言分析中使用计算方法的好处。

引用次数: 5

Bayesian Learning over Conflicting Data: Predictions for Language Change 冲突数据中的贝叶斯学习:语言变化的预测

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2008-06-19 DOI: 10.3115/1626324.1626326

Rebecca L Morley

This paper is an analysis of the claim that a universal ban on certain ('anti-markedness') grammars is necessary in order to explain their non-occurrence in the languages of the world. To assess the validity of this hypothesis I examine the implications of one sound change (a > ə) for learning in a specific phonological domain (stress assignment), making explicit assumptions about the type of data that results, and the learning function that computes over that data. The preliminary conclusion is that restrictions on possible end-point languages are unneeded, and that the most likely outcome of change is a lexicon that is inconsistent with respect to a single generating rule.

有一种观点认为，为了解释某些语法在世界语言中不存在的原因，有必要普遍禁止它们(“反标记”)。为了评估这一假设的有效性，我研究了一个声音变化(> /)对特定语音领域(重音分配)学习的影响，对结果的数据类型和计算该数据的学习功能做出了明确的假设。初步的结论是，不需要对可能的终点语言进行限制，更改的最可能的结果是与单个生成规则不一致的词典。

引用次数: 1

Word Similarity Metrics and Multilateral Comparison 词相似度度量和多边比较

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626518

Brett Kessler

Phylogenetic analyses of languages need to explicitly address whether the languages under consideration are related to each other at all. Recently developed permutation tests allow this question to be explored by testing whether words in one set of languages are significantly more similar to those in another set of languages when paired up by semantics than when paired up at random. Seven different phonetic similarity metrics are implemented and evaluated on their effectiveness within such multilateral comparison systems when deployed to detect genetic relations among the Indo-European and Uralic language families.

语言的系统发育分析需要明确指出所考虑的语言是否彼此相关。最近开发的排列测试允许通过测试一组语言中的单词与另一组语言中的单词在按语义配对时是否比随机配对时明显更相似来探索这个问题。在这种多边比较系统中，当用于检测印欧语系和乌拉尔语系之间的遗传关系时，实施了七种不同的语音相似性度量并评估了它们的有效性。

引用次数: 13

Inducing Sound Segment Differences Using Pair Hidden Markov Models 利用对隐马尔可夫模型诱导声音片段差异

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626523

Martijn Wieling, Therese Leinonen, J. Nerbonne

Pair Hidden Markov Models (PairHMMs) are trained to align the pronunciation transcriptions of a large contemporary collection of Dutch dialect material, the Goeman-Taeldeman-Van Reenen-Project (GTRP, collected 1980--1995). We focus on the question of how to incorporate information about sound segment distances to improve sequence distance measures for use in dialect comparison. PairHMMs induce segment distances via expectation maximisation (EM). Our analysis uses a phonologically comparable subset of 562 items for all 424 localities in the Netherlands. We evaluate the work first via comparison to analyses obtained using the Levenshtein distance on the same dataset and second, by comparing the quality of the induced vowel distances to acoustic differences.

对隐马尔可夫模型(pairhmm)进行训练，以对齐大量当代荷兰方言材料的发音转录，goeman - taeldemand - van Reenen-Project (GTRP，收集1980- 1995)。我们关注的问题是如何结合音段距离的信息来改进方言比较中使用的序列距离测量。pairhmm通过期望最大化(EM)来诱导区段距离。我们的分析使用了荷兰所有424个地区的562个项目的语音可比子集。我们首先通过与同一数据集上使用Levenshtein距离获得的分析结果进行比较，然后通过比较诱导元音距离与声学差异的质量来评估工作。

引用次数: 27

Can Corpus Based Measures be Used for Comparative Study of Languages? 基于语料库的度量方法能否用于语言比较研究?

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626522

Anil Kumar Singh, H. Surana

Quantitative measurement of inter-language distance is a useful technique for studying diachronic and synchronic relations between languages. Such measures have been used successfully for purposes like deriving language taxonomies and language reconstruction, but they have mostly been applied to handcrafted word lists. Can we instead use corpus based measures for comparative study of languages? In this paper we try to answer this question. We use three corpus based measures and present the results obtained from them and show how these results relate to linguistic and historical knowledge. We argue that the answer is yes and that such studies can provide or validate linguistic and computational insights.

语言间距离的定量测量是研究语言间历时性和共时性关系的有效手段。这些方法已经成功地用于派生语言分类法和语言重建等目的，但它们大多应用于手工制作的单词列表。我们是否可以使用基于语料库的方法来进行语言比较研究?在本文中，我们试图回答这个问题。我们使用了三种基于语料库的测量方法，并展示了从中获得的结果，并展示了这些结果与语言和历史知识的关系。我们认为答案是肯定的，这样的研究可以提供或验证语言和计算的见解。

引用次数: 30

Visualizing the Evaluation of Distance Measures 可视化距离测量的评估

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626527

T. Pilz, Axel Philipsenburg, W. Luther

This paper describes the development and use of an interface for visually evaluating distance measures. The combination of multidimensional scaling plots, histograms and tables allows for different stages of overview and detail. The interdisciplinary project Rule-based search in text databases with nonstandard orthography develops a fuzzy full text search engine and uses distance measures for historical text document retrieval. This engine should provide easier text access for experts as well as interested amateurs.

本文描述了一个用于可视地评估距离测量的接口的开发和使用。多维缩放图，直方图和表格的组合允许不同阶段的概述和细节。基于规则的非标准正字法文本数据库检索跨学科项目开发了一个模糊全文搜索引擎，并利用距离度量对历史文本文档进行检索。这个引擎应该为专家以及感兴趣的业余爱好者提供更容易的文本访问。

引用次数: 2

Bayesian Identification of Cognates and Correspondences 同源词和对应词的贝叶斯识别

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626519

T. M. Ellison

This paper presents a Bayesian approach to comparing languages: identifying cognates and the regular correspondences that compose them. A simple model of language is extended to include these notions in an account of parent languages. An expression is developed for the posterior probability of child language forms given a parent language. Bayes' Theorem offers a schema for evaluating choices of cognates and correspondences to explain semantically matched data. An implementation optimising this value with gradient descent is shown to distinguish cognates from non-cognates in data from Polish and Russian.

本文提出了一种贝叶斯方法来比较语言:识别同源词和构成它们的规则对应关系。一个简单的语言模型被扩展到包括这些概念在母体语言的帐户。给出了一种母语言的子语言形式的后验概率表达式。贝叶斯定理提供了一种模式来评估同源词和对应关系的选择，以解释语义匹配的数据。用梯度下降优化这个值的实现被证明可以区分波兰语和俄语数据中的同源词和非同源词。

引用次数: 12

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Special Interest Group on Computational Morphology and Phonology Workshop

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀