Special Interest Group on Computational Morphology and Phonology Workshop最新文献

英文中文

Cognate Identification and Alignment Using Practical Orthographies 使用实用正字法的同源识别和对齐

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626530

Michael Cysouw, H. Jung

We use an iterative process of multi-gram alignment between associated words in different languages in an attempt to identify cognates. To maximise the amount of data, we use practical orthographies instead of consistently coded phonetic transcriptions. First results indicate that using practical orthographies can be useful, the more so when dealing with large amounts of data.

我们在不同语言的关联词之间使用多克对齐的迭代过程，试图识别同源词。为了最大限度地提高数据量，我们使用实用的正字法而不是一致编码的语音转录。第一个结果表明，使用实用的正字法可能是有用的，尤其是在处理大量数据时。

引用次数: 11

Emergence of Community Structures in Vowel Inventories: An Analysis Based on Complex Networks 元音清单中群落结构的出现:基于复杂网络的分析

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626529

Animesh Mukherjee, M. Choudhury, A. Basu, Niloy Ganguly

In this work, we attempt to capture patterns of co-occurrence across vowel systems and at the same time figure out the nature of the force leading to the emergence of such patterns. For this purpose we define a weighted network where the vowels are the nodes and an edge between two nodes (read vowels) signify their co-occurrence likelihood over the vowel inventories. Through this network we identify communities of vowels, which essentially reflect their patterns of co-occurrence across languages. We observe that in the assortative vowel communities the constituent nodes (read vowels) are largely uncorrelated in terms of their features indicating that they are formed based on the principle of maximal perceptual contrast. However, in the rest of the communities, strong correlations are reflected among the constituent vowels with respect to their features indicating that it is the principle of feature economy that binds them together.

在这项工作中，我们试图捕捉跨元音系统共现的模式，同时找出导致这种模式出现的力的性质。为此，我们定义了一个加权网络，其中元音是节点，两个节点之间的边(读元音)表示它们在元音清单上共现的可能性。通过这个网络，我们识别出元音群落，这些群落本质上反映了它们在不同语言中共现的模式。我们观察到，在分类元音群落中，组成节点(读元音)的特征在很大程度上是不相关的，这表明它们是基于最大感知对比原则形成的。然而，在其他社区中，组成元音之间的强相关性反映在它们的特征上，这表明是特征经济原则将它们联系在一起。

引用次数: 5

ParaMor: Minimally Supervised Induction of Paradigm Structure and Morphological Analysis 参数:最小监督归纳范式结构和形态分析

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626531

Christian Monson, J. Carbonell, A. Lavie, Lori S. Levin

Paradigms provide an inherent organizational structure to natural language morphology. ParaMor, our minimally supervised morphology induction algorithm, retrusses the word forms of raw text corpora back onto their paradigmatic skeletons; performing on par with state-of-the-art minimally supervised morphology induction algorithms at morphological analysis of English and German. ParaMor consists of two phases. Our algorithm first constructs sets of affixes closely mimicking the paradigms of a language. And with these structures in hand, ParaMor then annotates word forms with morpheme boundaries. To set ParaMor's few free parameters we analyze a training corpus of Spanish. Without adjusting parameters, we induce the morphological structure of English and German. Adopting the evaluation methodology of Morpho Challenge 2007 (Kurimo et al., 2007), we compare ParaMor's morphological analyses with Morfessor (Creutz, 2006), a modern minimally supervised morphology induction system. ParaMor consistently achieves competitive F1 measures.

范式为自然语言形态提供了一种内在的组织结构。ParaMor，我们的最低监督形态诱导算法，将原始文本语料库的词形式回溯到它们的范式骨架上;在英语和德语的形态学分析中，表现与最先进的最低监督形态学诱导算法相当。ParaMor由两个阶段组成。我们的算法首先构建一组非常模仿语言范例的词缀。有了这些结构，ParaMor就可以用语素边界来注释词形。为了设置ParaMor的几个自由参数，我们分析了一个西班牙语训练语料库。在不调整参数的情况下，我们归纳了英语和德语的词形结构。采用Morpho Challenge 2007 (Kurimo et al.， 2007)的评估方法，我们将ParaMor的形态学分析与现代最低监督形态学诱导系统morfesson (Creutz, 2006)进行比较。ParaMor始终如一地达到具有竞争力的F1指标。

引用次数: 27

Data Nonlinearity in Exploratory Multivariate Analysis of Language Corpora 语言语料库探索性多元分析中的数据非线性

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626528

H. Moisl

Data nonlinearity has historically not been and currently is not an issue in work on exploratory multivariate analysis of language corpora. However, the presence of nonlinearity in data has a fundamental bearing on the conduct of exploratory analysis. The first part of the discussion explains why this is so in principle, and the second exemplifies the explanation via exploratory analysis of the Newcastle Electronic Corpus of Tyneside English (NECTE), an historical speech corpus. The conclusion is that data should be screened for nonlinearity prior to analysis and, if a substantial degree of it is found, a nonlinear analytical method should be used.

在语言语料库的探索性多变量分析中，数据非线性过去和现在都不是一个问题。然而，数据中非线性的存在对探索性分析的进行有着根本性的影响。讨论的第一部分解释了原则上为什么会这样，第二部分通过对泰恩赛德英语纽卡斯尔电子语料库(NECTE)的探索性分析举例说明了这一解释，这是一个历史语音语料库。结论是，在分析之前应该对数据进行非线性筛选，如果发现很大程度的非线性，就应该使用非线性分析方法。

引用次数: 3

The Relative Divergence of Dutch Dialect Pronunciations from their Common Source: An Exploratory Study 荷兰方言同源语音的相对差异:探索性研究

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626521

W. Heeringa, B. Joseph

In this paper we use the Reeks Nederlandse Dialectatlassen as a source for the reconstruction of a 'proto-language' of Dutch dialects. We used 360 dialects from locations in the Netherlands, the northern part of Belgium and French-Flanders. The density of dialect locations is about the same everywhere. For each dialect we reconstructed 85 words. For the reconstruction of vowels we used knowledge of Dutch history, and for the reconstruction of consonants we used well-known tendencies found in most textbooks about historical linguistics. We validated results by comparing the reconstructed forms with pronunciations according to a proto-Germanic dictionary (Kobler, 2003). For 46% of the words we reconstructed the same vowel or the closest possible vowel when the vowel to be reconstructed was not found in the dialect material. For 52% of the words all consonants we reconstructed were the same. For 42% of the words, only one consonant was differently reconstructed. We measured the divergence of Dutch dialects from their 'proto-language'. We measured pronunciation distances to the proto-language we reconstructed ourselves and correlated them with pronunciation distances we measured to proto-Germanic based on the dictionary. Pronunciation distances were measured using Levenshtein distance, a string edit distance measure. We found a relatively strong correlation (r=0.87).

在本文中，我们使用Reeks Nederlandse Dialectatlassen作为重建荷兰方言“原始语言”的来源。我们使用了来自荷兰、比利时北部和法属佛兰德斯地区的360种方言。方言分布的密度在任何地方都差不多。对于每种方言，我们重建了85个单词。为了重建元音，我们使用了荷兰历史知识，为了重建辅音，我们使用了大多数历史语言学教科书中众所周知的趋势。我们根据原始日耳曼语词典(Kobler, 2003)将重建的形式与发音进行比较，验证了结果。对于46%的单词，当要重构的元音在方言材料中找不到时，我们重构了相同的元音或最接近的元音。对于52%的单词，我们重建的所有辅音都是相同的。在42%的单词中，只有一个辅音有不同的重构。我们测量了荷兰方言与其“原始语言”之间的差异。我们测量了与原始语言的发音距离，我们重建了自己，并将它们与基于字典的原始日耳曼语的发音距离相关联。发音距离使用Levenshtein距离测量，这是一种字符串编辑距离测量。我们发现相关性相对较强(r=0.87)。

{"title":"The Relative Divergence of Dutch Dialect Pronunciations from their Common Source: An Exploratory Study","authors":"W. Heeringa, B. Joseph","doi":"10.3115/1626516.1626521","DOIUrl":"https://doi.org/10.3115/1626516.1626521","url":null,"abstract":"In this paper we use the Reeks Nederlandse Dialectatlassen as a source for the reconstruction of a 'proto-language' of Dutch dialects. We used 360 dialects from locations in the Netherlands, the northern part of Belgium and French-Flanders. The density of dialect locations is about the same everywhere. For each dialect we reconstructed 85 words. For the reconstruction of vowels we used knowledge of Dutch history, and for the reconstruction of consonants we used well-known tendencies found in most textbooks about historical linguistics. We validated results by comparing the reconstructed forms with pronunciations according to a proto-Germanic dictionary (Kobler, 2003). For 46% of the words we reconstructed the same vowel or the closest possible vowel when the vowel to be reconstructed was not found in the dialect material. For 52% of the words all consonants we reconstructed were the same. For 42% of the words, only one consonant was differently reconstructed. We measured the divergence of Dutch dialects from their 'proto-language'. We measured pronunciation distances to the proto-language we reconstructed ourselves and correlated them with pronunciation distances we measured to proto-Germanic based on the dictionary. Pronunciation distances were measured using Levenshtein distance, a string edit distance measure. We found a relatively strong correlation (r=0.87).","PeriodicalId":186158,"journal":{"name":"Special Interest Group on Computational Morphology and Phonology Workshop","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116173055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Creating a Comparative Dictionary of Totonac-Tepehua 创建托托纳克语-特佩瓦语比较词典

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626533

Grzegorz Kondrak, D. Beck, Philip Dilts

We apply algorithms for the identification of cognates and recurrent sound correspondences proposed by Kondrak (2002) to the Totonac-Tepehua family of indigenous languages in Mexico. We show that by combining expert linguistic knowledge with computational analysis, it is possible to quickly identify a large number of cognate sets within the family. Our objective is to provide tools for rapid construction of comparative dictionaries for relatively unfamiliar language families.

我们应用了Kondrak(2002)提出的识别同源词和循环音对应的算法，该算法适用于墨西哥土着语言的Totonac-Tepehua家族。我们表明，通过将专家语言学知识与计算分析相结合，可以快速识别家族内的大量同源集。我们的目标是为相对陌生的语系提供快速构建比较词典的工具。

引用次数: 10

Computing and Historical Phonology 计算和历史音韵学

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626517

J. Nerbonne, T. M. Ellison, Grzegorz Kondrak

We introduce the proceedings from the workshop 'Computing and Historical Phonology: 9th Meeting of the ACL Special Interest Group for Computational Morphology and Phonology'.

我们介绍了“计算和历史音韵学:ACL计算形态学和音韵学特别兴趣小组第9次会议”研讨会的会议记录。

引用次数: 5

Phonological Reconstruction of a Dead Language Using the Gradual Learning Algorithm 用渐进式学习算法重建一种死亡语言的语音

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626524

Eric Smith

This paper discusses the reconstruction of the Elamite language's phonology from its orthography using the Gradual Learning Algorithm, which was re-purposed to "learn" underlying phonological forms from surface orthography. Practical issues are raised regarding the difficulty of mapping between orthography and phonology, and Optimality Theory's neglected Lexicon Optimization module is highlighted.

本文讨论了使用渐进学习算法从正字法中重建埃兰语的音系，该算法被重新用于从表面正字法中“学习”潜在的音系形式。提出了关于正字法和音系之间映射困难的实际问题，并强调了优化理论中被忽视的词汇优化模块。

引用次数: 2

On the Geolinguistic Change in Northern France between 1300 and 1900: A Dialectometrical Inquiry 1300 - 1900年间法国北部的地理语言变迁:辩证法研究

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626526

H. Goebl

With the supply of 8 closely interpreted dialectometrical maps, this paper analyses the linguistic change of the geolinguistic deep structures in Northern France (Domaine d'Oil) between 1300 and 1900. As a matter of fact, the result will show -- with one exception -- the great stability of these deep structures.

本文利用8幅严密解释的方言地图，分析了1300年至1900年间法国北部(Domaine d’oil)地质语言深层结构的语言变化。事实上，结果将显示——除了一个例外——这些深层结构的巨大稳定性。

引用次数: 4

Evolution, Optimization, and Language Change: The Case of Bengali Verb Inflections 演化、优化与语言变化:孟加拉语动词屈折变化的例子

Special Interest Group on Computational Morphology and Phonology Workshop

Pub Date : 2007-06-28 DOI: 10.3115/1626516.1626525

M. Choudhury, Vaibhav Jalan, S. Sarkar, A. Basu

The verb inflections of Bengali underwent a series of phonological change between 10th and 18th centuries, which gave rise to several modern dialects of the language. In this paper, we offer a functional explanation for this change by quantifying the functional pressures of ease of articulation, perceptual contrast and learnability through objective functions or constraints, or both. The multi-objective and multi-constraint optimization problem has been solved through genetic algorithm, whereby we have observed the emergence of Pareto-optimal dialects in the system that closely resemble some of the real ones.

从10世纪到18世纪，孟加拉语的动词变化经历了一系列的音系变化，从而产生了几种现代方言。在本文中，我们通过量化通过目标函数或约束或两者兼而有之的易于发音、感知对比和可学习性的功能压力，为这种变化提供了功能解释。通过遗传算法求解多目标多约束优化问题，我们观察到系统中出现了一些与真实系统非常相似的帕累托最优方言。

引用次数: 8

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Special Interest Group on Computational Morphology and Phonology Workshop

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀