Proceedings of the COLING/ACL on Main conference poster sessions -最新文献

英文中文

Soft Syntactic Constraints for Word Alignment through Discriminative Training 判别训练对词对齐的软句法约束

Proceedings of the COLING/ACL on Main conference poster sessions -

Pub Date : 2006-07-17 DOI: 10.3115/1273073.1273087

Colin Cherry, Dekang Lin

Word alignment methods can gain valuable guidance by ensuring that their alignments maintain cohesion with respect to the phrases specified by a monolingual dependency tree. However, this hard constraint can also rule out correct alignments, and its utility decreases as alignment models become more complex. We use a publicly available structured output SVM to create a max-margin syntactic aligner with a soft cohesion constraint. The resulting aligner is the first, to our knowledge, to use a discriminative learning method to train an ITG bitext parser.

单词对齐方法可以通过确保它们的对齐保持与单语依赖树指定的短语的内聚来获得有价值的指导。然而，这个硬约束也可以排除正确的对齐，并且随着对齐模型变得更加复杂，它的效用也会降低。我们使用公开可用的结构化输出支持向量机来创建具有软内聚约束的最大边距语法对齐器。据我们所知，生成的对齐器是第一个使用判别学习方法来训练ITG文本解析器的对齐器。

引用次数: 53

Adding Syntax to Dynamic Programming for Aligning Comparable Texts for the Generation of Paraphrases 将语法添加到动态规划中，以对齐可比较的文本以生成释义

Proceedings of the COLING/ACL on Main conference poster sessions -

Pub Date : 2006-07-17 DOI: 10.3115/1273073.1273169

Siwei Shen, Dragomir R. Radev, Agam Patel, Günes Erkan

Multiple sequence alignment techniques have recently gained popularity in the Natural Language community, especially for tasks such as machine translation, text generation, and paraphrase identification. Prior work falls into two categories, depending on the type of input used: (a) parallel corpora (e.g., multiple translations of the same text) or (b) comparable texts (non-parallel but on the same topic). So far, only techniques based on parallel texts have successfully used syntactic information to guide alignments. In this paper, we describe an algorithm for incorporating syntactic features in the alignment process for non-parallel texts with the goal of generating novel paraphrases of existing texts. Our method uses dynamic programming with alignment decision based on the local syntactic similarity between two sentences. Our results show that syntactic alignment outrivals syntax-free methods by 20% in both grammaticality and fidelity when computed over the novel sentences generated by alignment-induced finite state automata.

多序列比对技术最近在自然语言社区中得到了普及，特别是在机器翻译、文本生成和释义识别等任务中。根据使用的输入类型，先前的工作分为两类:(a)平行语料库(例如，同一文本的多个翻译)或(b)可比文本(非平行但在同一主题上)。到目前为止，只有基于平行文本的技术成功地使用了语法信息来指导对齐。在本文中，我们描述了一种在非平行文本对齐过程中结合句法特征的算法，目的是生成现有文本的新释义。该方法采用基于两句局部语法相似度的动态规划对齐决策。我们的研究结果表明，当对由对齐诱导的有限状态自动机生成的新句子进行计算时，句法对齐在语法性和保真度上都比无语法方法高出20%。

引用次数: 27

N Semantic Classes are Harder than Two N个语义类比2个更难

Proceedings of the COLING/ACL on Main conference poster sessions -

Pub Date : 2006-07-17 DOI: 10.3115/1273073.1273080

Ben Carterette, R. Jones, W. Greiner, C. Barr

We show that we can automatically classify semantically related phrases into 10 classes. Classification robustness is improved by training with multiple sources of evidence, including within-document cooccurrence, HTML markup, syntactic relationships in sentences, substitutability in query logs, and string similarity. Our work provides a benchmark for automatic n-way classification into WordNet's semantic classes, both on a TREC news corpus and on a corpus of substitutable search query phrases.

我们表明，我们可以自动将语义相关的短语分为10类。通过使用多个证据来源进行训练，包括文档内的协同性、HTML标记、句子中的语法关系、查询日志中的可替代性和字符串相似性，可以提高分类稳健性。我们的工作为自动n向分类到WordNet的语义类提供了一个基准，包括TREC新闻语料库和可替换搜索查询短语语料库。

引用次数: 3

Detection of Quotations and Inserted Clauses and Its Application to Dependency Structure Analysis in Spontaneous Japanese 引用与插入子句的检测及其在自发性日语依存结构分析中的应用

Proceedings of the COLING/ACL on Main conference poster sessions -

Pub Date : 2006-07-17 DOI: 10.21437/Interspeech.2006-251

Ryoji Hamabe, Kiyotaka Uchimoto, Tatsuya Kawahara, H. Isahara

Japanese dependency structure is usually represented by relationships between phrasal units called bunsetsus. One of the biggest problems with dependency structure analysis in spontaneous speech is that clause boundaries are ambiguous. This paper describes a method for detecting the boundaries of quotations and inserted clauses and that for improving the dependency accuracy by applying the detected boundaries to dependency structure analysis. The quotations and inserted clauses are determined by using an SVM-based text chunking method that considers information on morphemes, pauses, fillers, etc. The information on automatically analyzed dependency structure is also used to detect the beginning of the clauses. Our evaluation experiment using Corpus of Spontaneous Japanese (CSJ) showed that the automatically estimated boundaries of quotations and inserted clauses helped to improve the accuracy of dependency structure analysis.

日语的依存结构通常由短语单位之间的关系来表示，称为“词”。自发言语依存结构分析的最大问题之一是子句边界不明确。本文描述了一种检测引语和插入子句边界的方法，并将检测到的边界应用于依赖关系结构分析，以提高依赖关系的准确性。使用基于svm的文本分块方法确定引语和插入的子句，该方法考虑了语素、停顿、填充等信息。自动分析的依赖结构信息也用于检测子句的开始。基于自发日语语料库(CSJ)的评价实验表明，自动估计引语和插入从句的边界有助于提高依存结构分析的准确性。

引用次数: 1

Automatically Extracting Nominal Mentions of Events with a Bootstrapped Probabilistic Classifier 基于自举概率分类器的事件标称提及自动提取

Proceedings of the COLING/ACL on Main conference poster sessions -

Pub Date : 2006-07-17 DOI: 10.3115/1273073.1273095

C. Creswell, Matthew J. Beal, John Chen, T. Cornell, L. Nilsson, R. Srihari

Most approaches to event extraction focus on mentions anchored in verbs. However, many mentions of events surface as noun phrases. Detecting them can increase the recall of event extraction and provide the foundation for detecting relations between events. This paper describes a weakly-supervised method for detecting nominal event mentions that combines techniques from word sense disambiguation (WSD) and lexical acquisition to create a classifier that labels noun phrases as denoting events or non-events. The classifier uses boot-strapped probabilistic generative models of the contexts of events and non-events. The contexts are the lexically-anchored semantic dependency relations that the NPs appear in. Our method dramatically improves with bootstrapping, and comfortably outperforms lexical lookup methods which are based on very much larger hand-crafted resources.

大多数提取事件的方法都集中在动词中的提及。然而，许多提到的事件都是以名词短语的形式出现的。检测它们可以提高事件提取的召回率，为检测事件之间的关系提供基础。本文描述了一种弱监督检测名义事件提及的方法，该方法结合了词义消歧(WSD)和词汇习得技术，创建了一个分类器，将名词短语标记为表示事件或非事件。分类器使用事件和非事件上下文的bootstrap概率生成模型。上下文是词法锚定的语义依赖关系，np出现在其中。我们的方法大大改进了自引导，并且轻松优于基于更大的手工资源的词法查找方法。

引用次数: 17

Using Machine Learning to Explore Human Multimodal Clarification Strategies 利用机器学习探索人类多模态澄清策略

Proceedings of the COLING/ACL on Main conference poster sessions -

Pub Date : 2006-07-17 DOI: 10.3115/1273073.1273158

Verena Rieser, Oliver Lemon

We investigate the use of machine learning in combination with feature engineering techniques to explore human multimodal clarification strategies and the use of those strategies for dialogue systems. We learn from data collected in a Wizard-of-Oz study where different wizards could decide whether to ask a clarification request in a multimodal manner or else use speech alone. We show that there is a uniform strategy across wizards which is based on multiple features in the context. These are generic runtime features which can be implemented in dialogue systems. Our prediction models achieve a weighted f-score of 85.3% (which is a 25.5% improvement over a one-rule baseline). To assess the effects of models, feature discretisation, and selection, we also conduct a regression analysis. We then interpret and discuss the use of the learnt strategy for dialogue systems. Throughout the investigation we discuss the issues arising from using small initial Wizard-of-Oz data sets, and we show that feature engineering is an essential step when learning from such limited data.

我们研究了机器学习与特征工程技术相结合的使用，以探索人类多模态澄清策略以及这些策略在对话系统中的使用。我们从《绿野仙踪》研究中收集的数据中了解到，不同的巫师可以决定是用多模态方式提出澄清请求，还是单独使用语音。我们展示了基于上下文中的多个特性的跨向导的统一策略。这些是可以在对话系统中实现的通用运行时特性。我们的预测模型达到了85.3%的加权f分数(这比一条规则基线提高了25.5%)。为了评估模型、特征离散化和选择的影响，我们还进行了回归分析。然后我们解释和讨论在对话系统中使用所学的策略。在整个调查过程中，我们讨论了使用小型初始Wizard-of-Oz数据集所产生的问题，并且我们表明，在从这些有限的数据中学习时，特征工程是必不可少的一步。

引用次数: 37

A Phrase-Based Statistical Model for SMS Text Normalization 基于短语的SMS文本规范化统计模型

Proceedings of the COLING/ACL on Main conference poster sessions -

Pub Date : 2006-07-17 DOI: 10.3115/1273073.1273078

AiTi Aw, Min Zhang, Juan Xiao, Jian Su

Short Messaging Service (SMS) texts behave quite differently from normal written texts and have some very special phenomena. To translate SMS texts, traditional approaches model such irregularities directly in Machine Translation (MT). However, such approaches suffer from customization problem as tremendous effort is required to adapt the language model of the existing translation system to handle SMS text style. We offer an alternative approach to resolve such irregularities by normalizing SMS texts before MT. In this paper, we view the task of SMS normalization as a translation problem from the SMS language to the English language and we propose to adapt a phrase-based statistical MT model for the task. Evaluation by 5-fold cross validation on a parallel SMS normalized corpus of 5000 sentences shows that our method can achieve 0.80702 in BLEU score against the baseline BLEU score 0.6958. Another experiment of translating SMS texts from English to Chinese on a separate SMS text corpus shows that, using SMS normalization as MT preprocessing can largely boost SMS translation performance from 0.1926 to 0.3770 in BLEU score.

短消息服务(SMS)文本的行为与普通的书面文本有很大的不同，并且有一些非常特殊的现象。为了翻译短信文本，传统的方法是直接在机器翻译(MT)中模拟这种不规则现象。但是，这种方法存在定制化的问题，需要对现有翻译系统的语言模型进行大量的调整来处理短信文本样式。我们提供了一种替代方法，通过在机器翻译之前对短信文本进行规范化来解决这种不规则性。在本文中，我们将短信规范化任务视为从短信语言到英语的翻译问题，并建议为该任务采用基于短语的统计机器翻译模型。在5000个句子的平行短信规范化语料库上进行5次交叉验证，结果表明，该方法的BLEU得分为0.80702，而基线BLEU得分为0.6958。另一项在单独的短信文本语料库上进行的中英文短信翻译实验表明，使用短信归一化作为MT预处理，可以大大提高短信翻译性能，BLEU分数从0.1926提高到0.3770。

引用次数: 284

An Empirical Study of Chinese Chunking 汉语组块记忆的实证研究

Proceedings of the COLING/ACL on Main conference poster sessions -

Pub Date : 2006-07-17 DOI: 10.3115/1273073.1273086

Wenliang Chen, Yujie Zhang, H. Isahara

In this paper, we describe an empirical study of Chinese chunking on a corpus, which is extracted from UPENN Chinese Treebank-4 (CTB4). First, we compare the performance of the state-of-the-art machine learning models. Then we propose two approaches in order to improve the performance of Chinese chunking. 1) We propose an approach to resolve the special problems of Chinese chunking. This approach extends the chunk tags for every problem by a tag-extension function. 2) We propose two novel voting methods based on the characteristics of chunking task. Compared with traditional voting methods, the proposed voting methods consider long distance information. The experimental results show that the SVMs model outperforms the other models and that our proposed approaches can improve performance significantly.

本文对UPENN Chinese Treebank-4 (CTB4)中的语料库进行了汉语分块的实证研究。首先，我们比较了最先进的机器学习模型的性能。然后，我们提出了两种方法来提高中文分块的性能。1)提出了一种解决汉语分块问题的方法。这种方法通过标记扩展函数为每个问题扩展块标记。2)基于分块任务的特点，提出了两种新的投票方法。与传统的投票方法相比，本文提出的投票方法考虑了远距离信息。实验结果表明，支持向量机模型的性能优于其他模型，我们提出的方法可以显著提高性能。

引用次数: 55

Aligning Features with Sense Distinction Dimensions 将特征与感知区分维度对齐

Proceedings of the COLING/ACL on Main conference poster sessions -

Pub Date : 2006-07-17 DOI: 10.3115/1273073.1273191

Nianwen Xue, Jinying Chen, Martha Palmer

In this paper we present word sense disambiguation (WSD) experiments on ten highly polysemous verbs in Chinese, where significant performance improvements are achieved using rich linguistic features. Our system performs significantly better, and in some cases substantially better, than the baseline on all ten verbs. Our results also demonstrate that features extracted from the output of an automatic Chinese semantic role labeling system in general benefited the WSD system, even though the amount of improvement was not consistent across the verbs. For a few verbs, semantic role information actually hurt WSD performance. The inconsistency of feature performance is a general characteristic of the WSD task, as has been observed by others. We argue that this result can be explained by the fact that word senses are partitioned along different dimensions for different verbs and the features therefore need to be tailored to particular verbs in order to achieve adequate accuracy on verb sense disambiguation.

本文对10个汉语高度多义词动词进行了词义消歧实验，利用丰富的语言特征实现了显著的性能改进。我们的系统在所有10个动词上的表现明显好于基线，在某些情况下明显好于基线。我们的研究结果还表明，从自动中文语义角色标注系统的输出中提取的特征总体上有利于WSD系统，尽管不同动词的改进幅度并不一致。对于一些动词，语义角色信息实际上会损害WSD的性能。特性性能的不一致是WSD任务的一个普遍特征，正如其他人所观察到的那样。我们认为，这一结果可以解释为，不同动词的词义沿着不同的维度划分，因此需要针对特定的动词定制特征，以便在动词意义消歧上达到足够的准确性。

引用次数: 12

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the COLING/ACL on Main conference poster sessions -

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀