Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03最新文献

Weakly Supervised Natural Language Learning Without Redundant Views 无冗余视图的弱监督自然语言学习

Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03

Pub Date : 2003-05-27 DOI: 10.3115/1073445.1073468

Vincent Ng, Claire Cardie

We investigate single-view algorithms as an alternative to multi-view algorithms for weakly supervised learning for natural language processing tasks without a natural feature split. In particular, we apply co-training, self-training, and EM to one such task and find that both self-training and FS-EM, a new variation of EM that incorporates feature selection, outperform co-training and are comparatively less sensitive to parameter changes.

我们研究了单视图算法作为无自然特征分割的自然语言处理任务弱监督学习的多视图算法的替代方案。特别是，我们将共同训练、自我训练和EM应用于这样一个任务，并发现自我训练和FS-EM (EM的一种新变体，包含特征选择)都优于共同训练，并且对参数变化相对不那么敏感。

引用次数: 140

A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation 统计机器翻译对准模板模型的加权有限状态传感器实现

Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03

Pub Date : 2003-05-27 DOI: 10.3115/1073445.1073464

Shankar Kumar, W. Byrne

We present a derivation of the alignment template model for statistical machine translation and an implementation of the model using weighted finite state transducers. The approach we describe allows us to implement each constituent distribution of the model as a weighted finite state transducer or acceptor. We show that bitext word alignment and translation under the model can be performed with standard FSM operations involving these transducers. One of the benefits of using this framework is that it obviates the need to develop specialized search procedures, even for the generation of lattices or N-Best lists of bitext word alignments and translation hypotheses. We evaluate the implementation of the model on the French-to-English Hansards task and report alignment and translation performance.

我们提出了统计机器翻译的对齐模板模型的推导，并使用加权有限状态传感器实现了该模型。我们描述的方法允许我们将模型的每个组成分布实现为加权有限状态传感器或受体。我们证明了该模型下的文本对齐和翻译可以通过涉及这些换能器的标准FSM操作来执行。使用这个框架的好处之一是，它不需要开发专门的搜索过程，甚至不需要生成文本单词对齐和翻译假设的格或N-Best列表。我们评估了该模型在法语-英语备忘录任务中的实施情况，并报告了一致性和翻译绩效。

引用次数: 104

Statistical Sentence Condensation using Ambiguity Packing and Stochastic Disambiguation Methods for Lexical-Functional Grammar 基于歧义包装和随机消歧方法的词汇功能语法统计句子凝聚

Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03

Pub Date : 2003-05-27 DOI: 10.3115/1073445.1073471

S. Riezler, Tracy Holloway King, Dick Crouch, A. Zaenen

We present an application of ambiguity packing and stochastic disambiguation techniques for Lexical-Functional Grammars (LFG) to the domain of sentence condensation. Our system incorporates a linguistic parser/generator for LFG, a transfer component for parse reduction operating on packed parse forests, and a maximum-entropy model for stochastic output selection. Furthermore, we propose the use of standard parser evaluation methods for automatically evaluating the summarization quality of sentence condensation systems. An experimental evaluation of summarization quality shows a close correlation between the automatic parse-based evaluation and a manual evaluation of generated strings. Overall summarization quality of the proposed system is state-of-the-art, with guaranteed grammaticality of the system output due to the use of a constraint-based parser/generator.

本文提出了一种基于语义功能语法的歧义填充和随机消歧技术在句子凝聚领域的应用。我们的系统包含一个用于LFG的语言解析器/生成器，一个用于在打包解析森林上操作的解析缩减的传输组件，以及一个用于随机输出选择的最大熵模型。此外，我们提出使用标准解析器评估方法来自动评估句子浓缩系统的摘要质量。摘要质量的实验评估表明，自动基于解析的评估与人工对生成字符串的评估之间存在密切的相关性。所建议系统的总体摘要质量是最先进的，由于使用了基于约束的解析器/生成器，因此保证了系统输出的语法性。

引用次数: 100

Inducing History Representations for Broad Coverage Statistical Parsing 引申历史表示用于大范围统计分析

Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03

Pub Date : 2003-05-27 DOI: 10.3115/1073445.1073459

James Henderson

We present a neural network method for inducing representations of parse histories and using these history representations to estimate the probabilities needed by a statistical left-corner parser. The resulting statistical parser achieves performance (89.1% F-measure) on the Penn Treebank which is only 0.6% below the best current parser for this task, despite using a smaller vocabulary size and less prior linguistic knowledge. Crucial to this success is the use of structurally determined soft biases in inducing the representation of the parse history, and no use of hard independence assumptions.

我们提出了一种神经网络方法来诱导解析历史的表示，并使用这些历史表示来估计统计左角解析器所需的概率。结果统计解析器在Penn Treebank上实现了性能(89.1% F-measure)，仅比当前最佳解析器低0.6%，尽管使用了较小的词汇量和较少的先验语言知识。这一成功的关键是在诱导解析历史的表示时使用结构决定的软偏差，而不使用硬独立性假设。

引用次数: 109

Minimally Supervised Induction of Grammatical Gender 语法性别的最低监督归纳

Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03

Pub Date : 2003-05-27 DOI: 10.3115/1073445.1073451

Silviu Cucerzan, David Yarowsky

This paper investigates the problem of determining grammatical gender for the nouns of a language starting with minimal resources: a very small list of seed nouns for which gender is known or via translingual projection of natural gender. We show that through a bootstrapping process that uses contextual clues from an unannotated corpus and morphological clues modeled with suffix tries, accurate gender predictions can be induced for five diverse test languages.

本文研究了从最小的资源开始确定语言名词的语法性别的问题:一个非常小的已知性别的种子名词列表或通过自然性别的翻译语言投射。我们表明，通过使用来自未注释语料库的上下文线索和后缀尝试建模的形态学线索的引导过程，可以对五种不同的测试语言进行准确的性别预测。

引用次数: 38