首页 > 最新文献

Proceedings of the conference. Association for Computational Linguistics. Meeting最新文献

英文 中文
Aggregating and Predicting Sequence Labels from Crowd Annotations. 从人群注释中聚合和预测序列标签。
An T Nguyen, Byron C Wallace, Junyi Jessy Li, Ani Nenkova, Matthew Lease

Despite sequences being core to NLP, scant work has considered how to handle noisy sequence labels from multiple annotators for the same text. Given such annotations, we consider two complementary tasks: (1) aggregating sequential crowd labels to infer a best single set of consensus annotations; and (2) using crowd annotations as training data for a model that can predict sequences in unannotated text. For aggregation, we propose a novel Hidden Markov Model variant. To predict sequences in unannotated text, we propose a neural approach using Long Short Term Memory. We evaluate a suite of methods across two different applications and text genres: Named-Entity Recognition in news articles and Information Extraction from biomedical abstracts. Results show improvement over strong baselines. Our source code and data are available online.

尽管序列是NLP的核心,但很少有人考虑如何处理来自同一文本的多个注释器的噪声序列标签。给定这样的注释,我们考虑两个互补的任务:(1)聚合连续的人群标签以推断出最佳的单一共识注释集;(2)使用群体注释作为模型的训练数据,该模型可以在未注释的文本中预测序列。对于聚合,我们提出了一种新的隐马尔可夫模型变体。为了预测未注释文本中的序列,我们提出了一种使用长短期记忆的神经方法。我们评估了两种不同应用和文本类型的一套方法:新闻文章中的命名实体识别和生物医学摘要的信息提取。结果显示较强基线有所改善。我们的源代码和数据可以在网上找到。
{"title":"Aggregating and Predicting Sequence Labels from Crowd Annotations.","authors":"An T Nguyen,&nbsp;Byron C Wallace,&nbsp;Junyi Jessy Li,&nbsp;Ani Nenkova,&nbsp;Matthew Lease","doi":"10.18653/v1/P17-1028","DOIUrl":"https://doi.org/10.18653/v1/P17-1028","url":null,"abstract":"<p><p>Despite sequences being core to NLP, scant work has considered how to handle noisy sequence labels from multiple annotators for the same text. Given such annotations, we consider two complementary tasks: (1) aggregating sequential crowd labels to infer a best single set of consensus annotations; and (2) using crowd annotations as training data for a model that can predict sequences in unannotated text. For aggregation, we propose a novel Hidden Markov Model variant. To predict sequences in unannotated text, we propose a neural approach using Long Short Term Memory. We evaluate a suite of methods across two different applications and text genres: Named-Entity Recognition in news articles and Information Extraction from biomedical abstracts. Results show improvement over strong baselines. Our source code and data are available online.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2017 ","pages":"299-309"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.18653/v1/P17-1028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35568131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Understanding Discourse on Work and Job-Related Well-Being in Public Social Media 理解公共社交媒体中关于工作和工作相关幸福感的话语
Tong Liu, Christopher Homan, Cecilia Ovesdotter Alm, Megan C. Lytle-Flint, Ann Marie White, Henry A. Kautz
We construct a humans-in-the-loop supervised learning framework that integrates crowdsourcing feedback and local knowledge to detect job-related tweets from individual and business accounts. Using data-driven ethnography, we examine discourse about work by fusing language-based analysis with temporal, geospational, and labor statistics information.
我们构建了一个人在循环监督学习框架,该框架集成了众包反馈和本地知识,以检测来自个人和企业账户的与工作相关的推文。使用数据驱动的人种学,我们通过将基于语言的分析与时间、地理和劳工统计信息融合在一起来研究关于工作的话语。
{"title":"Understanding Discourse on Work and Job-Related Well-Being in Public Social Media","authors":"Tong Liu, Christopher Homan, Cecilia Ovesdotter Alm, Megan C. Lytle-Flint, Ann Marie White, Henry A. Kautz","doi":"10.18653/v1/P16-1099","DOIUrl":"https://doi.org/10.18653/v1/P16-1099","url":null,"abstract":"We construct a humans-in-the-loop supervised learning framework that integrates crowdsourcing feedback and local knowledge to detect job-related tweets from individual and business accounts. Using data-driven ethnography, we examine discourse about work by fusing language-based analysis with temporal, geospational, and labor statistics information.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"251 1","pages":"1044-1053"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75760239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Nonparametric Spherical Topic Modeling with Word Embeddings. 基于词嵌入的非参数球形主题建模。
Kayhan Batmanghelich, Ardavan Saeedi, Karthik Narasimhan, Sam Gershman

Traditional topic models do not account for semantic regularities in language. Recent distributional representations of words exhibit semantic consistency over directional metrics such as cosine similarity. However, neither categorical nor Gaussian observational distributions used in existing topic models are appropriate to leverage such correlations. In this paper, we propose to use the von Mises-Fisher distribution to model the density of words over a unit sphere. Such a representation is well-suited for directional data. We use a Hierarchical Dirichlet Process for our base topic model and propose an efficient inference algorithm based on Stochastic Variational Inference. This model enables us to naturally exploit the semantic structures of word embeddings while flexibly discovering the number of topics. Experiments demonstrate that our method outperforms competitive approaches in terms of topic coherence on two different text corpora while offering efficient inference.

传统的主题模型没有考虑语言的语义规律。最近的词的分布表示在诸如余弦相似度的方向度量上表现出语义一致性。然而,现有主题模型中使用的分类分布和高斯观测分布都不适合利用这种相关性。在本文中,我们建议使用von Mises-Fisher分布来模拟单位球上的单词密度。这种表示非常适合于定向数据。我们将层次狄利克雷过程用于基本主题模型,提出了一种基于随机变分推理的高效推理算法。该模型使我们能够自然地利用词嵌入的语义结构,同时灵活地发现主题的数量。实验表明,我们的方法在两种不同文本语料库的主题一致性方面优于竞争方法,同时提供了有效的推理。
{"title":"Nonparametric Spherical Topic Modeling with Word Embeddings.","authors":"Kayhan Batmanghelich,&nbsp;Ardavan Saeedi,&nbsp;Karthik Narasimhan,&nbsp;Sam Gershman","doi":"10.18653/v1/P16-2087","DOIUrl":"https://doi.org/10.18653/v1/P16-2087","url":null,"abstract":"<p><p>Traditional topic models do not account for semantic regularities in language. Recent distributional representations of words exhibit semantic consistency over directional metrics such as cosine similarity. However, neither categorical nor Gaussian observational distributions used in existing topic models are appropriate to leverage such correlations. In this paper, we propose to use the von Mises-Fisher distribution to model the density of words over a unit sphere. Such a representation is well-suited for directional data. We use a Hierarchical Dirichlet Process for our base topic model and propose an efficient inference algorithm based on Stochastic Variational Inference. This model enables us to naturally exploit the semantic structures of word embeddings while flexibly discovering the number of topics. Experiments demonstrate that our method outperforms competitive approaches in terms of topic coherence on two different text corpora while offering efficient inference.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2016 ","pages":"537-542"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6327958/pdf/nihms-999400.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36858644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Neural Tree Indexers for Text Understanding 文本理解的神经树索引器
Tsendsuren Munkhdalai, Hong Yu
Recurrent neural networks (RNNs) process input text sequentially and model the conditional transition between word tokens. In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. However, the current recursive architecture is limited by its dependence on syntactic tree. In this paper, we introduce a robust syntactic parsing-independent tree structured model, Neural Tree Indexers (NTI) that provides a middle ground between the sequential RNNs and the syntactic treebased recursive models. NTI constructs a full n-ary tree by processing the input text with its node function in a bottom-up fashion. Attention mechanism can then be applied to both structure and node function. We implemented and evaluated a binary tree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection, and sentence classification, outperforming state-of-the-art recurrent and recursive neural networks.
递归神经网络(RNNs)对输入文本进行顺序处理,并对单词标记之间的条件转换进行建模。相比之下,递归网络的优点包括它们明确地模拟了自然语言的组合性和递归结构。然而,目前的递归体系结构受其对语法树的依赖的限制。在本文中,我们介绍了一个鲁棒的独立于语法解析的树结构模型,神经树索引器(NTI),它提供了一个介于顺序rnn和基于语法树的递归模型之间的中间地带。NTI通过使用其节点函数以自下而上的方式处理输入文本来构造一个完整的n元树。注意机制可以同时应用于结构和节点功能。我们实现并评估了NTI的二叉树模型,表明该模型在三个不同的NLP任务上取得了最先进的性能:自然语言推理、答案句子选择和句子分类,优于最先进的递归和递归神经网络。
{"title":"Neural Tree Indexers for Text Understanding","authors":"Tsendsuren Munkhdalai, Hong Yu","doi":"10.18653/V1/E17-1002","DOIUrl":"https://doi.org/10.18653/V1/E17-1002","url":null,"abstract":"Recurrent neural networks (RNNs) process input text sequentially and model the conditional transition between word tokens. In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. However, the current recursive architecture is limited by its dependence on syntactic tree. In this paper, we introduce a robust syntactic parsing-independent tree structured model, Neural Tree Indexers (NTI) that provides a middle ground between the sequential RNNs and the syntactic treebased recursive models. NTI constructs a full n-ary tree by processing the input text with its node function in a bottom-up fashion. Attention mechanism can then be applied to both structure and node function. We implemented and evaluated a binary tree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection, and sentence classification, outperforming state-of-the-art recurrent and recursive neural networks.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"1 1","pages":"11-21"},"PeriodicalIF":0.0,"publicationDate":"2016-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78981638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 103
Neural Semantic Encoders 神经语义编码器
Tsendsuren Munkhdalai, Hong Yu
We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders. NSE is equipped with a novel memory update rule and has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through read, compose and write operations. NSE can also access 1 multiple and shared memories. In this paper, we demonstrated the effectiveness and the flexibility of NSE on five different natural language tasks: natural language inference, question answering, sentence classification, document sentiment analysis and machine translation where NSE achieved state-of-the-art performance when evaluated on publically available benchmarks. For example, our shared-memory model showed an encouraging result on neural machine translation, improving an attention-based baseline by approximately 1.0 BLEU.
我们提出了一种用于自然语言理解的记忆增强神经网络:神经语义编码器。NSE配备了一种新颖的内存更新规则,并具有可变大小的编码内存,随着时间的推移而发展,并通过读、写和写操作保持对输入序列的理解。NSE还可以访问多个内存和共享内存。在本文中,我们展示了NSE在五个不同的自然语言任务上的有效性和灵活性:自然语言推理、问题回答、句子分类、文档情感分析和机器翻译,其中NSE在公开可用的基准测试中获得了最先进的性能。例如,我们的共享内存模型在神经机器翻译上显示出令人鼓舞的结果,将基于注意力的基线提高了大约1.0 BLEU。
{"title":"Neural Semantic Encoders","authors":"Tsendsuren Munkhdalai, Hong Yu","doi":"10.18653/V1/E17-1038","DOIUrl":"https://doi.org/10.18653/V1/E17-1038","url":null,"abstract":"We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders. NSE is equipped with a novel memory update rule and has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through read, compose and write operations. NSE can also access 1 multiple and shared memories. In this paper, we demonstrated the effectiveness and the flexibility of NSE on five different natural language tasks: natural language inference, question answering, sentence classification, document sentiment analysis and machine translation where NSE achieved state-of-the-art performance when evaluated on publically available benchmarks. For example, our shared-memory model showed an encouraging result on neural machine translation, improving an attention-based baseline by approximately 1.0 BLEU.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"91 1","pages":"397-407"},"PeriodicalIF":0.0,"publicationDate":"2016-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73677189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 133
Exploring Autism Spectrum Disorders Using HLT. 使用 HLT 探索自闭症谱系障碍。
Julia Parish-Morris, Mark Liberman, Neville Ryant, Christopher Cieri, Leila Bateman, Emily Ferguson, Robert T Schultz

The phenotypic complexity of Autism Spectrum Disorder motivates the application of modern computational methods to large collections of observational data, both for improved clinical diagnosis and for better scientific understanding. We have begun to create a corpus of annotated language samples relevant to this research, and we plan to join with other researchers in pooling and publishing such resources on a large scale. The goal of this paper is to present some initial explorations to illustrate the opportunities that such datasets will afford.

自闭症谱系障碍的表型复杂,促使我们将现代计算方法应用于大量观察数据的收集,以改进临床诊断和提高科学认识。我们已经开始创建与这项研究相关的注释语言样本语料库,并计划与其他研究人员一起大规模汇集和发布此类资源。本文旨在介绍一些初步探索,以说明此类数据集将带来的机遇。
{"title":"Exploring Autism Spectrum Disorders Using HLT.","authors":"Julia Parish-Morris, Mark Liberman, Neville Ryant, Christopher Cieri, Leila Bateman, Emily Ferguson, Robert T Schultz","doi":"10.18653/v1/w16-0308","DOIUrl":"10.18653/v1/w16-0308","url":null,"abstract":"<p><p>The phenotypic complexity of Autism Spectrum Disorder motivates the application of modern computational methods to large collections of observational data, both for improved clinical diagnosis and for better scientific understanding. We have begun to create a corpus of annotated language samples relevant to this research, and we plan to join with other researchers in pooling and publishing such resources on a large scale. The goal of this paper is to present some initial explorations to illustrate the opportunities that such datasets will afford.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2016 ","pages":"74-84"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7558465/pdf/nihms-985179.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38604333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring idiosyncratic interests in children with autism. 测量自闭症儿童的特殊兴趣。
Masoud Rouhizadeh, Emily Prud'hommeaux, Jan van Santen, Richard Sproat
A defining symptom of autism spectrum disorder (ASD) is the presence of restricted and repetitive activities and interests, which can surface in language as a perseverative focus on idiosyncratic topics. In this paper, we use semantic similarity measures to identify such idiosyncratic topics in narratives produced by children with and without ASD. We find that neurotypical children tend to use the same words and semantic concepts when retelling the same narrative, while children with ASD, even when producing accurate retellings, use different words and concepts relative not only to neurotypical children but also to other children with ASD. Our results indicate that children with ASD not only stray from the target topic but do so in idiosyncratic ways according to their own restricted interests.
自闭症谱系障碍(ASD)的一个典型症状是存在限制和重复的活动和兴趣,这些活动和兴趣可以在语言中表现为对特殊话题的持续关注。在本文中,我们使用语义相似性测量来识别自闭症儿童和非自闭症儿童所产生的叙事中的这些特殊主题。我们发现,神经正常儿童在复述同样的故事时倾向于使用相同的单词和语义概念,而自闭症儿童即使在准确复述时,使用的单词和概念不仅与神经正常儿童不同,也与其他自闭症儿童不同。我们的研究结果表明,自闭症儿童不仅会偏离目标话题,而且会根据他们自己有限的兴趣以特殊的方式偏离目标话题。
{"title":"Measuring idiosyncratic interests in children with autism.","authors":"Masoud Rouhizadeh,&nbsp;Emily Prud'hommeaux,&nbsp;Jan van Santen,&nbsp;Richard Sproat","doi":"10.3115/v1/p15-2035","DOIUrl":"https://doi.org/10.3115/v1/p15-2035","url":null,"abstract":"A defining symptom of autism spectrum disorder (ASD) is the presence of restricted and repetitive activities and interests, which can surface in language as a perseverative focus on idiosyncratic topics. In this paper, we use semantic similarity measures to identify such idiosyncratic topics in narratives produced by children with and without ASD. We find that neurotypical children tend to use the same words and semantic concepts when retelling the same narrative, while children with ASD, even when producing accurate retellings, use different words and concepts relative not only to neurotypical children but also to other children with ASD. Our results indicate that children with ASD not only stray from the target topic but do so in idiosyncratic ways according to their own restricted interests.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2015 ","pages":"212-217"},"PeriodicalIF":0.0,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5715463/pdf/nihms792406.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35626981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Extension of BLANC to System Mentions. BLANC对系统提及的扩展。
Xiaoqiang Luo, Sameer Pradhan, Marta Recasens, Eduard Hovy

BLANC is a link-based coreference evaluation metric for measuring the quality of coreference systems on gold mentions. This paper extends the original BLANC ("BLANC-gold" henceforth) to system mentions, removing the gold mention assumption. The proposed BLANC falls back seamlessly to the original one if system mentions are identical to gold mentions, and it is shown to strongly correlate with existing metrics on the 2011 and 2012 CoNLL data.

BLANC是一个基于链接的共同参考评价指标,用于测量黄金提及的共同参考系统的质量。本文将原有的BLANC(以下简称BLANC-gold)扩展到系统提及,去掉了黄金提及的假设。如果系统提及与黄金提及相同,则拟议的BLANC可以无缝地回落到原始的BLANC,并且它与2011年和2012年CoNLL数据上的现有指标密切相关。
{"title":"An Extension of BLANC to System Mentions.","authors":"Xiaoqiang Luo,&nbsp;Sameer Pradhan,&nbsp;Marta Recasens,&nbsp;Eduard Hovy","doi":"10.3115/v1/P14-2005","DOIUrl":"https://doi.org/10.3115/v1/P14-2005","url":null,"abstract":"<p><p>BLANC is a link-based coreference evaluation metric for measuring the quality of coreference systems on gold mentions. This paper extends the original BLANC (\"BLANC-gold\" henceforth) to system mentions, removing the gold mention assumption. The proposed BLANC falls back seamlessly to the original one if system mentions are identical to gold mentions, and it is shown to strongly correlate with existing metrics on the 2011 and 2012 CoNLL data.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2014 ","pages":"24-29"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3115/v1/P14-2005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35225786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation. 预测提及的评分参考分区:参考实现。
Sameer Pradhan, Xiaoqiang Luo, Marta Recasens, Eduard Hovy, Vincent Ng, Michael Strube

The definitions of two coreference scoring metrics- B3 and CEAF-are underspecified with respect to predicted, as opposed to key (or gold) mentions. Several variations have been proposed that manipulate either, or both, the key and predicted mentions in order to get a one-to-one mapping. On the other hand, the metric BLANC was, until recently, limited to scoring partitions of key mentions. In this paper, we (i) argue that mention manipulation for scoring predicted mentions is unnecessary, and potentially harmful as it could produce unintuitive results; (ii) illustrate the application of all these measures to scoring predicted mentions; (iii) make available an open-source, thoroughly-tested reference implementation of the main coreference evaluation measures; and (iv) rescore the results of the CoNLL-2011/2012 shared task systems with this implementation. This will help the community accurately measure and compare new end-to-end coreference resolution algorithms.

相对于关键(或黄金)提及,两个共同参考评分指标——B3和cef——的定义相对于预测来说没有明确。为了获得一对一的映射,已经提出了几种变体,可以对关键字和预测项进行操作或同时操作。另一方面,直到最近,度量BLANC还仅限于对关键提及的分区进行评分。在本文中,我们(i)认为对预测提及进行评分的提及操作是不必要的,并且可能有害,因为它可能产生不直观的结果;(ii)说明所有这些措施对预测提及评分的应用;(iii)就主要的共同参考评估措施,提供一个开放源码、经过彻底测试的参考实施方案;(iv)通过此实现重新记录CoNLL-2011/2012共享任务系统的结果。这将有助于社区准确地测量和比较新的端到端共同参考分辨率算法。
{"title":"Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation.","authors":"Sameer Pradhan,&nbsp;Xiaoqiang Luo,&nbsp;Marta Recasens,&nbsp;Eduard Hovy,&nbsp;Vincent Ng,&nbsp;Michael Strube","doi":"10.3115/v1/P14-2006","DOIUrl":"https://doi.org/10.3115/v1/P14-2006","url":null,"abstract":"<p><p>The definitions of two coreference scoring metrics- B<sup>3</sup> and CEAF-are underspecified with respect to <i>predicted</i>, as opposed to <i>key</i> (or <i>gold</i>) mentions. Several variations have been proposed that manipulate either, or both, the key and predicted mentions in order to get a one-to-one mapping. On the other hand, the metric BLANC was, until recently, limited to scoring partitions of key mentions. In this paper, we (i) argue that mention manipulation for scoring predicted mentions is unnecessary, and potentially harmful as it could produce unintuitive results; (ii) illustrate the application of all these measures to scoring predicted mentions; (iii) make available an open-source, thoroughly-tested reference implementation of the main coreference evaluation measures; and (iv) rescore the results of the CoNLL-2011/2012 shared task systems with this implementation. This will help the community accurately measure and compare new end-to-end coreference resolution algorithms.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2014 ","pages":"30-35"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.3115/v1/P14-2006","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35225788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 185
Interpretable Semantic Vectors from a Joint Model of Brain- and Text-Based Meaning. 大脑和文本意义联合模型中的可解释语义向量
Alona Fyshe, Partha P Talukdar, Brian Murphy, Tom M Mitchell

Vector space models (VSMs) represent word meanings as points in a high dimensional space. VSMs are typically created using a large text corpora, and so represent word semantics as observed in text. We present a new algorithm (JNNSE) that can incorporate a measure of semantics not previously used to create VSMs: brain activation data recorded while people read words. The resulting model takes advantage of the complementary strengths and weaknesses of corpus and brain activation data to give a more complete representation of semantics. Evaluations show that the model 1) matches a behavioral measure of semantics more closely, 2) can be used to predict corpus data for unseen words and 3) has predictive power that generalizes across brain imaging technologies and across subjects. We believe that the model is thus a more faithful representation of mental vocabularies.

向量空间模型(VSM)将词义表示为高维空间中的点。VSM 通常使用大型文本语料库创建,因此代表的是在文本中观察到的词义。我们提出了一种新算法(JNNSE),该算法可以将以前未用于创建 VSM 的语义度量方法纳入其中:即在人们阅读单词时记录的大脑激活数据。由此产生的模型利用了语料库和脑激活数据的互补优缺点,对语义进行了更完整的表述。评估结果表明,该模型:1)与语义的行为测量更为匹配;2)可用于预测未见词语的语料库数据;3)具有跨脑成像技术和跨受试者的预测能力。因此,我们认为该模型能更忠实地反映心理词汇。
{"title":"Interpretable Semantic Vectors from a Joint Model of Brain- and Text-Based Meaning.","authors":"Alona Fyshe, Partha P Talukdar, Brian Murphy, Tom M Mitchell","doi":"10.3115/v1/p14-1046","DOIUrl":"10.3115/v1/p14-1046","url":null,"abstract":"<p><p>Vector space models (VSMs) represent word meanings as points in a high dimensional space. VSMs are typically created using a large text corpora, and so represent word semantics as observed in text. We present a new algorithm (JNNSE) that can incorporate a measure of semantics not previously used to create VSMs: brain activation data recorded while people read words. The resulting model takes advantage of the complementary strengths and weaknesses of corpus and brain activation data to give a more complete representation of semantics. Evaluations show that the model 1) matches a behavioral measure of semantics more closely, 2) can be used to predict corpus data for unseen words and 3) has predictive power that generalizes across brain imaging technologies and across subjects. We believe that the model is thus a more faithful representation of mental vocabularies.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2014 ","pages":"489-499"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4497373/pdf/nihms589902.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34282421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the conference. Association for Computational Linguistics. Meeting
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1