ANSWER: An unsupervised attractor network method for detecting salient words in text corpora

2015 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2015-07-12 DOI:10.1109/IJCNN.2015.7280773

Madhavun Candadai, A. Vanarase, M. Mei, A. Minai

{"title":"ANSWER: An unsupervised attractor network method for detecting salient words in text corpora","authors":"Madhavun Candadai, A. Vanarase, M. Mei, A. Minai","doi":"10.1109/IJCNN.2015.7280773","DOIUrl":null,"url":null,"abstract":"The availability of unstructured text as a source of data has increased by orders of magnitude in the last few years, triggering extensive research in the automated processing and analysis of electronic texts. An especially important and difficult problem is the identification of salient words in a corpus, so that further processing can focus on these words without distraction by uninformative words. Standard lists of stop words are used to remove common words such as articles, pronouns and prepositions, but many other words that should be removed are much harder to identify because word salience is highly context-dependent. In this paper, we describe a neurodynamical approach for the context-dependent identification of salient words in large text corpora. The method, termed the Attractor Network-based Salient Word Extraction Rule (ANSWER) is modeled as a cognitive mechanism that identifies salient words based on their participation in coherent multi-word ideas. These ideas are, in turn, extracted via attractor dynamics in a recurrent neural network modeling the associative semantic graph of the corpus. The corpus used in this paper comprises the abstracts of all papers published in the proceedings of IJCNN 2009, 2011 and 2013. The list of salient words that the system generates is compared with those generated by other standard metrics, and is found to outperform all of them in almost all cases.","PeriodicalId":6539,"journal":{"name":"2015 International Joint Conference on Neural Networks (IJCNN)","volume":"62 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2015.7280773","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The availability of unstructured text as a source of data has increased by orders of magnitude in the last few years, triggering extensive research in the automated processing and analysis of electronic texts. An especially important and difficult problem is the identification of salient words in a corpus, so that further processing can focus on these words without distraction by uninformative words. Standard lists of stop words are used to remove common words such as articles, pronouns and prepositions, but many other words that should be removed are much harder to identify because word salience is highly context-dependent. In this paper, we describe a neurodynamical approach for the context-dependent identification of salient words in large text corpora. The method, termed the Attractor Network-based Salient Word Extraction Rule (ANSWER) is modeled as a cognitive mechanism that identifies salient words based on their participation in coherent multi-word ideas. These ideas are, in turn, extracted via attractor dynamics in a recurrent neural network modeling the associative semantic graph of the corpus. The corpus used in this paper comprises the abstracts of all papers published in the proceedings of IJCNN 2009, 2011 and 2013. The list of salient words that the system generates is compared with those generated by other standard metrics, and is found to outperform all of them in almost all cases.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

答:一种用于检测文本语料库中显著词的无监督吸引子网络方法

在过去几年中，作为数据来源的非结构化文本的可用性以数量级增加，引发了对电子文本的自动化处理和分析的广泛研究。一个特别重要和困难的问题是识别语料库中的突出词，以便进一步的处理可以集中在这些词上，而不会被无信息的词分散注意力。标准的停止词列表用于删除冠词、代词和介词等常见单词，但许多其他应该删除的单词更难识别，因为单词的显著性高度依赖于上下文。在本文中，我们描述了一种神经动力学方法来识别大型文本语料库中上下文相关的突出词。这种方法被称为基于吸引子网络的突出词提取规则(ANSWER)，它被建模为一种认知机制，根据它们在连贯的多词思想中的参与来识别突出词。反过来，这些想法通过循环神经网络中的吸引子动态来提取，该神经网络对语料库的关联语义图进行建模。本文使用的语料库包括IJCNN 2009年、2011年和2013年会刊上发表的所有论文摘要。将系统生成的突出词列表与其他标准指标生成的突出词列表进行比较，发现几乎在所有情况下都优于所有标准指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量