ANSWER: An unsupervised attractor network method for detecting salient words in text corpora

Madhavun Candadai, A. Vanarase, M. Mei, A. Minai
{"title":"ANSWER: An unsupervised attractor network method for detecting salient words in text corpora","authors":"Madhavun Candadai, A. Vanarase, M. Mei, A. Minai","doi":"10.1109/IJCNN.2015.7280773","DOIUrl":null,"url":null,"abstract":"The availability of unstructured text as a source of data has increased by orders of magnitude in the last few years, triggering extensive research in the automated processing and analysis of electronic texts. An especially important and difficult problem is the identification of salient words in a corpus, so that further processing can focus on these words without distraction by uninformative words. Standard lists of stop words are used to remove common words such as articles, pronouns and prepositions, but many other words that should be removed are much harder to identify because word salience is highly context-dependent. In this paper, we describe a neurodynamical approach for the context-dependent identification of salient words in large text corpora. The method, termed the Attractor Network-based Salient Word Extraction Rule (ANSWER) is modeled as a cognitive mechanism that identifies salient words based on their participation in coherent multi-word ideas. These ideas are, in turn, extracted via attractor dynamics in a recurrent neural network modeling the associative semantic graph of the corpus. The corpus used in this paper comprises the abstracts of all papers published in the proceedings of IJCNN 2009, 2011 and 2013. The list of salient words that the system generates is compared with those generated by other standard metrics, and is found to outperform all of them in almost all cases.","PeriodicalId":6539,"journal":{"name":"2015 International Joint Conference on Neural Networks (IJCNN)","volume":"62 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2015.7280773","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The availability of unstructured text as a source of data has increased by orders of magnitude in the last few years, triggering extensive research in the automated processing and analysis of electronic texts. An especially important and difficult problem is the identification of salient words in a corpus, so that further processing can focus on these words without distraction by uninformative words. Standard lists of stop words are used to remove common words such as articles, pronouns and prepositions, but many other words that should be removed are much harder to identify because word salience is highly context-dependent. In this paper, we describe a neurodynamical approach for the context-dependent identification of salient words in large text corpora. The method, termed the Attractor Network-based Salient Word Extraction Rule (ANSWER) is modeled as a cognitive mechanism that identifies salient words based on their participation in coherent multi-word ideas. These ideas are, in turn, extracted via attractor dynamics in a recurrent neural network modeling the associative semantic graph of the corpus. The corpus used in this paper comprises the abstracts of all papers published in the proceedings of IJCNN 2009, 2011 and 2013. The list of salient words that the system generates is compared with those generated by other standard metrics, and is found to outperform all of them in almost all cases.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
答:一种用于检测文本语料库中显著词的无监督吸引子网络方法
在过去几年中,作为数据来源的非结构化文本的可用性以数量级增加,引发了对电子文本的自动化处理和分析的广泛研究。一个特别重要和困难的问题是识别语料库中的突出词,以便进一步的处理可以集中在这些词上,而不会被无信息的词分散注意力。标准的停止词列表用于删除冠词、代词和介词等常见单词,但许多其他应该删除的单词更难识别,因为单词的显著性高度依赖于上下文。在本文中,我们描述了一种神经动力学方法来识别大型文本语料库中上下文相关的突出词。这种方法被称为基于吸引子网络的突出词提取规则(ANSWER),它被建模为一种认知机制,根据它们在连贯的多词思想中的参与来识别突出词。反过来,这些想法通过循环神经网络中的吸引子动态来提取,该神经网络对语料库的关联语义图进行建模。本文使用的语料库包括IJCNN 2009年、2011年和2013年会刊上发表的所有论文摘要。将系统生成的突出词列表与其他标准指标生成的突出词列表进行比较,发现几乎在所有情况下都优于所有标准指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Efficient conformal regressors using bagged neural nets Repeated play of the SVM game as a means of adaptive classification Unit commitment considering multiple charging and discharging scenarios of plug-in electric vehicles High-dimensional function approximation using local linear embedding A label compression coding approach through maximizing dependence between features and labels for multi-label classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1