{"title":"ANSWER: An unsupervised attractor network method for detecting salient words in text corpora","authors":"Madhavun Candadai, A. Vanarase, M. Mei, A. Minai","doi":"10.1109/IJCNN.2015.7280773","DOIUrl":null,"url":null,"abstract":"The availability of unstructured text as a source of data has increased by orders of magnitude in the last few years, triggering extensive research in the automated processing and analysis of electronic texts. An especially important and difficult problem is the identification of salient words in a corpus, so that further processing can focus on these words without distraction by uninformative words. Standard lists of stop words are used to remove common words such as articles, pronouns and prepositions, but many other words that should be removed are much harder to identify because word salience is highly context-dependent. In this paper, we describe a neurodynamical approach for the context-dependent identification of salient words in large text corpora. The method, termed the Attractor Network-based Salient Word Extraction Rule (ANSWER) is modeled as a cognitive mechanism that identifies salient words based on their participation in coherent multi-word ideas. These ideas are, in turn, extracted via attractor dynamics in a recurrent neural network modeling the associative semantic graph of the corpus. The corpus used in this paper comprises the abstracts of all papers published in the proceedings of IJCNN 2009, 2011 and 2013. The list of salient words that the system generates is compared with those generated by other standard metrics, and is found to outperform all of them in almost all cases.","PeriodicalId":6539,"journal":{"name":"2015 International Joint Conference on Neural Networks (IJCNN)","volume":"62 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2015.7280773","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The availability of unstructured text as a source of data has increased by orders of magnitude in the last few years, triggering extensive research in the automated processing and analysis of electronic texts. An especially important and difficult problem is the identification of salient words in a corpus, so that further processing can focus on these words without distraction by uninformative words. Standard lists of stop words are used to remove common words such as articles, pronouns and prepositions, but many other words that should be removed are much harder to identify because word salience is highly context-dependent. In this paper, we describe a neurodynamical approach for the context-dependent identification of salient words in large text corpora. The method, termed the Attractor Network-based Salient Word Extraction Rule (ANSWER) is modeled as a cognitive mechanism that identifies salient words based on their participation in coherent multi-word ideas. These ideas are, in turn, extracted via attractor dynamics in a recurrent neural network modeling the associative semantic graph of the corpus. The corpus used in this paper comprises the abstracts of all papers published in the proceedings of IJCNN 2009, 2011 and 2013. The list of salient words that the system generates is compared with those generated by other standard metrics, and is found to outperform all of them in almost all cases.