{"title":"Opinion subset selection via submodular maximization","authors":"Yang Zhao, Tommy W.S. Chow","doi":"10.1016/j.ins.2020.12.083","DOIUrl":null,"url":null,"abstract":"<div><p><span>Current research on subset selection for opinion analysis assumes that their methods can retrieve the opinions expressed in documents from general text features. However, such relaxed conditions can hardly maintain the performance of the analysis in </span>opinion mining<span>, especially when given strict limitations on the subset size<span>. In this paper, we propose a framework for opinion subset selection. This framework can select a small set of instances from original data to convey a subjective representation for opinion classification and regression. Compared with our framework, the conventional submodular based subset selection approach cannot capture the fine-grained opinion features expressed in the corpus. Specifically, we propose a monotone non-decreasing score function<span> and a framework based on topic modeling and submodular maximization for filtering irrelevant information and selecting the subsets. Our work further introduces an opinion-sensitive algorithm for optimizing the proposed function for opinion subset construction. We perform extensive experiments and comparative analysis of different subset selection methods in this work. The experimental result shows that the proposed opinion subset selection framework can compress the original text training set and preserve the test set’s classification and regression metric performance at the same time.</span></span></span></p></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"560 ","pages":"Pages 283-306"},"PeriodicalIF":8.1000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.ins.2020.12.083","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025521000141","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 5
Abstract
Current research on subset selection for opinion analysis assumes that their methods can retrieve the opinions expressed in documents from general text features. However, such relaxed conditions can hardly maintain the performance of the analysis in opinion mining, especially when given strict limitations on the subset size. In this paper, we propose a framework for opinion subset selection. This framework can select a small set of instances from original data to convey a subjective representation for opinion classification and regression. Compared with our framework, the conventional submodular based subset selection approach cannot capture the fine-grained opinion features expressed in the corpus. Specifically, we propose a monotone non-decreasing score function and a framework based on topic modeling and submodular maximization for filtering irrelevant information and selecting the subsets. Our work further introduces an opinion-sensitive algorithm for optimizing the proposed function for opinion subset construction. We perform extensive experiments and comparative analysis of different subset selection methods in this work. The experimental result shows that the proposed opinion subset selection framework can compress the original text training set and preserve the test set’s classification and regression metric performance at the same time.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.