{"title":"基于特征选择和主题模型的文本数据流集成分类算法","authors":"Zhongxin Wang, Jianqiao Liu, Gang Sun, Jia Zhao, Zhengqi Ding, Xiaowen Guan","doi":"10.1109/ICAICA50127.2020.9181903","DOIUrl":null,"url":null,"abstract":"How to mine valuable information that users are interested in from a continuous text data stream, text data stream classification has received widespread attention as a core technology to solve the problem. This paper proposes a text data stream ensemble classification algorithm that combines feature selection and topic model. Firstly, the mutual information feature selection method is used to remove features that are not related to classification. Secondly, the LDA topic model is used to establish the document-topic distribution. Finally, the pre-processed text data stream is classified by an ensemble classification model. The experimental results show that the proposed text data stream ensemble classification algorithm can improve the classification performance of text data stream.","PeriodicalId":113564,"journal":{"name":"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"An Ensemble Classification Algorithm for Text Data Stream based on Feature Selection and Topic Model\",\"authors\":\"Zhongxin Wang, Jianqiao Liu, Gang Sun, Jia Zhao, Zhengqi Ding, Xiaowen Guan\",\"doi\":\"10.1109/ICAICA50127.2020.9181903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"How to mine valuable information that users are interested in from a continuous text data stream, text data stream classification has received widespread attention as a core technology to solve the problem. This paper proposes a text data stream ensemble classification algorithm that combines feature selection and topic model. Firstly, the mutual information feature selection method is used to remove features that are not related to classification. Secondly, the LDA topic model is used to establish the document-topic distribution. Finally, the pre-processed text data stream is classified by an ensemble classification model. The experimental results show that the proposed text data stream ensemble classification algorithm can improve the classification performance of text data stream.\",\"PeriodicalId\":113564,\"journal\":{\"name\":\"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICA50127.2020.9181903\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICA50127.2020.9181903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Ensemble Classification Algorithm for Text Data Stream based on Feature Selection and Topic Model
How to mine valuable information that users are interested in from a continuous text data stream, text data stream classification has received widespread attention as a core technology to solve the problem. This paper proposes a text data stream ensemble classification algorithm that combines feature selection and topic model. Firstly, the mutual information feature selection method is used to remove features that are not related to classification. Secondly, the LDA topic model is used to establish the document-topic distribution. Finally, the pre-processed text data stream is classified by an ensemble classification model. The experimental results show that the proposed text data stream ensemble classification algorithm can improve the classification performance of text data stream.