Enhanced analysis of large-scale news text data using the bidirectional-Kmeans-LSTM-CNN model

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE PeerJ Computer Science Pub Date : 2024-08-01 DOI:10.7717/peerj-cs.2213

Qingxiang Zeng

{"title":"Enhanced analysis of large-scale news text data using the bidirectional-Kmeans-LSTM-CNN model","authors":"Qingxiang Zeng","doi":"10.7717/peerj-cs.2213","DOIUrl":null,"url":null,"abstract":"Traditional methods may be inefficient when processing large-scale data in the field of text mining, often struggling to identify and cluster relevant information accurately and efficiently. Additionally, capturing nuanced sentiment and emotional context within news text is challenging with conventional techniques. To address these issues, this article introduces an improved bidirectional-Kmeans-long short-term memory network-convolutional neural network (BiK-LSTM-CNN) model that incorporates emotional semantic analysis for high-dimensional news text visual extraction and media hotspot mining. The BiK-LSTM-CNN model comprises four modules: news text preprocessing, news text clustering, sentiment semantic analysis, and the BiK-LSTM-CNN model itself. By combining these components, the model effectively identifies common features within the input data, clusters similar news articles, and accurately analyzes the emotional semantics of the text. This comprehensive approach enhances both the accuracy and efficiency of visual extraction and hotspot mining. Experimental results demonstrate that compared to models such as Transformer, AdvLSTM, and NewRNN, BiK-LSTM-CNN achieves improvements in macro accuracy by 0.50%, 0.91%, and 1.34%, respectively. Similarly, macro recall rates increase by 0.51%, 1.24%, and 1.26%, while macro F1 scores improve by 0.52%, 1.23%, and 1.92%. Additionally, the BiK-LSTM-CNN model shows significant improvements in time efficiency, further establishing its potential as a more effective approach for processing and analyzing large-scale text data","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"89 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2213","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional methods may be inefficient when processing large-scale data in the field of text mining, often struggling to identify and cluster relevant information accurately and efficiently. Additionally, capturing nuanced sentiment and emotional context within news text is challenging with conventional techniques. To address these issues, this article introduces an improved bidirectional-Kmeans-long short-term memory network-convolutional neural network (BiK-LSTM-CNN) model that incorporates emotional semantic analysis for high-dimensional news text visual extraction and media hotspot mining. The BiK-LSTM-CNN model comprises four modules: news text preprocessing, news text clustering, sentiment semantic analysis, and the BiK-LSTM-CNN model itself. By combining these components, the model effectively identifies common features within the input data, clusters similar news articles, and accurately analyzes the emotional semantics of the text. This comprehensive approach enhances both the accuracy and efficiency of visual extraction and hotspot mining. Experimental results demonstrate that compared to models such as Transformer, AdvLSTM, and NewRNN, BiK-LSTM-CNN achieves improvements in macro accuracy by 0.50%, 0.91%, and 1.34%, respectively. Similarly, macro recall rates increase by 0.51%, 1.24%, and 1.26%, while macro F1 scores improve by 0.52%, 1.23%, and 1.92%. Additionally, the BiK-LSTM-CNN model shows significant improvements in time efficiency, further establishing its potential as a more effective approach for processing and analyzing large-scale text data

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用双向均值-LSTM-CNN 模型加强对大规模新闻文本数据的分析

在文本挖掘领域处理大规模数据时，传统方法可能效率低下，往往难以准确高效地识别和聚类相关信息。此外，传统技术难以捕捉新闻文本中细微的情感和情绪背景。为解决这些问题，本文介绍了一种改进的双向-均值-长短期记忆网络-卷积神经网络（BiK-LSTM-CNN）模型，该模型结合了情感语义分析，可用于高维新闻文本视觉提取和媒体热点挖掘。BiK-LSTM-CNN 模型包括四个模块：新闻文本预处理、新闻文本聚类、情感语义分析和 BiK-LSTM-CNN 模型本身。通过结合这些组件，该模型能有效识别输入数据中的共同特征，聚类相似的新闻文章，并准确分析文本的情感语义。这种综合方法提高了视觉提取和热点挖掘的准确性和效率。实验结果表明，与 Transformer、AdvLSTM 和 NewRNN 等模型相比，BiK-LSTM-CNN 的宏观准确率分别提高了 0.50%、0.91% 和 1.34%。同样，宏观召回率提高了 0.51%、1.24% 和 1.26%，宏观 F1 分数提高了 0.52%、1.23% 和 1.92%。此外，BiK-LSTM-CNN 模型在时间效率方面也有显著提高，进一步证实了其作为处理和分析大规模文本数据的更有效方法的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.