{"title":"Reducing Data Volume in News Topic Classification: Deep Learning Framework and Dataset","authors":"Luigi Serreli;Claudio Marche;Michele Nitti","doi":"10.1109/OJCS.2024.3519747","DOIUrl":null,"url":null,"abstract":"Withthe rise of smart devices and technological advancements, accessing vast amounts of information has become easier than ever before. However, sorting and categorising such an overwhelming volume of content has become increasingly challenging. This article introduces a new framework for classifying news articles based on a Bidirectional LSTM (BiLSTM) network and an attention mechanism. The article also presents a new dataset of 60 000 news articles from various global sources. Furthermore, it proposes a methodology for reducing data volume by extracting key sentences using an algorithm resulting in inference times that are, on average, 50% shorter than the original document without compromising the system's accuracy. Experimental evaluations demonstrate that our framework outperforms existing methodologies in terms of accuracy. Our system's accuracy has been compared with various works using two popular datasets, AG News and BBC News, and has achieved excellent results of 99.7% and 94.55%, respectively.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"152-163"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10806791","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10806791/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Withthe rise of smart devices and technological advancements, accessing vast amounts of information has become easier than ever before. However, sorting and categorising such an overwhelming volume of content has become increasingly challenging. This article introduces a new framework for classifying news articles based on a Bidirectional LSTM (BiLSTM) network and an attention mechanism. The article also presents a new dataset of 60 000 news articles from various global sources. Furthermore, it proposes a methodology for reducing data volume by extracting key sentences using an algorithm resulting in inference times that are, on average, 50% shorter than the original document without compromising the system's accuracy. Experimental evaluations demonstrate that our framework outperforms existing methodologies in terms of accuracy. Our system's accuracy has been compared with various works using two popular datasets, AG News and BBC News, and has achieved excellent results of 99.7% and 94.55%, respectively.