用于在线新闻分类的 X-News 数据集

IF 2.2 Q3 COMPUTER SCIENCE, CYBERNETICS International Journal of Intelligent Computing and Cybernetics Pub Date : 2024-08-13 DOI:10.1108/ijicc-04-2024-0184

Samia Nawaz Yousafzai, Hooria Shahbaz, Armughan Ali, Amreen Qamar, Inzamam Mashood Nasir, Sara Tehsin, R. Damaševičius

{"title":"用于在线新闻分类的 X-News 数据集","authors":"Samia Nawaz Yousafzai, Hooria Shahbaz, Armughan Ali, Amreen Qamar, Inzamam Mashood Nasir, Sara Tehsin, R. Damaševičius","doi":"10.1108/ijicc-04-2024-0184","DOIUrl":null,"url":null,"abstract":"PurposeThe objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning (DL) techniques. A distributed framework utilizing Bidirectional Encoder Representations from Transformers (BERT) was developed to classify news headlines. This approach leverages various text mining and DL techniques on a distributed infrastructure, aiming to offer an alternative to traditional news classification methods.Design/methodology/approachThis study focuses on the classification of distinct types of news by analyzing tweets from various news channels. It addresses the limitations of using benchmark datasets for news classification, which often result in models that are impractical for real-world applications.FindingsThe framework’s effectiveness was evaluated on a newly proposed dataset and two additional benchmark datasets from the Kaggle repository, assessing the performance of each text mining and classification method across these datasets. The results of this study demonstrate that the proposed strategy significantly outperforms other approaches in terms of accuracy and execution time. This indicates that the distributed framework, coupled with the use of BERT for text analysis, provides a robust solution for analyzing large volumes of data efficiently. The findings also highlight the value of the newly released corpus for further research in news classification and emotion classification, suggesting its potential to facilitate advancements in these areas.Originality/valueThis research introduces an innovative distributed framework for news classification that addresses the shortcomings of models trained on benchmark datasets. By utilizing cutting-edge techniques and a novel dataset, the study offers significant improvements in accuracy and processing speed. The release of the corpus represents a valuable contribution to the field, enabling further exploration into news and emotion classification. This work sets a new standard for the analysis of news data, offering practical implications for the development of more effective and efficient news classification systems.","PeriodicalId":45291,"journal":{"name":"International Journal of Intelligent Computing and Cybernetics","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"X-News dataset for online news categorization\",\"authors\":\"Samia Nawaz Yousafzai, Hooria Shahbaz, Armughan Ali, Amreen Qamar, Inzamam Mashood Nasir, Sara Tehsin, R. Damaševičius\",\"doi\":\"10.1108/ijicc-04-2024-0184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PurposeThe objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning (DL) techniques. A distributed framework utilizing Bidirectional Encoder Representations from Transformers (BERT) was developed to classify news headlines. This approach leverages various text mining and DL techniques on a distributed infrastructure, aiming to offer an alternative to traditional news classification methods.Design/methodology/approachThis study focuses on the classification of distinct types of news by analyzing tweets from various news channels. It addresses the limitations of using benchmark datasets for news classification, which often result in models that are impractical for real-world applications.FindingsThe framework’s effectiveness was evaluated on a newly proposed dataset and two additional benchmark datasets from the Kaggle repository, assessing the performance of each text mining and classification method across these datasets. The results of this study demonstrate that the proposed strategy significantly outperforms other approaches in terms of accuracy and execution time. This indicates that the distributed framework, coupled with the use of BERT for text analysis, provides a robust solution for analyzing large volumes of data efficiently. The findings also highlight the value of the newly released corpus for further research in news classification and emotion classification, suggesting its potential to facilitate advancements in these areas.Originality/valueThis research introduces an innovative distributed framework for news classification that addresses the shortcomings of models trained on benchmark datasets. By utilizing cutting-edge techniques and a novel dataset, the study offers significant improvements in accuracy and processing speed. The release of the corpus represents a valuable contribution to the field, enabling further exploration into news and emotion classification. This work sets a new standard for the analysis of news data, offering practical implications for the development of more effective and efficient news classification systems.\",\"PeriodicalId\":45291,\"journal\":{\"name\":\"International Journal of Intelligent Computing and Cybernetics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Intelligent Computing and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/ijicc-04-2024-0184\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, CYBERNETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Computing and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ijicc-04-2024-0184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}

引用次数: 0

摘要

目的目标是利用先进的文本挖掘和深度学习（DL）技术开发一种更有效的模型，以简化和加速新闻分类过程。我们开发了一个分布式框架，利用来自变换器的双向编码器表示（BERT）对新闻标题进行分类。这种方法在分布式基础架构上利用了各种文本挖掘和深度学习技术，旨在为传统的新闻分类方法提供一种替代方案。研究结果在新提出的数据集和来自 Kaggle 数据库的另外两个基准数据集上评估了该框架的有效性，评估了每种文本挖掘和分类方法在这些数据集上的性能。研究结果表明，所提出的策略在准确性和执行时间方面明显优于其他方法。这表明，分布式框架加上使用 BERT 进行文本分析，为高效分析海量数据提供了强大的解决方案。研究结果还凸显了新发布的语料库对于进一步研究新闻分类和情感分类的价值，表明它具有促进这些领域进步的潜力。原创性/价值这项研究为新闻分类引入了一个创新的分布式框架，解决了在基准数据集上训练的模型的缺点。通过利用前沿技术和新颖的数据集，该研究在准确性和处理速度方面都有显著提高。该语料库的发布是对该领域的宝贵贡献，有助于进一步探索新闻和情感分类。这项工作为新闻数据分析设定了新标准，为开发更有效、更高效的新闻分类系统提供了实际意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

X-News dataset for online news categorization

PurposeThe objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning (DL) techniques. A distributed framework utilizing Bidirectional Encoder Representations from Transformers (BERT) was developed to classify news headlines. This approach leverages various text mining and DL techniques on a distributed infrastructure, aiming to offer an alternative to traditional news classification methods.Design/methodology/approachThis study focuses on the classification of distinct types of news by analyzing tweets from various news channels. It addresses the limitations of using benchmark datasets for news classification, which often result in models that are impractical for real-world applications.FindingsThe framework’s effectiveness was evaluated on a newly proposed dataset and two additional benchmark datasets from the Kaggle repository, assessing the performance of each text mining and classification method across these datasets. The results of this study demonstrate that the proposed strategy significantly outperforms other approaches in terms of accuracy and execution time. This indicates that the distributed framework, coupled with the use of BERT for text analysis, provides a robust solution for analyzing large volumes of data efficiently. The findings also highlight the value of the newly released corpus for further research in news classification and emotion classification, suggesting its potential to facilitate advancements in these areas.Originality/valueThis research introduces an innovative distributed framework for news classification that addresses the shortcomings of models trained on benchmark datasets. By utilizing cutting-edge techniques and a novel dataset, the study offers significant improvements in accuracy and processing speed. The release of the corpus represents a valuable contribution to the field, enabling further exploration into news and emotion classification. This work sets a new standard for the analysis of news data, offering practical implications for the development of more effective and efficient news classification systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Intelligent Computing and Cybernetics COMPUTER SCIENCE, CYBERNETICS-

CiteScore

6.80

自引率

4.70%

发文量