On the use of text augmentation for stance and fake news detection

IF 2.7 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Information and Telecommunication Pub Date : 2023-04-19 DOI:10.1080/24751839.2023.2198820

Ilhem Salah, Khaled Jouini, O. Korbaa

{"title":"On the use of text augmentation for stance and fake news detection","authors":"Ilhem Salah, Khaled Jouini, O. Korbaa","doi":"10.1080/24751839.2023.2198820","DOIUrl":null,"url":null,"abstract":"ABSTRACT Data Augmentation (DA) aims at synthesizing new training instances by applying transformations to available ones. DA has several well-known benefits such as: (i) increasing generalization ability; (ii) preventing data scarcity; and (iii) helping resolve class imbalance issues. In this work, we investigate the use of DA for stance and fake news detection. In the first part of our work, we explore the effect of various DA techniques on the performance of common classification algorithms. Our study reveals that the motto ‘the more, the better’ is the wrong approach regarding text augmentation and that there is no one-size-fits-all text augmentation technique. The second part of our work leverages the results of our study to propose a novel augmentation-based, ensemble learning approach. The proposed approach leverages text augmentation to enhance base learners' diversity and accuracy, ergo the predictive performance of the ensemble. The third part of our work experimentally investigates the use of DA to cope with the class imbalance problem. Class imbalance is very common in stance and fake news detection and often results in biased models. In this work we show how and to what extent text augmentation can help resolving moderate and severe imbalance.","PeriodicalId":32180,"journal":{"name":"Journal of Information and Telecommunication","volume":"15 6","pages":"359 - 375"},"PeriodicalIF":2.7000,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information and Telecommunication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/24751839.2023.2198820","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 2

Abstract

ABSTRACT Data Augmentation (DA) aims at synthesizing new training instances by applying transformations to available ones. DA has several well-known benefits such as: (i) increasing generalization ability; (ii) preventing data scarcity; and (iii) helping resolve class imbalance issues. In this work, we investigate the use of DA for stance and fake news detection. In the first part of our work, we explore the effect of various DA techniques on the performance of common classification algorithms. Our study reveals that the motto ‘the more, the better’ is the wrong approach regarding text augmentation and that there is no one-size-fits-all text augmentation technique. The second part of our work leverages the results of our study to propose a novel augmentation-based, ensemble learning approach. The proposed approach leverages text augmentation to enhance base learners' diversity and accuracy, ergo the predictive performance of the ensemble. The third part of our work experimentally investigates the use of DA to cope with the class imbalance problem. Class imbalance is very common in stance and fake news detection and often results in biased models. In this work we show how and to what extent text augmentation can help resolving moderate and severe imbalance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

关于文本增强在姿态和假新闻检测中的应用

数据增强(Data Augmentation, DA)旨在通过对已有的训练实例进行转换来合成新的训练实例。数据分析有几个众所周知的好处，例如:(i)提高泛化能力;(ii)防止数据短缺;(三)帮助解决阶级失衡问题。在这项工作中，我们研究了数据处理在姿态和假新闻检测中的应用。在我们工作的第一部分中，我们探讨了各种数据处理技术对常用分类算法性能的影响。我们的研究表明，“越多越好”的座右铭是关于文本增强的错误方法，并且没有一种适用于所有文本增强的技术。我们工作的第二部分利用我们的研究结果提出了一种新的基于增强的集成学习方法。提出的方法利用文本增强来提高基础学习者的多样性和准确性，从而提高集成的预测性能。第三部分实验研究了数据挖掘在处理类不平衡问题中的应用。阶级不平衡在立场和假新闻检测中非常普遍，并且经常导致有偏见的模型。在这项工作中，我们展示了文本增强如何以及在多大程度上可以帮助解决中度和严重的不平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊