On the use of text augmentation for stance and fake news detection

IF 2.7 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Information and Telecommunication Pub Date : 2023-04-19 DOI:10.1080/24751839.2023.2198820
Ilhem Salah, Khaled Jouini, O. Korbaa
{"title":"On the use of text augmentation for stance and fake news detection","authors":"Ilhem Salah, Khaled Jouini, O. Korbaa","doi":"10.1080/24751839.2023.2198820","DOIUrl":null,"url":null,"abstract":"ABSTRACT Data Augmentation (DA) aims at synthesizing new training instances by applying transformations to available ones. DA has several well-known benefits such as: (i) increasing generalization ability; (ii) preventing data scarcity; and (iii) helping resolve class imbalance issues. In this work, we investigate the use of DA for stance and fake news detection. In the first part of our work, we explore the effect of various DA techniques on the performance of common classification algorithms. Our study reveals that the motto ‘the more, the better’ is the wrong approach regarding text augmentation and that there is no one-size-fits-all text augmentation technique. The second part of our work leverages the results of our study to propose a novel augmentation-based, ensemble learning approach. The proposed approach leverages text augmentation to enhance base learners' diversity and accuracy, ergo the predictive performance of the ensemble. The third part of our work experimentally investigates the use of DA to cope with the class imbalance problem. Class imbalance is very common in stance and fake news detection and often results in biased models. In this work we show how and to what extent text augmentation can help resolving moderate and severe imbalance.","PeriodicalId":32180,"journal":{"name":"Journal of Information and Telecommunication","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information and Telecommunication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/24751839.2023.2198820","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2

Abstract

ABSTRACT Data Augmentation (DA) aims at synthesizing new training instances by applying transformations to available ones. DA has several well-known benefits such as: (i) increasing generalization ability; (ii) preventing data scarcity; and (iii) helping resolve class imbalance issues. In this work, we investigate the use of DA for stance and fake news detection. In the first part of our work, we explore the effect of various DA techniques on the performance of common classification algorithms. Our study reveals that the motto ‘the more, the better’ is the wrong approach regarding text augmentation and that there is no one-size-fits-all text augmentation technique. The second part of our work leverages the results of our study to propose a novel augmentation-based, ensemble learning approach. The proposed approach leverages text augmentation to enhance base learners' diversity and accuracy, ergo the predictive performance of the ensemble. The third part of our work experimentally investigates the use of DA to cope with the class imbalance problem. Class imbalance is very common in stance and fake news detection and often results in biased models. In this work we show how and to what extent text augmentation can help resolving moderate and severe imbalance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
关于文本增强在姿态和假新闻检测中的应用
数据增强(Data Augmentation, DA)旨在通过对已有的训练实例进行转换来合成新的训练实例。数据分析有几个众所周知的好处,例如:(i)提高泛化能力;(ii)防止数据短缺;(三)帮助解决阶级失衡问题。在这项工作中,我们研究了数据处理在姿态和假新闻检测中的应用。在我们工作的第一部分中,我们探讨了各种数据处理技术对常用分类算法性能的影响。我们的研究表明,“越多越好”的座右铭是关于文本增强的错误方法,并且没有一种适用于所有文本增强的技术。我们工作的第二部分利用我们的研究结果提出了一种新的基于增强的集成学习方法。提出的方法利用文本增强来提高基础学习者的多样性和准确性,从而提高集成的预测性能。第三部分实验研究了数据挖掘在处理类不平衡问题中的应用。阶级不平衡在立场和假新闻检测中非常普遍,并且经常导致有偏见的模型。在这项工作中,我们展示了文本增强如何以及在多大程度上可以帮助解决中度和严重的不平衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.50
自引率
0.00%
发文量
18
审稿时长
27 weeks
期刊最新文献
A fast and efficient data reuse scheme for HEVC Integer Motion Estimation hardware architecture 2TierEdge-Defense: a cascaded defense framework with rule-based LSTM for NCIFA in NDN Physical layer security in wireless sensors networks: secrecy outage probability analysis Deep learning-based human pose estimation towards artworks classification JCARP: Joint Channel Assignment and Routing Protocol for cognitive-radio-based internet of things (CRIoT)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1