Fake news detection: deep semantic representation with enhanced feature engineering.

IF 3.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Data Science and Analytics Pub Date : 2023-03-09 DOI:10.1007/s41060-023-00387-8
Mohammadreza Samadi, Saeedeh Momtazi
{"title":"Fake news detection: deep semantic representation with enhanced feature engineering.","authors":"Mohammadreza Samadi, Saeedeh Momtazi","doi":"10.1007/s41060-023-00387-8","DOIUrl":null,"url":null,"abstract":"<p><p>Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation in social media. Despite the recent research studies that only focused on semantic features extracted by deep contextualized text representation models, we aim to show that content-based feature engineering can enhance the semantic models in a complex task like fake news detection. These features can provide valuable information from different aspects of input texts and assist our neural classifier in detecting fake and real news more accurately than using semantic features. To substantiate the effectiveness of feature engineering besides semantic features, we proposed a deep neural architecture in which three parallel convolutional neural network (CNN) layers extract semantic features from contextual representation vectors. Then, semantic and content-based features are fed to a fully connected layer. We evaluated our model on an English dataset about the COVID-19 pandemic and a domain-independent Persian fake news dataset (TAJ). Our experiments on the English COVID-19 dataset show 4.16% and 4.02% improvement in accuracy and f1-score, respectively, compared to the baseline model, which does not benefit from the content-based features. We also achieved 2.01% and 0.69% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art results reported by Shifath et al. (A transformer based approach for fighting covid-19 fake news, arXiv preprint arXiv:2101.12027, 2021). Our model outperformed the baseline on the TAJ dataset by improving accuracy and f1-score metrics by 1.89% and 1.74%, respectively. The model also shows 2.13% and 1.6% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art model proposed by Samadi et al. (ACM Trans Asian Low-Resour Lang Inf Process, https://doi.org/10.1145/3472620, 2021).</p>","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":" ","pages":"1-12"},"PeriodicalIF":3.4000,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998010/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Science and Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41060-023-00387-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation in social media. Despite the recent research studies that only focused on semantic features extracted by deep contextualized text representation models, we aim to show that content-based feature engineering can enhance the semantic models in a complex task like fake news detection. These features can provide valuable information from different aspects of input texts and assist our neural classifier in detecting fake and real news more accurately than using semantic features. To substantiate the effectiveness of feature engineering besides semantic features, we proposed a deep neural architecture in which three parallel convolutional neural network (CNN) layers extract semantic features from contextual representation vectors. Then, semantic and content-based features are fed to a fully connected layer. We evaluated our model on an English dataset about the COVID-19 pandemic and a domain-independent Persian fake news dataset (TAJ). Our experiments on the English COVID-19 dataset show 4.16% and 4.02% improvement in accuracy and f1-score, respectively, compared to the baseline model, which does not benefit from the content-based features. We also achieved 2.01% and 0.69% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art results reported by Shifath et al. (A transformer based approach for fighting covid-19 fake news, arXiv preprint arXiv:2101.12027, 2021). Our model outperformed the baseline on the TAJ dataset by improving accuracy and f1-score metrics by 1.89% and 1.74%, respectively. The model also shows 2.13% and 1.6% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art model proposed by Samadi et al. (ACM Trans Asian Low-Resour Lang Inf Process, https://doi.org/10.1145/3472620, 2021).

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
假新闻检测:具有增强特征工程的深层语义表示。
由于社交媒体的广泛使用,人们接触到了假新闻和错误信息。传播假新闻对公众和政府都有不利影响。这一问题促使研究人员利用先进的自然语言处理概念来检测社交媒体中的此类错误信息。尽管最近的研究只关注深度上下文化文本表示模型提取的语义特征,但我们的目的是表明基于内容的特征工程可以在假新闻检测等复杂任务中增强语义模型。这些特征可以从输入文本的不同方面提供有价值的信息,并帮助我们的神经分类器比使用语义特征更准确地检测假新闻和真新闻。为了证明除了语义特征之外,特征工程的有效性,我们提出了一种深度神经架构,其中三个并行卷积神经网络(CNN)层从上下文表示向量中提取语义特征。然后,语义和基于内容的特征被馈送到完全连接的层。我们在关于新冠肺炎大流行的英文数据集和依赖领域的波斯假新闻数据集(TAJ)上评估了我们的模型。我们在英文新冠肺炎数据集上的实验显示,与基线模型相比,准确度和f1-score分别提高了4.16%和4.02%,基线模型没有从基于内容的特征中获益。与Shifath等人报告的最新结果相比,我们的准确度和f1-score分别提高了2.01%和0.69%。(一种基于变压器的抗击新冠肺炎假新闻的方法,arXiv预打印arXiv:2101.120272021)。我们的模型在TAJ数据集上的表现优于基线,准确率和f1得分指标分别提高了1.89%和1.74%。该模型还显示,与Samadi等人提出的最先进的模型相比,准确度和f1得分分别提高了2.13%和1.6%。(ACM Trans-Asian Low Resour Lang-Inf Process,https://doi.org/10.1145/3472620,2021)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.40
自引率
8.30%
发文量
72
期刊介绍: Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social sci­ence, and lifestyle. The field encompasses the larger ar­eas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new sci­entific chal­lenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and vis­ualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation.The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations. The jour­nal is composed of three streams: Regular, to communicate original and reproducible theoretical and experimental findings on data science and analytics; Applications, to report the significant data science applications to real-life situations; and Trends, to report expert opinion and comprehensive surveys and reviews of relevant areas and topics in data science and analytics.Topics of relevance include all aspects of the trends, scientific foundations, techniques, and applica­tions of data science and analytics, with a primary focus on:statistical and mathematical foundations for data science and analytics;understanding and analytics of complex data, human, domain, network, organizational, social, behavior, and system characteristics, complexities and intelligences;creation and extraction, processing, representation and modelling, learning and discovery, fusion and integration, presentation and visualization of complex data, behavior, knowledge and intelligence;data analytics, pattern recognition, knowledge discovery, machine learning, deep analytics and deep learning, and intelligent processing of various data (including transaction, text, image, video, graph and network), behaviors and systems;active, real-time, personalized, actionable and automated analytics, learning, computation, optimization, presentation and recommendation; big data architecture, infrastructure, computing, matching, indexing, query processing, mapping, search, retrieval, interopera­bility, exchange, and recommendation;in-memory, distributed, parallel, scalable and high-performance computing, analytics and optimization for big data;review, surveys, trends, prospects and opportunities of data science research, innovation and applications;data science applications, intelligent devices and services in scientific, business, governmental, cultural, behavioral, social and economic, health and medical, human, natural and artificial (including online/Web, cloud, IoT, mobile and social media) domains; andethics, quality, privacy, safety and security, trust, and risk of data science and analytics
期刊最新文献
Power Analysis for Causal Discovery. Discrete double factors of a family of odd Weibull-G distributions: features and modeling Artificial intelligence trend analysis in German business and politics: a web mining approach Speech-based detection of multi-class Alzheimer’s disease classification using machine learning Implementation of air pollution traceability method based on IF-GNN-FC model with multiple-source data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1