假新闻检测：具有增强特征工程的深层语义表示。

IF 3.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Data Science and Analytics Pub Date : 2023-03-09 DOI:10.1007/s41060-023-00387-8

Mohammadreza Samadi, Saeedeh Momtazi

{"title":"假新闻检测：具有增强特征工程的深层语义表示。","authors":"Mohammadreza Samadi, Saeedeh Momtazi","doi":"10.1007/s41060-023-00387-8","DOIUrl":null,"url":null,"abstract":"Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation in social media. Despite the recent research studies that only focused on semantic features extracted by deep contextualized text representation models, we aim to show that content-based feature engineering can enhance the semantic models in a complex task like fake news detection. These features can provide valuable information from different aspects of input texts and assist our neural classifier in detecting fake and real news more accurately than using semantic features. To substantiate the effectiveness of feature engineering besides semantic features, we proposed a deep neural architecture in which three parallel convolutional neural network (CNN) layers extract semantic features from contextual representation vectors. Then, semantic and content-based features are fed to a fully connected layer. We evaluated our model on an English dataset about the COVID-19 pandemic and a domain-independent Persian fake news dataset (TAJ). Our experiments on the English COVID-19 dataset show 4.16% and 4.02% improvement in accuracy and f1-score, respectively, compared to the baseline model, which does not benefit from the content-based features. We also achieved 2.01% and 0.69% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art results reported by Shifath et al. (A transformer based approach for fighting covid-19 fake news, arXiv preprint arXiv:2101.12027, 2021). Our model outperformed the baseline on the TAJ dataset by improving accuracy and f1-score metrics by 1.89% and 1.74%, respectively. The model also shows 2.13% and 1.6% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art model proposed by Samadi et al. (ACM Trans Asian Low-Resour Lang Inf Process, https://doi.org/10.1145/3472620, 2021).","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":" ","pages":"1-12"},"PeriodicalIF":3.4000,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998010/pdf/","citationCount":"0","resultStr":"{\"title\":\"Fake news detection: deep semantic representation with enhanced feature engineering.\",\"authors\":\"Mohammadreza Samadi, Saeedeh Momtazi\",\"doi\":\"10.1007/s41060-023-00387-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation in social media. Despite the recent research studies that only focused on semantic features extracted by deep contextualized text representation models, we aim to show that content-based feature engineering can enhance the semantic models in a complex task like fake news detection. These features can provide valuable information from different aspects of input texts and assist our neural classifier in detecting fake and real news more accurately than using semantic features. To substantiate the effectiveness of feature engineering besides semantic features, we proposed a deep neural architecture in which three parallel convolutional neural network (CNN) layers extract semantic features from contextual representation vectors. Then, semantic and content-based features are fed to a fully connected layer. We evaluated our model on an English dataset about the COVID-19 pandemic and a domain-independent Persian fake news dataset (TAJ). Our experiments on the English COVID-19 dataset show 4.16% and 4.02% improvement in accuracy and f1-score, respectively, compared to the baseline model, which does not benefit from the content-based features. We also achieved 2.01% and 0.69% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art results reported by Shifath et al. (A transformer based approach for fighting covid-19 fake news, arXiv preprint arXiv:2101.12027, 2021). Our model outperformed the baseline on the TAJ dataset by improving accuracy and f1-score metrics by 1.89% and 1.74%, respectively. The model also shows 2.13% and 1.6% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art model proposed by Samadi et al. (ACM Trans Asian Low-Resour Lang Inf Process, https://doi.org/10.1145/3472620, 2021).\",\"PeriodicalId\":45667,\"journal\":{\"name\":\"International Journal of Data Science and Analytics\",\"volume\":\" \",\"pages\":\"1-12\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2023-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998010/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Data Science and Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s41060-023-00387-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Science and Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41060-023-00387-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

由于社交媒体的广泛使用，人们接触到了假新闻和错误信息。传播假新闻对公众和政府都有不利影响。这一问题促使研究人员利用先进的自然语言处理概念来检测社交媒体中的此类错误信息。尽管最近的研究只关注深度上下文化文本表示模型提取的语义特征，但我们的目的是表明基于内容的特征工程可以在假新闻检测等复杂任务中增强语义模型。这些特征可以从输入文本的不同方面提供有价值的信息，并帮助我们的神经分类器比使用语义特征更准确地检测假新闻和真新闻。为了证明除了语义特征之外，特征工程的有效性，我们提出了一种深度神经架构，其中三个并行卷积神经网络（CNN）层从上下文表示向量中提取语义特征。然后，语义和基于内容的特征被馈送到完全连接的层。我们在关于新冠肺炎大流行的英文数据集和依赖领域的波斯假新闻数据集（TAJ）上评估了我们的模型。我们在英文新冠肺炎数据集上的实验显示，与基线模型相比，准确度和f1-score分别提高了4.16%和4.02%，基线模型没有从基于内容的特征中获益。与Shifath等人报告的最新结果相比，我们的准确度和f1-score分别提高了2.01%和0.69%。（一种基于变压器的抗击新冠肺炎假新闻的方法，arXiv预打印arXiv:2101.120272021）。我们的模型在TAJ数据集上的表现优于基线，准确率和f1得分指标分别提高了1.89%和1.74%。该模型还显示，与Samadi等人提出的最先进的模型相比，准确度和f1得分分别提高了2.13%和1.6%。（ACM Trans-Asian Low Resour Lang-Inf Process，https://doi.org/10.1145/3472620，2021）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fake news detection: deep semantic representation with enhanced feature engineering.

Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation in social media. Despite the recent research studies that only focused on semantic features extracted by deep contextualized text representation models, we aim to show that content-based feature engineering can enhance the semantic models in a complex task like fake news detection. These features can provide valuable information from different aspects of input texts and assist our neural classifier in detecting fake and real news more accurately than using semantic features. To substantiate the effectiveness of feature engineering besides semantic features, we proposed a deep neural architecture in which three parallel convolutional neural network (CNN) layers extract semantic features from contextual representation vectors. Then, semantic and content-based features are fed to a fully connected layer. We evaluated our model on an English dataset about the COVID-19 pandemic and a domain-independent Persian fake news dataset (TAJ). Our experiments on the English COVID-19 dataset show 4.16% and 4.02% improvement in accuracy and f1-score, respectively, compared to the baseline model, which does not benefit from the content-based features. We also achieved 2.01% and 0.69% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art results reported by Shifath et al. (A transformer based approach for fighting covid-19 fake news, arXiv preprint arXiv:2101.12027, 2021). Our model outperformed the baseline on the TAJ dataset by improving accuracy and f1-score metrics by 1.89% and 1.74%, respectively. The model also shows 2.13% and 1.6% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art model proposed by Samadi et al. (ACM Trans Asian Low-Resour Lang Inf Process, https://doi.org/10.1145/3472620, 2021).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Data Science and Analytics Multiple-

CiteScore

6.40

自引率

8.30%

发文量

期刊介绍： Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social science, and lifestyle. The field encompasses the larger areas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new scientific challenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and visualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation.The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations. The journal is composed of three streams: Regular, to communicate original and reproducible theoretical and experimental findings on data science and analytics; Applications, to report the significant data science applications to real-life situations; and Trends, to report expert opinion and comprehensive surveys and reviews of relevant areas and topics in data science and analytics.Topics of relevance include all aspects of the trends, scientific foundations, techniques, and applications of data science and analytics, with a primary focus on:statistical and mathematical foundations for data science and analytics;understanding and analytics of complex data, human, domain, network, organizational, social, behavior, and system characteristics, complexities and intelligences;creation and extraction, processing, representation and modelling, learning and discovery, fusion and integration, presentation and visualization of complex data, behavior, knowledge and intelligence;data analytics, pattern recognition, knowledge discovery, machine learning, deep analytics and deep learning, and intelligent processing of various data (including transaction, text, image, video, graph and network), behaviors and systems;active, real-time, personalized, actionable and automated analytics, learning, computation, optimization, presentation and recommendation; big data architecture, infrastructure, computing, matching, indexing, query processing, mapping, search, retrieval, interoperability, exchange, and recommendation;in-memory, distributed, parallel, scalable and high-performance computing, analytics and optimization for big data;review, surveys, trends, prospects and opportunities of data science research, innovation and applications;data science applications, intelligent devices and services in scientific, business, governmental, cultural, behavioral, social and economic, health and medical, human, natural and artificial (including online/Web, cloud, IoT, mobile and social media) domains; andethics, quality, privacy, safety and security, trust, and risk of data science and analytics