Linguistic features based framework for automatic fake news detection

IF 6.7 1区 工程技术 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Industrial Engineering Pub Date : 2022-10-01 DOI:10.1016/j.cie.2022.108432
Sonal Garg, Dilip Kumar Sharma
{"title":"Linguistic features based framework for automatic fake news detection","authors":"Sonal Garg,&nbsp;Dilip Kumar Sharma","doi":"10.1016/j.cie.2022.108432","DOIUrl":null,"url":null,"abstract":"<div><p>Social media platforms now a day are mainly used for news consumption among users. Political groups use social media platforms to attract users by enclosing users' votes in their favor. Due to the large volume of data on social media, it is essential to verify the authenticity of the content. The use of artificial intelligence techniques including the development of embedding and deployment of the machine-learning algorithm is required to combat misinformation. This paper focused on various categories of linguistic features covering complexity features, readability index, psycholinguistic features, and stylometric features for competent fake news identification. The linguistic model helps in computing language-driven features by learning the properties of news content. In this work, we have selected twenty-six significant features and applied various machine learning models for implementation. For feature extraction, three different techniques named term frequency-inverse document frequency (tf-idf), count vectorizer (CV), and hash-vectorizer (HV) are applied. Then, we tested those models in different training dataset sizes to obtain accuracy for each model and compared them. We used four existing datasets for the experiment. The proposed framework achieved 90.8 % accuracy using Reuter dataset. Buzzfeed dataset obtained highest of 90% accuracy. Random Political and Mc_Intire dataset achieved an accuracy of 93.8 and 86.9% respectively.</p></div>","PeriodicalId":55220,"journal":{"name":"Computers & Industrial Engineering","volume":null,"pages":null},"PeriodicalIF":6.7000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Industrial Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360835222004697","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 13

Abstract

Social media platforms now a day are mainly used for news consumption among users. Political groups use social media platforms to attract users by enclosing users' votes in their favor. Due to the large volume of data on social media, it is essential to verify the authenticity of the content. The use of artificial intelligence techniques including the development of embedding and deployment of the machine-learning algorithm is required to combat misinformation. This paper focused on various categories of linguistic features covering complexity features, readability index, psycholinguistic features, and stylometric features for competent fake news identification. The linguistic model helps in computing language-driven features by learning the properties of news content. In this work, we have selected twenty-six significant features and applied various machine learning models for implementation. For feature extraction, three different techniques named term frequency-inverse document frequency (tf-idf), count vectorizer (CV), and hash-vectorizer (HV) are applied. Then, we tested those models in different training dataset sizes to obtain accuracy for each model and compared them. We used four existing datasets for the experiment. The proposed framework achieved 90.8 % accuracy using Reuter dataset. Buzzfeed dataset obtained highest of 90% accuracy. Random Political and Mc_Intire dataset achieved an accuracy of 93.8 and 86.9% respectively.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于语言特征的假新闻自动检测框架
社交媒体平台现在每天主要用于用户的新闻消费。政治团体利用社交媒体平台,把用户的选票投给自己,从而吸引用户。由于社交媒体上的数据量很大,验证内容的真实性至关重要。需要使用人工智能技术,包括开发嵌入和部署机器学习算法来打击错误信息。本文主要从复杂特征、可读性指标、心理语言特征和语体特征等方面探讨了虚假新闻识别的语言特征。语言模型通过学习新闻内容的属性来帮助计算语言驱动的特征。在这项工作中,我们选择了26个重要的特征,并应用了各种机器学习模型来实现。对于特征提取,使用了术语频率逆文档频率(tf-idf)、计数矢量器(CV)和哈希矢量器(HV)三种不同的技术。然后,我们在不同的训练数据集规模下对这些模型进行测试,以获得每个模型的精度并进行比较。我们在实验中使用了四个现有的数据集。使用路透社数据集,该框架的准确率达到90.8%。Buzzfeed数据集的准确率最高达到90%。Random Political和mc_entire数据集的准确率分别为93.8%和86.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computers & Industrial Engineering
Computers & Industrial Engineering 工程技术-工程:工业
CiteScore
12.70
自引率
12.70%
发文量
794
审稿时长
10.6 months
期刊介绍: Computers & Industrial Engineering (CAIE) is dedicated to researchers, educators, and practitioners in industrial engineering and related fields. Pioneering the integration of computers in research, education, and practice, industrial engineering has evolved to make computers and electronic communication integral to its domain. CAIE publishes original contributions focusing on the development of novel computerized methodologies to address industrial engineering problems. It also highlights the applications of these methodologies to issues within the broader industrial engineering and associated communities. The journal actively encourages submissions that push the boundaries of fundamental theories and concepts in industrial engineering techniques.
期刊最新文献
Joint optimization of opportunistic maintenance and speed control for continuous process manufacturing systems considering stochastic imperfect maintenance Production line location strategy for foreign manufacturer when selling in a market lag behind in manufacturing Bi-objective optimization for equipment system-of-systems development planning using a novel co-evolutionary algorithm based on NSGA-II and HypE Artificial intelligence abnormal driving behavior detection for mitigating traffic accidents Design and strategy selection for quality incentive mechanisms in the public cloud manufacturing model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1