Linguistic features based framework for automatic fake news detection

IF 6.5 1区工程技术 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Industrial Engineering Pub Date : 2022-10-01 Epub Date: 2022-07-09 DOI:10.1016/j.cie.2022.108432

Sonal Garg, Dilip Kumar Sharma

{"title":"Linguistic features based framework for automatic fake news detection","authors":"Sonal Garg, Dilip Kumar Sharma","doi":"10.1016/j.cie.2022.108432","DOIUrl":null,"url":null,"abstract":"<div><p>Social media platforms now a day are mainly used for news consumption among users. Political groups use social media platforms to attract users by enclosing users' votes in their favor. Due to the large volume of data on social media, it is essential to verify the authenticity of the content. The use of artificial intelligence techniques including the development of embedding and deployment of the machine-learning algorithm is required to combat misinformation. This paper focused on various categories of linguistic features covering complexity features, readability index, psycholinguistic features, and stylometric features for competent fake news identification. The linguistic model helps in computing language-driven features by learning the properties of news content. In this work, we have selected twenty-six significant features and applied various machine learning models for implementation. For feature extraction, three different techniques named term frequency-inverse document frequency (tf-idf), count vectorizer (CV), and hash-vectorizer (HV) are applied. Then, we tested those models in different training dataset sizes to obtain accuracy for each model and compared them. We used four existing datasets for the experiment. The proposed framework achieved 90.8 % accuracy using Reuter dataset. Buzzfeed dataset obtained highest of 90% accuracy. Random Political and Mc_Intire dataset achieved an accuracy of 93.8 and 86.9% respectively.</p></div>","PeriodicalId":55220,"journal":{"name":"Computers & Industrial Engineering","volume":"172 ","pages":"Article 108432"},"PeriodicalIF":6.5000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Industrial Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360835222004697","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/7/9 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 13

Abstract

Social media platforms now a day are mainly used for news consumption among users. Political groups use social media platforms to attract users by enclosing users' votes in their favor. Due to the large volume of data on social media, it is essential to verify the authenticity of the content. The use of artificial intelligence techniques including the development of embedding and deployment of the machine-learning algorithm is required to combat misinformation. This paper focused on various categories of linguistic features covering complexity features, readability index, psycholinguistic features, and stylometric features for competent fake news identification. The linguistic model helps in computing language-driven features by learning the properties of news content. In this work, we have selected twenty-six significant features and applied various machine learning models for implementation. For feature extraction, three different techniques named term frequency-inverse document frequency (tf-idf), count vectorizer (CV), and hash-vectorizer (HV) are applied. Then, we tested those models in different training dataset sizes to obtain accuracy for each model and compared them. We used four existing datasets for the experiment. The proposed framework achieved 90.8 % accuracy using Reuter dataset. Buzzfeed dataset obtained highest of 90% accuracy. Random Political and Mc_Intire dataset achieved an accuracy of 93.8 and 86.9% respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于语言特征的假新闻自动检测框架

社交媒体平台现在每天主要用于用户的新闻消费。政治团体利用社交媒体平台，把用户的选票投给自己，从而吸引用户。由于社交媒体上的数据量很大，验证内容的真实性至关重要。需要使用人工智能技术，包括开发嵌入和部署机器学习算法来打击错误信息。本文主要从复杂特征、可读性指标、心理语言特征和语体特征等方面探讨了虚假新闻识别的语言特征。语言模型通过学习新闻内容的属性来帮助计算语言驱动的特征。在这项工作中，我们选择了26个重要的特征，并应用了各种机器学习模型来实现。对于特征提取，使用了术语频率逆文档频率(tf-idf)、计数矢量器(CV)和哈希矢量器(HV)三种不同的技术。然后，我们在不同的训练数据集规模下对这些模型进行测试，以获得每个模型的精度并进行比较。我们在实验中使用了四个现有的数据集。使用路透社数据集，该框架的准确率达到90.8%。Buzzfeed数据集的准确率最高达到90%。Random Political和mc_entire数据集的准确率分别为93.8%和86.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computers & Industrial Engineering 工程技术-工程：工业

CiteScore

12.70

自引率

12.70%

发文量

794

审稿时长

10.6 months

期刊介绍： Computers & Industrial Engineering (CAIE) is dedicated to researchers, educators, and practitioners in industrial engineering and related fields. Pioneering the integration of computers in research, education, and practice, industrial engineering has evolved to make computers and electronic communication integral to its domain. CAIE publishes original contributions focusing on the development of novel computerized methodologies to address industrial engineering problems. It also highlights the applications of these methodologies to issues within the broader industrial engineering and associated communities. The journal actively encourages submissions that push the boundaries of fundamental theories and concepts in industrial engineering techniques.