{"title":"基于语言特征的假新闻自动检测框架","authors":"Sonal Garg, Dilip Kumar Sharma","doi":"10.1016/j.cie.2022.108432","DOIUrl":null,"url":null,"abstract":"<div><p>Social media platforms now a day are mainly used for news consumption among users. Political groups use social media platforms to attract users by enclosing users' votes in their favor. Due to the large volume of data on social media, it is essential to verify the authenticity of the content. The use of artificial intelligence techniques including the development of embedding and deployment of the machine-learning algorithm is required to combat misinformation. This paper focused on various categories of linguistic features covering complexity features, readability index, psycholinguistic features, and stylometric features for competent fake news identification. The linguistic model helps in computing language-driven features by learning the properties of news content. In this work, we have selected twenty-six significant features and applied various machine learning models for implementation. For feature extraction, three different techniques named term frequency-inverse document frequency (tf-idf), count vectorizer (CV), and hash-vectorizer (HV) are applied. Then, we tested those models in different training dataset sizes to obtain accuracy for each model and compared them. We used four existing datasets for the experiment. The proposed framework achieved 90.8 % accuracy using Reuter dataset. Buzzfeed dataset obtained highest of 90% accuracy. Random Political and Mc_Intire dataset achieved an accuracy of 93.8 and 86.9% respectively.</p></div>","PeriodicalId":55220,"journal":{"name":"Computers & Industrial Engineering","volume":null,"pages":null},"PeriodicalIF":6.7000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Linguistic features based framework for automatic fake news detection\",\"authors\":\"Sonal Garg, Dilip Kumar Sharma\",\"doi\":\"10.1016/j.cie.2022.108432\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Social media platforms now a day are mainly used for news consumption among users. Political groups use social media platforms to attract users by enclosing users' votes in their favor. Due to the large volume of data on social media, it is essential to verify the authenticity of the content. The use of artificial intelligence techniques including the development of embedding and deployment of the machine-learning algorithm is required to combat misinformation. This paper focused on various categories of linguistic features covering complexity features, readability index, psycholinguistic features, and stylometric features for competent fake news identification. The linguistic model helps in computing language-driven features by learning the properties of news content. In this work, we have selected twenty-six significant features and applied various machine learning models for implementation. For feature extraction, three different techniques named term frequency-inverse document frequency (tf-idf), count vectorizer (CV), and hash-vectorizer (HV) are applied. Then, we tested those models in different training dataset sizes to obtain accuracy for each model and compared them. We used four existing datasets for the experiment. The proposed framework achieved 90.8 % accuracy using Reuter dataset. Buzzfeed dataset obtained highest of 90% accuracy. Random Political and Mc_Intire dataset achieved an accuracy of 93.8 and 86.9% respectively.</p></div>\",\"PeriodicalId\":55220,\"journal\":{\"name\":\"Computers & Industrial Engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Industrial Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0360835222004697\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Industrial Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360835222004697","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Linguistic features based framework for automatic fake news detection
Social media platforms now a day are mainly used for news consumption among users. Political groups use social media platforms to attract users by enclosing users' votes in their favor. Due to the large volume of data on social media, it is essential to verify the authenticity of the content. The use of artificial intelligence techniques including the development of embedding and deployment of the machine-learning algorithm is required to combat misinformation. This paper focused on various categories of linguistic features covering complexity features, readability index, psycholinguistic features, and stylometric features for competent fake news identification. The linguistic model helps in computing language-driven features by learning the properties of news content. In this work, we have selected twenty-six significant features and applied various machine learning models for implementation. For feature extraction, three different techniques named term frequency-inverse document frequency (tf-idf), count vectorizer (CV), and hash-vectorizer (HV) are applied. Then, we tested those models in different training dataset sizes to obtain accuracy for each model and compared them. We used four existing datasets for the experiment. The proposed framework achieved 90.8 % accuracy using Reuter dataset. Buzzfeed dataset obtained highest of 90% accuracy. Random Political and Mc_Intire dataset achieved an accuracy of 93.8 and 86.9% respectively.
期刊介绍:
Computers & Industrial Engineering (CAIE) is dedicated to researchers, educators, and practitioners in industrial engineering and related fields. Pioneering the integration of computers in research, education, and practice, industrial engineering has evolved to make computers and electronic communication integral to its domain. CAIE publishes original contributions focusing on the development of novel computerized methodologies to address industrial engineering problems. It also highlights the applications of these methodologies to issues within the broader industrial engineering and associated communities. The journal actively encourages submissions that push the boundaries of fundamental theories and concepts in industrial engineering techniques.