假新闻检测工具

2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) Pub Date : 2018-09-01 DOI:10.1109/SYNASC.2018.00064

Bashar Al Asaad, Madalina Erascu

{"title":"假新闻检测工具","authors":"Bashar Al Asaad, Madalina Erascu","doi":"10.1109/SYNASC.2018.00064","DOIUrl":null,"url":null,"abstract":"The word post-truth was considered by Oxford Dictionaries Word of the Year 2016. The word is an adjective relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief. This leads to misinformation and problems in society. Hence, it is important to make effort to detect these facts and prevent them from spreading. In this paper we propose machine learning techniques, in particular supervised learning, for fake news detection. More precisely, we used a dataset of fake and real news to train a machine learning model using Scikit-learn library in Python. We extracted features from the dataset using text representation models like Bag-of-Words, Term Frequency-Inverse Document Frequency (TF-IDF) and Bi-gram frequency. We tested two classification approaches, namely probabilistic classification and linear classification on the title and the content, checking if it is clickbait/nonclickbait, respectively fake/real. The outcome of our experiments was that the linear classification works the best with the TF-IDF model in the process of content classification. The Bi-gram frequency model gave the lowest accuracy for title classification in comparison with Bag-of-Words and TF-IDF.","PeriodicalId":273805,"journal":{"name":"2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"A Tool for Fake News Detection\",\"authors\":\"Bashar Al Asaad, Madalina Erascu\",\"doi\":\"10.1109/SYNASC.2018.00064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The word post-truth was considered by Oxford Dictionaries Word of the Year 2016. The word is an adjective relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief. This leads to misinformation and problems in society. Hence, it is important to make effort to detect these facts and prevent them from spreading. In this paper we propose machine learning techniques, in particular supervised learning, for fake news detection. More precisely, we used a dataset of fake and real news to train a machine learning model using Scikit-learn library in Python. We extracted features from the dataset using text representation models like Bag-of-Words, Term Frequency-Inverse Document Frequency (TF-IDF) and Bi-gram frequency. We tested two classification approaches, namely probabilistic classification and linear classification on the title and the content, checking if it is clickbait/nonclickbait, respectively fake/real. The outcome of our experiments was that the linear classification works the best with the TF-IDF model in the process of content classification. The Bi-gram frequency model gave the lowest accuracy for title classification in comparison with Bag-of-Words and TF-IDF.\",\"PeriodicalId\":273805,\"journal\":{\"name\":\"2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)\",\"volume\":\"99 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SYNASC.2018.00064\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2018.00064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

摘要

后真相(post-truth)一词被《牛津词典》评为2016年度词汇。这个词是一个形容词，指的是客观事实对公众舆论的影响不如情感和个人信仰的影响大的情况。这导致了社会上的错误信息和问题。因此，重要的是要努力发现这些事实并防止它们蔓延。在本文中，我们提出了机器学习技术，特别是监督学习，用于假新闻检测。更准确地说，我们使用假新闻和真实新闻的数据集，使用Python中的Scikit-learn库来训练机器学习模型。我们使用词袋、词频-逆文档频率(TF-IDF)和双图频率等文本表示模型从数据集中提取特征。我们测试了标题和内容的两种分类方法，即概率分类和线性分类，分别检查它是否是标题党/非标题党，假/真。我们的实验结果是，线性分类与TF-IDF模型在内容分类过程中效果最好。与Bag-of-Words和TF-IDF相比，双图频率模型的标题分类准确率最低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Tool for Fake News Detection

The word post-truth was considered by Oxford Dictionaries Word of the Year 2016. The word is an adjective relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief. This leads to misinformation and problems in society. Hence, it is important to make effort to detect these facts and prevent them from spreading. In this paper we propose machine learning techniques, in particular supervised learning, for fake news detection. More precisely, we used a dataset of fake and real news to train a machine learning model using Scikit-learn library in Python. We extracted features from the dataset using text representation models like Bag-of-Words, Term Frequency-Inverse Document Frequency (TF-IDF) and Bi-gram frequency. We tested two classification approaches, namely probabilistic classification and linear classification on the title and the content, checking if it is clickbait/nonclickbait, respectively fake/real. The outcome of our experiments was that the linear classification works the best with the TF-IDF model in the process of content classification. The Bi-gram frequency model gave the lowest accuracy for title classification in comparison with Bag-of-Words and TF-IDF.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

自引率

0.00%

发文量