{"title":"Cross-Language Fake News Detection","authors":"Samuel Kai Wah Chu , Runbin Xie , Yanshu Wang","doi":"10.2478/dim-2020-0025","DOIUrl":null,"url":null,"abstract":"<div><p>With increasing globalization, news from different countries, and even in different languages, has become readily available and has become a way for many people to learn about other cultures. As people around the world become more reliant on social media, the impact of fake news on public society also increases. However, most of the fake news detection research focuses only on English. In this work, we compared the difference between textual features of different languages (Chinese and English) and their effect on detecting fake news. We also explored the cross-language transmissibility of fake news detection models. We found that Chinese textual features in fake news are more complex compared with English textual features. Our results also illustrated that the bidirectional encoder representations from transformers (BERT) model outperformed other algorithms for within-language data sets. As for detection in cross-language data sets, our findings demonstrated that fake news monitoring across languages is potentially feasible, while models trained with data from a more inclusive language would perform better in cross-language detection.</p></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"5 1","pages":"Pages 100-109"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2543925122000250/pdfft?md5=9a92679f1481bbf37de6ca7784e12501&pid=1-s2.0-S2543925122000250-main.pdf","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and information management","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2543925122000250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
Abstract
With increasing globalization, news from different countries, and even in different languages, has become readily available and has become a way for many people to learn about other cultures. As people around the world become more reliant on social media, the impact of fake news on public society also increases. However, most of the fake news detection research focuses only on English. In this work, we compared the difference between textual features of different languages (Chinese and English) and their effect on detecting fake news. We also explored the cross-language transmissibility of fake news detection models. We found that Chinese textual features in fake news are more complex compared with English textual features. Our results also illustrated that the bidirectional encoder representations from transformers (BERT) model outperformed other algorithms for within-language data sets. As for detection in cross-language data sets, our findings demonstrated that fake news monitoring across languages is potentially feasible, while models trained with data from a more inclusive language would perform better in cross-language detection.