{"title":"深度迁移学习用于波斯语中的COVID-19假新闻检测","authors":"Masood Ghayoomi, Maryam Mousavian","doi":"10.1111/exsy.13008","DOIUrl":null,"url":null,"abstract":"<p><p>The spread of fake news on social media has increased dramatically in recent years. Hence, fake news detection systems have received researchers' attention globally. During the COVID-19 outbreak in 2019 and the worldwide epidemic, the importance of this issue becomes more apparent. Due to the importance of the issue, a large number of researchers have begun to collect English datasets and to study COVID-19 fake news detection. However, there are a large number of low-resource languages, including Persian, that cannot develop accurate tools for automatic COVID-19 fake news detection due to the lack of annotated data for the task. In this article, we aim to develop a corpus for Persian in the domain of COVID-19 where the fake news is annotated and to provide a model for detecting Persian COVID-19 fake news. With the impressive advancement of multilingual pre-trained language models, the idea of cross-lingual transfer learning can be proposed to improve the generalization of models trained with low-resource language datasets. Accordingly, we use the state-of-the-art deep cross-lingual contextualized language model, XLM-RoBERTa, and the parallel convolutional neural networks to detect Persian COVID-19 fake news. Moreover, we use the idea of knowledge transferring across-domains to improve the results by using both the English COVID-19 dataset and the general domain Persian fake news dataset. The combination of both cross-lingual and cross-domain transfer learning has outperformed the models and it has beaten the baseline by 2.39% significantly.</p>","PeriodicalId":58,"journal":{"name":"The Journal of Physical Chemistry ","volume":"98 28","pages":"e13008"},"PeriodicalIF":2.7810,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9111484/pdf/","citationCount":"0","resultStr":"{\"title\":\"Deep transfer learning for COVID-19 fake news detection in Persian.\",\"authors\":\"Masood Ghayoomi, Maryam Mousavian\",\"doi\":\"10.1111/exsy.13008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The spread of fake news on social media has increased dramatically in recent years. Hence, fake news detection systems have received researchers' attention globally. During the COVID-19 outbreak in 2019 and the worldwide epidemic, the importance of this issue becomes more apparent. Due to the importance of the issue, a large number of researchers have begun to collect English datasets and to study COVID-19 fake news detection. However, there are a large number of low-resource languages, including Persian, that cannot develop accurate tools for automatic COVID-19 fake news detection due to the lack of annotated data for the task. In this article, we aim to develop a corpus for Persian in the domain of COVID-19 where the fake news is annotated and to provide a model for detecting Persian COVID-19 fake news. With the impressive advancement of multilingual pre-trained language models, the idea of cross-lingual transfer learning can be proposed to improve the generalization of models trained with low-resource language datasets. Accordingly, we use the state-of-the-art deep cross-lingual contextualized language model, XLM-RoBERTa, and the parallel convolutional neural networks to detect Persian COVID-19 fake news. Moreover, we use the idea of knowledge transferring across-domains to improve the results by using both the English COVID-19 dataset and the general domain Persian fake news dataset. The combination of both cross-lingual and cross-domain transfer learning has outperformed the models and it has beaten the baseline by 2.39% significantly.</p>\",\"PeriodicalId\":58,\"journal\":{\"name\":\"The Journal of Physical Chemistry \",\"volume\":\"98 28\",\"pages\":\"e13008\"},\"PeriodicalIF\":2.7810,\"publicationDate\":\"2022-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9111484/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Journal of Physical Chemistry \",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1111/exsy.13008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Physical Chemistry ","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1111/exsy.13008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep transfer learning for COVID-19 fake news detection in Persian.
The spread of fake news on social media has increased dramatically in recent years. Hence, fake news detection systems have received researchers' attention globally. During the COVID-19 outbreak in 2019 and the worldwide epidemic, the importance of this issue becomes more apparent. Due to the importance of the issue, a large number of researchers have begun to collect English datasets and to study COVID-19 fake news detection. However, there are a large number of low-resource languages, including Persian, that cannot develop accurate tools for automatic COVID-19 fake news detection due to the lack of annotated data for the task. In this article, we aim to develop a corpus for Persian in the domain of COVID-19 where the fake news is annotated and to provide a model for detecting Persian COVID-19 fake news. With the impressive advancement of multilingual pre-trained language models, the idea of cross-lingual transfer learning can be proposed to improve the generalization of models trained with low-resource language datasets. Accordingly, we use the state-of-the-art deep cross-lingual contextualized language model, XLM-RoBERTa, and the parallel convolutional neural networks to detect Persian COVID-19 fake news. Moreover, we use the idea of knowledge transferring across-domains to improve the results by using both the English COVID-19 dataset and the general domain Persian fake news dataset. The combination of both cross-lingual and cross-domain transfer learning has outperformed the models and it has beaten the baseline by 2.39% significantly.