{"title":"利用多头注意力权重分析假新闻的共同词汇特征","authors":"Mamoru Mimura , Takayuki Ishimaru","doi":"10.1016/j.iot.2024.101409","DOIUrl":null,"url":null,"abstract":"<div><div>Numerous approaches have been developed to identify fake news through machine learning; however, these methods are predominantly assessed using singular datasets specific to certain fields, leading to a scarcity of research on versatile models adaptable to a range of domains. This study evaluates the adaptability of a fake news detection model across diverse fields, employing three distinct datasets. Furthermore, the study leverages the multi-head attention feature of bidirectional encoder representations from transformers (BERT) to scrutinize the feature extraction process in the model. In our analysis, we focused on words that are commonly emphasized by machine learning in fake news detection. The dataset comprised 27,442 instances of genuine news and 28,359 instances of fabricated news, each distinctly labeled. To examine the focal words, we utilized multi-head attention, a component of BERT. This mechanism assigns greater weight to words that receive more attention. Our investigation aimed to identify which words were assigned higher weights in each article. The findings indicate that while representing a minor percentage, a common characteristic of fake news is the heightened attention to words that influence the credibility of the article. To assess the versatility of the model, we applied the model trained on one dataset to classify other datasets. The results demonstrate a notable decline in accuracy, attributable to the distinctive characteristics of the training data. These observations suggest that common features among fake news, which could be extracted using the fine-tuned BERT model, are limited.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":null,"pages":null},"PeriodicalIF":6.0000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analyzing common lexical features of fake news using multi-head attention weights\",\"authors\":\"Mamoru Mimura , Takayuki Ishimaru\",\"doi\":\"10.1016/j.iot.2024.101409\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Numerous approaches have been developed to identify fake news through machine learning; however, these methods are predominantly assessed using singular datasets specific to certain fields, leading to a scarcity of research on versatile models adaptable to a range of domains. This study evaluates the adaptability of a fake news detection model across diverse fields, employing three distinct datasets. Furthermore, the study leverages the multi-head attention feature of bidirectional encoder representations from transformers (BERT) to scrutinize the feature extraction process in the model. In our analysis, we focused on words that are commonly emphasized by machine learning in fake news detection. The dataset comprised 27,442 instances of genuine news and 28,359 instances of fabricated news, each distinctly labeled. To examine the focal words, we utilized multi-head attention, a component of BERT. This mechanism assigns greater weight to words that receive more attention. Our investigation aimed to identify which words were assigned higher weights in each article. The findings indicate that while representing a minor percentage, a common characteristic of fake news is the heightened attention to words that influence the credibility of the article. To assess the versatility of the model, we applied the model trained on one dataset to classify other datasets. The results demonstrate a notable decline in accuracy, attributable to the distinctive characteristics of the training data. These observations suggest that common features among fake news, which could be extracted using the fine-tuned BERT model, are limited.</div></div>\",\"PeriodicalId\":29968,\"journal\":{\"name\":\"Internet of Things\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2024-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Internet of Things\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2542660524003500\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2542660524003500","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Analyzing common lexical features of fake news using multi-head attention weights
Numerous approaches have been developed to identify fake news through machine learning; however, these methods are predominantly assessed using singular datasets specific to certain fields, leading to a scarcity of research on versatile models adaptable to a range of domains. This study evaluates the adaptability of a fake news detection model across diverse fields, employing three distinct datasets. Furthermore, the study leverages the multi-head attention feature of bidirectional encoder representations from transformers (BERT) to scrutinize the feature extraction process in the model. In our analysis, we focused on words that are commonly emphasized by machine learning in fake news detection. The dataset comprised 27,442 instances of genuine news and 28,359 instances of fabricated news, each distinctly labeled. To examine the focal words, we utilized multi-head attention, a component of BERT. This mechanism assigns greater weight to words that receive more attention. Our investigation aimed to identify which words were assigned higher weights in each article. The findings indicate that while representing a minor percentage, a common characteristic of fake news is the heightened attention to words that influence the credibility of the article. To assess the versatility of the model, we applied the model trained on one dataset to classify other datasets. The results demonstrate a notable decline in accuracy, attributable to the distinctive characteristics of the training data. These observations suggest that common features among fake news, which could be extracted using the fine-tuned BERT model, are limited.
期刊介绍:
Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT.
The journal will place a high priority on timely publication, and provide a home for high quality.
Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.