Guofan Liu;Jinghao Zhang;Qiang Liu;Junfei Wu;Shu Wu;Liang Wang
{"title":"用于多模态假新闻检测的单模态事件诊断知识蒸馏法","authors":"Guofan Liu;Jinghao Zhang;Qiang Liu;Junfei Wu;Shu Wu;Liang Wang","doi":"10.1109/TKDE.2024.3477977","DOIUrl":null,"url":null,"abstract":"With the rapid expansion of multimodal content in online social media, automatic detection of multimodal fake news has received much attention. Multimodal joint training commonly used in existing methods is expected to benefit from thoroughly leveraging cross-modal features, yet these methods still suffer from insufficient learning of uni-modal features. Due to the heterogeneity of multimodal networks, optimizing a single objective will inevitably make the models prone to rely on specific modality while leaving other modalities under-optimized. On the other hand, simply expecting each modality to play a significant role in identifying all the rumors is also not appropriate as the multimodal fake news often involves tampering in only one modality. Therefore, how to find the genuine tampering on the per-sample basis becomes the key point to unlock the full power of each modality in a good collaborative manner. To address these issues, we propose a \n<bold><u>U</u></b>\nni-modal \n<bold><u>E</u></b>\nvent-agnostic \n<bold><u>K</u></b>\nnowledge \n<bold><u>D</u></b>\nistillation framework (UEKD), which aims to transfer knowledge contained in the fine-grained prediction from uni-modal teachers to the multimodal student model through modality-specific distillation. Specifically, we find that the uni-modal teachers simply trained on the whole training set are easy to memorize the event-specific noise information to make a correct but biased prediction, failing to reflect the genuine degree of tampering in each modality. To tackle this problem, we propose to train and validate the teacher models on different domains in training dataset through a cross-validation manner, as the predictions from the out-of-domain teachers can be regarded as event-agnostic knowledge without spurious connections with event-specific information. Finally, to balance the convergence speeds across modalities, we dynamically monitor the involvement of each modality during training, through which we could identify the more under-optimized modalities and re-weight the distillation loss accordingly. Our method could be served as a plug-and-play module for existing multimodal fake news detection backbones. Extensive experiments on three public datasets and four state-of-the-art fake news detection backbones show that our proposed method can improve the performance by a large margin.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9490-9503"},"PeriodicalIF":8.9000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uni-Modal Event-Agnostic Knowledge Distillation for Multimodal Fake News Detection\",\"authors\":\"Guofan Liu;Jinghao Zhang;Qiang Liu;Junfei Wu;Shu Wu;Liang Wang\",\"doi\":\"10.1109/TKDE.2024.3477977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid expansion of multimodal content in online social media, automatic detection of multimodal fake news has received much attention. Multimodal joint training commonly used in existing methods is expected to benefit from thoroughly leveraging cross-modal features, yet these methods still suffer from insufficient learning of uni-modal features. Due to the heterogeneity of multimodal networks, optimizing a single objective will inevitably make the models prone to rely on specific modality while leaving other modalities under-optimized. On the other hand, simply expecting each modality to play a significant role in identifying all the rumors is also not appropriate as the multimodal fake news often involves tampering in only one modality. Therefore, how to find the genuine tampering on the per-sample basis becomes the key point to unlock the full power of each modality in a good collaborative manner. To address these issues, we propose a \\n<bold><u>U</u></b>\\nni-modal \\n<bold><u>E</u></b>\\nvent-agnostic \\n<bold><u>K</u></b>\\nnowledge \\n<bold><u>D</u></b>\\nistillation framework (UEKD), which aims to transfer knowledge contained in the fine-grained prediction from uni-modal teachers to the multimodal student model through modality-specific distillation. Specifically, we find that the uni-modal teachers simply trained on the whole training set are easy to memorize the event-specific noise information to make a correct but biased prediction, failing to reflect the genuine degree of tampering in each modality. To tackle this problem, we propose to train and validate the teacher models on different domains in training dataset through a cross-validation manner, as the predictions from the out-of-domain teachers can be regarded as event-agnostic knowledge without spurious connections with event-specific information. Finally, to balance the convergence speeds across modalities, we dynamically monitor the involvement of each modality during training, through which we could identify the more under-optimized modalities and re-weight the distillation loss accordingly. Our method could be served as a plug-and-play module for existing multimodal fake news detection backbones. Extensive experiments on three public datasets and four state-of-the-art fake news detection backbones show that our proposed method can improve the performance by a large margin.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"36 12\",\"pages\":\"9490-9503\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2024-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10713273/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10713273/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Uni-Modal Event-Agnostic Knowledge Distillation for Multimodal Fake News Detection
With the rapid expansion of multimodal content in online social media, automatic detection of multimodal fake news has received much attention. Multimodal joint training commonly used in existing methods is expected to benefit from thoroughly leveraging cross-modal features, yet these methods still suffer from insufficient learning of uni-modal features. Due to the heterogeneity of multimodal networks, optimizing a single objective will inevitably make the models prone to rely on specific modality while leaving other modalities under-optimized. On the other hand, simply expecting each modality to play a significant role in identifying all the rumors is also not appropriate as the multimodal fake news often involves tampering in only one modality. Therefore, how to find the genuine tampering on the per-sample basis becomes the key point to unlock the full power of each modality in a good collaborative manner. To address these issues, we propose a
U
ni-modal
E
vent-agnostic
K
nowledge
D
istillation framework (UEKD), which aims to transfer knowledge contained in the fine-grained prediction from uni-modal teachers to the multimodal student model through modality-specific distillation. Specifically, we find that the uni-modal teachers simply trained on the whole training set are easy to memorize the event-specific noise information to make a correct but biased prediction, failing to reflect the genuine degree of tampering in each modality. To tackle this problem, we propose to train and validate the teacher models on different domains in training dataset through a cross-validation manner, as the predictions from the out-of-domain teachers can be regarded as event-agnostic knowledge without spurious connections with event-specific information. Finally, to balance the convergence speeds across modalities, we dynamically monitor the involvement of each modality during training, through which we could identify the more under-optimized modalities and re-weight the distillation loss accordingly. Our method could be served as a plug-and-play module for existing multimodal fake news detection backbones. Extensive experiments on three public datasets and four state-of-the-art fake news detection backbones show that our proposed method can improve the performance by a large margin.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.