Huimin Zeng;Zhenrui Yue;Lanyu Shang;Yang Zhang;Dong Wang
{"title":"通过对比性对抗性领域混合实现无监督领域适应:COVID-19 案例研究","authors":"Huimin Zeng;Zhenrui Yue;Lanyu Shang;Yang Zhang;Dong Wang","doi":"10.1109/TETC.2024.3354419","DOIUrl":null,"url":null,"abstract":"Training large deep learning (DL) models with high performance for natural language downstream tasks usually requires rich-labeled data. However, in a real-world application of COVID-19 information service (e.g., misinformation detection, question answering), a fundamental challenge is the lack of the labeled COVID data to enable supervised end-to-end training of the models for different downstream tasks, especially at the early stage of the pandemic. To address this challenge, we propose an unsupervised domain adaptation framework using contrastive learning and adversarial domain mixup to transfer the knowledge from an existing source data domain to the target COVID-19 data domain. In particular, to bridge the gap between the source domain and the target domain, our method reduces a radial basis function (RBF) based discrepancy between these two domains. Moreover, we leverage the power of domain adversarial examples to establish an intermediate domain mixup, where the latent representations of the input text from both domains could be mixed during the training process. In this paper, we focus on two prevailing downstream tasks in mining COVID-19 text data: COVID-19 misinformation detection and COVID-19 news question answering. Extensive domain adaptation experiments on multiple real-world datasets suggest that our method can effectively adapt misinformation detection and question answering systems to the unseen COVID-19 target domain with significant improvements compared to the state-of-the-art baselines.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 4","pages":"1105-1116"},"PeriodicalIF":5.1000,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10415352","citationCount":"0","resultStr":"{\"title\":\"Unsupervised Domain Adaptation via Contrastive Adversarial Domain Mixup: A Case Study on COVID-19\",\"authors\":\"Huimin Zeng;Zhenrui Yue;Lanyu Shang;Yang Zhang;Dong Wang\",\"doi\":\"10.1109/TETC.2024.3354419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Training large deep learning (DL) models with high performance for natural language downstream tasks usually requires rich-labeled data. However, in a real-world application of COVID-19 information service (e.g., misinformation detection, question answering), a fundamental challenge is the lack of the labeled COVID data to enable supervised end-to-end training of the models for different downstream tasks, especially at the early stage of the pandemic. To address this challenge, we propose an unsupervised domain adaptation framework using contrastive learning and adversarial domain mixup to transfer the knowledge from an existing source data domain to the target COVID-19 data domain. In particular, to bridge the gap between the source domain and the target domain, our method reduces a radial basis function (RBF) based discrepancy between these two domains. Moreover, we leverage the power of domain adversarial examples to establish an intermediate domain mixup, where the latent representations of the input text from both domains could be mixed during the training process. In this paper, we focus on two prevailing downstream tasks in mining COVID-19 text data: COVID-19 misinformation detection and COVID-19 news question answering. Extensive domain adaptation experiments on multiple real-world datasets suggest that our method can effectively adapt misinformation detection and question answering systems to the unseen COVID-19 target domain with significant improvements compared to the state-of-the-art baselines.\",\"PeriodicalId\":13156,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computing\",\"volume\":\"12 4\",\"pages\":\"1105-1116\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-01-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10415352\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10415352/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10415352/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Unsupervised Domain Adaptation via Contrastive Adversarial Domain Mixup: A Case Study on COVID-19
Training large deep learning (DL) models with high performance for natural language downstream tasks usually requires rich-labeled data. However, in a real-world application of COVID-19 information service (e.g., misinformation detection, question answering), a fundamental challenge is the lack of the labeled COVID data to enable supervised end-to-end training of the models for different downstream tasks, especially at the early stage of the pandemic. To address this challenge, we propose an unsupervised domain adaptation framework using contrastive learning and adversarial domain mixup to transfer the knowledge from an existing source data domain to the target COVID-19 data domain. In particular, to bridge the gap between the source domain and the target domain, our method reduces a radial basis function (RBF) based discrepancy between these two domains. Moreover, we leverage the power of domain adversarial examples to establish an intermediate domain mixup, where the latent representations of the input text from both domains could be mixed during the training process. In this paper, we focus on two prevailing downstream tasks in mining COVID-19 text data: COVID-19 misinformation detection and COVID-19 news question answering. Extensive domain adaptation experiments on multiple real-world datasets suggest that our method can effectively adapt misinformation detection and question answering systems to the unseen COVID-19 target domain with significant improvements compared to the state-of-the-art baselines.
期刊介绍:
IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.