{"title":"利用双阈值筛选和相似性学习促进半监督学习","authors":"Zechen Liang, Yuan-Gen Wang, Wei Lu, Xiaochun Cao","doi":"10.1145/3672563","DOIUrl":null,"url":null,"abstract":"<p>How to effectively utilize unlabeled data for training is a key problem in Semi-Supervised Learning (SSL). Existing SSL methods often consider the unlabeled data whose predictions are beyond a fixed threshold (e.g., 0.95), and discard those less than 0.95. We argue that these discarded data have a large proportion, are of hard sample, and will benefit the model training if used properly. In this paper, we propose a novel method to take full advantage of the unlabeled data, termed DTS-SimL, which includes two core designs: Dual-Threshold Screening and Similarity Learning. Except for the fixed threshold, DTS-SimL extracts another class-adaptive threshold from the labeled data. Such a class-adaptive threshold can screen many unlabeled data whose predictions are lower than 0.95 but over the extracted one for model training. On the other hand, we design a new similar loss to perform similarity learning for all the highly-similar unlabeled data, which can further mine the valuable information from the unlabeled data. Finally, for more effective training of DTS-SimL, we construct an overall loss function by assigning four different losses to four different types of data. Extensive experiments are conducted on five benchmark datasets, including CIFAR-10, CIFAR-100, SVHN, Mini-ImageNet, and DomainNet-Real. Experimental results show that the proposed DTS-SimL achieves state-of-the-art classification accuracy. The code is publicly available at <i> https://github.com/GZHU-DVL/DTS-SimL.</i></p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"44 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Boosting Semi-Supervised Learning with Dual-Threshold Screening and Similarity Learning\",\"authors\":\"Zechen Liang, Yuan-Gen Wang, Wei Lu, Xiaochun Cao\",\"doi\":\"10.1145/3672563\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>How to effectively utilize unlabeled data for training is a key problem in Semi-Supervised Learning (SSL). Existing SSL methods often consider the unlabeled data whose predictions are beyond a fixed threshold (e.g., 0.95), and discard those less than 0.95. We argue that these discarded data have a large proportion, are of hard sample, and will benefit the model training if used properly. In this paper, we propose a novel method to take full advantage of the unlabeled data, termed DTS-SimL, which includes two core designs: Dual-Threshold Screening and Similarity Learning. Except for the fixed threshold, DTS-SimL extracts another class-adaptive threshold from the labeled data. Such a class-adaptive threshold can screen many unlabeled data whose predictions are lower than 0.95 but over the extracted one for model training. On the other hand, we design a new similar loss to perform similarity learning for all the highly-similar unlabeled data, which can further mine the valuable information from the unlabeled data. Finally, for more effective training of DTS-SimL, we construct an overall loss function by assigning four different losses to four different types of data. Extensive experiments are conducted on five benchmark datasets, including CIFAR-10, CIFAR-100, SVHN, Mini-ImageNet, and DomainNet-Real. Experimental results show that the proposed DTS-SimL achieves state-of-the-art classification accuracy. The code is publicly available at <i> https://github.com/GZHU-DVL/DTS-SimL.</i></p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"44 1\",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3672563\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3672563","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Boosting Semi-Supervised Learning with Dual-Threshold Screening and Similarity Learning
How to effectively utilize unlabeled data for training is a key problem in Semi-Supervised Learning (SSL). Existing SSL methods often consider the unlabeled data whose predictions are beyond a fixed threshold (e.g., 0.95), and discard those less than 0.95. We argue that these discarded data have a large proportion, are of hard sample, and will benefit the model training if used properly. In this paper, we propose a novel method to take full advantage of the unlabeled data, termed DTS-SimL, which includes two core designs: Dual-Threshold Screening and Similarity Learning. Except for the fixed threshold, DTS-SimL extracts another class-adaptive threshold from the labeled data. Such a class-adaptive threshold can screen many unlabeled data whose predictions are lower than 0.95 but over the extracted one for model training. On the other hand, we design a new similar loss to perform similarity learning for all the highly-similar unlabeled data, which can further mine the valuable information from the unlabeled data. Finally, for more effective training of DTS-SimL, we construct an overall loss function by assigning four different losses to four different types of data. Extensive experiments are conducted on five benchmark datasets, including CIFAR-10, CIFAR-100, SVHN, Mini-ImageNet, and DomainNet-Real. Experimental results show that the proposed DTS-SimL achieves state-of-the-art classification accuracy. The code is publicly available at https://github.com/GZHU-DVL/DTS-SimL.
期刊介绍:
The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome.
TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.