Peng Zan , Yuerong Wang , Haohao Hu , Wanjun Zhong , Tianyu Han , Jingwei Yue
{"title":"An Active Transfer Learning framework for image classification based on Maximum Differentiation Classifier","authors":"Peng Zan , Yuerong Wang , Haohao Hu , Wanjun Zhong , Tianyu Han , Jingwei Yue","doi":"10.1016/j.imavis.2024.105401","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning has been extensively adopted across various domains, yielding satisfactory outcomes. However, it heavily relies on extensive labeled datasets, collecting data labels is expensive and time-consuming. We propose a novel framework called Active Transfer Learning (ATL) to address this issue. The ATL framework consists of Active Learning (AL) and Transfer Learning (TL). AL queries the unlabeled samples with high inconsistency by Maximum Differentiation Classifier (MDC). The MDC pulls the discrepancy between the labeled data and their augmentations to select and annotate the informative samples. Additionally, we also explore the potential of incorporating TL techniques. The TL comprises pre-training and fine-tuning. The former learns knowledge from the origin-augmentation domain to pre-train the model, while the latter leverages the acquired knowledge for the downstream tasks. The results indicate that the combination of TL and AL exhibits complementary effects, while the proposed ATL framework outperforms state-of-the-art methods in terms of accuracy, precision, recall, and F1-score.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105401"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624005067","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning has been extensively adopted across various domains, yielding satisfactory outcomes. However, it heavily relies on extensive labeled datasets, collecting data labels is expensive and time-consuming. We propose a novel framework called Active Transfer Learning (ATL) to address this issue. The ATL framework consists of Active Learning (AL) and Transfer Learning (TL). AL queries the unlabeled samples with high inconsistency by Maximum Differentiation Classifier (MDC). The MDC pulls the discrepancy between the labeled data and their augmentations to select and annotate the informative samples. Additionally, we also explore the potential of incorporating TL techniques. The TL comprises pre-training and fine-tuning. The former learns knowledge from the origin-augmentation domain to pre-train the model, while the latter leverages the acquired knowledge for the downstream tasks. The results indicate that the combination of TL and AL exhibits complementary effects, while the proposed ATL framework outperforms state-of-the-art methods in terms of accuracy, precision, recall, and F1-score.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.