Self-supervised learning (SSL) has recently achieved remarkable success in computer vision, primarily through joint embedding architectures. These models train dual networks by aligning different augmentations of the same image, as well as preventing feature space collapse. Building upon this, previous work establishes a mathematical connection between joint embedding SSL and the co-occurrences of image patches. Moreover, there have been a number of efforts to scale patch-based SSL to a vast number of image patches, demonstrating rapid convergence and notable performance. However, the efficiency of these methods is hindered by the excessive use of cropped patches. Addressing this issue, we propose a novel framework named Past-to-Present (P2P) smoothing that leverages the model’s previous outputs as a supervisory signal. Specifically, we divide the patch augmentations of a single image into two portions. One portion is used to update the model at iteration and retained as past information of iteration t. The other portion is used for comparison in iteration t, serving as present information to be complementary to the past. This design allows us to spread the patches of the same image across different batches, thereby enhancing the utilization rate of patch-based learning in our model. Through extensive experimentation and validation, our method achieves outstanding accuracy, scoring 94.2 % on CIFAR-10, 74.2 % on CIFAR-100, 49.5 % on TinyImageNet, and 78.2 % on ImageNet-100. Besides, additional experiments demonstrate its enhanced transferability to out-of-domain datasets when compared to other SSL baselines.
扫码关注我们
求助内容:
应助结果提醒方式:
