Transferring informative dark knowledge from other modalities has become a common approach to solving learning tasks that are challenging to accomplish independently due to limitations in data quality. However, research on why the transferred knowledge works has not been extensively explored. To address this issue, in this paper, we discover the correlation between feature discriminability and its dimensional structure (DS) by observing the features extracted from modalities with high and low data quality within the same learning task. On this basis, we express DS using the spatial distribution of intermediate features and the channel-wise correlation of output features. We empirically find that the DS of high-quality features is better than that of low-quality ones. This inspires us to propose a novel DS-based knowledge distillation method for better supervised cross-modal learning (CML) performance. Instead of merely mimicking the logits or features from the high-quality modality, the proposed method leverages its structural knowledge to guide the low-quality modality. Specifically, it enforces uniform distribution of intermediate features and channel-wise independence of deep features in the low-quality modality, thereby enhancing semantic learning and improving performance. This is especially useful when the performance gap between dual modalities is relatively large. Furthermore, this paper introduces a new CML dataset for the task of marine target recognition, named IIS-ISCAS, to promote community development. The dataset includes more than 10,000 paired samples from 8 distinct marine targets with optics and radar modalities and is continuously being updated. Experimental results on six visual benchmark transformed datasets and six CML datasets validate the effectiveness of the proposed method.
扫码关注我们
求助内容:
应助结果提醒方式:
