A gradual approach to knowledge distillation in deep supervised hashing for large-scale image retrieval

IF 4 3区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Computers & Electrical Engineering Pub Date : 2024-11-06 DOI:10.1016/j.compeleceng.2024.109799

Abid Hussain , Heng-Chao li , Mehboob Hussain , Muqadar Ali , Shaheen Abbas , Danish Ali , Amir Rehman

{"title":"A gradual approach to knowledge distillation in deep supervised hashing for large-scale image retrieval","authors":"Abid Hussain , Heng-Chao li , Mehboob Hussain , Muqadar Ali , Shaheen Abbas , Danish Ali , Amir Rehman","doi":"10.1016/j.compeleceng.2024.109799","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning-based hashing methods have emerged as superior techniques for large-scale image retrieval, surpassing non-deep and unsupervised algorithms. However, most hashing models do not consider memory usage and computational costs, which hinders their use on resource-constrained devices. This paper proposes an Optimized Knowledge Distillation (OKD) approach for training compact deep supervised hashing models to address this issue. OKD utilizes a unique growing teacher-student training strategy where an evolving teacher continuously imparts enriched knowledge to the student. The teacher and student networks are divided into blocks, with auxiliary training modules placed between corresponding blocks. These modules extract knowledge from intermediate layers to capture multifaceted relationships in data and enhance distillation. Furthermore, a noise- and background-reduction mask (NBRM) is employed to filter noise from transferred knowledge, promoting focus on discriminative features. During training, the student utilizes various sources of supervision, including dynamically improving the teacher's predictions, ground truths, and hash code matching. This assists the student in closely replicating the teacher's abilities despite using fewer parameters. Experimental evaluation on four benchmark datasets - CIFAR-10, CIFAR-100, NUS-WIDE, and ImageNet - demonstrates that OKD outperforms existing hashing methods. OKD achieves 92.98 %, 88.72 %, and 75.88 % mean average precision on CIFAR-10, NUS-WIDE, and ImageNet datasets, respectively, with up to 1.83 %, 1.69 %, and 0.80 % higher accuracy than the previous best methods, across different hash code lengths. By matching teacher ability using distilled knowledge, OKD addresses the barriers that prevent powerful models from being deployed on resource-constrained mobile/embedded platforms.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"120 ","pages":"Article 109799"},"PeriodicalIF":4.0000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790624007262","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning-based hashing methods have emerged as superior techniques for large-scale image retrieval, surpassing non-deep and unsupervised algorithms. However, most hashing models do not consider memory usage and computational costs, which hinders their use on resource-constrained devices. This paper proposes an Optimized Knowledge Distillation (OKD) approach for training compact deep supervised hashing models to address this issue. OKD utilizes a unique growing teacher-student training strategy where an evolving teacher continuously imparts enriched knowledge to the student. The teacher and student networks are divided into blocks, with auxiliary training modules placed between corresponding blocks. These modules extract knowledge from intermediate layers to capture multifaceted relationships in data and enhance distillation. Furthermore, a noise- and background-reduction mask (NBRM) is employed to filter noise from transferred knowledge, promoting focus on discriminative features. During training, the student utilizes various sources of supervision, including dynamically improving the teacher's predictions, ground truths, and hash code matching. This assists the student in closely replicating the teacher's abilities despite using fewer parameters. Experimental evaluation on four benchmark datasets - CIFAR-10, CIFAR-100, NUS-WIDE, and ImageNet - demonstrates that OKD outperforms existing hashing methods. OKD achieves 92.98 %, 88.72 %, and 75.88 % mean average precision on CIFAR-10, NUS-WIDE, and ImageNet datasets, respectively, with up to 1.83 %, 1.69 %, and 0.80 % higher accuracy than the previous best methods, across different hash code lengths. By matching teacher ability using distilled knowledge, OKD addresses the barriers that prevent powerful models from being deployed on resource-constrained mobile/embedded platforms.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于大规模图像检索的深度监督哈希算法中的渐进式知识提炼方法

基于深度学习的散列方法已成为大规模图像检索的卓越技术，超越了非深度和无监督算法。然而，大多数散列模型没有考虑内存使用和计算成本，这阻碍了它们在资源受限设备上的应用。本文提出了一种优化知识蒸馏（OKD）方法，用于训练紧凑型深度监督散列模型，以解决这一问题。OKD 采用独特的成长型师生训练策略，即不断发展的教师不断向学生传授丰富的知识。教师和学生网络被划分为多个区块，并在相应的区块之间放置了辅助训练模块。这些模块从中间层提取知识，以捕捉数据中的多方面关系并加强提炼。此外，还采用了噪声和背景还原掩码（NBRM）来过滤传输知识中的噪声，从而促进对辨别特征的关注。在训练过程中，学生会利用各种监督来源，包括动态改进教师的预测、基本事实和哈希代码匹配。这有助于学生在使用较少参数的情况下密切复制教师的能力。在四个基准数据集（CIFAR-10、CIFAR-100、NUS-WIDE 和 ImageNet）上进行的实验评估表明，OKD 优于现有的散列方法。在 CIFAR-10、NUS-WIDE 和 ImageNet 数据集上，OKD 的平均精确度分别达到 92.98%、88.72% 和 75.88%，在不同哈希码长度的数据集上，其精确度分别比之前的最佳方法高出 1.83%、1.69% 和 0.80%。通过利用提炼的知识匹配教师能力，OKD 解决了妨碍在资源有限的移动/嵌入式平台上部署强大模型的障碍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computers & Electrical Engineering 工程技术-工程：电子与电气

CiteScore

9.20

自引率

7.00%

发文量

661

审稿时长

47 days

期刊介绍： The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency. Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.