Abid Hussain , Heng-Chao li , Mehboob Hussain , Muqadar Ali , Shaheen Abbas , Danish Ali , Amir Rehman
{"title":"A gradual approach to knowledge distillation in deep supervised hashing for large-scale image retrieval","authors":"Abid Hussain , Heng-Chao li , Mehboob Hussain , Muqadar Ali , Shaheen Abbas , Danish Ali , Amir Rehman","doi":"10.1016/j.compeleceng.2024.109799","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning-based hashing methods have emerged as superior techniques for large-scale image retrieval, surpassing non-deep and unsupervised algorithms. However, most hashing models do not consider memory usage and computational costs, which hinders their use on resource-constrained devices. This paper proposes an Optimized Knowledge Distillation (OKD) approach for training compact deep supervised hashing models to address this issue. OKD utilizes a unique growing teacher-student training strategy where an evolving teacher continuously imparts enriched knowledge to the student. The teacher and student networks are divided into blocks, with auxiliary training modules placed between corresponding blocks. These modules extract knowledge from intermediate layers to capture multifaceted relationships in data and enhance distillation. Furthermore, a noise- and background-reduction mask (NBRM) is employed to filter noise from transferred knowledge, promoting focus on discriminative features. During training, the student utilizes various sources of supervision, including dynamically improving the teacher's predictions, ground truths, and hash code matching. This assists the student in closely replicating the teacher's abilities despite using fewer parameters. Experimental evaluation on four benchmark datasets - CIFAR-10, CIFAR-100, NUS-WIDE, and ImageNet - demonstrates that OKD outperforms existing hashing methods. OKD achieves 92.98 %, 88.72 %, and 75.88 % mean average precision on CIFAR-10, NUS-WIDE, and ImageNet datasets, respectively, with up to 1.83 %, 1.69 %, and 0.80 % higher accuracy than the previous best methods, across different hash code lengths. By matching teacher ability using distilled knowledge, OKD addresses the barriers that prevent powerful models from being deployed on resource-constrained mobile/embedded platforms.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"120 ","pages":"Article 109799"},"PeriodicalIF":4.0000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790624007262","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning-based hashing methods have emerged as superior techniques for large-scale image retrieval, surpassing non-deep and unsupervised algorithms. However, most hashing models do not consider memory usage and computational costs, which hinders their use on resource-constrained devices. This paper proposes an Optimized Knowledge Distillation (OKD) approach for training compact deep supervised hashing models to address this issue. OKD utilizes a unique growing teacher-student training strategy where an evolving teacher continuously imparts enriched knowledge to the student. The teacher and student networks are divided into blocks, with auxiliary training modules placed between corresponding blocks. These modules extract knowledge from intermediate layers to capture multifaceted relationships in data and enhance distillation. Furthermore, a noise- and background-reduction mask (NBRM) is employed to filter noise from transferred knowledge, promoting focus on discriminative features. During training, the student utilizes various sources of supervision, including dynamically improving the teacher's predictions, ground truths, and hash code matching. This assists the student in closely replicating the teacher's abilities despite using fewer parameters. Experimental evaluation on four benchmark datasets - CIFAR-10, CIFAR-100, NUS-WIDE, and ImageNet - demonstrates that OKD outperforms existing hashing methods. OKD achieves 92.98 %, 88.72 %, and 75.88 % mean average precision on CIFAR-10, NUS-WIDE, and ImageNet datasets, respectively, with up to 1.83 %, 1.69 %, and 0.80 % higher accuracy than the previous best methods, across different hash code lengths. By matching teacher ability using distilled knowledge, OKD addresses the barriers that prevent powerful models from being deployed on resource-constrained mobile/embedded platforms.
期刊介绍:
The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency.
Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.