Guest Editorial: Special issue on advances in representation learning for computer vision

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE CAAI Transactions on Intelligence Technology Pub Date : 2024-02-01 DOI:10.1049/cit2.12290

Andrew Beng Jin Teoh, Thian Song Ong, Kian Ming Lim, Chin Poo Lee

{"title":"Guest Editorial: Special issue on advances in representation learning for computer vision","authors":"Andrew Beng Jin Teoh, Thian Song Ong, Kian Ming Lim, Chin Poo Lee","doi":"10.1049/cit2.12290","DOIUrl":null,"url":null,"abstract":"Deep learning has been a catalyst for a transformative revolution in machine learning and computer vision in the past decade. Within these research domains, methods grounded in deep learning have exhibited exceptional performance across a spectrum of tasks. The success of deep learning methods can be attributed to their capability to derive potent representations from data, integral for a myriad of downstream applications. These representations encapsulate the intrinsic structure, features, or latent variables characterising the underlying statistics of visual data. Despite these achievements, the challenge persists in effectively conducting representation learning of visual data with deep models, particularly when confronted with vast and noisy datasets. This special issue is a dedicated platform for researchers worldwide to disseminate their latest, high-quality articles, aiming to enhance readers' comprehension of the principles, limitations, and diverse applications of representation learning in computer vision.Wencheng Yang et al. present the first paper in this special issue. The authors thoroughly review feature extraction and learning methods in their work, specifically focusing on cancellable biometrics, a topic not addressed in previous survey articles. While preserving user data privacy, they emphasise the significance of cancellable biometrics in the capacity of feature representation for achieving good recognition accuracy. The paper states that selecting appropriate feature extraction and learning methods relies on individual applications' specific needs and restrictions. Deep learning-based feature learning has significantly improved cancellable biometrics in recent years, while hand-crafted feature extraction has matured. In addition, the research also discusses the problems and potential research areas in this field, providing valuable insights for future studies in cancellable biometrics, which attempts to strike a balance between privacy protection and recognition efficiency.The second paper by Mecheter et al. delves into the intricate realm of medical image analysis, specifically focusing on the segmentation of Magnetic Resonance images. The challenge lies in achieving precise segmentation, particularly with incorporating deep learning networks and the scarcity of sufficient medical images. Mecheter et al. tackle this challenge by proposing a novel approach—transfer learning from T1-weighted to T2-weighted MR sequences. Their work aims to enhance bone segmentation while minimising computational resources. The paper introduces an innovative excitation-based convolutional neural network and explores four transfer learning mechanisms. The hybrid transfer learning approach is particularly interesting, addressing overfitting concerns, and preserving features from both modalities with minimal computation time. Evaluating 14 clinical 3D brain MR and CT images demonstrates the superior performance and efficiency of hybrid transfer learning for bone segmentation, marking a significant advancement in the field.In the third paper, Yiyi Yuan et al. propose a robust watermarking system specifically designed for the secure transmission of medical images. They introduced a three-step process with image encryption through DWT-DCT and Logistic mapping, feature extraction utilising Daisy descriptors, and watermark generation employing perceptual hashing. This approach satisfies the basic requirements of medical image watermarking by utilising cryptographic knowledge and the zero-watermarking technique to embed watermarks in test images without modifying the original images. It also facilitates quick watermark insertion and removal with minimal computational loads. Besides, it demonstrates exceptional resilience to geometric and conventional attacks with a notable performance against rotational attacks.The fourth paper by Liping Zhang et al. aims to investigate a new method for describing human facial patterns, moving away from conventional pixel-based techniques to a continuous surface representation called the EMFace model. The proposed model focused on explicitly representing human faces using a mathematical model instead of typical approaches that rely on hand-crafted features and data-driven deep neural network learning methods. Specifically, EmFace has been effectively implemented in various face image processing applications, such as transformation, restoration, and denoising, demonstrating its adaptability and efficiency in managing intricate facial image data with simple parameter computations.The fifth paper by Jaekwon Lee et al. proposes an innovative multi-biometric strategies that bridge feature level fusion with score fusion. This method leverages the geometric intersection of cardinal directions to extract features from palm vein and palmprint biometrics, which are then utilised for identity verification. In order to secure the fused template, a feature transformation would be employed, which would produce a non-invertible template and potentially increase the resolution of the template to improve the accuracy of verification. The proposed system is assessed through extensive tests using three public palm databases to examine the four properties of template protection, including a novel stolen template scenario that has not been studied.The sixth paper by Wu et al. navigates the evolving landscape of semantic segmentation, building upon the remarkable success of fully convolutional networks in extracting discriminative pixel representations. In their insightful exploration, the authors identify persistent challenges in existing methods, specifically the substantial intra-class feature variation between different scenes and the constrained inter-class feature distinction within the same scene. Wu et al. present a paradigm shift by reimagining semantic segmentation through the lens of pixel-to-class centre similarity. Each weight vector in the segmentation head is an embedding for the corresponding semantic class across the entire dataset, facilitating the computation of similarity in the final feature space. Introducing the innovative Class Centre Similarity (CCS) layer, the authors propose adaptive class centres conditioned on each scene, effectively addressing intra-class variation. The CCS layer incorporates the Adaptive Class Centre Module and a specially designed Class Distance Loss to control inter-class and intra-class distances, resulting in a refined segmentation prediction. Extensive experiments showcase the superior performance of the proposed model against state-of-the-art methods, marking a significant advancement in the field of semantic segmentation.The seventh paper by Kim et al. presents a compelling contribution to image enhancement, focusing on the growing interest in global transformation functions learnt through deep neural networks. While recent approaches have shown promise, a common limitation resides in oversimplifying transformation functions, struggling to emulate intricate colour transformations between low-quality and manually retouched high-quality images. In addressing this challenge, the proposed algorithm takes a simple yet effective route by applying a channel-wise intensity transformation to the learnt embedding space. Departing from traditional colour spaces, the continuous intensity transformation maps input and output intensities, enhancing features before restoring them to colours. The enhancement network, developed for this purpose, generates multi-scale feature maps, derives a set of transformation functions, and applies continuous intensity transformations to produce enhanced images. The proposed approach, outperforming state-of-the-art alternatives across various datasets, represents a noteworthy advancement in image enhancement.The eighth paper by Zhang et al. presents an innovative approach to address the challenges posed by sparse representation in image classification, particularly focusing on deformable objects like human faces. Sparse representation, a powerful data classification algorithm, relies on known training samples to categorise test samples. However, deformable images, characterised by varying pixel intensities at the same location across different images of the same subject, pose a significant hurdle. Factors such as lighting, attitude, and occlusion further complicate feature extraction and correct classification. The authors propose a novel image representation and classification algorithm in response to these challenges. Their method generates virtual samples through a non-linear variation approach, effectively extracting low-frequency information crucial for representing deformable objects. Combining original and virtual samples enhances the algorithm's classification performance and robustness. By calculating expression coefficients separately for original and virtual samples using sparse representation principles and employing an efficient score fusion scheme, the proposed algorithm achieves superior classification results compared to conventional sparse representation algorithms, as demonstrated through compelling experimental results.The ninth paper by Zhe Liu et al. introduces a groundbreaking approach to address challenges in reconstructing high-resolution hyperspectral (HR-HS) images, focusing on the hyperspectral image super-resolution (HSI SR) paradigm. Deep learning-based algorithms have made substantial strides in this field by automatically learning image priors, typically through full supervision using large amounts of externally collected data. However, the construction of a database for merging low-resolution hyperspectral (LR-HS) and high-resolution multispectral (HR-MS) or RGB images, crucial for HSI SR research, encounters difficulties in obtaining corresponding training triplets simultaneously. To overcome these limitations, the authors propose a novel method leveraging deep internal learning (DIL) and deep self-supervised learning (DSL). DIL involves training a specific CNN model at test time by online preparation of training triplet samples from observed LR-HS/HR-MS images and downsampled LR-HS versions. Additionally, DSL exploits observations as unlabelled training samples, enhancing the model's adaptability to diverse environments. The authors present a robust HSI SR method without prior training through a unified deep framework consolidating DIL and DSL. They demonstrate its efficacy through extensive experiments on benchmark hyperspectral datasets, including CAVE and Harvard, showcasing significant performance gains over state-of-the-art methods.The collection of selected papers encompasses captivating topics and highlights key directions in this crucial realm of research and development. We aspire that these chosen papers enhance the community's comprehension of current trends and guide future focus areas. We extend our sincere gratitude to all the authors for choosing this special section as a platform to disseminate their research findings. Special thanks to the reviewers whose valuable and thoughtful feedback greatly benefited the authors. Additionally, we thank the IET staff members for their generous support and advice throughout preparing this special issue.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 1","pages":"1-3"},"PeriodicalIF":8.4000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12290","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12290","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning has been a catalyst for a transformative revolution in machine learning and computer vision in the past decade. Within these research domains, methods grounded in deep learning have exhibited exceptional performance across a spectrum of tasks. The success of deep learning methods can be attributed to their capability to derive potent representations from data, integral for a myriad of downstream applications. These representations encapsulate the intrinsic structure, features, or latent variables characterising the underlying statistics of visual data. Despite these achievements, the challenge persists in effectively conducting representation learning of visual data with deep models, particularly when confronted with vast and noisy datasets. This special issue is a dedicated platform for researchers worldwide to disseminate their latest, high-quality articles, aiming to enhance readers' comprehension of the principles, limitations, and diverse applications of representation learning in computer vision.

Wencheng Yang et al. present the first paper in this special issue. The authors thoroughly review feature extraction and learning methods in their work, specifically focusing on cancellable biometrics, a topic not addressed in previous survey articles. While preserving user data privacy, they emphasise the significance of cancellable biometrics in the capacity of feature representation for achieving good recognition accuracy. The paper states that selecting appropriate feature extraction and learning methods relies on individual applications' specific needs and restrictions. Deep learning-based feature learning has significantly improved cancellable biometrics in recent years, while hand-crafted feature extraction has matured. In addition, the research also discusses the problems and potential research areas in this field, providing valuable insights for future studies in cancellable biometrics, which attempts to strike a balance between privacy protection and recognition efficiency.

The second paper by Mecheter et al. delves into the intricate realm of medical image analysis, specifically focusing on the segmentation of Magnetic Resonance images. The challenge lies in achieving precise segmentation, particularly with incorporating deep learning networks and the scarcity of sufficient medical images. Mecheter et al. tackle this challenge by proposing a novel approach—transfer learning from T1-weighted to T2-weighted MR sequences. Their work aims to enhance bone segmentation while minimising computational resources. The paper introduces an innovative excitation-based convolutional neural network and explores four transfer learning mechanisms. The hybrid transfer learning approach is particularly interesting, addressing overfitting concerns, and preserving features from both modalities with minimal computation time. Evaluating 14 clinical 3D brain MR and CT images demonstrates the superior performance and efficiency of hybrid transfer learning for bone segmentation, marking a significant advancement in the field.

In the third paper, Yiyi Yuan et al. propose a robust watermarking system specifically designed for the secure transmission of medical images. They introduced a three-step process with image encryption through DWT-DCT and Logistic mapping, feature extraction utilising Daisy descriptors, and watermark generation employing perceptual hashing. This approach satisfies the basic requirements of medical image watermarking by utilising cryptographic knowledge and the zero-watermarking technique to embed watermarks in test images without modifying the original images. It also facilitates quick watermark insertion and removal with minimal computational loads. Besides, it demonstrates exceptional resilience to geometric and conventional attacks with a notable performance against rotational attacks.

The fourth paper by Liping Zhang et al. aims to investigate a new method for describing human facial patterns, moving away from conventional pixel-based techniques to a continuous surface representation called the EMFace model. The proposed model focused on explicitly representing human faces using a mathematical model instead of typical approaches that rely on hand-crafted features and data-driven deep neural network learning methods. Specifically, EmFace has been effectively implemented in various face image processing applications, such as transformation, restoration, and denoising, demonstrating its adaptability and efficiency in managing intricate facial image data with simple parameter computations.

The fifth paper by Jaekwon Lee et al. proposes an innovative multi-biometric strategies that bridge feature level fusion with score fusion. This method leverages the geometric intersection of cardinal directions to extract features from palm vein and palmprint biometrics, which are then utilised for identity verification. In order to secure the fused template, a feature transformation would be employed, which would produce a non-invertible template and potentially increase the resolution of the template to improve the accuracy of verification. The proposed system is assessed through extensive tests using three public palm databases to examine the four properties of template protection, including a novel stolen template scenario that has not been studied.

The sixth paper by Wu et al. navigates the evolving landscape of semantic segmentation, building upon the remarkable success of fully convolutional networks in extracting discriminative pixel representations. In their insightful exploration, the authors identify persistent challenges in existing methods, specifically the substantial intra-class feature variation between different scenes and the constrained inter-class feature distinction within the same scene. Wu et al. present a paradigm shift by reimagining semantic segmentation through the lens of pixel-to-class centre similarity. Each weight vector in the segmentation head is an embedding for the corresponding semantic class across the entire dataset, facilitating the computation of similarity in the final feature space. Introducing the innovative Class Centre Similarity (CCS) layer, the authors propose adaptive class centres conditioned on each scene, effectively addressing intra-class variation. The CCS layer incorporates the Adaptive Class Centre Module and a specially designed Class Distance Loss to control inter-class and intra-class distances, resulting in a refined segmentation prediction. Extensive experiments showcase the superior performance of the proposed model against state-of-the-art methods, marking a significant advancement in the field of semantic segmentation.

The seventh paper by Kim et al. presents a compelling contribution to image enhancement, focusing on the growing interest in global transformation functions learnt through deep neural networks. While recent approaches have shown promise, a common limitation resides in oversimplifying transformation functions, struggling to emulate intricate colour transformations between low-quality and manually retouched high-quality images. In addressing this challenge, the proposed algorithm takes a simple yet effective route by applying a channel-wise intensity transformation to the learnt embedding space. Departing from traditional colour spaces, the continuous intensity transformation maps input and output intensities, enhancing features before restoring them to colours. The enhancement network, developed for this purpose, generates multi-scale feature maps, derives a set of transformation functions, and applies continuous intensity transformations to produce enhanced images. The proposed approach, outperforming state-of-the-art alternatives across various datasets, represents a noteworthy advancement in image enhancement.

The eighth paper by Zhang et al. presents an innovative approach to address the challenges posed by sparse representation in image classification, particularly focusing on deformable objects like human faces. Sparse representation, a powerful data classification algorithm, relies on known training samples to categorise test samples. However, deformable images, characterised by varying pixel intensities at the same location across different images of the same subject, pose a significant hurdle. Factors such as lighting, attitude, and occlusion further complicate feature extraction and correct classification. The authors propose a novel image representation and classification algorithm in response to these challenges. Their method generates virtual samples through a non-linear variation approach, effectively extracting low-frequency information crucial for representing deformable objects. Combining original and virtual samples enhances the algorithm's classification performance and robustness. By calculating expression coefficients separately for original and virtual samples using sparse representation principles and employing an efficient score fusion scheme, the proposed algorithm achieves superior classification results compared to conventional sparse representation algorithms, as demonstrated through compelling experimental results.

The ninth paper by Zhe Liu et al. introduces a groundbreaking approach to address challenges in reconstructing high-resolution hyperspectral (HR-HS) images, focusing on the hyperspectral image super-resolution (HSI SR) paradigm. Deep learning-based algorithms have made substantial strides in this field by automatically learning image priors, typically through full supervision using large amounts of externally collected data. However, the construction of a database for merging low-resolution hyperspectral (LR-HS) and high-resolution multispectral (HR-MS) or RGB images, crucial for HSI SR research, encounters difficulties in obtaining corresponding training triplets simultaneously. To overcome these limitations, the authors propose a novel method leveraging deep internal learning (DIL) and deep self-supervised learning (DSL). DIL involves training a specific CNN model at test time by online preparation of training triplet samples from observed LR-HS/HR-MS images and downsampled LR-HS versions. Additionally, DSL exploits observations as unlabelled training samples, enhancing the model's adaptability to diverse environments. The authors present a robust HSI SR method without prior training through a unified deep framework consolidating DIL and DSL. They demonstrate its efficacy through extensive experiments on benchmark hyperspectral datasets, including CAVE and Harvard, showcasing significant performance gains over state-of-the-art methods.

The collection of selected papers encompasses captivating topics and highlights key directions in this crucial realm of research and development. We aspire that these chosen papers enhance the community's comprehension of current trends and guide future focus areas. We extend our sincere gratitude to all the authors for choosing this special section as a platform to disseminate their research findings. Special thanks to the reviewers whose valuable and thoughtful feedback greatly benefited the authors. Additionally, we thank the IET staff members for their generous support and advice throughout preparing this special issue.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

客座编辑：计算机视觉表征学习进展特刊

为了克服这些局限性，作者提出了一种利用深度内部学习（DIL）和深度自我监督学习（DSL）的新方法。DIL 包括在测试时通过从观测到的 LR-HS/HR-MS 图像和降采样 LR-HS 版本中在线准备训练三元组样本来训练特定的 CNN 模型。此外，DSL 利用观测结果作为无标签训练样本，增强了模型对不同环境的适应性。作者通过整合 DIL 和 DSL 的统一深度框架，提出了一种无需事先训练的稳健人机交互 SR 方法。他们在 CAVE 和哈佛等基准高光谱数据集上进行了大量实验，证明了该方法的功效，并展示了与最先进方法相比显著的性能提升。我们希望这些入选的论文能提高学术界对当前趋势的理解，并为未来的重点领域提供指导。我们衷心感谢所有作者选择本专栏作为传播其研究成果的平台。特别感谢审稿人，他们宝贵而周到的反馈意见让作者受益匪浅。此外，我们还要感谢 IET 工作人员在本特刊筹备过程中给予的大力支持和建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

CAAI Transactions on Intelligence Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

11.00

自引率

3.90%

发文量

134

审稿时长

35 weeks

期刊介绍： CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.