Guest Editorial: Spectral imaging powered computer vision

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IET Computer Vision Pub Date : 2023-10-03 DOI:10.1049/cvi2.12242

Jun Zhou, Fengchao Xiong, Lei Tong, Naoto Yokoya, Pedram Ghamisi

{"title":"Guest Editorial: Spectral imaging powered computer vision","authors":"Jun Zhou, Fengchao Xiong, Lei Tong, Naoto Yokoya, Pedram Ghamisi","doi":"10.1049/cvi2.12242","DOIUrl":null,"url":null,"abstract":"The increasing accessibility and affordability of spectral imaging technology have revolutionised computer vision, allowing for data capture across various wavelengths beyond the visual spectrum. This advancement has greatly enhanced the capabilities of computers and AI systems in observing, understanding, and interacting with the world. Consequently, new datasets in various modalities, such as infrared, ultraviolet, fluorescent, multispectral, and hyperspectral, have been constructed, presenting fresh opportunities for computer vision research and applications.Although significant progress has been made in processing, learning, and utilising data obtained through spectral imaging technology, several challenges persist in the field of computer vision. These challenges include the presence of low-quality images, sparse input, high-dimensional data, expensive data labelling processes, and a lack of methods to effectively analyse and utilise data considering their unique properties. Many mid-level and high-level computer vision tasks, such as object segmentation, detection and recognition, image retrieval and classification, and video tracking and understanding, still have not leveraged the advantages offered by spectral information. Additionally, the problem of effectively and efficiently fusing data in different modalities to create robust vision systems remains unresolved. Therefore, there is a pressing need for novel computer vision methods and applications to advance this research area. This special issue aims to provide a venue for researchers to present innovative computer vision methods driven by the spectral imaging technology.This special issue has received 11 submissions. Among them, five papers have been accepted for publication, indicating their high quality and contribution to spectral imaging powered computer vision. Four papers have been rejected and sent to a transfer service for consideration in other journals or invited for re-submission after revision based on reviewers’ feedback.The accepted papers can be categorised into three main groups based on the type of adopted data, that is, hyperspectral, multispectral, and X-ray images. Hyperspectral images provide material information about the scene and enable fine-grained object class classification. Multispectral images provide high spatial context and information beyond visible spectrum, such as infrared, providing enriched clues for visual computation. X-ray images can penetrate the surface of objects and provide internal structural information of targets, empowering medical applications, such as rib detection as exemplified by Tsai et al. Below is a brief summary of each paper in this special issue.Zhong et al. proposed a lightweight criss-cross large kernel (CCLK) convolutional neural network for hyperspectral classification. The key component of this network is a CCLK module, which incorporates large kernels within the 1D convolutional layers and computes self-attention in orthogonal directions. Due to the large kernels and multiple stacks of the CCLK modules, the network can effectively capture long-range contextual features with a compact model size. The experimental results show that the network achieves enhanced classification performance and generalisation capability compared to alternative lightweight deep learning methods. Fewer parameters also make it suitable for deployment on devices with limited resources.Ye et al. developed a domain-invariant attention network to address heterogeneous transfer learning in cross-scene hyperspectral classification. The network includes a feature-alignment convolutional neural networks (FACNN) and domain-invariant attention block (DIAB). FACNN extracts features from source and target scenes and projects the heterogeneous features from two scenes into a shared low-dimensional subspace, guaranteeing the class consistency between scenes. DIAB gains cross-domain consistency with a specially designed class-specific domain-invariance loss to obtain domain-invariant and discriminative attention weights for samples, reducing the domain shift. In this way, the knowledge of source scene is successfully transferred to the target scene, alleviating the small training samples in hyperspectral classification. The experiments prove that the network achieves promising hyperspectral classification.Zuo et al. developed a method for multispectral pedestrian detection, focusing on scale-aware permutation attention and adjacent feature aggregation. The scale-aware permutated attention module uses both local and global attention to enhance pedestrian features of different scales in the feature pyramid, improving the quality of feature fusion. The adjacent-branch feature aggregation module considers both semantic context and spatial resolution, leading to improved detection accuracy for small-sized pedestrians. Extensive experimental evaluations showcase notable improvements in both efficiency and accuracy compared to several existing methods.Guo et al. introduced a model called spatial-temporal-meteorological/long short-term memory network (STM-LSTM) to predict photovoltaic power generation. The proposed method integrates satellite image, historical meteorological data and historical power generation data, and uses cloud motion-aware learning to account for cloud movement and an attention mechanism to weigh the images in different bands from satellite cloud maps. The LSTM model combines the historical power generation sequence and meteorological change information for better accuracy. Experimental results show that the STM-LSTM model outperforms the baseline model to a certain margin, indicating its effectiveness in photovoltaic power generation prediction.Tsai et al. created a fully annotated EDARib-CXR dataset for the identification and localization of fractured ribs in frontal and oblique chest X-ray images. The dataset consists of 369 frontal and 829 oblique chest X rays, providing valuable resources for research in this field. Based on YOLOv5, two detection models, namely AB-YOLOv5 and PB-YOLOv5, were introduced. AB-YOLOv5 incorporates an auxiliary branch that enhances the resolution of extracted feature maps in the final convolutional network layer, facilitating the determination of fracture location when relevant characteristics are identified in the data. On the other hand, PB-YOLOv5 employs image patches instead of the entire image to preserve the features of small objects in downsampled images during training, enabling the detection of subtle lesion features. Moreover, the researchers implemented a two-stage cascade detector that effectively integrates these two models to further improve the detection performance. Experimental results demonstrated superior performance of the introduced methods, providing an applicability in reducing diagnostic time and alleviating the heavy workload faced by clinicians.Spectral imaging powered computer vision is still an emerging research area with great potential of creating new knowledge and methods. All of the accepted papers in this special issue highlight the crucial need for techniques that leverage information beyond the visual spectrum to help understand the world through spectral imaging devices. The rapid advancements in spectral imaging technology have paved the way for new opportunities and tasks in computer vision research and applications. We expect that more researchers will join this exciting area and develop solutions to handle tasks that cannot be solved well by traditional computer vision.Jun Zhou and Fengchao Xiong led the organization of this special issue, including compiling the potential author’s list, calling for papers, handling paper reviews, and drafting the editorial. Lei Tong, Naoto Yokoya and Pedram Ghamisi provided valuable input on the scope of this special issue, promoted this special issue to potential authors, and gave feedback to the editorial.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 7","pages":"723-725"},"PeriodicalIF":1.5000,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12242","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12242","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The increasing accessibility and affordability of spectral imaging technology have revolutionised computer vision, allowing for data capture across various wavelengths beyond the visual spectrum. This advancement has greatly enhanced the capabilities of computers and AI systems in observing, understanding, and interacting with the world. Consequently, new datasets in various modalities, such as infrared, ultraviolet, fluorescent, multispectral, and hyperspectral, have been constructed, presenting fresh opportunities for computer vision research and applications.

Although significant progress has been made in processing, learning, and utilising data obtained through spectral imaging technology, several challenges persist in the field of computer vision. These challenges include the presence of low-quality images, sparse input, high-dimensional data, expensive data labelling processes, and a lack of methods to effectively analyse and utilise data considering their unique properties. Many mid-level and high-level computer vision tasks, such as object segmentation, detection and recognition, image retrieval and classification, and video tracking and understanding, still have not leveraged the advantages offered by spectral information. Additionally, the problem of effectively and efficiently fusing data in different modalities to create robust vision systems remains unresolved. Therefore, there is a pressing need for novel computer vision methods and applications to advance this research area. This special issue aims to provide a venue for researchers to present innovative computer vision methods driven by the spectral imaging technology.

This special issue has received 11 submissions. Among them, five papers have been accepted for publication, indicating their high quality and contribution to spectral imaging powered computer vision. Four papers have been rejected and sent to a transfer service for consideration in other journals or invited for re-submission after revision based on reviewers’ feedback.

The accepted papers can be categorised into three main groups based on the type of adopted data, that is, hyperspectral, multispectral, and X-ray images. Hyperspectral images provide material information about the scene and enable fine-grained object class classification. Multispectral images provide high spatial context and information beyond visible spectrum, such as infrared, providing enriched clues for visual computation. X-ray images can penetrate the surface of objects and provide internal structural information of targets, empowering medical applications, such as rib detection as exemplified by Tsai et al. Below is a brief summary of each paper in this special issue.

Zhong et al. proposed a lightweight criss-cross large kernel (CCLK) convolutional neural network for hyperspectral classification. The key component of this network is a CCLK module, which incorporates large kernels within the 1D convolutional layers and computes self-attention in orthogonal directions. Due to the large kernels and multiple stacks of the CCLK modules, the network can effectively capture long-range contextual features with a compact model size. The experimental results show that the network achieves enhanced classification performance and generalisation capability compared to alternative lightweight deep learning methods. Fewer parameters also make it suitable for deployment on devices with limited resources.

Ye et al. developed a domain-invariant attention network to address heterogeneous transfer learning in cross-scene hyperspectral classification. The network includes a feature-alignment convolutional neural networks (FACNN) and domain-invariant attention block (DIAB). FACNN extracts features from source and target scenes and projects the heterogeneous features from two scenes into a shared low-dimensional subspace, guaranteeing the class consistency between scenes. DIAB gains cross-domain consistency with a specially designed class-specific domain-invariance loss to obtain domain-invariant and discriminative attention weights for samples, reducing the domain shift. In this way, the knowledge of source scene is successfully transferred to the target scene, alleviating the small training samples in hyperspectral classification. The experiments prove that the network achieves promising hyperspectral classification.

Zuo et al. developed a method for multispectral pedestrian detection, focusing on scale-aware permutation attention and adjacent feature aggregation. The scale-aware permutated attention module uses both local and global attention to enhance pedestrian features of different scales in the feature pyramid, improving the quality of feature fusion. The adjacent-branch feature aggregation module considers both semantic context and spatial resolution, leading to improved detection accuracy for small-sized pedestrians. Extensive experimental evaluations showcase notable improvements in both efficiency and accuracy compared to several existing methods.

Guo et al. introduced a model called spatial-temporal-meteorological/long short-term memory network (STM-LSTM) to predict photovoltaic power generation. The proposed method integrates satellite image, historical meteorological data and historical power generation data, and uses cloud motion-aware learning to account for cloud movement and an attention mechanism to weigh the images in different bands from satellite cloud maps. The LSTM model combines the historical power generation sequence and meteorological change information for better accuracy. Experimental results show that the STM-LSTM model outperforms the baseline model to a certain margin, indicating its effectiveness in photovoltaic power generation prediction.

Tsai et al. created a fully annotated EDARib-CXR dataset for the identification and localization of fractured ribs in frontal and oblique chest X-ray images. The dataset consists of 369 frontal and 829 oblique chest X rays, providing valuable resources for research in this field. Based on YOLOv5, two detection models, namely AB-YOLOv5 and PB-YOLOv5, were introduced. AB-YOLOv5 incorporates an auxiliary branch that enhances the resolution of extracted feature maps in the final convolutional network layer, facilitating the determination of fracture location when relevant characteristics are identified in the data. On the other hand, PB-YOLOv5 employs image patches instead of the entire image to preserve the features of small objects in downsampled images during training, enabling the detection of subtle lesion features. Moreover, the researchers implemented a two-stage cascade detector that effectively integrates these two models to further improve the detection performance. Experimental results demonstrated superior performance of the introduced methods, providing an applicability in reducing diagnostic time and alleviating the heavy workload faced by clinicians.

Spectral imaging powered computer vision is still an emerging research area with great potential of creating new knowledge and methods. All of the accepted papers in this special issue highlight the crucial need for techniques that leverage information beyond the visual spectrum to help understand the world through spectral imaging devices. The rapid advancements in spectral imaging technology have paved the way for new opportunities and tasks in computer vision research and applications. We expect that more researchers will join this exciting area and develop solutions to handle tasks that cannot be solved well by traditional computer vision.

Jun Zhou and Fengchao Xiong led the organization of this special issue, including compiling the potential author’s list, calling for papers, handling paper reviews, and drafting the editorial. Lei Tong, Naoto Yokoya and Pedram Ghamisi provided valuable input on the scope of this special issue, promoted this special issue to potential authors, and gave feedback to the editorial.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

客座编辑：光谱成像驱动的计算机视觉

光谱成像技术的可及性和可负担性的提高彻底改变了计算机视觉，允许在视觉光谱之外的各种波长上捕获数据。这一进步极大地增强了计算机和人工智能系统观察、理解和与世界互动的能力。因此，构建了各种模式的新数据集，如红外、紫外线、荧光、多光谱和高光谱，为计算机视觉研究和应用提供了新的机会。尽管在处理、学习和利用光谱成像技术获得的数据方面取得了重大进展，但计算机视觉领域仍存在一些挑战。这些挑战包括低质量图像的存在、稀疏输入、高维数据、昂贵的数据标记过程，以及缺乏有效分析和利用数据的方法来考虑其独特特性。许多中高级计算机视觉任务，如对象分割、检测和识别、图像检索和分类以及视频跟踪和理解，仍然没有利用光谱信息提供的优势。此外，有效和高效地融合不同模式的数据以创建强大的视觉系统的问题仍未解决。因此，迫切需要新的计算机视觉方法和应用来推进这一研究领域。本期特刊旨在为研究人员提供一个展示由光谱成像技术驱动的创新计算机视觉方法的场所。本特刊已收到11份投稿。其中，有五篇论文已被接受发表，表明它们的高质量和对光谱成像驱动的计算机视觉的贡献。四篇论文被拒绝，并被送往转稿服务机构，在其他期刊上进行审议，或根据审稿人的反馈，在修改后被邀请重新提交。根据采用的数据类型，被接受的论文可分为三大类，即高光谱、多光谱和X射线图像。高光谱图像提供了有关场景的物质信息，并实现了细粒度的对象类别分类。多光谱图像提供了高空间背景和超出可见光谱（如红外）的信息，为视觉计算提供了丰富的线索。X射线图像可以穿透物体表面，并提供目标的内部结构信息，从而增强医学应用的能力，例如Tsai等人的肋骨检测。以下是本期特刊中每一篇论文的简要摘要。钟等。提出了一种用于高光谱分类的轻量级交叉大核（CCLK）卷积神经网络。该网络的关键组件是CCLK模块，它在1D卷积层中包含大内核，并计算正交方向上的自注意。由于CCLK模块的大内核和多个堆栈，该网络可以以紧凑的模型大小有效地捕获长程上下文特征。实验结果表明，与其他轻量级深度学习方法相比，该网络实现了更强的分类性能和泛化能力。更少的参数也使其适合部署在资源有限的设备上。Ye等人。开发了一种域不变注意力网络来解决跨场景高光谱分类中的异构迁移学习问题。该网络包括特征对齐卷积神经网络（FACNN）和域不变注意力块（DIAB）。FACNN从源场景和目标场景中提取特征，并将两个场景中的异构特征投影到共享的低维子空间中，保证了场景之间的类一致性。DIAB通过专门设计的类特定域不变性损失来获得跨域一致性，以获得样本的域不变性和判别性注意力权重，从而减少域偏移。通过这种方式，将源场景的知识成功地转移到目标场景，缓解了高光谱分类中训练样本较小的问题。实验证明，该网络实现了很有前景的高光谱分类。左等。开发了一种多光谱行人检测方法，重点关注尺度感知的排列注意力和相邻特征聚合。尺度感知的置换注意力模块利用局部和全局注意力来增强特征金字塔中不同尺度的行人特征，提高了特征融合的质量。相邻分支特征聚合模块考虑了语义上下文和空间分辨率，提高了小型行人的检测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IET Computer Vision 工程技术-工程：电子与电气

CiteScore

3.30

自引率

11.80%

发文量

审稿时长

3.4 months

期刊介绍： IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision. IET Computer Vision welcomes submissions on the following topics: Biologically and perceptually motivated approaches to low level vision (feature detection, etc.); Perceptual grouping and organisation Representation, analysis and matching of 2D and 3D shape Shape-from-X Object recognition Image understanding Learning with visual inputs Motion analysis and object tracking Multiview scene analysis Cognitive approaches in low, mid and high level vision Control in visual systems Colour, reflectance and light Statistical and probabilistic models Face and gesture Surveillance Biometrics and security Robotics Vehicle guidance Automatic model aquisition Medical image analysis and understanding Aerial scene analysis and remote sensing Deep learning models in computer vision Both methodological and applications orientated papers are welcome. Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review. Special Issues Current Call for Papers: Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf