首页 > 最新文献

IET Computer Vision最新文献

英文 中文
Improving neural ordinary differential equations via knowledge distillation 通过知识提炼改进神经常微分方程
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-11-06 DOI: 10.1049/cvi2.12248
Haoyu Chu, Shikui Wei, Qiming Lu, Yao Zhao

Neural ordinary differential equations (ODEs) (Neural ODEs) construct the continuous dynamics of hidden units using ODEs specified by a neural network, demonstrating promising results on many tasks. However, Neural ODEs still do not perform well on image recognition tasks. The possible reason is that the one-hot encoding vector commonly used in Neural ODEs can not provide enough supervised information. A new training based on knowledge distillation is proposed to construct more powerful and robust Neural ODEs fitting image recognition tasks. Specially, the training of Neural ODEs is modelled into a teacher-student learning process, in which ResNets are proposed as the teacher model to provide richer supervised information. The experimental results show that the new training manner can improve the classification accuracy of Neural ODEs by 5.17%, 24.75%, 7.20%, and 8.99%, on Street View House Numbers, CIFAR10, CIFAR100, and Food-101, respectively. In addition, the effect of knowledge distillation is also evaluated in Neural ODEs on robustness against adversarial examples. The authors discover that incorporating knowledge distillation, coupled with the increase of the time horizon, can significantly enhance the robustness of Neural ODEs. The performance improvement is analysed from the perspective of the underlying dynamical system.

神经常微分方程(ODE)(Neural ODEs)利用神经网络指定的 ODEs 来构建隐藏单元的连续动态,在许多任务中都取得了可喜的成果。然而,神经 ODE 在图像识别任务中的表现仍然不佳。可能的原因是神经 ODEs 常用的单次编码向量无法提供足够的监督信息。我们提出了一种新的基于知识提炼的训练方法,以构建更强大、更稳健的神经 ODE,用于图像识别任务。特别是,神经 ODE 的训练被模拟成一个师生学习过程,其中 ResNets 被提议作为教师模型,以提供更丰富的监督信息。实验结果表明,在街景门牌号、CIFAR10、CIFAR100 和 Food-101 中,新的训练方式能将神经 ODE 的分类准确率分别提高 5.17%、24.75%、7.20% 和 8.99%。此外,还评估了知识提炼在神经 ODE 中对对抗性示例的鲁棒性的影响。作者发现,结合知识蒸馏和时间跨度的增加,可以显著提高神经 ODE 的鲁棒性。作者从底层动态系统的角度分析了性能的提高。
{"title":"Improving neural ordinary differential equations via knowledge distillation","authors":"Haoyu Chu,&nbsp;Shikui Wei,&nbsp;Qiming Lu,&nbsp;Yao Zhao","doi":"10.1049/cvi2.12248","DOIUrl":"10.1049/cvi2.12248","url":null,"abstract":"<p>Neural ordinary differential equations (ODEs) (Neural ODEs) construct the continuous dynamics of hidden units using ODEs specified by a neural network, demonstrating promising results on many tasks. However, Neural ODEs still do not perform well on image recognition tasks. The possible reason is that the one-hot encoding vector commonly used in Neural ODEs can not provide enough supervised information. A new training based on knowledge distillation is proposed to construct more powerful and robust Neural ODEs fitting image recognition tasks. Specially, the training of Neural ODEs is modelled into a teacher-student learning process, in which ResNets are proposed as the teacher model to provide richer supervised information. The experimental results show that the new training manner can improve the classification accuracy of Neural ODEs by 5.17%, 24.75%, 7.20%, and 8.99%, on Street View House Numbers, CIFAR10, CIFAR100, and Food-101, respectively. In addition, the effect of knowledge distillation is also evaluated in Neural ODEs on robustness against adversarial examples. The authors discover that incorporating knowledge distillation, coupled with the increase of the time horizon, can significantly enhance the robustness of Neural ODEs. The performance improvement is analysed from the perspective of the underlying dynamical system.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 2","pages":"304-314"},"PeriodicalIF":1.7,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12248","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135679482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved triplet loss for domain adaptation 改进域适应的三重损失
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-11-03 DOI: 10.1049/cvi2.12226
Xiaoshun Wang, Yunhan Li, Xiangliang Zhang

A technique known as domain adaptation is utilised to address classification challenges in an unlabelled target domain by leveraging labelled source domains. Previous domain adaptation approaches have predominantly focussed on global domain adaptation, neglecting class-level information and resulting in suboptimal transfer performance. In recent years, a considerable number of researchers have explored class-level domain adaptation, aiming to precisely align the distribution of diverse domains. Nevertheless, existing research on class-level alignment tends to align domain features either on or in proximity to classification boundaries, which introduces ambiguous samples that can impact classification accuracy. In this study, the authors propose a novel strategy called class guided constraints (CGC) to tackle this issue. Specifically, CGC is employed to preserve the compactness within classes and separability between classes of domain features prior to class-level alignment. Furthermore, the authors incorporate CGC in conjunction with similarity guided constraint. Comprehensive evaluations conducted on four public datasets demonstrate that our approach outperforms numerous state-of-the-art domain adaptation methods significantly and achieves greater improvements compared to the baseline approach.

利用一种被称为域适应的技术,通过利用已标记的源域来解决未标记目标域中的分类难题。以往的域适应方法主要侧重于全域适应,忽略了类级信息,导致传输性能不理想。近年来,相当多的研究人员探索了类级域适应,旨在精确调整不同域的分布。然而,现有的类级对齐研究倾向于在分类边界上或分类边界附近对齐领域特征,这就引入了可能影响分类准确性的模糊样本。在这项研究中,作者提出了一种名为 "类引导约束"(CGC)的新策略来解决这个问题。具体来说,在进行类级对齐之前,CGC 用于保持领域特征的类内紧凑性和类间分离性。此外,作者还将 CGC 与相似性引导约束相结合。在四个公共数据集上进行的综合评估表明,我们的方法明显优于众多最先进的领域适应方法,与基线方法相比取得了更大的改进。
{"title":"Improved triplet loss for domain adaptation","authors":"Xiaoshun Wang,&nbsp;Yunhan Li,&nbsp;Xiangliang Zhang","doi":"10.1049/cvi2.12226","DOIUrl":"10.1049/cvi2.12226","url":null,"abstract":"<p>A technique known as domain adaptation is utilised to address classification challenges in an unlabelled target domain by leveraging labelled source domains. Previous domain adaptation approaches have predominantly focussed on global domain adaptation, neglecting class-level information and resulting in suboptimal transfer performance. In recent years, a considerable number of researchers have explored class-level domain adaptation, aiming to precisely align the distribution of diverse domains. Nevertheless, existing research on class-level alignment tends to align domain features either on or in proximity to classification boundaries, which introduces ambiguous samples that can impact classification accuracy. In this study, the authors propose a novel strategy called class guided constraints (CGC) to tackle this issue. Specifically, CGC is employed to preserve the compactness within classes and separability between classes of domain features prior to class-level alignment. Furthermore, the authors incorporate CGC in conjunction with similarity guided constraint. Comprehensive evaluations conducted on four public datasets demonstrate that our approach outperforms numerous state-of-the-art domain adaptation methods significantly and achieves greater improvements compared to the baseline approach.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 1","pages":"84-96"},"PeriodicalIF":1.7,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12226","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135820304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey on weakly supervised 3D point cloud semantic segmentation 弱监督三维点云语义分割调查
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-11-02 DOI: 10.1049/cvi2.12250
Jingyi Wang, Yu Liu, Hanlin Tan, Maojun Zhang

With the popularity and advancement of 3D point cloud data acquisition technologies and sensors, research into 3D point clouds has made considerable strides based on deep learning. The semantic segmentation of point clouds, a crucial step in comprehending 3D scenes, has drawn much attention. The accuracy and effectiveness of fully supervised semantic segmentation tasks have greatly improved with the increase in the number of accessible datasets. However, these achievements rely on time-consuming and expensive full labelling. In solve of these existential issues, research on weakly supervised learning has recently exploded. These methods train neural networks to tackle 3D semantic segmentation tasks with fewer point labels. In addition to providing a thorough overview of the history and current state of the art in weakly supervised semantic segmentation of 3D point clouds, a detailed description of the most widely used data acquisition sensors, a list of publicly accessible benchmark datasets, and a look ahead to potential future development directions is provided.

随着三维点云数据采集技术和传感器的普及和发展,基于深度学习的三维点云研究取得了长足的进步。作为理解三维场景的关键步骤,点云的语义分割备受关注。随着可访问数据集数量的增加,完全有监督的语义分割任务的准确性和有效性大大提高。然而,这些成就依赖于耗时且昂贵的全标记。为了解决这些存在的问题,有关弱监督学习的研究最近呈现爆炸式增长。这些方法训练神经网络,以较少的点标签处理三维语义分割任务。除了对三维点云弱监督语义分割技术的历史和现状进行全面概述外,还详细介绍了最广泛使用的数据采集传感器、可公开访问的基准数据集列表,并展望了潜在的未来发展方向。
{"title":"A survey on weakly supervised 3D point cloud semantic segmentation","authors":"Jingyi Wang,&nbsp;Yu Liu,&nbsp;Hanlin Tan,&nbsp;Maojun Zhang","doi":"10.1049/cvi2.12250","DOIUrl":"10.1049/cvi2.12250","url":null,"abstract":"<p>With the popularity and advancement of 3D point cloud data acquisition technologies and sensors, research into 3D point clouds has made considerable strides based on deep learning. The semantic segmentation of point clouds, a crucial step in comprehending 3D scenes, has drawn much attention. The accuracy and effectiveness of fully supervised semantic segmentation tasks have greatly improved with the increase in the number of accessible datasets. However, these achievements rely on time-consuming and expensive full labelling. In solve of these existential issues, research on weakly supervised learning has recently exploded. These methods train neural networks to tackle 3D semantic segmentation tasks with fewer point labels. In addition to providing a thorough overview of the history and current state of the art in weakly supervised semantic segmentation of 3D point clouds, a detailed description of the most widely used data acquisition sensors, a list of publicly accessible benchmark datasets, and a look ahead to potential future development directions is provided.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 3","pages":"329-342"},"PeriodicalIF":1.7,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12250","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135934040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StableNet: Distinguishing the hard samples to overcome language priors in visual question answering StableNet:区分困难样本,克服视觉问题解答中的语言先验
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-10-28 DOI: 10.1049/cvi2.12249
Zhengtao Yu, Jia Zhao, Chenliang Guo, Ying Yang

With the booming fields of computer vision and natural language processing, cross-modal intersections such as visual question answering (VQA) have become very popular. However, several studies have shown that many VQA models suffer from severe language prior problems. After a series of experiments, the authors found that previous VQA models are in an unstable state, that is, when training is repeated several times on the same dataset, there are significant differences between the distributions of the predicted answers given by the models each time, and these models also perform unsatisfactorily in terms of accuracy. The reason for model instability is that some of the difficult samples bring serious interference to model training, so we design a method to measure model stability quantitatively and further propose a method that can alleviate both model imbalance and instability phenomena. Precisely, the question types are classified into simple and difficult ones different weighting measures are applied. By imposing constraints on the training process for both types of questions, the stability and accuracy of the model improve. Experimental results demonstrate the effectiveness of our method, which achieves 63.11% on VQA-CP v2 and 75.49% with the addition of the pre-trained model.

随着计算机视觉和自然语言处理领域的蓬勃发展,视觉问题解答(VQA)等跨模态交叉技术已变得非常流行。然而,多项研究表明,许多 VQA 模型都存在严重的语言先验问题。经过一系列实验,作者发现以往的 VQA 模型处于不稳定状态,即在同一数据集上重复训练多次后,每次模型给出的预测答案的分布都存在显著差异,而且这些模型在准确率方面的表现也不尽如人意。造成模型不稳定的原因是一些困难样本给模型训练带来了严重干扰,因此我们设计了一种定量测量模型稳定性的方法,并进一步提出了一种能同时缓解模型不平衡和不稳定现象的方法。具体来说,就是将问题类型分为简单和困难两种,并采用不同的加权措施。通过对两类问题的训练过程施加约束,模型的稳定性和准确性都得到了提高。实验结果表明,我们的方法非常有效,在 VQA-CP v2 上的准确率达到了 63.11%,在加入预训练模型后达到了 75.49%。
{"title":"StableNet: Distinguishing the hard samples to overcome language priors in visual question answering","authors":"Zhengtao Yu,&nbsp;Jia Zhao,&nbsp;Chenliang Guo,&nbsp;Ying Yang","doi":"10.1049/cvi2.12249","DOIUrl":"10.1049/cvi2.12249","url":null,"abstract":"<p>With the booming fields of computer vision and natural language processing, cross-modal intersections such as visual question answering (VQA) have become very popular. However, several studies have shown that many VQA models suffer from severe language prior problems. After a series of experiments, the authors found that previous VQA models are in an unstable state, that is, when training is repeated several times on the same dataset, there are significant differences between the distributions of the predicted answers given by the models each time, and these models also perform unsatisfactorily in terms of accuracy. The reason for model instability is that some of the difficult samples bring serious interference to model training, so we design a method to measure model stability quantitatively and further propose a method that can alleviate both model imbalance and instability phenomena. Precisely, the question types are classified into simple and difficult ones different weighting measures are applied. By imposing constraints on the training process for both types of questions, the stability and accuracy of the model improve. Experimental results demonstrate the effectiveness of our method, which achieves 63.11% on VQA-CP v2 and 75.49% with the addition of the pre-trained model.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 2","pages":"315-327"},"PeriodicalIF":1.7,"publicationDate":"2023-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12249","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136233692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RGB depth salient object detection via cross-modal attention and boundary feature guidance 通过跨模态注意力和边界特征引导进行 RGB 深度突出物体检测
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-10-19 DOI: 10.1049/cvi2.12244
Lingbing Meng, Mengya Yuan, Xuehan Shi, Le Zhang, Qingqing Liu, Dai Ping, Jinhua Wu, Fei Cheng

RGB depth (RGB-D) salient object detection (SOD) is a meaningful and challenging task, which has achieved good detection performance in dealing with simple scenes using convolutional neural networks, however, it cannot effectively handle scenes with complex contours of salient objects or similarly coloured salient objects and background. A novel end-to-end framework is proposed for RGB-D SOD, which comprises of four main components: the cross-modal attention feature enhancement (CMAFE) module, the multi-level contextual feature interaction (MLCFI) module, the boundary feature extraction (BFE) module, and the multi-level boundary attention guidance (MLBAG) module. The CMAFE module retains the more effective salient features by employing a dual-attention mechanism to filter noise from two modalities. In the MLCFI module, a shuffle operation is used for high-level and low-level channels to promote cross-channel information communication, and rich semantic information is extracted. The BFE module converts salient features into boundary features to generate boundary maps. The MLBAG module produces saliency maps by aggregating multi-level boundary saliency maps to guide cross-modal features in the decode stage. Extensive experiments are conducted on six public benchmark datasets, with the results demonstrating that the proposed model significantly outperforms 23 state-of-the-art RGB-D SOD models with regards to multiple evaluation metrics.

RGB 深度(RGB-D)突出物体检测(SOD)是一项有意义且极具挑战性的任务,利用卷积神经网络处理简单场景已取得了良好的检测性能,但它无法有效处理突出物体轮廓复杂或突出物体与背景颜色相似的场景。本文为 RGB-D SOD 提出了一个新颖的端到端框架,它由四个主要部分组成:跨模态注意力特征增强(CMAFE)模块、多层次上下文特征交互(MLCFI)模块、边界特征提取(BFE)模块和多层次边界注意力引导(MLBAG)模块。CMAFE 模块采用双重注意机制过滤来自两种模态的噪音,从而保留更有效的突出特征。在 MLCFI 模块中,高层和低层通道采用了洗牌操作,以促进跨通道信息交流,并提取丰富的语义信息。BFE 模块将突出特征转换为边界特征,生成边界图。MLBAG 模块通过聚合多级边界显著性图生成显著性图,从而在解码阶段引导跨模态特征。我们在六个公共基准数据集上进行了广泛的实验,结果表明,在多个评估指标方面,所提出的模型明显优于 23 个最先进的 RGB-D SOD 模型。
{"title":"RGB depth salient object detection via cross-modal attention and boundary feature guidance","authors":"Lingbing Meng,&nbsp;Mengya Yuan,&nbsp;Xuehan Shi,&nbsp;Le Zhang,&nbsp;Qingqing Liu,&nbsp;Dai Ping,&nbsp;Jinhua Wu,&nbsp;Fei Cheng","doi":"10.1049/cvi2.12244","DOIUrl":"10.1049/cvi2.12244","url":null,"abstract":"<p>RGB depth (RGB-D) salient object detection (SOD) is a meaningful and challenging task, which has achieved good detection performance in dealing with simple scenes using convolutional neural networks, however, it cannot effectively handle scenes with complex contours of salient objects or similarly coloured salient objects and background. A novel end-to-end framework is proposed for RGB-D SOD, which comprises of four main components: the cross-modal attention feature enhancement (CMAFE) module, the multi-level contextual feature interaction (MLCFI) module, the boundary feature extraction (BFE) module, and the multi-level boundary attention guidance (MLBAG) module. The CMAFE module retains the more effective salient features by employing a dual-attention mechanism to filter noise from two modalities. In the MLCFI module, a shuffle operation is used for high-level and low-level channels to promote cross-channel information communication, and rich semantic information is extracted. The BFE module converts salient features into boundary features to generate boundary maps. The MLBAG module produces saliency maps by aggregating multi-level boundary saliency maps to guide cross-modal features in the decode stage. Extensive experiments are conducted on six public benchmark datasets, with the results demonstrating that the proposed model significantly outperforms 23 state-of-the-art RGB-D SOD models with regards to multiple evaluation metrics.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 2","pages":"273-288"},"PeriodicalIF":1.7,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12244","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135779863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dense multi-scale context and asymmetric pooling embedding network for smoke segmentation 用于烟雾细分的密集多尺度背景和非对称汇集嵌入网络
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-10-17 DOI: 10.1049/cvi2.12246
Gang Wen, Fangrong Zhou, Yutang Ma, Hao Pan, Hao Geng, Jun Cao, Kang Li, Feiniu Yuan

It is very challenging to accurately segment smoke images because smoke has some adverse vision characteristics, such as anomalous shapes, blurred edges, and translucency. Existing methods cannot fully focus on the texture details of anomalous shapes and blurred edges simultaneously. To solve these problems, a Dense Multi-scale context and Asymmetric pooling Embedding Network (DMAENet) is proposed to model the smoke edge details and anomalous shapes for smoke segmentation. To capture the feature information from different scales, a Dense Multi-scale Context Module (DMCM) is proposed to further enhance the feature representation capability of our network under the help of asymmetric convolutions. To efficiently extract features for long-shaped objects, the authors use asymmetric pooling to propose an Asymmetric Pooling Enhancement Module (APEM). The vertical and horizontal pooling methods are responsible for enhancing features of irregular objects. Finally, a Feature Fusion Module (FFM) is designed, which accepts three inputs for improving performance. Low and high-level features are fused by pixel-wise summing, and then the summed feature maps are further enhanced in an attention manner. Experimental results on synthetic and real smoke datasets validate that all these modules can improve performance, and the proposed DMAENet obviously outperforms existing state-of-the-art methods.

由于烟雾具有一些不利于视觉的特征,如形状异常、边缘模糊和半透明,因此准确分割烟雾图像非常具有挑战性。现有方法无法同时完全关注异常形状和模糊边缘的纹理细节。为了解决这些问题,我们提出了一种密集多尺度上下文和非对称池化嵌入网络(DMAENet),用于为烟雾边缘细节和异常形状建模,以进行烟雾分割。为了捕捉不同尺度的特征信息,我们提出了密集多尺度上下文模块(DMCM),在非对称卷积的帮助下进一步增强网络的特征表示能力。为了有效提取长形物体的特征,作者利用非对称池化技术提出了非对称池化增强模块(APEM)。垂直和水平池化方法负责增强不规则物体的特征。最后,作者设计了一个特征融合模块(FFM),该模块接受三个输入以提高性能。低级和高级特征通过像素求和进行融合,然后以关注的方式进一步增强求和后的特征图。在合成和真实烟雾数据集上的实验结果验证了所有这些模块都能提高性能,而且所提出的 DMAENet 明显优于现有的最先进方法。
{"title":"A dense multi-scale context and asymmetric pooling embedding network for smoke segmentation","authors":"Gang Wen,&nbsp;Fangrong Zhou,&nbsp;Yutang Ma,&nbsp;Hao Pan,&nbsp;Hao Geng,&nbsp;Jun Cao,&nbsp;Kang Li,&nbsp;Feiniu Yuan","doi":"10.1049/cvi2.12246","DOIUrl":"10.1049/cvi2.12246","url":null,"abstract":"<p>It is very challenging to accurately segment smoke images because smoke has some adverse vision characteristics, such as anomalous shapes, blurred edges, and translucency. Existing methods cannot fully focus on the texture details of anomalous shapes and blurred edges simultaneously. To solve these problems, a Dense Multi-scale context and Asymmetric pooling Embedding Network (DMAENet) is proposed to model the smoke edge details and anomalous shapes for smoke segmentation. To capture the feature information from different scales, a Dense Multi-scale Context Module (DMCM) is proposed to further enhance the feature representation capability of our network under the help of asymmetric convolutions. To efficiently extract features for long-shaped objects, the authors use asymmetric pooling to propose an Asymmetric Pooling Enhancement Module (APEM). The vertical and horizontal pooling methods are responsible for enhancing features of irregular objects. Finally, a Feature Fusion Module (FFM) is designed, which accepts three inputs for improving performance. Low and high-level features are fused by pixel-wise summing, and then the summed feature maps are further enhanced in an attention manner. Experimental results on synthetic and real smoke datasets validate that all these modules can improve performance, and the proposed DMAENet obviously outperforms existing state-of-the-art methods.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 2","pages":"236-246"},"PeriodicalIF":1.7,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12246","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136033880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IoUNet++: Spatial cross-layer interaction-based bounding box regression for visual tracking IoUNet++:用于视觉跟踪的基于空间跨层交互的边界框回归
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-10-16 DOI: 10.1049/cvi2.12235
Shilei Wang, Yamin Han, Baozhen Sun, Jifeng Ning

Accurate target prediction, especially bounding box estimation, is a key problem in visual tracking. Many recently proposed trackers adopt the refinement module called IoU predictor by designing a high-level modulation vector to achieve bounding box estimation. However, due to the lack of spatial information that is important for precise box estimation, this simple one-dimensional modulation vector has limited refinement representation capability. In this study, a novel IoU predictor (IoUNet++) is designed to achieve more accurate bounding box estimation by investigating spatial matching with a spatial cross-layer interaction model. Rather than using a one-dimensional modulation vector to generate representations of the candidate bounding box for overlap prediction, this paper first extracts and fuses multi-level features of the target to generate template kernel with spatial description capability. Then, when aggregating the features of the template and the search region, the depthwise separable convolution correlation is adopted to preserve the spatial matching between the target feature and candidate feature, which makes their IoUNet++ network have better template representation and better fusion than the original network. The proposed IoUNet++ method with a plug-and-play style is applied to a series of strengthened trackers including DiMP++, SuperDiMP++ and SuperDIMP_AR++, which achieve consistent performance gain. Finally, experiments conducted on six popular tracking benchmarks show that their trackers outperformed the state-of-the-art trackers with significantly fewer training epochs.

准确的目标预测,尤其是边界框估计,是视觉跟踪中的一个关键问题。最近提出的许多跟踪器都采用了被称为 IoU 预测器的细化模块,通过设计高级调制矢量来实现边界框估计。然而,由于缺乏对精确边界框估计非常重要的空间信息,这种简单的一维调制矢量的细化表示能力有限。本研究设计了一种新型 IoU 预测器(IoUNet++),通过研究空间匹配与空间跨层交互模型来实现更精确的边界框估算。本文首先提取并融合目标的多层次特征,生成具有空间描述能力的模板内核,而不是使用一维调制向量来生成用于重叠预测的候选边界框表示。然后,在聚合模板和搜索区域的特征时,采用深度可分离卷积相关性来保留目标特征和候选特征之间的空间匹配,这使得他们的 IoUNet++ 网络比原始网络具有更好的模板表示和融合能力。提出的即插即用式 IoUNet++ 方法被应用于一系列强化跟踪器,包括 DiMP++、SuperDiMP++ 和 SuperDIMP_AR++,取得了一致的性能提升。最后,在六个流行的跟踪基准上进行的实验表明,它们的跟踪器在显著减少训练历时的情况下,性能优于最先进的跟踪器。
{"title":"IoUNet++: Spatial cross-layer interaction-based bounding box regression for visual tracking","authors":"Shilei Wang,&nbsp;Yamin Han,&nbsp;Baozhen Sun,&nbsp;Jifeng Ning","doi":"10.1049/cvi2.12235","DOIUrl":"10.1049/cvi2.12235","url":null,"abstract":"<p>Accurate target prediction, especially bounding box estimation, is a key problem in visual tracking. Many recently proposed trackers adopt the refinement module called IoU predictor by designing a high-level modulation vector to achieve bounding box estimation. However, due to the lack of spatial information that is important for precise box estimation, this simple one-dimensional modulation vector has limited refinement representation capability. In this study, a novel IoU predictor (IoUNet++) is designed to achieve more accurate bounding box estimation by investigating spatial matching with a spatial cross-layer interaction model. Rather than using a one-dimensional modulation vector to generate representations of the candidate bounding box for overlap prediction, this paper first extracts and fuses multi-level features of the target to generate template kernel with spatial description capability. Then, when aggregating the features of the template and the search region, the depthwise separable convolution correlation is adopted to preserve the spatial matching between the target feature and candidate feature, which makes their IoUNet++ network have better template representation and better fusion than the original network. The proposed IoUNet++ method with a plug-and-play style is applied to a series of strengthened trackers including DiMP++, SuperDiMP++ and SuperDIMP_AR++, which achieve consistent performance gain. Finally, experiments conducted on six popular tracking benchmarks show that their trackers outperformed the state-of-the-art trackers with significantly fewer training epochs.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 1","pages":"177-189"},"PeriodicalIF":1.7,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12235","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136142452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Representation constraint-based dual-channel network for face antispoofing 基于表征约束的人脸反欺骗双通道网络
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-10-10 DOI: 10.1049/cvi2.12245
Zuhe Li, Yuhao Cui, Fengqin Wang, Weihua Liu, Yongshuang Yang, Zeqi Yu, Bin Jiang, Hui Chen

Although multimodal face data have obvious advantages in describing live and spoofed features, single-modality face antispoofing technologies are still widely used when it is difficult to obtain multimodal face images or inconvenient to integrate and deploy multimodal sensors. Since the live/spoofed representations in visible light facial images include considerable face identity information interference, existing deep learning-based face antispoofing models achieve poor performance when only the visible light modality is used. To address the above problems, the authors design a dual-channel network structure and a constrained representation learning method for face antispoofing. First, they design a dual-channel attention mechanism-based grouped convolutional neural network (CNN) to learn important deceptive cues in live and spoofed faces. Second, they design inner contrastive estimation-based representation constraints for both live and spoofed samples to minimise the sample similarity loss to prevent the CNN from learning more facial appearance information. This increases the distance between live and spoofed faces and enhances the network's ability to identify deceptive cues. The evaluation results indicate that the framework we designed achieves an average classification error rate (ACER) of 2.37% on the visible light modality subset of the CASIA-SURF dataset and an ACER of 2.4% on the CASIA-SURF CeFA dataset, outperforming existing methods. The proposed method achieves low ACER scores in cross-dataset testing, demonstrating its advantage in domain generalisation.

尽管多模态人脸数据在描述实时和欺骗特征方面具有明显优势,但当难以获得多模态人脸图像或不便集成和部署多模态传感器时,单模态人脸反欺骗技术仍被广泛使用。由于可见光人脸图像中的实时/欺骗表示包含大量人脸身份信息干扰,现有的基于深度学习的人脸反欺骗模型在仅使用可见光模态时性能较差。针对上述问题,作者设计了一种双通道网络结构和一种用于人脸反欺骗的受限表征学习方法。首先,他们设计了一种基于双通道注意机制的分组卷积神经网络(CNN),以学习真实人脸和欺骗人脸中的重要欺骗性线索。其次,他们为真实样本和欺骗样本设计了基于对比估计的内部表示约束,以尽量减少样本相似性损失,从而防止 CNN 学习到更多面部外观信息。这就拉大了真实人脸和欺骗人脸之间的距离,增强了网络识别欺骗性线索的能力。评估结果表明,我们设计的框架在 CASIA-SURF 数据集的可见光模式子集上实现了 2.37% 的平均分类错误率(ACER),在 CASIA-SURF CeFA 数据集上实现了 2.4% 的 ACER,优于现有方法。所提出的方法在跨数据集测试中获得了较低的 ACER 分数,证明了其在领域通用性方面的优势。
{"title":"Representation constraint-based dual-channel network for face antispoofing","authors":"Zuhe Li,&nbsp;Yuhao Cui,&nbsp;Fengqin Wang,&nbsp;Weihua Liu,&nbsp;Yongshuang Yang,&nbsp;Zeqi Yu,&nbsp;Bin Jiang,&nbsp;Hui Chen","doi":"10.1049/cvi2.12245","DOIUrl":"10.1049/cvi2.12245","url":null,"abstract":"<p>Although multimodal face data have obvious advantages in describing live and spoofed features, single-modality face antispoofing technologies are still widely used when it is difficult to obtain multimodal face images or inconvenient to integrate and deploy multimodal sensors. Since the live/spoofed representations in visible light facial images include considerable face identity information interference, existing deep learning-based face antispoofing models achieve poor performance when only the visible light modality is used. To address the above problems, the authors design a dual-channel network structure and a constrained representation learning method for face antispoofing. First, they design a dual-channel attention mechanism-based grouped convolutional neural network (CNN) to learn important deceptive cues in live and spoofed faces. Second, they design inner contrastive estimation-based representation constraints for both live and spoofed samples to minimise the sample similarity loss to prevent the CNN from learning more facial appearance information. This increases the distance between live and spoofed faces and enhances the network's ability to identify deceptive cues. The evaluation results indicate that the framework we designed achieves an average classification error rate (ACER) of 2.37% on the visible light modality subset of the CASIA-SURF dataset and an ACER of 2.4% on the CASIA-SURF CeFA dataset, outperforming existing methods. The proposed method achieves low ACER scores in cross-dataset testing, demonstrating its advantage in domain generalisation.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 2","pages":"289-303"},"PeriodicalIF":1.7,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12245","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136359519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GFRNet: Rethinking the global contexts extraction in medical images segmentation through matrix factorization and self-attention GFRNet:通过矩阵因式分解和自我关注反思医学图像分割中的全局背景提取
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-10-08 DOI: 10.1049/cvi2.12243
Lifang Chen, Shanglai Wang, Li Wan, Jianghu Su, Shunfeng Wang

Due to the large fluctuations of the boundaries and internal variations of the lesion regions in medical image segmentation, current methods may have difficulty capturing sufficient global contexts effectively to deal with these inherent challenges, which may lead to a problem of segmented discrete masks undermining the performance of segmentation. Although self-attention can be implemented to capture long-distance dependencies between pixels, it has the disadvantage of computational complexity and the global contexts extracted by self-attention are still insufficient. To this end, the authors propose the GFRNet, which resorts to the idea of low-rank matrix factorization by forming global contexts locally to obtain global contexts that are totally different from contexts extracted by self-attention. The authors effectively integrate the different global contexts extract by self-attention and low-rank matrix factorization to extract versatile global contexts. Also, to recover the spatial contexts lost during the matrix factorization process and enhance boundary contexts, the authors propose the Modified Matrix Decomposition module which employ depth-wise separable convolution and spatial augmentation in the low-rank matrix factorization process. Comprehensive experiments are performed on four benchmark datasets showing that GFRNet performs better than the relevant CNN and transformer-based recipes.

在医学图像分割中,由于病变区域的边界波动和内部变化较大,目前的方法可能难以有效捕捉足够的全局上下文来应对这些固有的挑战,这可能会导致分割离散掩模的问题,影响分割的性能。虽然可以通过自注意来捕捉像素之间的长距离依赖关系,但其缺点是计算复杂,而且通过自注意提取的全局上下文仍然不够充分。为此,作者提出了 GFRNet,它借鉴了低秩矩阵因式分解的思想,通过局部形成全局上下文来获得与自我注意提取的上下文完全不同的全局上下文。作者有效地整合了自我注意和低阶矩阵因式分解提取的不同全局上下文,从而提取出多功能的全局上下文。此外,为了恢复在矩阵因式分解过程中丢失的空间上下文并增强边界上下文,作者提出了修正矩阵分解模块,在低阶矩阵因式分解过程中采用了深度可分离卷积和空间增强技术。在四个基准数据集上进行的综合实验表明,GFRNet 的性能优于相关的 CNN 和基于变换器的配方。
{"title":"GFRNet: Rethinking the global contexts extraction in medical images segmentation through matrix factorization and self-attention","authors":"Lifang Chen,&nbsp;Shanglai Wang,&nbsp;Li Wan,&nbsp;Jianghu Su,&nbsp;Shunfeng Wang","doi":"10.1049/cvi2.12243","DOIUrl":"10.1049/cvi2.12243","url":null,"abstract":"<p>Due to the large fluctuations of the boundaries and internal variations of the lesion regions in medical image segmentation, current methods may have difficulty capturing sufficient global contexts effectively to deal with these inherent challenges, which may lead to a problem of segmented discrete masks undermining the performance of segmentation. Although self-attention can be implemented to capture long-distance dependencies between pixels, it has the disadvantage of computational complexity and the global contexts extracted by self-attention are still insufficient. To this end, the authors propose the GFRNet, which resorts to the idea of low-rank matrix factorization by forming global contexts locally to obtain global contexts that are totally different from contexts extracted by self-attention. The authors effectively integrate the different global contexts extract by self-attention and low-rank matrix factorization to extract versatile global contexts. Also, to recover the spatial contexts lost during the matrix factorization process and enhance boundary contexts, the authors propose the Modified Matrix Decomposition module which employ depth-wise separable convolution and spatial augmentation in the low-rank matrix factorization process. Comprehensive experiments are performed on four benchmark datasets showing that GFRNet performs better than the relevant CNN and transformer-based recipes.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 2","pages":"260-272"},"PeriodicalIF":1.7,"publicationDate":"2023-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12243","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135251729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial: Spectral imaging powered computer vision 客座编辑:光谱成像驱动的计算机视觉
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-10-03 DOI: 10.1049/cvi2.12242
Jun Zhou, Fengchao Xiong, Lei Tong, Naoto Yokoya, Pedram Ghamisi
<p>The increasing accessibility and affordability of spectral imaging technology have revolutionised computer vision, allowing for data capture across various wavelengths beyond the visual spectrum. This advancement has greatly enhanced the capabilities of computers and AI systems in observing, understanding, and interacting with the world. Consequently, new datasets in various modalities, such as infrared, ultraviolet, fluorescent, multispectral, and hyperspectral, have been constructed, presenting fresh opportunities for computer vision research and applications.</p><p>Although significant progress has been made in processing, learning, and utilising data obtained through spectral imaging technology, several challenges persist in the field of computer vision. These challenges include the presence of low-quality images, sparse input, high-dimensional data, expensive data labelling processes, and a lack of methods to effectively analyse and utilise data considering their unique properties. Many mid-level and high-level computer vision tasks, such as object segmentation, detection and recognition, image retrieval and classification, and video tracking and understanding, still have not leveraged the advantages offered by spectral information. Additionally, the problem of effectively and efficiently fusing data in different modalities to create robust vision systems remains unresolved. Therefore, there is a pressing need for novel computer vision methods and applications to advance this research area. This special issue aims to provide a venue for researchers to present innovative computer vision methods driven by the spectral imaging technology.</p><p>This special issue has received 11 submissions. Among them, five papers have been accepted for publication, indicating their high quality and contribution to spectral imaging powered computer vision. Four papers have been rejected and sent to a transfer service for consideration in other journals or invited for re-submission after revision based on reviewers’ feedback.</p><p>The accepted papers can be categorised into three main groups based on the type of adopted data, that is, hyperspectral, multispectral, and X-ray images. Hyperspectral images provide material information about the scene and enable fine-grained object class classification. Multispectral images provide high spatial context and information beyond visible spectrum, such as infrared, providing enriched clues for visual computation. X-ray images can penetrate the surface of objects and provide internal structural information of targets, empowering medical applications, such as rib detection as exemplified by Tsai et al. Below is a brief summary of each paper in this special issue.</p><p>Zhong et al. proposed a lightweight criss-cross large kernel (CCLK) convolutional neural network for hyperspectral classification. The key component of this network is a CCLK module, which incorporates large kernels within the 1D convolutional layers and
光谱成像技术的可及性和可负担性的提高彻底改变了计算机视觉,允许在视觉光谱之外的各种波长上捕获数据。这一进步极大地增强了计算机和人工智能系统观察、理解和与世界互动的能力。因此,构建了各种模式的新数据集,如红外、紫外线、荧光、多光谱和高光谱,为计算机视觉研究和应用提供了新的机会。尽管在处理、学习和利用光谱成像技术获得的数据方面取得了重大进展,但计算机视觉领域仍存在一些挑战。这些挑战包括低质量图像的存在、稀疏输入、高维数据、昂贵的数据标记过程,以及缺乏有效分析和利用数据的方法来考虑其独特特性。许多中高级计算机视觉任务,如对象分割、检测和识别、图像检索和分类以及视频跟踪和理解,仍然没有利用光谱信息提供的优势。此外,有效和高效地融合不同模式的数据以创建强大的视觉系统的问题仍未解决。因此,迫切需要新的计算机视觉方法和应用来推进这一研究领域。本期特刊旨在为研究人员提供一个展示由光谱成像技术驱动的创新计算机视觉方法的场所。本特刊已收到11份投稿。其中,有五篇论文已被接受发表,表明它们的高质量和对光谱成像驱动的计算机视觉的贡献。四篇论文被拒绝,并被送往转稿服务机构,在其他期刊上进行审议,或根据审稿人的反馈,在修改后被邀请重新提交。根据采用的数据类型,被接受的论文可分为三大类,即高光谱、多光谱和X射线图像。高光谱图像提供了有关场景的物质信息,并实现了细粒度的对象类别分类。多光谱图像提供了高空间背景和超出可见光谱(如红外)的信息,为视觉计算提供了丰富的线索。X射线图像可以穿透物体表面,并提供目标的内部结构信息,从而增强医学应用的能力,例如Tsai等人的肋骨检测。以下是本期特刊中每一篇论文的简要摘要。钟等。提出了一种用于高光谱分类的轻量级交叉大核(CCLK)卷积神经网络。该网络的关键组件是CCLK模块,它在1D卷积层中包含大内核,并计算正交方向上的自注意。由于CCLK模块的大内核和多个堆栈,该网络可以以紧凑的模型大小有效地捕获长程上下文特征。实验结果表明,与其他轻量级深度学习方法相比,该网络实现了更强的分类性能和泛化能力。更少的参数也使其适合部署在资源有限的设备上。Ye等人。开发了一种域不变注意力网络来解决跨场景高光谱分类中的异构迁移学习问题。该网络包括特征对齐卷积神经网络(FACNN)和域不变注意力块(DIAB)。FACNN从源场景和目标场景中提取特征,并将两个场景中的异构特征投影到共享的低维子空间中,保证了场景之间的类一致性。DIAB通过专门设计的类特定域不变性损失来获得跨域一致性,以获得样本的域不变性和判别性注意力权重,从而减少域偏移。通过这种方式,将源场景的知识成功地转移到目标场景,缓解了高光谱分类中训练样本较小的问题。实验证明,该网络实现了很有前景的高光谱分类。左等。开发了一种多光谱行人检测方法,重点关注尺度感知的排列注意力和相邻特征聚合。尺度感知的置换注意力模块利用局部和全局注意力来增强特征金字塔中不同尺度的行人特征,提高了特征融合的质量。相邻分支特征聚合模块考虑了语义上下文和空间分辨率,提高了小型行人的检测精度。
{"title":"Guest Editorial: Spectral imaging powered computer vision","authors":"Jun Zhou,&nbsp;Fengchao Xiong,&nbsp;Lei Tong,&nbsp;Naoto Yokoya,&nbsp;Pedram Ghamisi","doi":"10.1049/cvi2.12242","DOIUrl":"https://doi.org/10.1049/cvi2.12242","url":null,"abstract":"&lt;p&gt;The increasing accessibility and affordability of spectral imaging technology have revolutionised computer vision, allowing for data capture across various wavelengths beyond the visual spectrum. This advancement has greatly enhanced the capabilities of computers and AI systems in observing, understanding, and interacting with the world. Consequently, new datasets in various modalities, such as infrared, ultraviolet, fluorescent, multispectral, and hyperspectral, have been constructed, presenting fresh opportunities for computer vision research and applications.&lt;/p&gt;&lt;p&gt;Although significant progress has been made in processing, learning, and utilising data obtained through spectral imaging technology, several challenges persist in the field of computer vision. These challenges include the presence of low-quality images, sparse input, high-dimensional data, expensive data labelling processes, and a lack of methods to effectively analyse and utilise data considering their unique properties. Many mid-level and high-level computer vision tasks, such as object segmentation, detection and recognition, image retrieval and classification, and video tracking and understanding, still have not leveraged the advantages offered by spectral information. Additionally, the problem of effectively and efficiently fusing data in different modalities to create robust vision systems remains unresolved. Therefore, there is a pressing need for novel computer vision methods and applications to advance this research area. This special issue aims to provide a venue for researchers to present innovative computer vision methods driven by the spectral imaging technology.&lt;/p&gt;&lt;p&gt;This special issue has received 11 submissions. Among them, five papers have been accepted for publication, indicating their high quality and contribution to spectral imaging powered computer vision. Four papers have been rejected and sent to a transfer service for consideration in other journals or invited for re-submission after revision based on reviewers’ feedback.&lt;/p&gt;&lt;p&gt;The accepted papers can be categorised into three main groups based on the type of adopted data, that is, hyperspectral, multispectral, and X-ray images. Hyperspectral images provide material information about the scene and enable fine-grained object class classification. Multispectral images provide high spatial context and information beyond visible spectrum, such as infrared, providing enriched clues for visual computation. X-ray images can penetrate the surface of objects and provide internal structural information of targets, empowering medical applications, such as rib detection as exemplified by Tsai et al. Below is a brief summary of each paper in this special issue.&lt;/p&gt;&lt;p&gt;Zhong et al. proposed a lightweight criss-cross large kernel (CCLK) convolutional neural network for hyperspectral classification. The key component of this network is a CCLK module, which incorporates large kernels within the 1D convolutional layers and","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 7","pages":"723-725"},"PeriodicalIF":1.7,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12242","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50125654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IET Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1