首页 > 最新文献

IET Computer Vision最新文献

英文 中文
Guest Editorial: Anomaly detection and open-set recognition applications for computer vision 嘉宾评论:计算机视觉中的异常检测和开放集识别应用
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-19 DOI: 10.1049/cvi2.12329
Hakan Cevikalp, Robi Polikar, Ömer Nezih Gerek, Songcan Chen, Chuanxing Geng

Anomaly detection is a method employed to identify data points or patterns that significantly deviate from expected or normal behaviour within a dataset. This approach aims to detect observations regarded as unusual, erroneous, anomalous, rare, or potentially indicative of fraudulent or malicious activity. Open-set recognition, also referred to as open-set identification or open-set classification, is a pattern recognition task that extends traditional classification by addressing the presence of unknown or novel classes during the testing phase. This approach highlights a strong connection between anomaly detection and open-set recognition, as both seek to identify samples originating from unknown classes or distributions. Open-set recognition methods frequently involve modelling both known and unknown classes during training, allowing for the capture of the distribution of known classes while explicitly addressing the space of unknown classes. Techniques in open-set recognition may include outlier detection, density estimation, or configuring decision boundaries to better differentiate between known and unknown classes. This special issue calls for original contributions introducing novel datasets, innovative architectures, and advanced training methods for tasks related to visual anomaly detection and open-set recognition.

异常检测是一种用于识别数据集中明显偏离预期或正常行为的数据点或模式的方法。该方法旨在检测被视为不寻常、错误、异常、罕见或潜在指示欺诈或恶意活动的观察结果。开放集识别,也称为开放集识别或开放集分类,是一种模式识别任务,它通过在测试阶段处理未知或新类的存在来扩展传统分类。这种方法强调了异常检测和开放集识别之间的紧密联系,因为两者都试图识别来自未知类别或分布的样本。开集识别方法通常涉及在训练期间对已知和未知类建模,允许捕获已知类的分布,同时显式地处理未知类的空间。开放集识别中的技术可能包括离群值检测、密度估计或配置决策边界以更好地区分已知和未知类。本期特刊呼吁原创投稿,介绍与视觉异常检测和开放集识别相关的新数据集、创新架构和先进的训练方法。
{"title":"Guest Editorial: Anomaly detection and open-set recognition applications for computer vision","authors":"Hakan Cevikalp,&nbsp;Robi Polikar,&nbsp;Ömer Nezih Gerek,&nbsp;Songcan Chen,&nbsp;Chuanxing Geng","doi":"10.1049/cvi2.12329","DOIUrl":"10.1049/cvi2.12329","url":null,"abstract":"<p>Anomaly detection is a method employed to identify data points or patterns that significantly deviate from expected or normal behaviour within a dataset. This approach aims to detect observations regarded as unusual, erroneous, anomalous, rare, or potentially indicative of fraudulent or malicious activity. Open-set recognition, also referred to as open-set identification or open-set classification, is a pattern recognition task that extends traditional classification by addressing the presence of unknown or novel classes during the testing phase. This approach highlights a strong connection between anomaly detection and open-set recognition, as both seek to identify samples originating from unknown classes or distributions. Open-set recognition methods frequently involve modelling both known and unknown classes during training, allowing for the capture of the distribution of known classes while explicitly addressing the space of unknown classes. Techniques in open-set recognition may include outlier detection, density estimation, or configuring decision boundaries to better differentiate between known and unknown classes. This special issue calls for original contributions introducing novel datasets, innovative architectures, and advanced training methods for tasks related to visual anomaly detection and open-set recognition.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1069-1071"},"PeriodicalIF":1.3,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12329","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Autoencoder-based unsupervised one-class learning for abnormal activity detection in egocentric videos 基于自编码器的无监督单类学习在自我中心视频中的异常活动检测
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-19 DOI: 10.1049/cvi2.12333
Haowen Hu, Ryo Hachiuma, Hideo Saito

In recent years, abnormal human activity detection has become an important research topic. However, most existing methods focus on detecting abnormal activities of pedestrians in surveillance videos; even those methods using egocentric videos deal with the activities of pedestrians around the camera wearer. In this paper, the authors present an unsupervised auto-encoder-based network trained by one-class learning that inputs RGB image sequences recorded by egocentric cameras to detect abnormal activities of the camera wearers themselves. To improve the performance of network, the authors introduce a ‘re-encoding’ architecture and a regularisation loss function term, minimising the KL divergence between the distributions of features extracted by the first and second encoders. Unlike the common use of KL divergence loss to obtain a feature distribution close to an already-known distribution, the aim is to encourage the features extracted by the second encoder to have a close distribution to those extracted from the first encoder. The authors evaluate the proposed method on the Epic-Kitchens-55 dataset and conduct an ablation study to analyse the functions of different components. Experimental results demonstrate that the method outperforms the comparison methods in all cases and demonstrate the effectiveness of the proposed re-encoding architecture and the regularisation term.

近年来,人体异常活动检测已成为一个重要的研究课题。然而,现有的方法大多侧重于检测监控视频中行人的异常活动;即使是那些使用以自我为中心的视频的方法,也要处理相机佩戴者周围行人的活动。在本文中,作者提出了一个基于无监督自编码器的网络,该网络通过单类学习训练,输入由自我中心相机记录的RGB图像序列,以检测相机佩戴者自己的异常活动。为了提高网络的性能,作者引入了“重新编码”架构和正则化损失函数项,最小化了由第一和第二编码器提取的特征分布之间的KL分歧。与通常使用KL散度损失来获得接近已知分布的特征分布不同,其目的是鼓励第二个编码器提取的特征与从第一个编码器提取的特征具有接近的分布。作者在Epic-Kitchens-55数据集上对所提出的方法进行了评估,并进行了消融研究,以分析不同成分的功能。实验结果表明,该方法在所有情况下都优于比较方法,并证明了所提出的重编码结构和正则化项的有效性。
{"title":"Autoencoder-based unsupervised one-class learning for abnormal activity detection in egocentric videos","authors":"Haowen Hu,&nbsp;Ryo Hachiuma,&nbsp;Hideo Saito","doi":"10.1049/cvi2.12333","DOIUrl":"10.1049/cvi2.12333","url":null,"abstract":"<p>In recent years, abnormal human activity detection has become an important research topic. However, most existing methods focus on detecting abnormal activities of pedestrians in surveillance videos; even those methods using egocentric videos deal with the activities of pedestrians around the camera wearer. In this paper, the authors present an unsupervised auto-encoder-based network trained by one-class learning that inputs RGB image sequences recorded by egocentric cameras to detect abnormal activities of the camera wearers themselves. To improve the performance of network, the authors introduce a ‘re-encoding’ architecture and a regularisation loss function term, minimising the KL divergence between the distributions of features extracted by the first and second encoders. Unlike the common use of KL divergence loss to obtain a feature distribution close to an already-known distribution, the aim is to encourage the features extracted by the second encoder to have a close distribution to those extracted from the first encoder. The authors evaluate the proposed method on the Epic-Kitchens-55 dataset and conduct an ablation study to analyse the functions of different components. Experimental results demonstrate that the method outperforms the comparison methods in all cases and demonstrate the effectiveness of the proposed re-encoding architecture and the regularisation term.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12333","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Metric-guided class-level alignment for domain adaptation 用于领域适应的度量引导的类级别对齐
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-15 DOI: 10.1049/cvi2.12322
Xiaoshun Wang, Yunhan Li

The utilisation of domain adaptation methods facilitates the resolution of classification challenges in an unlabelled target domain by capitalising on the labelled information from source domains. Unfortunately, previous domain adaptation methods have focused mostly on global domain adaptation and have not taken into account class-specific data, which leads to poor knowledge transfer performance. The study of class-level domain adaptation, which aims to precisely match the distributions of different domains, has garnered attention in recent times. However, existing investigations into class-level alignment frequently align domain features either directly on or in close proximity to classification boundaries, resulting in the creation of uncertain samples that could potentially impair classification accuracy. To address the aforementioned problem, we propose a new approach called metric-guided class-level alignment (MCA) as a solution to this problem. Specifically, we employ different metrics to enable the network to acquire supplementary information, thereby enhancing class-level alignment. Moreover, MCA can be effectively combined with existing domain-level alignment methods to successfully mitigate the challenges posed by domain shift. Extensive testing on commonly-used public datasets shows that our method outperforms many other cutting-edge domain adaptation methods, showing significant gains over baseline performance.

利用领域自适应方法,通过利用来自源领域的标记信息,促进了在未标记的目标领域中解决分类挑战。遗憾的是,以往的领域自适应方法主要关注全局领域自适应,没有考虑类特定数据,导致知识转移性能较差。类级域自适应的研究以精确匹配不同域的分布为目标,近年来引起了人们的广泛关注。然而,现有的类级对齐研究经常将领域特征直接对齐在分类边界上或接近分类边界,导致产生不确定的样本,这可能会损害分类的准确性。为了解决前面提到的问题,我们提出了一种称为度量导向类级别对齐(MCA)的新方法来解决这个问题。具体来说,我们使用不同的度量来使网络获得补充信息,从而增强类级别的一致性。此外,MCA可以有效地与现有的领域级对齐方法相结合,成功地缓解了领域转移带来的挑战。在常用的公共数据集上进行的广泛测试表明,我们的方法优于许多其他先进的领域自适应方法,在基线性能上显示出显着的收益。
{"title":"Metric-guided class-level alignment for domain adaptation","authors":"Xiaoshun Wang,&nbsp;Yunhan Li","doi":"10.1049/cvi2.12322","DOIUrl":"10.1049/cvi2.12322","url":null,"abstract":"<p>The utilisation of domain adaptation methods facilitates the resolution of classification challenges in an unlabelled target domain by capitalising on the labelled information from source domains. Unfortunately, previous domain adaptation methods have focused mostly on global domain adaptation and have not taken into account class-specific data, which leads to poor knowledge transfer performance. The study of class-level domain adaptation, which aims to precisely match the distributions of different domains, has garnered attention in recent times. However, existing investigations into class-level alignment frequently align domain features either directly on or in close proximity to classification boundaries, resulting in the creation of uncertain samples that could potentially impair classification accuracy. To address the aforementioned problem, we propose a new approach called metric-guided class-level alignment (MCA) as a solution to this problem. Specifically, we employ different metrics to enable the network to acquire supplementary information, thereby enhancing class-level alignment. Moreover, MCA can be effectively combined with existing domain-level alignment methods to successfully mitigate the challenges posed by domain shift. Extensive testing on commonly-used public datasets shows that our method outperforms many other cutting-edge domain adaptation methods, showing significant gains over baseline performance.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12322","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Representation alignment contrastive regularisation for multi-object tracking 多目标跟踪的表示对齐对比正则化
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-15 DOI: 10.1049/cvi2.12331
Shujie Chen, Zhonglin Liu, Jianfeng Dong, Xun Wang, Di Zhou

Achieving high-performance in multi-object tracking algorithms heavily relies on modelling spatial-temporal relationships during the data association stage. Mainstream approaches encompass rule-based and deep learning-based methods for spatial-temporal relationship modelling. While the former relies on physical motion laws, offering wider applicability but yielding suboptimal results for complex object movements, the latter, though achieving high-performance, lacks interpretability and involves complex module designs. This work aims to simplify deep learning-based spatial-temporal relationship models and introduce interpretability into features for data association. Specifically, a lightweight single-layer transformer encoder is utilised to model spatial-temporal relationships. To make features more interpretative, two contrastive regularisation losses based on representation alignment are proposed, derived from spatial-temporal consistency rules. By applying weighted summation to affinity matrices, the aligned features can seamlessly integrate into the data association stage of the original tracking workflow. Experimental results showcase that our model enhances the majority of existing tracking networks' performance without excessive complexity, with minimal increase in training overhead and nearly negligible computational and storage costs.

实现高性能的多目标跟踪算法在很大程度上依赖于数据关联阶段的时空关系建模。主流方法包括基于规则和基于深度学习的时空关系建模方法。前者依赖于物理运动定律,提供了更广泛的适用性,但对于复杂的物体运动产生了次优结果,后者虽然实现了高性能,但缺乏可解释性,并且涉及复杂的模块设计。这项工作旨在简化基于深度学习的时空关系模型,并将可解释性引入数据关联的特征中。具体来说,一个轻量级的单层变压器编码器被用来模拟时空关系。为了使特征更具可解释性,提出了两种基于表示对齐的对比正则化损失,这些正则化损失来源于时空一致性规则。通过对关联矩阵进行加权求和,将对齐后的特征无缝集成到原跟踪工作流的数据关联阶段。实验结果表明,我们的模型在没有过度复杂性的情况下增强了大多数现有跟踪网络的性能,训练开销的增加最小,计算和存储成本几乎可以忽略不计。
{"title":"Representation alignment contrastive regularisation for multi-object tracking","authors":"Shujie Chen,&nbsp;Zhonglin Liu,&nbsp;Jianfeng Dong,&nbsp;Xun Wang,&nbsp;Di Zhou","doi":"10.1049/cvi2.12331","DOIUrl":"10.1049/cvi2.12331","url":null,"abstract":"<p>Achieving high-performance in multi-object tracking algorithms heavily relies on modelling spatial-temporal relationships during the data association stage. Mainstream approaches encompass rule-based and deep learning-based methods for spatial-temporal relationship modelling. While the former relies on physical motion laws, offering wider applicability but yielding suboptimal results for complex object movements, the latter, though achieving high-performance, lacks interpretability and involves complex module designs. This work aims to simplify deep learning-based spatial-temporal relationship models and introduce interpretability into features for data association. Specifically, a lightweight single-layer transformer encoder is utilised to model spatial-temporal relationships. To make features more interpretative, two contrastive regularisation losses based on representation alignment are proposed, derived from spatial-temporal consistency rules. By applying weighted summation to affinity matrices, the aligned features can seamlessly integrate into the data association stage of the original tracking workflow. Experimental results showcase that our model enhances the majority of existing tracking networks' performance without excessive complexity, with minimal increase in training overhead and nearly negligible computational and storage costs.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12331","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid feature-based moving cast shadow detection 基于混合特征的移动投影检测
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-13 DOI: 10.1049/cvi2.12328
Jiangyan Dai, Huihui Zhang, Jin Gao, Chunlei Chen, Yugen Yi

The accurate detection of moving objects is essential in various applications of artificial intelligence, particularly in the field of intelligent surveillance systems. However, the moving cast shadow detection significantly decreases the precision of moving object detection because they share similar motion characteristics. To address the issue, the authors propose an innovative approach to detect moving cast shadows by combining the hybrid feature with a broad learning system (BLS). The approach involves extracting low-level features from the input and background images based on colour constancy and texture consistency principles that are shown to be highly effective in moving cast shadow detection. The authors then utilise the BLS to create a hybrid feature and BLS uses the extracted low-level features as input instead of the original data. BLS is an innovative form of deep learning that can map input to feature nodes and further enhance them by enhancement nodes, resulting in more compact features for classification. Finally, the authors develop an efficient and straightforward post-processing technique to improve the accuracy of moving object detection. To evaluate the effectiveness and generalisation ability, the authors conduct extensive experiments on public ATON-CVRR and CDnet datasets to verify the superior performance of our method by comparing with representative approaches.

在人工智能的各种应用中,尤其是在智能监控系统领域,准确检测移动物体至关重要。然而,移动投影检测会大大降低移动物体检测的精度,因为它们具有相似的运动特征。为了解决这个问题,作者提出了一种创新方法,通过将混合特征与广泛学习系统(BLS)相结合来检测移动投影。该方法基于色彩恒定性和纹理一致性原理,从输入图像和背景图像中提取低级特征,这些特征在移动投影检测中非常有效。然后,作者利用 BLS 创建混合特征,BLS 将提取的低级特征作为输入,而不是原始数据。BLS 是深度学习的一种创新形式,它可以将输入映射到特征节点,并通过增强节点进一步增强,从而获得更紧凑的分类特征。最后,作者开发了一种高效、直接的后处理技术,以提高移动物体检测的准确性。为了评估该方法的有效性和泛化能力,作者在公开的 ATON-CVRR 和 CDnet 数据集上进行了大量实验,通过与具有代表性的方法进行比较,验证了我们的方法的优越性能。
{"title":"Hybrid feature-based moving cast shadow detection","authors":"Jiangyan Dai,&nbsp;Huihui Zhang,&nbsp;Jin Gao,&nbsp;Chunlei Chen,&nbsp;Yugen Yi","doi":"10.1049/cvi2.12328","DOIUrl":"10.1049/cvi2.12328","url":null,"abstract":"<p>The accurate detection of moving objects is essential in various applications of artificial intelligence, particularly in the field of intelligent surveillance systems. However, the moving cast shadow detection significantly decreases the precision of moving object detection because they share similar motion characteristics. To address the issue, the authors propose an innovative approach to detect moving cast shadows by combining the hybrid feature with a broad learning system (BLS). The approach involves extracting low-level features from the input and background images based on colour constancy and texture consistency principles that are shown to be highly effective in moving cast shadow detection. The authors then utilise the BLS to create a hybrid feature and BLS uses the extracted low-level features as input instead of the original data. BLS is an innovative form of deep learning that can map input to feature nodes and further enhance them by enhancement nodes, resulting in more compact features for classification. Finally, the authors develop an efficient and straightforward post-processing technique to improve the accuracy of moving object detection. To evaluate the effectiveness and generalisation ability, the authors conduct extensive experiments on public ATON-CVRR and CDnet datasets to verify the superior performance of our method by comparing with representative approaches.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12328","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High precision light field image depth estimation via multi-region attention enhanced network 基于多区域关注增强网络的高精度光场图像深度估计
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-10 DOI: 10.1049/cvi2.12326
Jie Li, Wenxuan Yang, Chuanlun Zhang, Heng Li, Xinjia Li, Lin Wang, Yanling Wang, Xiaoyan Wang

Light field (LF) depth estimation is a key task with numerous practical applications. However, achieving high-precision depth estimation in challenging scenarios, such as occlusions and detailed regions (e.g. fine structures and edges), remains a significant challenge. To address this problem, the authors propose a LF depth estimation network based on multi-region selection and guided optimisation. Firstly, we construct a multi-region disparity selection module based on angular patch, which selects specific regions for generating angular patch, achieving representative sub-angular patch by balancing different regions. Secondly, different from traditional guided deformable convolution, the guided optimisation leverages colour prior information to learn the aggregation of sampling points, which enhances the deformable convolution ability by learning deformation parameters and fitting irregular windows. Finally, to achieve high-precision LF depth estimation, the authors have developed a network architecture based on the proposed multi-region disparity selection and guided optimisation module. Experiments demonstrate the effectiveness of network on the HCInew dataset, especially in handling occlusions and detailed regions.

光场(LF)深度估算是一项具有众多实际应用的关键任务。然而,在遮挡和细节区域(如精细结构和边缘)等具有挑战性的场景中实现高精度深度估计仍然是一项重大挑战。为了解决这个问题,作者提出了一种基于多区域选择和引导优化的 LF 深度估计网络。首先,我们构建了基于角补丁的多区域差异选择模块,该模块选择特定区域生成角补丁,通过平衡不同区域实现具有代表性的子角补丁。其次,与传统的引导式可变形卷积不同,引导式优化利用颜色先验信息来学习采样点的聚集,通过学习变形参数和拟合不规则窗口来增强可变形卷积的能力。最后,为了实现高精度的 LF 深度估计,作者基于所提出的多区域差异选择和引导优化模块开发了一种网络架构。实验证明了该网络在 HCInew 数据集上的有效性,尤其是在处理遮挡和细节区域方面。
{"title":"High precision light field image depth estimation via multi-region attention enhanced network","authors":"Jie Li,&nbsp;Wenxuan Yang,&nbsp;Chuanlun Zhang,&nbsp;Heng Li,&nbsp;Xinjia Li,&nbsp;Lin Wang,&nbsp;Yanling Wang,&nbsp;Xiaoyan Wang","doi":"10.1049/cvi2.12326","DOIUrl":"10.1049/cvi2.12326","url":null,"abstract":"<p>Light field (LF) depth estimation is a key task with numerous practical applications. However, achieving high-precision depth estimation in challenging scenarios, such as occlusions and detailed regions (e.g. fine structures and edges), remains a significant challenge. To address this problem, the authors propose a LF depth estimation network based on multi-region selection and guided optimisation. Firstly, we construct a multi-region disparity selection module based on angular patch, which selects specific regions for generating angular patch, achieving representative sub-angular patch by balancing different regions. Secondly, different from traditional guided deformable convolution, the guided optimisation leverages colour prior information to learn the aggregation of sampling points, which enhances the deformable convolution ability by learning deformation parameters and fitting irregular windows. Finally, to achieve high-precision LF depth estimation, the authors have developed a network architecture based on the proposed multi-region disparity selection and guided optimisation module. Experiments demonstrate the effectiveness of network on the HCInew dataset, especially in handling occlusions and detailed regions.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1390-1406"},"PeriodicalIF":1.3,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12326","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DPANet: Position-aware feature encoding and decoding for accurate large-scale point cloud semantic segmentation DPANet:位置感知特征编码和解码,用于精确的大规模点云语义分割
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-05 DOI: 10.1049/cvi2.12325
Haoying Zhao, Aimin Zhou

Due to the scattered, unordered, and unstructured nature of point clouds, it is challenging to extract local features. Existing methods tend to design redundant and less-discriminative spatial feature extraction methods in the encoder, while neglecting the utilisation of uneven distribution in the decoder. In this paper, the authors fully exploit the characteristics of the imbalanced distribution in point clouds and design our Position-aware Encoder (PAE) module and Position-aware Decoder (PAD) module. In the PAE module, the authors extract position relationships utilising both Cartesian coordinate system and polar coordinate system to enhance the distinction of features. In the PAD module, the authors recognise the inherent positional disparities between each point and its corresponding upsampled point, utilising these distinctions to enrich features and mitigate information loss. The authors conduct extensive experiments and compare the proposed DPANet with existing methods on two benchmarks S3DIS and Semantic3D. The experimental results demonstrate that the method outperforms the state-of-the-art approaches.

由于点云的散乱、无序和非结构化,局部特征的提取具有一定的挑战性。现有的方法倾向于在编码器中设计冗余和低判别性的空间特征提取方法,而忽略了对解码器中不均匀分布的利用。本文充分利用点云中不平衡分布的特点,设计了位置感知编码器(PAE)模块和位置感知解码器(PAD)模块。在PAE模块中,作者利用笛卡尔坐标系和极坐标系提取位置关系,增强特征的区分。在PAD模块中,作者认识到每个点与其相应的上采样点之间的固有位置差异,利用这些差异来丰富特征并减轻信息损失。作者进行了大量的实验,并将所提出的DPANet与现有方法在S3DIS和Semantic3D两个基准上进行了比较。实验结果表明,该方法优于目前最先进的方法。
{"title":"DPANet: Position-aware feature encoding and decoding for accurate large-scale point cloud semantic segmentation","authors":"Haoying Zhao,&nbsp;Aimin Zhou","doi":"10.1049/cvi2.12325","DOIUrl":"10.1049/cvi2.12325","url":null,"abstract":"<p>Due to the scattered, unordered, and unstructured nature of point clouds, it is challenging to extract local features. Existing methods tend to design redundant and less-discriminative spatial feature extraction methods in the encoder, while neglecting the utilisation of uneven distribution in the decoder. In this paper, the authors fully exploit the characteristics of the imbalanced distribution in point clouds and design our Position-aware Encoder (PAE) module and Position-aware Decoder (PAD) module. In the PAE module, the authors extract position relationships utilising both Cartesian coordinate system and polar coordinate system to enhance the distinction of features. In the PAD module, the authors recognise the inherent positional disparities between each point and its corresponding upsampled point, utilising these distinctions to enrich features and mitigate information loss. The authors conduct extensive experiments and compare the proposed DPANet with existing methods on two benchmarks S3DIS and Semantic3D. The experimental results demonstrate that the method outperforms the state-of-the-art approaches.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1376-1389"},"PeriodicalIF":1.3,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12325","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143248768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing overfitting in vehicle recognition by decorrelated sparse representation regularisation 利用去相关稀疏表示正则化方法减少车辆识别中的过拟合
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-30 DOI: 10.1049/cvi2.12320
Wanyu Wei, Xinsha Fu, Siqi Ma, Yaqiao Zhu, Ning Lu

Most state-of-the-art vehicle recognition methods benefit from the excellent feature extraction capabilities of convolutional neural networks (CNNs), which allow the models to perform well on the intra-dataset. However, they often show poor generalisation when facing cross-datasets due to the overfitting problem. For this issue, numerous studies have shown that models do not generalise well in new scenarios due to the high correlation between the representations in CNNs. Furthermore, over-parameterised CNNs have a large number of redundant representations. Therefore, we propose a novel Decorrelated Sparse Representation (DSR) regularisation. (1) It tries to minimise the correlation between feature maps to obtain decorrelated representations. (2) It forces the convolution kernels to extract meaningful features by allowing the sparse kernels to have additional optimisation. The DSR regularisation encourages diverse representations to reduce overfitting. Meanwhile, DSR can be applied to a wide range of vehicle recognition methods based on CNNs, and it does not require additional computation in the testing phase. In the experiments, DSR performs better than the original model on the intra-dataset and cross-dataset. Through ablation analysis, we find that DSR can drive the model to focus on the essential differences among all kinds of vehicles.

大多数最先进的车辆识别方法都受益于卷积神经网络(cnn)出色的特征提取能力,这使得模型在数据集内表现良好。然而,由于过度拟合问题,它们在面对交叉数据集时往往表现出较差的泛化。对于这个问题,许多研究表明,由于cnn中表示之间的高度相关性,模型在新场景中不能很好地泛化。此外,过度参数化的cnn具有大量冗余表示。因此,我们提出了一种新的去相关稀疏表示(DSR)正则化方法。(1)它试图最小化特征映射之间的相关性以获得去相关表示。(2)通过允许稀疏核进行额外的优化,迫使卷积核提取有意义的特征。DSR规范化鼓励多样化的表示,以减少过拟合。同时,DSR可以广泛应用于基于cnn的车辆识别方法,并且在测试阶段不需要额外的计算。在实验中,DSR在数据集内和数据集间的表现都优于原始模型。通过烧蚀分析,我们发现DSR可以驱动模型关注各类车辆之间的本质差异。
{"title":"Reducing overfitting in vehicle recognition by decorrelated sparse representation regularisation","authors":"Wanyu Wei,&nbsp;Xinsha Fu,&nbsp;Siqi Ma,&nbsp;Yaqiao Zhu,&nbsp;Ning Lu","doi":"10.1049/cvi2.12320","DOIUrl":"10.1049/cvi2.12320","url":null,"abstract":"<p>Most state-of-the-art vehicle recognition methods benefit from the excellent feature extraction capabilities of convolutional neural networks (CNNs), which allow the models to perform well on the intra-dataset. However, they often show poor generalisation when facing cross-datasets due to the overfitting problem. For this issue, numerous studies have shown that models do not generalise well in new scenarios due to the high correlation between the representations in CNNs. Furthermore, over-parameterised CNNs have a large number of redundant representations. Therefore, we propose a novel Decorrelated Sparse Representation (DSR) regularisation. (1) It tries to minimise the correlation between feature maps to obtain decorrelated representations. (2) It forces the convolution kernels to extract meaningful features by allowing the sparse kernels to have additional optimisation. The DSR regularisation encourages diverse representations to reduce overfitting. Meanwhile, DSR can be applied to a wide range of vehicle recognition methods based on CNNs, and it does not require additional computation in the testing phase. In the experiments, DSR performs better than the original model on the intra-dataset and cross-dataset. Through ablation analysis, we find that DSR can drive the model to focus on the essential differences among all kinds of vehicles.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1351-1361"},"PeriodicalIF":1.3,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12320","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RGAM: A refined global attention mechanism for medical image segmentation RGAM:一种精细的医学图像分割全局关注机制
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-29 DOI: 10.1049/cvi2.12323
Gangjun Ning, Pingping Liu, Chuangye Dai, Mingsi Sun, Qiuzhan Zhou, Qingliang Li

Attention mechanisms are popular techniques in computer vision that mimic the ability of the human visual system to analyse complex scenes, enhancing the performance of convolutional neural networks (CNN). In this paper, the authors propose a refined global attention module (RGAM) to address known shortcomings of existing attention mechanisms: (1) Traditional channel attention mechanisms are not refined enough when concentrating features, which may lead to overlooking important information. (2) The 1-dimensional attention map generated by traditional spatial attention mechanisms make it difficult to accurately summarise the weights of all channels in the original feature map at the same position. The RGAM is composed of two parts: refined channel attention and refined spatial attention. In the channel attention part, the authors used multiple weight-shared dilated convolutions with varying dilation rates to perceive features with different receptive fields at the feature compression stage. The authors also combined dilated convolutions with depth-wise convolution to reduce the number of parameters. In the spatial attention part, the authors grouped the feature maps and calculated the attention for each group independently, allowing for a more accurate assessment of each spatial position’s importance. Specifically, the authors calculated the attention weights separately for the width and height directions, similar to SENet, to obtain more refined attention weights. To validate the effectiveness and generality of the proposed method, the authors conducted extensive experiments on four distinct medical image segmentation datasets. The results demonstrate the effectiveness of RGAM in achieving state-of-the-art performance compared to existing methods.

注意机制是计算机视觉中流行的技术,它模仿了人类视觉系统分析复杂场景的能力,增强了卷积神经网络(CNN)的性能。本文提出了一种改进的全局注意模块(RGAM),以解决现有注意机制的已知缺陷:(1)传统的通道注意机制在集中特征时不够精炼,可能导致忽略重要信息。(2)传统空间注意机制生成的一维注意图难以准确总结出原始特征图中同一位置各通道的权重。RGAM由精细通道注意和精细空间注意两部分组成。在通道注意部分,作者在特征压缩阶段使用多个不同扩张率的权重共享扩张卷积来感知不同感受野的特征。作者还结合了扩展卷积和深度卷积来减少参数的数量。在空间注意力部分,作者将特征图分组,并独立计算每组的注意力,从而更准确地评估每个空间位置的重要性。具体来说,与SENet类似,作者分别计算了宽度和高度方向的注意权重,以获得更精细的注意权重。为了验证该方法的有效性和通用性,作者在四种不同的医学图像分割数据集上进行了大量的实验。结果表明,与现有方法相比,RGAM在实现最先进性能方面的有效性。
{"title":"RGAM: A refined global attention mechanism for medical image segmentation","authors":"Gangjun Ning,&nbsp;Pingping Liu,&nbsp;Chuangye Dai,&nbsp;Mingsi Sun,&nbsp;Qiuzhan Zhou,&nbsp;Qingliang Li","doi":"10.1049/cvi2.12323","DOIUrl":"10.1049/cvi2.12323","url":null,"abstract":"<p>Attention mechanisms are popular techniques in computer vision that mimic the ability of the human visual system to analyse complex scenes, enhancing the performance of convolutional neural networks (CNN). In this paper, the authors propose a refined global attention module (RGAM) to address known shortcomings of existing attention mechanisms: (1) Traditional channel attention mechanisms are not refined enough when concentrating features, which may lead to overlooking important information. (2) The 1-dimensional attention map generated by traditional spatial attention mechanisms make it difficult to accurately summarise the weights of all channels in the original feature map at the same position. The RGAM is composed of two parts: refined channel attention and refined spatial attention. In the channel attention part, the authors used multiple weight-shared dilated convolutions with varying dilation rates to perceive features with different receptive fields at the feature compression stage. The authors also combined dilated convolutions with depth-wise convolution to reduce the number of parameters. In the spatial attention part, the authors grouped the feature maps and calculated the attention for each group independently, allowing for a more accurate assessment of each spatial position’s importance. Specifically, the authors calculated the attention weights separately for the width and height directions, similar to SENet, to obtain more refined attention weights. To validate the effectiveness and generality of the proposed method, the authors conducted extensive experiments on four distinct medical image segmentation datasets. The results demonstrate the effectiveness of RGAM in achieving state-of-the-art performance compared to existing methods.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1362-1375"},"PeriodicalIF":1.3,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12323","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient class-agnostic obstacle detection for UAV-assisted waterway inspection systems 用于无人机辅助航道检测系统的高效类不可知障碍检测
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-25 DOI: 10.1049/cvi2.12319
Pablo Alonso, Jon Ander Íñiguez de Gordoa, Juan Diego Ortega, Marcos Nieto

Ensuring the safety of water airport runways is essential for the correct operation of seaplane flights. Among other tasks, airport operators must identify and remove various objects that may have drifted into the runway area. In this paper, the authors propose a complete and embedded-friendly waterway obstacle detection pipeline that runs on a camera-equipped drone. This system uses a class-agnostic version of the YOLOv7 detector, which is capable of detecting objects regardless of its class. Additionally, through the usage of the GPS data of the drone and camera parameters, the location of the objects are pinpointed with 0.58 m Distance Root Mean Square. In our own annotated dataset, the system is capable of generating alerts for detected objects with a recall of 0.833 and a precision of 1.

确保水上机场跑道的安全对于水上飞机的正常飞行至关重要。在其他任务中,机场运营商必须识别并清除可能飘入跑道区域的各种物体。在本文中,作者提出了一个完整的嵌入式友好的水路障碍物检测管道,该管道运行在配备摄像头的无人机上。该系统使用与类别无关的YOLOv7检测器版本,它能够检测对象,无论其类别如何。此外,通过利用无人机的GPS数据和相机参数,以0.58 m的距离均方根确定了目标的位置。在我们自己的注释数据集中,系统能够为检测到的对象生成警报,召回率为0.833,精度为1。
{"title":"Efficient class-agnostic obstacle detection for UAV-assisted waterway inspection systems","authors":"Pablo Alonso,&nbsp;Jon Ander Íñiguez de Gordoa,&nbsp;Juan Diego Ortega,&nbsp;Marcos Nieto","doi":"10.1049/cvi2.12319","DOIUrl":"10.1049/cvi2.12319","url":null,"abstract":"<p>Ensuring the safety of water airport runways is essential for the correct operation of seaplane flights. Among other tasks, airport operators must identify and remove various objects that may have drifted into the runway area. In this paper, the authors propose a complete and embedded-friendly waterway obstacle detection pipeline that runs on a camera-equipped drone. This system uses a class-agnostic version of the YOLOv7 detector, which is capable of detecting objects regardless of its class. Additionally, through the usage of the GPS data of the drone and camera parameters, the location of the objects are pinpointed with 0.58 m Distance Root Mean Square. In our own annotated dataset, the system is capable of generating alerts for detected objects with a recall of 0.833 and a precision of 1.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1087-1096"},"PeriodicalIF":1.3,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12319","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IET Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1