首页 > 最新文献

Image and Vision Computing最新文献

英文 中文
HEDehazeNet: Unpaired image dehazing via enhanced haze generation HEDehazeNet:通过增强型雾度生成实现非配对图像去噪
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.imavis.2024.105236
Wentao Li , Deming Fan , Qi Zhu, Zhanjiang Gao, Hao Sun

Unpaired image dehazing models based on Cycle-Consistent Adversarial Networks (CycleGAN) typically consist of two cycle branches: dehazing-rehazing branch and hazing-dehazing branch. In these two branches, there is an asymmetry of information in the mutual transformation process between haze images and haze-free images. Previous models tended to focus more on the transformation process from haze images to haze-free images within the dehazing-rehazing branch, overlooking the provision of effective information for the formation of haze images in the hazing-dehazing branch. This oversight results in the production of haze patterns that are both monotonous and simplistic, ultimately impeding the overall performance and generalization capabilities of dehazing networks. In light of this, this paper proposes a novel model called HEDehazeNet (Dehazing Net based on Haze Generation Enhancement), which provides crucial information for the generation process of haze images through a dedicated haze generation enhancement module. This module is capable of producing three distinct modes of transmission maps - random transmission map, simulated transmission map, and mixed transmission maps combining both. Employing these transmission maps to generate hazing images with varying density and patterns provides the dehazing network with a more diverse and dynamically complex set of training samples, thereby enhancing its capacity to handle intricate scenes. Additionally, we made minor modifications to the U-Net, replacing residual blocks with multi-scale parallel convolutional blocks and channel self-attention, to further enhance the network's performance. Experiments were conducted on both synthetic and real-world datasets, substantiating the superiority of HEDehazeNet over the current state-of-the-art unpaired dehazing models.

基于循环一致性对抗网络(CycleGAN)的非配对图像去雾模型通常由两个循环分支组成:去雾-去雾分支和无雾-去雾分支。在这两个分支中,雾霾图像和无雾霾图像之间的相互转换过程存在信息不对称。以往的模型倾向于更多地关注去雾化-重雾化分支中雾霾图像到无雾霾图像的转化过程,而忽视了在去雾化分支中为雾霾图像的形成提供有效信息。这种疏忽导致产生的雾霾模式既单调又简单,最终影响了去雾霾网络的整体性能和泛化能力。有鉴于此,本文提出了一种名为 HEDehazeNet(基于雾霾生成增强的去雾霾网络)的新型模型,通过一个专门的雾霾生成增强模块,为雾霾图像的生成过程提供关键信息。该模块能够生成三种不同模式的透射图--随机透射图、模拟透射图和两者结合的混合透射图。利用这些传输图生成不同密度和模式的雾霾图像,可为去雾霾网络提供更多样化和动态复杂的训练样本集,从而增强其处理复杂场景的能力。此外,我们还对 U-Net 进行了微小的修改,用多尺度并行卷积块和通道自注意取代了残差块,从而进一步提高了网络的性能。我们在合成数据集和实际数据集上进行了实验,证明 HEDehazeNet 优于目前最先进的无配对去毛刺模型。
{"title":"HEDehazeNet: Unpaired image dehazing via enhanced haze generation","authors":"Wentao Li ,&nbsp;Deming Fan ,&nbsp;Qi Zhu,&nbsp;Zhanjiang Gao,&nbsp;Hao Sun","doi":"10.1016/j.imavis.2024.105236","DOIUrl":"10.1016/j.imavis.2024.105236","url":null,"abstract":"<div><p>Unpaired image dehazing models based on Cycle-Consistent Adversarial Networks (CycleGAN) typically consist of two cycle branches: dehazing-rehazing branch and hazing-dehazing branch. In these two branches, there is an asymmetry of information in the mutual transformation process between haze images and haze-free images. Previous models tended to focus more on the transformation process from haze images to haze-free images within the dehazing-rehazing branch, overlooking the provision of effective information for the formation of haze images in the hazing-dehazing branch. This oversight results in the production of haze patterns that are both monotonous and simplistic, ultimately impeding the overall performance and generalization capabilities of dehazing networks. In light of this, this paper proposes a novel model called HEDehazeNet (Dehazing Net based on Haze Generation Enhancement), which provides crucial information for the generation process of haze images through a dedicated haze generation enhancement module. This module is capable of producing three distinct modes of transmission maps - random transmission map, simulated transmission map, and mixed transmission maps combining both. Employing these transmission maps to generate hazing images with varying density and patterns provides the dehazing network with a more diverse and dynamically complex set of training samples, thereby enhancing its capacity to handle intricate scenes. Additionally, we made minor modifications to the U-Net, replacing residual blocks with multi-scale parallel convolutional blocks and channel self-attention, to further enhance the network's performance. Experiments were conducted on both synthetic and real-world datasets, substantiating the superiority of HEDehazeNet over the current state-of-the-art unpaired dehazing models.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105236"},"PeriodicalIF":4.2,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S026288562400341X/pdfft?md5=fed6ad904a3f88e450cfdc7c4feb5004&pid=1-s2.0-S026288562400341X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142089416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting adversarial samples by noise injection and denoising 通过噪声注入和去噪检测对抗样本
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.imavis.2024.105238
Han Zhang , Xin Zhang , Yuan Sun , Lixia Ji

Deep learning models are highly vulnerable to adversarial examples, leading to significant attention on techniques for detecting them. However, current methods primarily rely on detecting image features for identifying adversarial examples, often failing to address the diverse types and intensities of such examples. We propose a novel adversarial example detection method based on perturbation estimation and denoising to overcome this limitation. We develop an autoencoder to predict the latent adversarial perturbations of samples and select appropriately sized noise based on these predictions to cover the perturbations. Subsequently, we employ a non-blind denoising autoencoder to remove noise and residual perturbations effectively. This approach allows us to eliminate adversarial perturbations while preserving the original information, thus altering the prediction results of adversarial examples without affecting predictions on benign samples. Inconsistencies in predictions before and after processing by the model identify adversarial examples. Our experiments on datasets such as MNIST, CIFAR-10, and ImageNet demonstrate that our method surpasses other advanced detection methods in accuracy.

深度学习模型极易受到对抗示例的影响,因此检测对抗示例的技术备受关注。然而,目前的方法主要依赖于检测图像特征来识别对抗示例,往往无法解决此类示例的不同类型和强度问题。为了克服这一局限,我们提出了一种基于扰动估计和去噪的新型对抗示例检测方法。我们开发了一种自动编码器来预测样本的潜在对抗扰动,并根据这些预测选择适当大小的噪声来覆盖扰动。随后,我们采用非盲目去噪自编码器来有效去除噪声和残余扰动。通过这种方法,我们可以在消除对抗性扰动的同时保留原始信息,从而在不影响良性样本预测的情况下改变对抗性样本的预测结果。模型处理前后预测结果的不一致性可识别出对抗性示例。我们在 MNIST、CIFAR-10 和 ImageNet 等数据集上的实验表明,我们的方法在准确性上超过了其他先进的检测方法。
{"title":"Detecting adversarial samples by noise injection and denoising","authors":"Han Zhang ,&nbsp;Xin Zhang ,&nbsp;Yuan Sun ,&nbsp;Lixia Ji","doi":"10.1016/j.imavis.2024.105238","DOIUrl":"10.1016/j.imavis.2024.105238","url":null,"abstract":"<div><p>Deep learning models are highly vulnerable to adversarial examples, leading to significant attention on techniques for detecting them. However, current methods primarily rely on detecting image features for identifying adversarial examples, often failing to address the diverse types and intensities of such examples. We propose a novel adversarial example detection method based on perturbation estimation and denoising to overcome this limitation. We develop an autoencoder to predict the latent adversarial perturbations of samples and select appropriately sized noise based on these predictions to cover the perturbations. Subsequently, we employ a non-blind denoising autoencoder to remove noise and residual perturbations effectively. This approach allows us to eliminate adversarial perturbations while preserving the original information, thus altering the prediction results of adversarial examples without affecting predictions on benign samples. Inconsistencies in predictions before and after processing by the model identify adversarial examples. Our experiments on datasets such as MNIST, CIFAR-10, and ImageNet demonstrate that our method surpasses other advanced detection methods in accuracy.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105238"},"PeriodicalIF":4.2,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142129875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced human motion detection with hybrid RDA-WOA-based RNN and multiple hypothesis tracking for occlusion handling 利用基于混合 RDA-WOA 的 RNN 和多假设跟踪增强人体运动检测,以处理遮挡问题
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-21 DOI: 10.1016/j.imavis.2024.105234
Jeba Nega Cheltha , Chirag Sharma , Deepak Prashar , Arfat Ahmad Khan , Seifedine Kadry

Human motion detection in complex scenarios poses challenges due to occlusions. This paper presents an integrated approach for accurate human motion detections by combining Adapted Canny Edge detection as a preprocessing step, backbone-modified Mask R-CNN for precise segmentation, Hybrid RDA-WOA-based RNN as a classification, and a Multiple-hypothesis model for effective occlusion handling. Adapted Canny Edge detection is utilized as an initial preprocessing step to highlight significant edges in the input image. The resulting edge map enhances object boundaries and highlights structural features, simplifying subsequent processing steps. The improved image is then passed through backbone-modified Mask R-CNN for the pixel-level segmentation of humans. Backbone-modified Mask R-CNN along with IoU, Euclidean Distance, and Z-Score recognizes moving objects in complex scenes exactly. After recognizing moving objects, the optimized Hybrid RDA-WOA-based RNN classifies humans. To handle the self-occlusion, Multiple Hypothesis Tracking (MHT) is used. Real-world situations frequently include occlusions where humans can be partially or completely hidden by objects. The proposed approach integrates a Multiple-hypothesis model into the detection pipeline to address this challenge. Moreover, the proposed human motion detection approach includes an optimized Hybrid RDA-WOA-based RNN trained with 2D representations of 3D skeletal motion. The proposed work was evaluated using the IXMAS, KTH, Weizmann, NTU RGB + D, and UCF101 Datasets. It achieved an accuracy of 98% on the IXMAS, KTH, Weizmann, and UCF101 Datasets and 97.1% on the NTU RGB + D Dataset. The simulation results unveil the superiority of the proposed methodology over the existing approaches.

由于遮挡物的存在,复杂场景中的人体运动检测面临挑战。本文提出了一种综合方法,将适应性 Canny 边缘检测(作为预处理步骤)、骨干修正掩模 R-CNN (用于精确分割)、基于 RDA-WOA 的混合 RNN (用于分类)和多重假设模型(用于有效处理遮挡)结合起来,实现精确的人体运动检测。适应性 Canny 边缘检测被用作初始预处理步骤,以突出输入图像中的重要边缘。由此产生的边缘图可以增强物体边界并突出结构特征,从而简化后续处理步骤。然后,改进后的图像将通过骨干修正掩模 R-CNN 对人类进行像素级分割。骨干修正掩模 R-CNN 与 IoU、欧氏距离和 Z-Score 可准确识别复杂场景中的移动物体。识别运动物体后,基于 RDA-WOA 的混合 RNN 会对人类进行分类。为了处理自闭塞,使用了多重假设跟踪(MHT)技术。现实世界中经常会出现遮挡的情况,人可能部分或完全被物体遮挡。所提出的方法将多重假设模型集成到检测管道中,以应对这一挑战。此外,所提出的人体运动检测方法还包括一个经过优化的基于 RDA-WOA 的混合 RNN,该 RNN 使用三维骨骼运动的二维表示进行训练。我们使用 IXMAS、KTH、Weizmann、NTU RGB + D 和 UCF101 数据集对所提出的工作进行了评估。它在 IXMAS、KTH、Weizmann 和 UCF101 数据集上的准确率达到 98%,在 NTU RGB + D 数据集上的准确率达到 97.1%。仿真结果揭示了所提出的方法优于现有方法。
{"title":"Enhanced human motion detection with hybrid RDA-WOA-based RNN and multiple hypothesis tracking for occlusion handling","authors":"Jeba Nega Cheltha ,&nbsp;Chirag Sharma ,&nbsp;Deepak Prashar ,&nbsp;Arfat Ahmad Khan ,&nbsp;Seifedine Kadry","doi":"10.1016/j.imavis.2024.105234","DOIUrl":"10.1016/j.imavis.2024.105234","url":null,"abstract":"<div><p>Human motion detection in complex scenarios poses challenges due to occlusions. This paper presents an integrated approach for accurate human motion detections by combining Adapted Canny Edge detection as a preprocessing step, backbone-modified Mask R-CNN for precise segmentation, Hybrid RDA-WOA-based RNN as a classification, and a Multiple-hypothesis model for effective occlusion handling. Adapted Canny Edge detection is utilized as an initial preprocessing step to highlight significant edges in the input image. The resulting edge map enhances object boundaries and highlights structural features, simplifying subsequent processing steps. The improved image is then passed through backbone-modified Mask R-CNN for the pixel-level segmentation of humans. Backbone-modified Mask R-CNN along with IoU, Euclidean Distance, and Z-Score recognizes moving objects in complex scenes exactly. After recognizing moving objects, the optimized Hybrid RDA-WOA-based RNN classifies humans. To handle the self-occlusion, Multiple Hypothesis Tracking (MHT) is used. Real-world situations frequently include occlusions where humans can be partially or completely hidden by objects. The proposed approach integrates a Multiple-hypothesis model into the detection pipeline to address this challenge. Moreover, the proposed human motion detection approach includes an optimized Hybrid RDA-WOA-based RNN trained with 2D representations of 3D skeletal motion. The proposed work was evaluated using the IXMAS, KTH, Weizmann, NTU RGB + D, and UCF101 Datasets. It achieved an accuracy of 98% on the IXMAS, KTH, Weizmann, and UCF101 Datasets and 97.1% on the NTU RGB + D Dataset. The simulation results unveil the superiority of the proposed methodology over the existing approaches.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105234"},"PeriodicalIF":4.2,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual temporal memory network with high-order spatio-temporal graph learning for video object segmentation 采用高阶时空图学习的双时空记忆网络,用于视频对象分割
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-21 DOI: 10.1016/j.imavis.2024.105208
Jiaqing Fan , Shenglong Hu , Long Wang , Kaihua Zhang , Bo Liu

Typically, Video Object Segmentation (VOS) always has the semi-supervised setting in the testing phase. The VOS aims to track and segment one or several target objects in the following frames in the sequence, only given the ground-truth segmentation mask in the initial frame. A fundamental issue in VOS is how to best utilize the temporal information to improve the accuracy. To address the aforementioned issue, we provide an end-to-end framework that simultaneously extracts long-term and short-term historical sequential information to current frame as temporal memories. The integrated temporal architecture consists of a short-term and a long-term memory modules. Specifically, the short-term memory module leverages a high-order graph-based learning framework to simulate the fine-grained spatial–temporal interactions between local regions across neighboring frames in a video, thereby maintaining the spatio-temporal visual consistency on local regions. Meanwhile, to relieve the occlusion and drift issues, the long-term memory module employs a Simplified Gated Recurrent Unit (S-GRU) to model the long evolutions in a video. Furthermore, we design a novel direction-aware attention module to complementarily augment the object representation for more robust segmentation. Our experiments on three mainstream VOS benchmarks, containing DAVIS 2017, DAVIS 2016, and Youtube-VOS, demonstrate that our proposed solution provides a fair tradeoff performance between both speed and accuracy.

通常情况下,视频对象分割(VOS)在测试阶段总是采用半监督设置。VOS 的目的是在序列的后续帧中跟踪和分割一个或多个目标对象,但只给定初始帧中的地面实况分割掩码。VOS 的一个基本问题是如何更好地利用时间信息来提高准确性。为解决上述问题,我们提供了一个端到端框架,可同时提取当前帧的长期和短期历史序列信息作为时态记忆。集成时空架构由短期和长期记忆模块组成。具体来说,短期记忆模块利用基于高阶图的学习框架来模拟视频中相邻帧之间局部区域的细粒度时空交互,从而保持局部区域的时空视觉一致性。同时,为了缓解遮挡和漂移问题,长期记忆模块采用了简化门控递归单元(S-GRU)来模拟视频中的长期演变。此外,我们还设计了一个新颖的方向感知注意力模块,以补充增强对象表示,从而实现更稳健的分割。我们在三个主流 VOS 基准(包括 DAVIS 2017、DAVIS 2016 和 Youtube-VOS)上进行的实验表明,我们提出的解决方案在速度和准确性之间实现了公平的性能权衡。
{"title":"Dual temporal memory network with high-order spatio-temporal graph learning for video object segmentation","authors":"Jiaqing Fan ,&nbsp;Shenglong Hu ,&nbsp;Long Wang ,&nbsp;Kaihua Zhang ,&nbsp;Bo Liu","doi":"10.1016/j.imavis.2024.105208","DOIUrl":"10.1016/j.imavis.2024.105208","url":null,"abstract":"<div><p>Typically, Video Object Segmentation (VOS) always has the semi-supervised setting in the testing phase. The VOS aims to track and segment one or several target objects in the following frames in the sequence, only given the ground-truth segmentation mask in the initial frame. A fundamental issue in VOS is how to best utilize the temporal information to improve the accuracy. To address the aforementioned issue, we provide an end-to-end framework that simultaneously extracts long-term and short-term historical sequential information to current frame as temporal memories. The integrated temporal architecture consists of a short-term and a long-term memory modules. Specifically, the short-term memory module leverages a high-order graph-based learning framework to simulate the fine-grained spatial–temporal interactions between local regions across neighboring frames in a video, thereby maintaining the spatio-temporal visual consistency on local regions. Meanwhile, to relieve the occlusion and drift issues, the long-term memory module employs a Simplified Gated Recurrent Unit (S-GRU) to model the long evolutions in a video. Furthermore, we design a novel direction-aware attention module to complementarily augment the object representation for more robust segmentation. Our experiments on three mainstream VOS benchmarks, containing DAVIS 2017, DAVIS 2016, and Youtube-VOS, demonstrate that our proposed solution provides a fair tradeoff performance between both speed and accuracy.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105208"},"PeriodicalIF":4.2,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142058536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reverse cross-refinement network for camouflaged object detection 用于伪装物体检测的反向交叉推理网络
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-17 DOI: 10.1016/j.imavis.2024.105218
Qian Ye , Yaqin Zhou , Guanying Huo , Yan Liu , Yan Zhou , Qingwu Li

Due to the high intrinsic similarity between camouflaged objects and the background, camouflaged objects often exhibit blurred boundaries, making it challenging to distinguish the boundaries of objects. Existing methods still focus on the overall regional accuracy but not on the boundary quality and are not competent to identify camouflaged objects from the background in complex scenarios. Thus, we propose a novel reverse cross-refinement network called RCR-Net. Specifically, we design a diverse feature enhancement module that simulates the correspondingly expanded receptive fields of the human visual system by using convolutional kernels with different dilation rates in parallel. Also, the boundary attention module is used to reduce the noise of the bottom features. Moreover, a multi-scale feature aggregation module is proposed to transmit the diverse features from pixel-level camouflaged edges to the entire camouflaged object region in a coarse-to-fine manner, which consists of reverse guidance, group guidance, and position guidance. Reverse guidance mines complementary regions and details by erasing already estimated object regions. Group guidance and position guidance integrate different features through simple and effective splitting and connecting operations. Extensive experiments show that RCR-Net outperforms the existing 18 state-of-the-art methods on four widely-used COD datasets. Especially, compared with the existing top-1 model HitNet, RCR-Net significantly improves the performance by ∼ 16.4% (Mean Absolute Error) on the CAMO dataset, showing that RCR-Net could accurately detect camouflaged objects.

由于伪装物体与背景之间的内在相似性很高,伪装物体往往会表现出模糊的边界,这给区分物体的边界带来了挑战。现有方法仍然只关注整体区域精度,而不关注边界质量,无法在复杂场景下从背景中识别伪装物体。因此,我们提出了一种名为 RCR-Net 的新型反向交叉精细化网络。具体来说,我们设计了一个多样化的特征增强模块,通过并行使用不同扩张率的卷积核来模拟人类视觉系统相应扩大的感受野。此外,我们还使用边界关注模块来降低底部特征的噪声。此外,还提出了一个多尺度特征聚合模块,以从粗到细的方式将像素级伪装边缘的不同特征传输到整个伪装物体区域,该模块由反向引导、群引导和位置引导组成。反向引导通过擦除已经估算出的物体区域来挖掘互补区域和细节。群引导和位置引导则通过简单有效的分割和连接操作整合不同的特征。大量实验表明,RCR-Net 在四个广泛使用的 COD 数据集上的表现优于现有的 18 种先进方法。特别是在 CAMO 数据集上,与现有排名第一的模型 HitNet 相比,RCR-Net 的性能显著提高了 ∼ 16.4% (平均绝对误差),表明 RCR-Net 可以准确地检测伪装物体。
{"title":"Reverse cross-refinement network for camouflaged object detection","authors":"Qian Ye ,&nbsp;Yaqin Zhou ,&nbsp;Guanying Huo ,&nbsp;Yan Liu ,&nbsp;Yan Zhou ,&nbsp;Qingwu Li","doi":"10.1016/j.imavis.2024.105218","DOIUrl":"10.1016/j.imavis.2024.105218","url":null,"abstract":"<div><p>Due to the high intrinsic similarity between camouflaged objects and the background, camouflaged objects often exhibit blurred boundaries, making it challenging to distinguish the boundaries of objects. Existing methods still focus on the overall regional accuracy but not on the boundary quality and are not competent to identify camouflaged objects from the background in complex scenarios. Thus, we propose a novel reverse cross-refinement network called RCR-Net. Specifically, we design a diverse feature enhancement module that simulates the correspondingly expanded receptive fields of the human visual system by using convolutional kernels with different dilation rates in parallel. Also, the boundary attention module is used to reduce the noise of the bottom features. Moreover, a multi-scale feature aggregation module is proposed to transmit the diverse features from pixel-level camouflaged edges to the entire camouflaged object region in a coarse-to-fine manner, which consists of reverse guidance, group guidance, and position guidance. Reverse guidance mines complementary regions and details by erasing already estimated object regions. Group guidance and position guidance integrate different features through simple and effective splitting and connecting operations. Extensive experiments show that RCR-Net outperforms the existing 18 state-of-the-art methods on four widely-used COD datasets. Especially, compared with the existing top-1 model HitNet, RCR-Net significantly improves the performance by ∼<!--> <!-->16.4% (Mean Absolute Error) on the CAMO dataset, showing that RCR-Net could accurately detect camouflaged objects.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105218"},"PeriodicalIF":4.2,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142040985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A streamlined framework for BEV-based 3D object detection with prior masking 基于先验遮蔽的 BEV 3D 物体检测精简框架
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-15 DOI: 10.1016/j.imavis.2024.105229
Qinglin Tong , Junjie Zhang , Chenggang Yan , Dan Zeng

In the field of autonomous driving, perception tasks based on Bird's-Eye-View (BEV) have attracted considerable research attention due to their numerous benefits. Despite recent advancements in performance, efficiency remains a challenge for real-world implementation. In this study, we propose an efficient and effective framework that constructs a spatio-temporal BEV feature from multi-camera inputs and leverages it for 3D object detection. Specifically, the success of our network is primarily attributed to the design of the lifting strategy and a tailored BEV encoder. The lifting strategy is tasked with the conversion of 2D features into 3D representations. In the absence of depth information in the images, we innovatively introduce a prior mask for the BEV feature, which can assess the significance of the feature along the camera ray at a low cost. Moreover, we design a lightweight BEV encoder, which significantly boosts the capacity of this physical-interpretation representation. In the encoder, we investigate the spatial relationships of the BEV feature and retain rich residual information from upstream. To further enhance performance, we establish a 2D object detection auxiliary head to delve into insights offered by 2D object detection and leverage the 4D information to explore the cues within the sequence. Benefiting from all these designs, our network can capture abundant semantic information from 3D scenes and strikes a balanced trade-off between efficiency and performance.

在自动驾驶领域,基于 "鸟瞰视角"(BEV)的感知任务因其众多优点而备受研究关注。尽管最近在性能方面取得了进步,但效率仍然是现实世界中实施的一个挑战。在本研究中,我们提出了一个高效的框架,它能从多摄像头输入构建时空 BEV 特征,并利用它进行 3D 物体检测。具体来说,我们网络的成功主要归功于提升策略和定制 BEV 编码器的设计。提升策略的任务是将 2D 特征转换为 3D 表示。在图像中没有深度信息的情况下,我们创新性地为 BEV 特征引入了一个先验掩码,它能以较低的成本评估沿摄像机光线的特征的重要性。此外,我们还设计了一种轻量级 BEV 编码器,大大提高了这种物理解释表示的能力。在编码器中,我们研究了 BEV 特征的空间关系,并从上游保留了丰富的残余信息。为了进一步提高性能,我们建立了一个二维物体检测辅助头,以深入研究二维物体检测提供的洞察力,并利用四维信息来探索序列中的线索。得益于所有这些设计,我们的网络可以从三维场景中捕捉到丰富的语义信息,并在效率和性能之间取得平衡。
{"title":"A streamlined framework for BEV-based 3D object detection with prior masking","authors":"Qinglin Tong ,&nbsp;Junjie Zhang ,&nbsp;Chenggang Yan ,&nbsp;Dan Zeng","doi":"10.1016/j.imavis.2024.105229","DOIUrl":"10.1016/j.imavis.2024.105229","url":null,"abstract":"<div><p>In the field of autonomous driving, perception tasks based on Bird's-Eye-View (BEV) have attracted considerable research attention due to their numerous benefits. Despite recent advancements in performance, efficiency remains a challenge for real-world implementation. In this study, we propose an efficient and effective framework that constructs a spatio-temporal BEV feature from multi-camera inputs and leverages it for 3D object detection. Specifically, the success of our network is primarily attributed to the design of the lifting strategy and a tailored BEV encoder. The lifting strategy is tasked with the conversion of 2D features into 3D representations. In the absence of depth information in the images, we innovatively introduce a prior mask for the BEV feature, which can assess the significance of the feature along the camera ray at a low cost. Moreover, we design a lightweight BEV encoder, which significantly boosts the capacity of this physical-interpretation representation. In the encoder, we investigate the spatial relationships of the BEV feature and retain rich residual information from upstream. To further enhance performance, we establish a 2D object detection auxiliary head to delve into insights offered by 2D object detection and leverage the 4D information to explore the cues within the sequence. Benefiting from all these designs, our network can capture abundant semantic information from 3D scenes and strikes a balanced trade-off between efficiency and performance.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105229"},"PeriodicalIF":4.2,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142117630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Black-box model adaptation for semantic segmentation 用于语义分割的黑盒模型适配
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-15 DOI: 10.1016/j.imavis.2024.105233
Zhiheng Zhou , Wanlin Yue , Yinglie Cao , Shifu Shen

Model adaptation aims to transfer knowledge in pre-trained source models to a new unlabeled dataset. Despite impressive progress, prior methods always need to access the source model and develop data-reconstruction approaches to align the data distributions between target samples and the generated instances, which may raise privacy concerns from source individuals. To alleviate the above problem, we propose a new method in the setting of Black-box model adaptation for semantic segmentation, in which only the pseudo-labels from multiple source domain is required during the adaptation process. Specifically, the proposed method structurally distills the knowledge with multiple classifiers to obtain a customized target model, and then the predictions of target data are refined to fit the target domain with co-regularization. We conduct extensive experiments on several standard datasets, and our method can achieve promising results.

模型适配旨在将预先训练好的源模型中的知识转移到新的无标记数据集上。尽管取得了令人瞩目的进展,但先前的方法始终需要访问源模型并开发数据重建方法,以调整目标样本和生成实例之间的数据分布,这可能会引起源个体的隐私担忧。为了缓解上述问题,我们在语义分割的黑盒模型适配设置中提出了一种新方法,在适配过程中只需要来自多个源域的伪标签。具体来说,我们提出的方法通过多个分类器对知识进行结构化提炼,从而得到一个定制的目标模型,然后通过共规范化对目标数据的预测进行细化,以适应目标领域。我们在多个标准数据集上进行了大量实验,结果表明我们的方法取得了良好的效果。
{"title":"Black-box model adaptation for semantic segmentation","authors":"Zhiheng Zhou ,&nbsp;Wanlin Yue ,&nbsp;Yinglie Cao ,&nbsp;Shifu Shen","doi":"10.1016/j.imavis.2024.105233","DOIUrl":"10.1016/j.imavis.2024.105233","url":null,"abstract":"<div><p>Model adaptation aims to transfer knowledge in pre-trained source models to a new unlabeled dataset. Despite impressive progress, prior methods always need to access the source model and develop data-reconstruction approaches to align the data distributions between target samples and the generated instances, which may raise privacy concerns from source individuals. To alleviate the above problem, we propose a new method in the setting of Black-box model adaptation for semantic segmentation, in which only the pseudo-labels from multiple source domain is required during the adaptation process. Specifically, the proposed method structurally distills the knowledge with multiple classifiers to obtain a customized target model, and then the predictions of target data are refined to fit the target domain with co-regularization. We conduct extensive experiments on several standard datasets, and our method can achieve promising results.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105233"},"PeriodicalIF":4.2,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale large kernel convolution and hybrid attention network for remote sensing image dehazing 用于遥感图像去毛刺的多尺度大核卷积和混合注意力网络
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-14 DOI: 10.1016/j.imavis.2024.105212
Hang Su, Lina Liu, Zenghui Wang, Mingliang Gao

Remote sensing (RS) image dehazing holds significant importance in enhancing the quality and information extraction capability of RS imagery. The enhancement in image dehazing quality has progressively advanced alongside the evolution of convolutional neural network (CNN). Due to the fixed receptive field of CNN, there is insufficient utilization of contextual information on haze features in multi-scale RS images. Additionally, the network fails to adequately extract both local and global information of haze features. In addressing the above problems, in this paper, we propose an RS image dehazing network based on multi-scale large kernel convolution and hybrid attention (MKHANet). The network is mainly composed of multi-scale large kernel convolution (MSLKC) module, hybrid attention (HA) module and feature fusion attention (FFA) module. The MSLKC module fully fuses the multi-scale information of features while enhancing the effective receptive field of the network by parallel multiple large kernel convolutions. To alleviate the problem of uneven distribution of haze and effectively extract the global and local information of haze features, the HA module is introduced by focusing on the importance of haze pixels at the channel level. The FFA module aims to boost the interaction of feature information between the network's deep and shallow layers. The subjective and objective experimental results on on multiple RS hazy image datasets illustrates that MKHANet surpasses existing state-of-the-art (SOTA) approaches. The source code is available at https://github.com/tohang98/MKHA_Net.

遥感(RS)图像脱灰对于提高遥感图像的质量和信息提取能力具有重要意义。随着卷积神经网络(CNN)的发展,图像去毛刺的质量也在逐步提高。由于卷积神经网络的感受野是固定的,因此无法充分利用多尺度 RS 图像中雾霾特征的上下文信息。此外,该网络无法充分提取灰霾特征的局部和全局信息。针对上述问题,本文提出了一种基于多尺度大核卷积和混合注意力的 RS 图像去噪网络(MKHANet)。该网络主要由多尺度大核卷积(MSLKC)模块、混合注意力(HA)模块和特征融合注意力(FFA)模块组成。MSLKC 模块充分融合了特征的多尺度信息,同时通过并行多重大核卷积增强了网络的有效感受野。为了缓解雾霾分布不均的问题,并有效提取雾霾特征的全局和局部信息,引入了 HA 模块,关注雾霾像素在信道级的重要性。FFA 模块旨在增强网络深层和浅层之间的特征信息交互。在多个 RS 灰霾图像数据集上的主观和客观实验结果表明,MKHANet 超越了现有的最先进(SOTA)方法。源代码见 https://github.com/tohang98/MKHA_Net。
{"title":"Multi-scale large kernel convolution and hybrid attention network for remote sensing image dehazing","authors":"Hang Su,&nbsp;Lina Liu,&nbsp;Zenghui Wang,&nbsp;Mingliang Gao","doi":"10.1016/j.imavis.2024.105212","DOIUrl":"10.1016/j.imavis.2024.105212","url":null,"abstract":"<div><p>Remote sensing (RS) image dehazing holds significant importance in enhancing the quality and information extraction capability of RS imagery. The enhancement in image dehazing quality has progressively advanced alongside the evolution of convolutional neural network (CNN). Due to the fixed receptive field of CNN, there is insufficient utilization of contextual information on haze features in multi-scale RS images. Additionally, the network fails to adequately extract both local and global information of haze features. In addressing the above problems, in this paper, we propose an RS image dehazing network based on multi-scale large kernel convolution and hybrid attention (MKHANet). The network is mainly composed of multi-scale large kernel convolution (MSLKC) module, hybrid attention (HA) module and feature fusion attention (FFA) module. The MSLKC module fully fuses the multi-scale information of features while enhancing the effective receptive field of the network by parallel multiple large kernel convolutions. To alleviate the problem of uneven distribution of haze and effectively extract the global and local information of haze features, the HA module is introduced by focusing on the importance of haze pixels at the channel level. The FFA module aims to boost the interaction of feature information between the network's deep and shallow layers. The subjective and objective experimental results on on multiple RS hazy image datasets illustrates that MKHANet surpasses existing state-of-the-art (SOTA) approaches. The source code is available at <span><span>https://github.com/tohang98/MKHA_Net</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105212"},"PeriodicalIF":4.2,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Triplet-set feature proximity learning for video anomaly detection 用于视频异常检测的三元组特征接近学习
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-14 DOI: 10.1016/j.imavis.2024.105205
Kuldeep Marotirao Biradar , Murari Mandal , Sachin Dube , Santosh Kumar Vipparthi , Dinesh Kumar Tyagi

The identification of anomalies in videos is a particularly complex visual challenge, given the wide variety of potential real-world events. To address this issue, our paper introduces a unique approach for detecting divergent behavior in surveillance videos, utilizing triplet-loss for video anomaly detection. Our method involves selecting a triplet set of video segments from normal (n) and abnormal (a) data points for deep feature learning. We begin by creating a database of triplet sets of two types: a-a-n and n-n-a. By computing a triplet loss, we model the proximity between n-n chunks and the distance between ‘a’ chunks from the n-n ones. Additionally, we train the deep network to model the closeness of a-a chunks and the divergent behavior of ‘n’ from the a-a chunks.

The model acquired in the initial stage can be viewed as a prior, which is subsequently employed for modeling normality. As a result, our method can leverage the advantages of both straightforward classification and normality modeling-based techniques. We also present a data selection mechanism for the efficient generation of triplet sets. Furthermore, we introduce a novel video anomaly dataset, AnoVIL, designed for human-centric anomaly detection. Our proposed method is assessed using the UCF-Crime dataset encompassing all 13 categories, the IIT-H accident dataset, and AnoVIL. The experimental findings demonstrate that our method surpasses the current state-of-the-art approaches. We conduct further evaluations of the performance, considering various configurations such as cross-dataset evaluation, loss functions, siamese structure, and embedding size. Additionally, an ablation study is carried out across different settings to provide insights into our proposed method.

鉴于现实世界中可能发生的事件种类繁多,识别视频中的异常情况是一项特别复杂的视觉挑战。为了解决这个问题,我们的论文引入了一种独特的方法来检测监控视频中的异常行为,即利用三重丢失进行视频异常检测。我们的方法包括从正常(n)和异常(a)数据点中选择三重视频片段集进行深度特征学习。我们首先创建一个三元组数据库,其中包括两种类型:a-a-n 和 n-n-a。通过计算三元组损失,我们建立了 n-n 个数据块之间的邻近度模型,以及 "a "数据块与 n-n 个数据块之间的距离模型。此外,我们还对深度网络进行训练,以模拟 a-a 块之间的接近程度以及'n'与 a-a 块之间的发散行为。因此,我们的方法既能利用直接分类技术的优势,又能利用基于正态性建模技术的优势。我们还提出了一种有效生成三元组的数据选择机制。此外,我们还介绍了一个新颖的视频异常数据集 AnoVIL,该数据集是专为以人为中心的异常检测而设计的。我们使用包含所有 13 个类别的 UCF-Crime 数据集、IIT-H 事故数据集和 AnoVIL 对我们提出的方法进行了评估。实验结果表明,我们的方法超越了当前最先进的方法。我们对性能进行了进一步评估,考虑了各种配置,如跨数据集评估、损失函数、连体结构和嵌入大小。此外,我们还在不同设置下进行了消融研究,以深入了解我们提出的方法。
{"title":"Triplet-set feature proximity learning for video anomaly detection","authors":"Kuldeep Marotirao Biradar ,&nbsp;Murari Mandal ,&nbsp;Sachin Dube ,&nbsp;Santosh Kumar Vipparthi ,&nbsp;Dinesh Kumar Tyagi","doi":"10.1016/j.imavis.2024.105205","DOIUrl":"10.1016/j.imavis.2024.105205","url":null,"abstract":"<div><p>The identification of anomalies in videos is a particularly complex visual challenge, given the wide variety of potential real-world events. To address this issue, our paper introduces a unique approach for detecting divergent behavior in surveillance videos, utilizing triplet-loss for video anomaly detection. Our method involves selecting a triplet set of video segments from normal (n) and abnormal (a) data points for deep feature learning. We begin by creating a database of triplet sets of two types: a-a-n and n-n-a. By computing a triplet loss, we model the proximity between n-n chunks and the distance between ‘a’ chunks from the n-n ones. Additionally, we train the deep network to model the closeness of a-a chunks and the divergent behavior of ‘n’ from the a-a chunks.</p><p>The model acquired in the initial stage can be viewed as a prior, which is subsequently employed for modeling normality. As a result, our method can leverage the advantages of both straightforward classification and normality modeling-based techniques. We also present a data selection mechanism for the efficient generation of triplet sets. Furthermore, we introduce a novel video anomaly dataset, AnoVIL, designed for human-centric anomaly detection. Our proposed method is assessed using the UCF-Crime dataset encompassing all 13 categories, the IIT-H accident dataset, and AnoVIL. The experimental findings demonstrate that our method surpasses the current state-of-the-art approaches. We conduct further evaluations of the performance, considering various configurations such as cross-dataset evaluation, loss functions, siamese structure, and embedding size. Additionally, an ablation study is carried out across different settings to provide insights into our proposed method.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105205"},"PeriodicalIF":4.2,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141984776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SDMNet: Spatially dilated multi-scale network for object detection for drone aerial imagery SDMNet:用于无人机航拍图像物体检测的空间扩张多尺度网络
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-13 DOI: 10.1016/j.imavis.2024.105232
Neeraj Battish , Dapinder Kaur , Moksh Chugh , Shashi Poddar

Multi-scale object detection is a preeminent challenge in computer vision and image processing. Several deep learning models that are designed to detect various objects miss out on the detection capabilities for small objects, reducing their detection accuracies. Intending to focus on different scales, from extremely small to large-sized objects, this work proposes a Spatially Dilated Multi-Scale Network (SDMNet) architecture for UAV-based ground object detection. It proposes a Multi-scale Enhanced Effective Channel Attention mechanism to preserve the object details in the images. Additionally, the proposed model incorporates dilated convolution, sub-pixel convolution, and additional prediction heads to enhance object detection performance specifically for aerial imaging. It has been evaluated on two popular aerial image datasets, VisDrone 2019 and UAVDT, containing publicly available annotated images of ground objects captured from UAV. Different performance metrics, such as precision, recall, mAP, and detection rate, benchmark the proposed architecture with the existing object detection approaches. The experimental results demonstrate the effectiveness of the proposed model for multi-scale object detection with an average precision score of 54.2% and 98.4% for VisDrone and UAVDT datasets, respectively.

多尺度物体检测是计算机视觉和图像处理领域的一大挑战。一些旨在检测各种物体的深度学习模型都忽略了对小物体的检测能力,从而降低了检测精度。为了关注从极小物体到大型物体的不同尺度,本研究提出了一种空间稀释多尺度网络(SDMNet)架构,用于基于无人机的地面物体检测。它提出了一种多尺度增强有效通道关注机制,以保留图像中的物体细节。此外,该模型还结合了扩张卷积、子像素卷积和附加预测头,以提高专门用于航空成像的物体检测性能。我们在 VisDrone 2019 和 UAVDT 这两个流行的航空图像数据集上对该模型进行了评估,这两个数据集包含从无人机捕获的地面物体的公开注释图像。不同的性能指标,如精确度、召回率、mAP 和检测率,将所提出的架构与现有的物体检测方法进行比较。实验结果证明了所提模型在多尺度物体检测方面的有效性,VisDrone 和 UAVDT 数据集的平均精度分别为 54.2% 和 98.4%。
{"title":"SDMNet: Spatially dilated multi-scale network for object detection for drone aerial imagery","authors":"Neeraj Battish ,&nbsp;Dapinder Kaur ,&nbsp;Moksh Chugh ,&nbsp;Shashi Poddar","doi":"10.1016/j.imavis.2024.105232","DOIUrl":"10.1016/j.imavis.2024.105232","url":null,"abstract":"<div><p>Multi-scale object detection is a preeminent challenge in computer vision and image processing. Several deep learning models that are designed to detect various objects miss out on the detection capabilities for small objects, reducing their detection accuracies. Intending to focus on different scales, from extremely small to large-sized objects, this work proposes a Spatially Dilated Multi-Scale Network (SDMNet) architecture for UAV-based ground object detection. It proposes a Multi-scale Enhanced Effective Channel Attention mechanism to preserve the object details in the images. Additionally, the proposed model incorporates dilated convolution, sub-pixel convolution, and additional prediction heads to enhance object detection performance specifically for aerial imaging. It has been evaluated on two popular aerial image datasets, VisDrone 2019 and UAVDT, containing publicly available annotated images of ground objects captured from UAV. Different performance metrics, such as precision, recall, mAP, and detection rate, benchmark the proposed architecture with the existing object detection approaches. The experimental results demonstrate the effectiveness of the proposed model for multi-scale object detection with an average precision score of 54.2% and 98.4% for VisDrone and UAVDT datasets, respectively.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105232"},"PeriodicalIF":4.2,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Image and Vision Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1