首页 > 最新文献

IET Computer Vision最新文献

英文 中文
Feature-Level Compensation and Alignment for Visible-Infrared Person Re-Identification 可见-红外人物再识别的特征级补偿与对准
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-25 DOI: 10.1049/cvi2.70005
Husheng Dong, Ping Lu, Yuanfeng Yang, Xun Sun

Visible-infrared person re-identification (VI-ReID) aims to match pedestrian images captured by nonoverlapping visible and infrared cameras. Most existing compensation-based methods try to generate images of missing modality from the other ones. However, the generated images often fail to possess enough quality due to severe discrepancies between different modalities. Moreover, it is generally assumed that person images are roughly aligned during the extraction of part-based local features. However, this does not always hold true, typically when they are cropped via inaccurate pedestrian detectors. To alleviate such problems, the authors propose a novel feature-level compensation and alignment network (FCA-Net) for VI-ReID in this paper, which tries to compensate for the missing modality information on the channel-level and align part-based local features. Specifically, the visible and infrared features of low-level subnetworks are first processed by a channel feature compensation (CFC) module, which enforces the network to learn consistent distribution patterns of channel features, and thereby the cross-modality discrepancy is narrowed. To address spatial misalignment, a pairwise relation module (PRM) is introduced to incorporate human structural information into part-based local features, which can significantly enhance the feature discrimination power. Besides, a cross-modality part alignment loss (CPAL) is designed on the basis of a dynamic part matching algorithm, which can promote more accurate local matching. Extensive experiments on three standard VI-ReID datasets are conducted to validate the effectiveness of the proposed method, and the results show that state-of-the-art performance is achieved.

可见红外人员再识别(VI-ReID)旨在匹配由非重叠可见和红外摄像机捕获的行人图像。大多数现有的基于补偿的方法都试图从其他图像中生成缺失模态的图像。然而,由于不同的模态之间存在严重的差异,生成的图像往往不能具有足够的质量。此外,在基于部分的局部特征提取过程中,通常假设人物图像是大致对齐的。然而,这并不总是正确的,特别是当它们被不准确的行人检测器裁剪时。为了缓解这些问题,本文提出了一种新的特征级补偿和对准网络(FCA-Net),该网络试图在通道级补偿缺失的模态信息,并对齐基于部分的局部特征。具体而言,首先通过信道特征补偿(channel feature compensation, CFC)模块对底层子网络的可见光和红外特征进行处理,使网络学习信道特征的一致分布模式,从而缩小跨模态差异。为了解决空间错位问题,引入配对关系模块(PRM),将人体结构信息整合到局部特征中,显著提高了特征识别能力。此外,在动态零件匹配算法的基础上,设计了跨模态零件对齐损失(CPAL),提高了局部匹配精度。在三个标准VI-ReID数据集上进行了大量实验,验证了所提出方法的有效性,结果表明达到了最先进的性能。
{"title":"Feature-Level Compensation and Alignment for Visible-Infrared Person Re-Identification","authors":"Husheng Dong,&nbsp;Ping Lu,&nbsp;Yuanfeng Yang,&nbsp;Xun Sun","doi":"10.1049/cvi2.70005","DOIUrl":"10.1049/cvi2.70005","url":null,"abstract":"<p>Visible-infrared person re-identification (VI-ReID) aims to match pedestrian images captured by nonoverlapping visible and infrared cameras. Most existing compensation-based methods try to generate images of missing modality from the other ones. However, the generated images often fail to possess enough quality due to severe discrepancies between different modalities. Moreover, it is generally assumed that person images are roughly aligned during the extraction of part-based local features. However, this does not always hold true, typically when they are cropped via inaccurate pedestrian detectors. To alleviate such problems, the authors propose a novel feature-level compensation and alignment network (FCA-Net) for VI-ReID in this paper, which tries to compensate for the missing modality information on the channel-level and align part-based local features. Specifically, the visible and infrared features of low-level subnetworks are first processed by a channel feature compensation (CFC) module, which enforces the network to learn consistent distribution patterns of channel features, and thereby the cross-modality discrepancy is narrowed. To address spatial misalignment, a pairwise relation module (PRM) is introduced to incorporate human structural information into part-based local features, which can significantly enhance the feature discrimination power. Besides, a cross-modality part alignment loss (CPAL) is designed on the basis of a dynamic part matching algorithm, which can promote more accurate local matching. Extensive experiments on three standard VI-ReID datasets are conducted to validate the effectiveness of the proposed method, and the results show that state-of-the-art performance is achieved.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143481439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancements in smart agriculture: A systematic literature review on state-of-the-art plant disease detection with computer vision 智能农业的进展:用计算机视觉进行最先进的植物病害检测的系统文献综述
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-14 DOI: 10.1049/cvi2.70004
Esra Yilmaz, Sevim Ceylan Bocekci, Cengiz Safak, Kazim Yildiz

In an era of rapid digital transformation, ensuring sustainable and traceable food production is more crucial than ever. Plant diseases, a major threat to agriculture, lead to significant losses in crops and financial damage. Standard techniques for detecting diseases, though widespread, are lengthy and intensive work, especially in extensive agricultural settings. This systematic literature review examines the cutting-edge technologies in smart agriculture specifically computer vision, robotics, deep learning (DL), and Internet of Things (IoT) that are reshaping plant disease detection and management. By analysing 198 studies published between 2021 and 2023, from an initial pool of 19,838 papers, the authors reveal the dominance of DL, particularly with datasets such as PlantVillage, and highlight critical challenges, including dataset limitations, lack of geographical diversity, and the scarcity of real-world field data. Moreover, the authors explore the promising role of IoT, robotics, and drones in enhancing early disease detection, although the high costs and technological gaps present significant barriers for small-scale farmers, especially in developing countries. Through the preferred reporting items for systematic reviews and meta-analyses methodology, this review synthesises these findings, identifying key trends, uncovering research gaps, and offering actionable insights for the future of plant disease management in smart agriculture.

在快速数字化转型的时代,确保可持续和可追溯的粮食生产比以往任何时候都更加重要。植物病害是对农业的主要威胁,造成重大作物损失和经济损失。检测疾病的标准技术虽然广泛存在,但却是一项漫长而密集的工作,特别是在广泛的农业环境中。本系统的文献综述探讨了智能农业的前沿技术,特别是计算机视觉、机器人、深度学习(DL)和物联网(IoT),这些技术正在重塑植物病害检测和管理。通过分析2021年至2023年间发表的198项研究,从最初的19,838篇论文中,作者揭示了深度学习的主导地位,特别是在PlantVillage等数据集上,并强调了关键的挑战,包括数据集的局限性、缺乏地理多样性和缺乏真实世界的实地数据。此外,作者探讨了物联网、机器人和无人机在加强早期疾病检测方面的有希望的作用,尽管高成本和技术差距对小农构成了重大障碍,特别是在发展中国家。通过系统评价和荟萃分析方法的首选报告项目,本综述综合了这些发现,确定了关键趋势,揭示了研究差距,并为智能农业中植物病害管理的未来提供了可操作的见解。
{"title":"Advancements in smart agriculture: A systematic literature review on state-of-the-art plant disease detection with computer vision","authors":"Esra Yilmaz,&nbsp;Sevim Ceylan Bocekci,&nbsp;Cengiz Safak,&nbsp;Kazim Yildiz","doi":"10.1049/cvi2.70004","DOIUrl":"10.1049/cvi2.70004","url":null,"abstract":"<p>In an era of rapid digital transformation, ensuring sustainable and traceable food production is more crucial than ever. Plant diseases, a major threat to agriculture, lead to significant losses in crops and financial damage. Standard techniques for detecting diseases, though widespread, are lengthy and intensive work, especially in extensive agricultural settings. This systematic literature review examines the cutting-edge technologies in smart agriculture specifically computer vision, robotics, deep learning (DL), and Internet of Things (IoT) that are reshaping plant disease detection and management. By analysing 198 studies published between 2021 and 2023, from an initial pool of 19,838 papers, the authors reveal the dominance of DL, particularly with datasets such as PlantVillage, and highlight critical challenges, including dataset limitations, lack of geographical diversity, and the scarcity of real-world field data. Moreover, the authors explore the promising role of IoT, robotics, and drones in enhancing early disease detection, although the high costs and technological gaps present significant barriers for small-scale farmers, especially in developing countries. Through the preferred reporting items for systematic reviews and meta-analyses methodology, this review synthesises these findings, identifying key trends, uncovering research gaps, and offering actionable insights for the future of plant disease management in smart agriculture.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Egocentric action anticipation from untrimmed videos 以自我为中心的动作预期从未修剪的视频
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-14 DOI: 10.1049/cvi2.12342
Ivan Rodin, Antonino Furnari, Giovanni Maria Farinella

Egocentric action anticipation involves predicting future actions performed by the camera wearer from egocentric video. Although the task has recently gained attention in the research community, current approaches often assume that input videos are ‘trimmed’, meaning that a short video sequence is sampled a fixed time before the beginning of the action. However, trimmed action anticipation has limited applicability in real-world scenarios, where it is crucial to deal with ‘untrimmed’ video inputs and the exact moment of action initiation cannot be assumed at test time. To address these limitations, an untrimmed action anticipation task is proposed, which, akin to temporal action detection, assumes that the input video is untrimmed at test time, while still requiring predictions to be made before actions take place. The authors introduce a benchmark evaluation procedure for methods designed to address this novel task and compare several baselines on the EPIC-KITCHENS-100 dataset. Through our experimental evaluation, testing a variety of models, the authors aim to better understand their performance in untrimmed action anticipation. Our results reveal that the performance of current models designed for trimmed action anticipation is limited, emphasising the need for further research in this area.

以自我为中心的动作预期包括从以自我为中心的视频中预测相机佩戴者未来的动作。尽管这项任务最近在研究界得到了关注,但目前的方法通常假设输入视频是“修剪”的,这意味着在动作开始前的固定时间对短视频序列进行采样。然而,调整动作预期在现实场景中的适用性有限,在现实场景中,处理“未调整”的视频输入至关重要,并且在测试时无法假设动作启动的确切时刻。为了解决这些限制,提出了一个未修剪的动作预期任务,它类似于时间动作检测,假设输入视频在测试时未修剪,同时仍然需要在动作发生之前进行预测。作者介绍了一个基准评估程序,用于解决这一新任务的方法,并比较了EPIC-KITCHENS-100数据集上的几个基线。通过我们的实验评估,测试各种模型,作者的目的是更好地了解他们在未修剪动作预期中的表现。我们的研究结果表明,目前为修剪动作预期设计的模型的性能是有限的,强调需要在这一领域进一步研究。
{"title":"Egocentric action anticipation from untrimmed videos","authors":"Ivan Rodin,&nbsp;Antonino Furnari,&nbsp;Giovanni Maria Farinella","doi":"10.1049/cvi2.12342","DOIUrl":"10.1049/cvi2.12342","url":null,"abstract":"<p>Egocentric action anticipation involves predicting future actions performed by the camera wearer from egocentric video. Although the task has recently gained attention in the research community, current approaches often assume that input videos are ‘trimmed’, meaning that a short video sequence is sampled a fixed time before the beginning of the action. However, trimmed action anticipation has limited applicability in real-world scenarios, where it is crucial to deal with ‘untrimmed’ video inputs and the exact moment of action initiation cannot be assumed at test time. To address these limitations, an untrimmed action anticipation task is proposed, which, akin to temporal action detection, assumes that the input video is untrimmed at test time, while still requiring predictions to be made before actions take place. The authors introduce a benchmark evaluation procedure for methods designed to address this novel task and compare several baselines on the EPIC-KITCHENS-100 dataset. Through our experimental evaluation, testing a variety of models, the authors aim to better understand their performance in untrimmed action anticipation. Our results reveal that the performance of current models designed for trimmed action anticipation is limited, emphasising the need for further research in this area.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12342","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Controlling semantics of diffusion-augmented data for unsupervised domain adaptation 无监督域自适应扩散增强数据的控制语义
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-07 DOI: 10.1049/cvi2.70002
Henrietta Ridley, Roberto Alcover-Couso, Juan C. SanMiguel

Unsupervised domain adaptation (UDA) offers a compelling solution to bridge the gap between labelled synthetic data and unlabelled real-world data for training semantic segmentation models, given the high costs associated with manual annotation. However, the visual differences between the synthetic and real images pose significant challenges to their practical applications. This work addresses these challenges through synthetic-to-real style transfer leveraging diffusion models. The authors’ proposal incorporates semantic controllers to guide the diffusion process and low-rank adaptations (LoRAs) to ensure that style-transferred images align with real-world aesthetics while preserving semantic layout. Moreover, the authors introduce quality metrics to rank the utility of generated images, enabling the selective use of high-quality images for training. To further enhance reliability, the authors propose a novel loss function that mitigates artefacts from the style transfer process by incorporating only pixels aligned with the original semantic labels. Experimental results demonstrate that the authors’ proposal outperforms selected state-of-the-art methods for image generation and UDA training, achieving optimal performance even with a smaller set of high-quality generated images. The authors’ code and models are available at http://www-vpu.eps.uam.es/ControllingSem4UDA/.

考虑到人工标注的高成本,无监督域自适应(UDA)为训练语义分割模型提供了一个令人信服的解决方案,以弥合有标记的合成数据和无标记的真实世界数据之间的差距。然而,合成图像与真实图像之间的视觉差异对其实际应用提出了重大挑战。这项工作通过利用扩散模型从合成到真实风格的转移来解决这些挑战。作者的提议结合了语义控制器来指导扩散过程和低秩适应(LoRAs),以确保风格转移的图像与现实世界的美学保持一致,同时保留语义布局。此外,作者还引入了质量指标来对生成的图像的效用进行排序,从而可以选择性地使用高质量的图像进行训练。为了进一步提高可靠性,作者提出了一种新的损失函数,通过仅合并与原始语义标签对齐的像素来减轻风格迁移过程中的伪影。实验结果表明,作者的建议优于选定的最先进的图像生成和UDA训练方法,即使使用较小的高质量生成图像集也能实现最佳性能。作者的代码和模型可在http://www-vpu.eps.uam.es/ControllingSem4UDA/上获得。
{"title":"Controlling semantics of diffusion-augmented data for unsupervised domain adaptation","authors":"Henrietta Ridley,&nbsp;Roberto Alcover-Couso,&nbsp;Juan C. SanMiguel","doi":"10.1049/cvi2.70002","DOIUrl":"10.1049/cvi2.70002","url":null,"abstract":"<p>Unsupervised domain adaptation (UDA) offers a compelling solution to bridge the gap between labelled synthetic data and unlabelled real-world data for training semantic segmentation models, given the high costs associated with manual annotation. However, the visual differences between the synthetic and real images pose significant challenges to their practical applications. This work addresses these challenges through synthetic-to-real style transfer leveraging diffusion models. The authors’ proposal incorporates semantic controllers to guide the diffusion process and low-rank adaptations (LoRAs) to ensure that style-transferred images align with real-world aesthetics while preserving semantic layout. Moreover, the authors introduce quality metrics to rank the utility of generated images, enabling the selective use of high-quality images for training. To further enhance reliability, the authors propose a novel loss function that mitigates artefacts from the style transfer process by incorporating only pixels aligned with the original semantic labels. Experimental results demonstrate that the authors’ proposal outperforms selected state-of-the-art methods for image generation and UDA training, achieving optimal performance even with a smaller set of high-quality generated images. The authors’ code and models are available at http://www-vpu.eps.uam.es/ControllingSem4UDA/.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TomoSAR 3D reconstruction: Cascading adversarial strategy with sparse observation trajectory TomoSAR三维重建:稀疏观测轨迹级联对抗策略
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-04 DOI: 10.1049/cvi2.70001
Xian Zhu, Xiaoqin Zeng, Yuhua Cong, Yanhao Huang, Ziyan Zhu, Yantao Luo

Synthetic aperture radar tomography (TomoSAR) has shown significant potential for the 3D Reconstruction of buildings, especially in critical areas such as topographic mapping, urban planning, and disaster monitoring. In practical applications, the constraints of observation trajectories frequently lead to the acquisition of a limited dataset of sparse SAR images, presenting challenges for TomoSAR 3D Reconstruction and affecting its signal-to-noise ratio and elevation resolution performance. The study introduces a cascade adversarial strategy based on the Conditional Generative Adversarial Network (CGAN), optimised explicitly for sparse observation trajectories. In the preliminary phase of the CGAN, the U-Net architecture was employed to capture more global information and enhance image detail recovery capability, which is subsequently utilised in the cascade refinement network. The ResNet34 residual network in the advanced network stage was adopted to bolster feature extraction and image generation capabilities further. Based on experimental validation performed on the curated TomoSAR 3D super-resolution dataset tailored for buildings, the findings reveal that the methodology yields a notable enhancement in image quality and accuracy compared to other techniques.

合成孔径雷达断层扫描(TomoSAR)在建筑物的三维重建方面显示出巨大的潜力,特别是在地形测绘、城市规划和灾害监测等关键领域。在实际应用中,观测轨迹的限制往往导致获取的稀疏SAR图像数据集有限,给TomoSAR三维重建带来挑战,并影响其信噪比和高程分辨率性能。该研究引入了一种基于条件生成对抗网络(CGAN)的级联对抗策略,该策略针对稀疏观察轨迹进行了明确优化。在CGAN的初始阶段,采用U-Net架构捕获更多的全局信息并增强图像细节恢复能力,随后将其用于级联细化网络。采用高级网络阶段的ResNet34残差网络,进一步增强特征提取和图像生成能力。基于对专为建筑物定制的TomoSAR 3D超分辨率数据集进行的实验验证,研究结果表明,与其他技术相比,该方法在图像质量和准确性方面有显著提高。
{"title":"TomoSAR 3D reconstruction: Cascading adversarial strategy with sparse observation trajectory","authors":"Xian Zhu,&nbsp;Xiaoqin Zeng,&nbsp;Yuhua Cong,&nbsp;Yanhao Huang,&nbsp;Ziyan Zhu,&nbsp;Yantao Luo","doi":"10.1049/cvi2.70001","DOIUrl":"10.1049/cvi2.70001","url":null,"abstract":"<p>Synthetic aperture radar tomography (TomoSAR) has shown significant potential for the 3D Reconstruction of buildings, especially in critical areas such as topographic mapping, urban planning, and disaster monitoring. In practical applications, the constraints of observation trajectories frequently lead to the acquisition of a limited dataset of sparse SAR images, presenting challenges for TomoSAR 3D Reconstruction and affecting its signal-to-noise ratio and elevation resolution performance. The study introduces a cascade adversarial strategy based on the Conditional Generative Adversarial Network (CGAN), optimised explicitly for sparse observation trajectories. In the preliminary phase of the CGAN, the U-Net architecture was employed to capture more global information and enhance image detail recovery capability, which is subsequently utilised in the cascade refinement network. The ResNet34 residual network in the advanced network stage was adopted to bolster feature extraction and image generation capabilities further. Based on experimental validation performed on the curated TomoSAR 3D super-resolution dataset tailored for buildings, the findings reveal that the methodology yields a notable enhancement in image quality and accuracy compared to other techniques.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143111862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human activity recognition: A review of deep learning-based methods 人类活动识别:基于深度学习的方法综述
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-01 DOI: 10.1049/cvi2.70003
Sanjay Jyoti Dutta, Tossapon Boongoen, Reyer Zwiggelaar

Human Activity Recognition (HAR) covers methods for automatically identifying human activities from a stream of data. End-users of HAR methods cover a range of sectors, including health, self-care, amusement, safety and monitoring. In this survey, the authors provide a thorough overview of deep learning based and detailed analysis of work that was performed between 2018 and 2023 in a variety of fields related to HAR with a focus on device-free solutions. It also presents the categorisation and taxonomy of the covered publication and an overview of publicly available datasets. To complete this review, the limitations of existing approaches and potential future research directions are discussed.

人类活动识别(HAR)涵盖了从数据流中自动识别人类活动的方法。HAR方法的最终用户涵盖一系列部门,包括卫生、自我保健、娱乐、安全和监测。在本调查中,作者提供了基于深度学习的全面概述,并详细分析了2018年至2023年期间在与HAR相关的各种领域进行的工作,重点是无设备解决方案。它还介绍了所涵盖出版物的分类和分类法以及公开可用数据集的概述。为了完成这一综述,讨论了现有方法的局限性和潜在的未来研究方向。
{"title":"Human activity recognition: A review of deep learning-based methods","authors":"Sanjay Jyoti Dutta,&nbsp;Tossapon Boongoen,&nbsp;Reyer Zwiggelaar","doi":"10.1049/cvi2.70003","DOIUrl":"10.1049/cvi2.70003","url":null,"abstract":"<p>Human Activity Recognition (HAR) covers methods for automatically identifying human activities from a stream of data. End-users of HAR methods cover a range of sectors, including health, self-care, amusement, safety and monitoring. In this survey, the authors provide a thorough overview of deep learning based and detailed analysis of work that was performed between 2018 and 2023 in a variety of fields related to HAR with a focus on device-free solutions. It also presents the categorisation and taxonomy of the covered publication and an overview of publicly available datasets. To complete this review, the limitations of existing approaches and potential future research directions are discussed.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70003","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143110673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A principal direction-guided local voxelisation structural feature approach for point cloud registration 一种主要的方向导向局部体素化结构特征点云配准方法
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-29 DOI: 10.1049/cvi2.70000
Chenyang Li, Yansong Duan

Point cloud registration is a crucial aspect of computer vision and 3D reconstruction. Traditional registration methods often depend on global features or iterative optimisation, leading to inefficiencies and imprecise outcomes when processing complex scene point cloud data. To address these challenges, the authors introduce a principal direction-guided local voxelisation structural feature (PDLVSF) approach for point cloud registration. This method reliably identifies feature points regardless of initial positioning. Approach begins with the 3D Harris algorithm to extract feature points, followed by determining the principal direction within the feature points' radius neighbourhood to ensure rotational invariance. For scale invariance, voxel grid normalisation is utilised to maximise the point cloud's geometric resolution and make it scale-independent. Cosine similarity is then employed for effective feature matching, identifying corresponding feature point pairs and determining transformation parameters between point clouds. Experimental validations on various datasets, including the real terrain dataset, demonstrate the effectiveness of our method. Results indicate superior performance in root mean square error (RMSE) and registration accuracy compared to state-of-the-art methods, particularly in scenarios with high noise, limited overlap, and significant initial pose rotation. The real terrain dataset is publicly available at https://github.com/black-2000/Real-terrain-data.

点云配准是计算机视觉和三维重建的一个重要方面。传统的配准方法往往依赖于全局特征或迭代优化,导致在处理复杂场景点云数据时效率低下,结果不精确。为了解决这些挑战,作者引入了一种主要方向引导的局部体素化结构特征(PDLVSF)方法用于点云配准。无论初始定位如何,该方法都能可靠地识别特征点。该方法首先采用三维Harris算法提取特征点,然后确定特征点半径邻域内的主方向以保证旋转不变性。对于尺度不变性,使用体素网格归一化来最大化点云的几何分辨率并使其与尺度无关。然后利用余弦相似度进行有效的特征匹配,识别对应的特征点对,确定点云之间的变换参数。在包括真实地形数据集在内的各种数据集上的实验验证表明了我们的方法的有效性。结果表明,与最先进的方法相比,该方法在均方根误差(RMSE)和配准精度方面表现优异,特别是在高噪声、有限重叠和显著初始姿态旋转的情况下。真实地形数据集可在https://github.com/black-2000/Real-terrain-data上公开获取。
{"title":"A principal direction-guided local voxelisation structural feature approach for point cloud registration","authors":"Chenyang Li,&nbsp;Yansong Duan","doi":"10.1049/cvi2.70000","DOIUrl":"10.1049/cvi2.70000","url":null,"abstract":"<p>Point cloud registration is a crucial aspect of computer vision and 3D reconstruction. Traditional registration methods often depend on global features or iterative optimisation, leading to inefficiencies and imprecise outcomes when processing complex scene point cloud data. To address these challenges, the authors introduce a principal direction-guided local voxelisation structural feature (PDLVSF) approach for point cloud registration. This method reliably identifies feature points regardless of initial positioning. Approach begins with the 3D Harris algorithm to extract feature points, followed by determining the principal direction within the feature points' radius neighbourhood to ensure rotational invariance. For scale invariance, voxel grid normalisation is utilised to maximise the point cloud's geometric resolution and make it scale-independent. Cosine similarity is then employed for effective feature matching, identifying corresponding feature point pairs and determining transformation parameters between point clouds. Experimental validations on various datasets, including the real terrain dataset, demonstrate the effectiveness of our method. Results indicate superior performance in root mean square error (RMSE) and registration accuracy compared to state-of-the-art methods, particularly in scenarios with high noise, limited overlap, and significant initial pose rotation. The real terrain dataset is publicly available at https://github.com/black-2000/Real-terrain-data.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70000","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143120630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NBCDC-YOLOv8: A new framework to improve blood cell detection and classification based on YOLOv8 NBCDC-YOLOv8:基于YOLOv8改进血细胞检测和分类的新框架
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-22 DOI: 10.1049/cvi2.12341
Xuan Chen, Linxuan Li, Xiaoyu Liu, Fengjuan Yin, Xue Liu, Xiaoxiao Zhu, Yufeng Wang, Fanbin Meng

In recent years, computer technology has successfully permeated all areas of medicine and its management, and it now offers doctors an accurate and rapid means of diagnosis. Existing blood cell detection methods suffer from low accuracy, which is caused by the uneven distribution, high density, and mutual occlusion of different blood cell types in blood microscope images, this article introduces NBCDC-YOLOv8: a new framework to improve blood cell detection and classification based on YOLOv8. Our framework innovates on several fronts: it uses Mosaic data augmentation to enrich the dataset and add small targets, incorporates a space to depth convolution (SPD-Conv) tailored for cells that are small and have low resolution, and introduces the Multi-Separated and Enhancement Attention Module (MultiSEAM) to enhance feature map resolution. Additionally, it integrates a bidirectional feature pyramid network (BiFPN) for effective multi-scale feature fusion and includes four detection heads to improve recognition accuracy of various cell sizes, especially small target platelets. Evaluated on the Blood Cell Classification Dataset (BCCD), NBCDC-YOLOv8 obtains a mean average precision (mAP) of 94.7%, and thus surpasses the original YOLOv8n by 2.3%.

近年来,计算机技术已经成功地渗透到医学及其管理的各个领域,现在它为医生提供了准确和快速的诊断手段。现有的血细胞检测方法由于血液显微镜图像中不同类型的血细胞分布不均匀、密度大、相互遮挡等原因导致准确率较低,本文介绍了基于YOLOv8改进血细胞检测与分类的新框架——NBCDC-YOLOv8。我们的框架在几个方面进行了创新:它使用马赛克数据增强来丰富数据集并添加小目标,结合了为小而低分辨率的细胞量身定制的空间到深度卷积(SPD-Conv),并引入了多分离和增强注意模块(MultiSEAM)来增强特征图分辨率。此外,它集成了双向特征金字塔网络(BiFPN),用于有效的多尺度特征融合,并包括四个检测头,以提高各种细胞大小的识别精度,特别是小目标血小板。在血细胞分类数据集(Blood Cell Classification Dataset, BCCD)上进行评估,NBCDC-YOLOv8的平均精度(mAP)为94.7%,比原来的YOLOv8n高出2.3%。
{"title":"NBCDC-YOLOv8: A new framework to improve blood cell detection and classification based on YOLOv8","authors":"Xuan Chen,&nbsp;Linxuan Li,&nbsp;Xiaoyu Liu,&nbsp;Fengjuan Yin,&nbsp;Xue Liu,&nbsp;Xiaoxiao Zhu,&nbsp;Yufeng Wang,&nbsp;Fanbin Meng","doi":"10.1049/cvi2.12341","DOIUrl":"10.1049/cvi2.12341","url":null,"abstract":"<p>In recent years, computer technology has successfully permeated all areas of medicine and its management, and it now offers doctors an accurate and rapid means of diagnosis. Existing blood cell detection methods suffer from low accuracy, which is caused by the uneven distribution, high density, and mutual occlusion of different blood cell types in blood microscope images, this article introduces NBCDC-YOLOv8: a new framework to improve blood cell detection and classification based on YOLOv8. Our framework innovates on several fronts: it uses Mosaic data augmentation to enrich the dataset and add small targets, incorporates a space to depth convolution (SPD-Conv) tailored for cells that are small and have low resolution, and introduces the Multi-Separated and Enhancement Attention Module (MultiSEAM) to enhance feature map resolution. Additionally, it integrates a bidirectional feature pyramid network (BiFPN) for effective multi-scale feature fusion and includes four detection heads to improve recognition accuracy of various cell sizes, especially small target platelets. Evaluated on the Blood Cell Classification Dataset (BCCD), NBCDC-YOLOv8 obtains a mean average precision (mAP) of 94.7%, and thus surpasses the original YOLOv8n by 2.3%.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143363050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Re-identification of patterned animals by multi-image feature aggregation and geometric similarity 基于多图像特征聚合和几何相似性的图案动物再识别
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-08 DOI: 10.1049/cvi2.12337
Ekaterina Nepovinnykh, Veikka Immonen, Tuomas Eerola, Charles V. Stewart, Heikki Kälviäinen

Image-based re-identification of animal individuals allows gathering of information such as population size and migration patterns of the animals over time. This, together with large image volumes collected using camera traps and crowdsourcing, opens novel possibilities to study animal populations. For many species, the re-identification can be done by analysing the permanent fur, feather, or skin patterns that are unique to each individual. In this paper, the authors study pattern feature aggregation based re-identification and consider two ways of improving accuracy: (1) aggregating pattern image features over multiple images and (2) combining the pattern appearance similarity obtained by feature aggregation and geometric pattern similarity. Aggregation over multiple database images of the same individual allows to obtain more comprehensive and robust descriptors while reducing the computation time. On the other hand, combining the two similarity measures allows to efficiently utilise both the local and global pattern features, providing a general re-identification approach that can be applied to a wide variety of different pattern types. In the experimental part of the work, the authors demonstrate that the proposed method achieves promising re-identification accuracies for Saimaa ringed seals and whale sharks without species-specific training or fine-tuning.

基于图像的动物个体重新识别可以收集诸如种群大小和动物随时间迁移模式等信息。这与使用相机陷阱和众包收集的大量图像一起,为研究动物种群开辟了新的可能性。对于许多物种来说,重新识别可以通过分析每个个体独特的永久皮毛、羽毛或皮肤图案来完成。本文研究了基于模式特征聚合的再识别方法,并考虑了两种提高准确率的方法:(1)在多幅图像上聚合模式图像特征;(2)将特征聚合获得的模式外观相似度与几何模式相似度相结合。对同一个体的多个数据库图像进行聚合,可以在减少计算时间的同时获得更全面、更健壮的描述符。另一方面,结合这两种相似度度量可以有效地利用本地和全局模式特征,从而提供一种通用的重新识别方法,该方法可以应用于各种不同的模式类型。在实验部分,作者证明了所提出的方法在没有特定物种训练或微调的情况下,对塞马环海豹和鲸鲨的重新识别精度很有希望。
{"title":"Re-identification of patterned animals by multi-image feature aggregation and geometric similarity","authors":"Ekaterina Nepovinnykh,&nbsp;Veikka Immonen,&nbsp;Tuomas Eerola,&nbsp;Charles V. Stewart,&nbsp;Heikki Kälviäinen","doi":"10.1049/cvi2.12337","DOIUrl":"10.1049/cvi2.12337","url":null,"abstract":"<p>Image-based re-identification of animal individuals allows gathering of information such as population size and migration patterns of the animals over time. This, together with large image volumes collected using camera traps and crowdsourcing, opens novel possibilities to study animal populations. For many species, the re-identification can be done by analysing the permanent fur, feather, or skin patterns that are unique to each individual. In this paper, the authors study pattern feature aggregation based re-identification and consider two ways of improving accuracy: (1) aggregating pattern image features over multiple images and (2) combining the pattern appearance similarity obtained by feature aggregation and geometric pattern similarity. Aggregation over multiple database images of the same individual allows to obtain more comprehensive and robust descriptors while reducing the computation time. On the other hand, combining the two similarity measures allows to efficiently utilise both the local and global pattern features, providing a general re-identification approach that can be applied to a wide variety of different pattern types. In the experimental part of the work, the authors demonstrate that the proposed method achieves promising re-identification accuracies for Saimaa ringed seals and whale sharks without species-specific training or fine-tuning.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12337","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MMF-Net: A novel multi-feature and multi-level fusion network for 3D human pose estimation MMF-Net:一种用于三维人体姿态估计的新型多特征多层次融合网络
IF 1.3 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-07 DOI: 10.1049/cvi2.12336
Qianxing Li, Dehui Kong, Jinghua Li, Baocai Yin

Human pose estimation based on monocular video has always been the focus of research in the human computer interaction community, which suffers mainly from depth ambiguity and self-occlusion challenges. While the recently proposed learning-based approaches have demonstrated promising performance, they do not fully explore the complementarity of features. In this paper, the authors propose a novel multi-feature and multi-level fusion network (MMF-Net), which extracts and combines joint features, bone features and trajectory features at multiple levels to estimate 3D human pose. In MMF-Net, firstly, the bone length estimation module and the trajectory multi-level fusion module are used to extract the geometric size information of the human body and multi-level trajectory information of human motion, respectively. Then, the fusion attention-based combination (FABC) module is used to extract multi-level topological structure information of the human body, and effectively fuse topological structure information, geometric size information and trajectory information. Extensive experiments show that MMF-Net achieves competitive results on Human3.6M, HumanEva-I and MPI-INF-3DHP datasets.

基于单目视频的人体姿态估计一直是人机交互领域的研究热点,但主要存在深度模糊和自遮挡等问题。虽然最近提出的基于学习的方法已经证明了有希望的性能,但它们并没有充分探索特征的互补性。在本文中,作者提出了一种新的多特征和多层次融合网络(MMF-Net),该网络在多个层次上提取和组合关节特征、骨骼特征和轨迹特征来估计三维人体姿态。在MMF-Net中,首先利用骨长度估计模块和轨迹多层次融合模块分别提取人体几何尺寸信息和人体运动多层次轨迹信息;然后,利用基于融合注意力的组合(FABC)模块提取人体多层次拓扑结构信息,有效融合拓扑结构信息、几何尺寸信息和轨迹信息;大量实验表明,MMF-Net在Human3.6M、HumanEva-I和MPI-INF-3DHP数据集上取得了具有竞争力的结果。
{"title":"MMF-Net: A novel multi-feature and multi-level fusion network for 3D human pose estimation","authors":"Qianxing Li,&nbsp;Dehui Kong,&nbsp;Jinghua Li,&nbsp;Baocai Yin","doi":"10.1049/cvi2.12336","DOIUrl":"10.1049/cvi2.12336","url":null,"abstract":"<p>Human pose estimation based on monocular video has always been the focus of research in the human computer interaction community, which suffers mainly from depth ambiguity and self-occlusion challenges. While the recently proposed learning-based approaches have demonstrated promising performance, they do not fully explore the complementarity of features. In this paper, the authors propose a novel multi-feature and multi-level fusion network (MMF-Net), which extracts and combines joint features, bone features and trajectory features at multiple levels to estimate 3D human pose. In MMF-Net, firstly, the bone length estimation module and the trajectory multi-level fusion module are used to extract the geometric size information of the human body and multi-level trajectory information of human motion, respectively. Then, the fusion attention-based combination (FABC) module is used to extract multi-level topological structure information of the human body, and effectively fuse topological structure information, geometric size information and trajectory information. Extensive experiments show that MMF-Net achieves competitive results on Human3.6M, HumanEva-I and MPI-INF-3DHP datasets.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12336","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143362774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IET Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1