Pattern Analysis and Applications最新文献_第6页

Enhancing cross-domain transferability of black-box adversarial attacks on speaker recognition systems using linearized backpropagation 利用线性化反向传播增强对扬声器识别系统的黑盒对抗攻击的跨域可转移性

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-05-13 DOI: 10.1007/s10044-024-01269-w

Umang Patel, Shruti Bhilare, Avik Hati

Speaker recognition system (SRS) serves as the gatekeeper for secure access, using the unique vocal characteristics of individuals for identification and verification. SRS can be found several biometric security applications such as in banks, autonomous cars, military, and smart devices. However, as technology advances, so do the threats to these models. With the rise of adversarial attacks, these models have been put to the test. Adversarial machine learning (AML) techniques have been utilized to exploit vulnerabilities in SRS, threatening their reliability and security. In this study, we concentrate on transferability in AML within the realm of SRS. Transferability refers to the capability of adversarial examples generated for one model to outsmart another model. Our research centers on enhancing the transferability of adversarial attacks in SRS. Our innovative approach involves strategically skipping non-linear activation functions during the backpropagation process to achieve this goal. The proposed method yields promising results in enhancing the transferability of adversarial examples across diverse SRS architectures, parameters, features, and datasets. To validate the effectiveness of our proposed method, we conduct an evaluation using the state-of-the-art FoolHD attack, an attack designed specifically for exploiting SRS. By implementing our method in various scenarios, including cross-architecture, cross-parameter, cross-feature, and cross-dataset settings, we demonstrate its resilience and versatility. To evaluate the performance of the proposed method in improving transferability, we have introduced three novel metrics: enhanced transferability, relative transferability, and effort in enhancing transferability. Our experiments demonstrate a significant boost in the transferability of adversarial examples in SRS. This research contributes to the growing body of knowledge on AML for SRS and emphasizes the urgency of developing robust defenses to safeguard these critical biometric systems.

扬声器识别系统（SRS）利用个人独特的声音特征进行识别和验证，是安全访问的守门员。声纹识别系统在银行、自动驾驶汽车、军事和智能设备等多个生物识别安全领域都有应用。然而，随着技术的进步，这些模式面临的威胁也在增加。随着对抗性攻击的兴起，这些模型受到了考验。对抗性机器学习（AML）技术被用来利用 SRS 中的漏洞，威胁着它们的可靠性和安全性。在本研究中，我们将重点关注在 SRS 领域内 AML 的可转移性。可转移性是指一个模型生成的对抗性示例战胜另一个模型的能力。我们的研究重点是提高 SRS 中对抗性攻击的可转移性。我们的创新方法是在反向传播过程中战略性地跳过非线性激活函数，以实现这一目标。所提出的方法在增强对抗示例在不同 SRS 架构、参数、特征和数据集之间的可转移性方面取得了可喜的成果。为了验证我们提出的方法的有效性，我们使用最先进的 FoolHD 攻击进行了评估，这是一种专为利用 SRS 而设计的攻击。通过在各种场景（包括跨体系结构、跨参数、跨特征和跨数据集设置）中实施我们的方法，我们展示了该方法的弹性和多功能性。为了评估所提方法在提高可转移性方面的性能，我们引入了三个新指标：增强可转移性、相对可转移性和增强可转移性的努力。我们的实验证明，在 SRS 中，对抗性示例的可转移性得到了显著提高。这项研究为有关 SRS 反洗钱的知识体系的不断壮大做出了贡献，并强调了开发强大防御系统以保护这些关键生物识别系统的紧迫性。

{"title":"Enhancing cross-domain transferability of black-box adversarial attacks on speaker recognition systems using linearized backpropagation","authors":"Umang Patel, Shruti Bhilare, Avik Hati","doi":"10.1007/s10044-024-01269-w","DOIUrl":"https://doi.org/10.1007/s10044-024-01269-w","url":null,"abstract":"Speaker recognition system (SRS) serves as the gatekeeper for secure access, using the unique vocal characteristics of individuals for identification and verification. SRS can be found several biometric security applications such as in banks, autonomous cars, military, and smart devices. However, as technology advances, so do the threats to these models. With the rise of adversarial attacks, these models have been put to the test. Adversarial machine learning (AML) techniques have been utilized to exploit vulnerabilities in SRS, threatening their reliability and security. In this study, we concentrate on transferability in AML within the realm of SRS. Transferability refers to the capability of adversarial examples generated for one model to outsmart another model. Our research centers on enhancing the transferability of adversarial attacks in SRS. Our innovative approach involves strategically skipping non-linear activation functions during the backpropagation process to achieve this goal. The proposed method yields promising results in enhancing the transferability of adversarial examples across diverse SRS architectures, parameters, features, and datasets. To validate the effectiveness of our proposed method, we conduct an evaluation using the state-of-the-art FoolHD attack, an attack designed specifically for exploiting SRS. By implementing our method in various scenarios, including cross-architecture, cross-parameter, cross-feature, and cross-dataset settings, we demonstrate its resilience and versatility. To evaluate the performance of the proposed method in improving transferability, we have introduced three novel metrics: enhanced transferability, relative transferability, and effort in enhancing transferability. Our experiments demonstrate a significant boost in the transferability of adversarial examples in SRS. This research contributes to the growing body of knowledge on AML for SRS and emphasizes the urgency of developing robust defenses to safeguard these critical biometric systems.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"79 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140927568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PMG-DETR: fast convergence of DETR with position-sensitive multi-scale attention and grouped queries PMG-DETR：利用位置敏感多尺度关注和分组查询快速收敛 DETR

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-05-09 DOI: 10.1007/s10044-024-01281-0

Shuming Cui, Hongwei Deng

The recently proposed DETR successfully applied the Transformer to object detection and achieved impressive results. However, the learned object queries often explore the entire image to match the corresponding regions, resulting in slow convergence of DETR. Additionally, DETR only uses single-scale features from the final stage of the backbone network, leading to poor performance in small object detection. To address these issues, we propose an effective training strategy for improving the DETR framework, named PMG-DETR. We achieve this by using Position-sensitive Multi-scale attention and Grouped queries. First, to better fuse the multi-scale features, we propose a Position-sensitive Multi-scale attention. By incorporating a spatial sampling strategy into deformable attention, we can further improve the performance of small object detection. Second, we extend the attention mechanism by introducing a novel positional encoding scheme. Finally, we propose a grouping strategy for object queries, where queries are grouped at the decoder side for a more precise inclusion of regions of interest and to accelerate DETR convergence. Extensive experiments on the COCO dataset show that PMG-DETR can achieve better performance compared to DETR, e.g., AP 47.8(%) using ResNet50 as backbone trained in 50 epochs. We perform ablation studies on the COCO dataset to validate the effectiveness of the proposed PMG-DETR.

最近提出的 DETR 成功地将变换器应用于物体检测，并取得了令人瞩目的成果。然而，学习到的物体查询往往会探索整个图像以匹配相应的区域，导致 DETR 的收敛速度很慢。此外，DETR 只使用骨干网络最后阶段的单尺度特征，导致小物体检测性能不佳。为了解决这些问题，我们提出了一种有效的训练策略来改进 DETR 框架，并将其命名为 PMG-DETR。我们通过使用位置敏感多尺度关注和分组查询来实现这一目标。首先，为了更好地融合多尺度特征，我们提出了位置敏感多尺度注意力。通过在可变形注意力中加入空间采样策略，我们可以进一步提高小物体检测的性能。其次，我们通过引入新颖的位置编码方案来扩展注意力机制。最后，我们提出了一种对象查询分组策略，即在解码器端对查询进行分组，以便更精确地包含感兴趣的区域，并加速 DETR 的收敛。在COCO数据集上进行的大量实验表明，与DETR相比，PMG-DETR可以获得更好的性能，例如，使用ResNet50作为骨干，在50个历时内训练出的AP为47.8（%/）。我们在COCO数据集上进行了消融研究，以验证所提出的PMG-DETR的有效性。

{"title":"PMG-DETR: fast convergence of DETR with position-sensitive multi-scale attention and grouped queries","authors":"Shuming Cui, Hongwei Deng","doi":"10.1007/s10044-024-01281-0","DOIUrl":"https://doi.org/10.1007/s10044-024-01281-0","url":null,"abstract":"The recently proposed DETR successfully applied the Transformer to object detection and achieved impressive results. However, the learned object queries often explore the entire image to match the corresponding regions, resulting in slow convergence of DETR. Additionally, DETR only uses single-scale features from the final stage of the backbone network, leading to poor performance in small object detection. To address these issues, we propose an effective training strategy for improving the DETR framework, named PMG-DETR. We achieve this by using Position-sensitive Multi-scale attention and Grouped queries. First, to better fuse the multi-scale features, we propose a Position-sensitive Multi-scale attention. By incorporating a spatial sampling strategy into deformable attention, we can further improve the performance of small object detection. Second, we extend the attention mechanism by introducing a novel positional encoding scheme. Finally, we propose a grouping strategy for object queries, where queries are grouped at the decoder side for a more precise inclusion of regions of interest and to accelerate DETR convergence. Extensive experiments on the COCO dataset show that PMG-DETR can achieve better performance compared to DETR, e.g., AP 47.8(%) using ResNet50 as backbone trained in 50 epochs. We perform ablation studies on the COCO dataset to validate the effectiveness of the proposed PMG-DETR.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140927569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Early detection of Alzheimer’s disease using squeeze and excitation network with local binary pattern descriptor 利用具有局部二进制模式描述符的挤压和激励网络早期检测阿尔茨海默病

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-05-09 DOI: 10.1007/s10044-024-01280-1

Ambily Francis, S. Immanuel Alex Pandian, K. Martin Sagayam, Lam Dang, J. Anitha, Linh Dinh, Marc Pomplun, Hien Dang

Alzheimer’s disease is a degenerative brain disease that impairs memory, thinking skills, and the ability to perform even the most basic tasks. The primary challenge in this domain is accurate early stage disease detection. When the disease is detected at an early stage, medical professionals can prescribe medications to reduce brain shrinkage. Although the disease may not be curable, these interventions can extend the patient’s life by slowing down the rate of shrinkage. The four cognitive states of the human brain are cognitive normal (CN), mild cognitive impairment convertible (MCIc), mild cognitive impairment non-convertible (MCInc), and Alzheimer’s disease (AD). Mild cognitive impairment convertible (MCIc) is the early stage of Alzheimer’s disease. Individuals with MCIc will develop Alzheimer’s disease for a few years. However, it is difficult to detect this state through medical investigations. The mild cognitive impairment non-convertible state (MCInc) is the state immediately before MCIc. MCInc is a common condition in people of all ages, where minor memory issues arise as a result of normal aging. Early detection of AD can be claimed if and only if the transition from MCInc to MCIc is complete. Deep learning algorithms can be promising techniques for identifying the progression stage of a disease using magnetic resonance imaging. In this study, a novel deep learning algorithm was proposed to improve the classification accuracy of MCIc vs. MCInc. This study utilized the advantages of local binary patterns along with squeeze and excitation networks (SENet). Without the squeeze and excitation network, the classification accuracy of MCIc versus MCInc was 82%. The classification accuracy improved by 86% with the use of SENet. The experimental results show that the proposed model achieves better performance for MCInc vs. MCIc classification in terms of accuracy, precision, recall, F1 score, and ROC.

阿尔茨海默病是一种退行性脑部疾病，会损害记忆力、思维能力和完成最基本任务的能力。这一领域的主要挑战在于早期疾病的准确检测。如果能在早期发现这种疾病，医务人员就可以开出药物来减少脑萎缩。虽然这种疾病可能无法治愈，但这些干预措施可以减缓脑萎缩的速度，从而延长患者的寿命。人脑的四种认知状态是认知正常（CN）、轻度认知障碍可转换（MCIc）、轻度认知障碍不可转换（MCInc）和阿尔茨海默病（AD）。可转换性轻度认知障碍（MCIc）是阿尔茨海默病的早期阶段。患有 MCIc 的人会在数年内发展为阿尔茨海默病。然而，这种状态很难通过医学检查发现。轻度认知障碍不可逆状态（MCInc）是紧接 MCIc 之前的状态。MCInc 是各年龄段人群中常见的一种状态，是正常衰老过程中出现的轻微记忆问题。只有当从 MCInc 到 MCIc 的转换完成时，才能声称可以早期检测出注意力缺失症。深度学习算法是利用磁共振成像识别疾病进展阶段的一种有前途的技术。本研究提出了一种新型深度学习算法，以提高 MCIc 与 MCInc 的分类准确性。这项研究利用了局部二元模式和挤压与激励网络（SENet）的优势。在不使用挤压和激励网络的情况下，MCIc 与 MCInc 的分类准确率为 82%。使用 SENet 后，分类准确率提高了 86%。实验结果表明，就准确率、精确度、召回率、F1 分数和 ROC 而言，所提出的模型在 MCInc 与 MCIc 分类中取得了更好的性能。

{"title":"Early detection of Alzheimer’s disease using squeeze and excitation network with local binary pattern descriptor","authors":"Ambily Francis, S. Immanuel Alex Pandian, K. Martin Sagayam, Lam Dang, J. Anitha, Linh Dinh, Marc Pomplun, Hien Dang","doi":"10.1007/s10044-024-01280-1","DOIUrl":"https://doi.org/10.1007/s10044-024-01280-1","url":null,"abstract":"Alzheimer’s disease is a degenerative brain disease that impairs memory, thinking skills, and the ability to perform even the most basic tasks. The primary challenge in this domain is accurate early stage disease detection. When the disease is detected at an early stage, medical professionals can prescribe medications to reduce brain shrinkage. Although the disease may not be curable, these interventions can extend the patient’s life by slowing down the rate of shrinkage. The four cognitive states of the human brain are cognitive normal (CN), mild cognitive impairment convertible (MCIc), mild cognitive impairment non-convertible (MCInc), and Alzheimer’s disease (AD). Mild cognitive impairment convertible (MCIc) is the early stage of Alzheimer’s disease. Individuals with MCIc will develop Alzheimer’s disease for a few years. However, it is difficult to detect this state through medical investigations. The mild cognitive impairment non-convertible state (MCInc) is the state immediately before MCIc. MCInc is a common condition in people of all ages, where minor memory issues arise as a result of normal aging. Early detection of AD can be claimed if and only if the transition from MCInc to MCIc is complete. Deep learning algorithms can be promising techniques for identifying the progression stage of a disease using magnetic resonance imaging. In this study, a novel deep learning algorithm was proposed to improve the classification accuracy of MCIc vs. MCInc. This study utilized the advantages of local binary patterns along with squeeze and excitation networks (SENet). Without the squeeze and excitation network, the classification accuracy of MCIc versus MCInc was 82%. The classification accuracy improved by 86% with the use of SENet. The experimental results show that the proposed model achieves better performance for MCInc vs. MCIc classification in terms of accuracy, precision, recall, F1 score, and ROC.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"119 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140927528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Digital fingerprint indexing using synthetic binary indexes 使用合成二进制索引编制数字指纹索引

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-05-06 DOI: 10.1007/s10044-024-01283-y

Joannes Falade, Sandra Cremer, Christophe Rosenberger

Fingerprint identification is an important issue for people recognition when using Automatic Fingerprint Identification Systems (AFIS). The size of fingerprint databases has increased with the growing use of AFIS for identification at border control, visa issuance and other procedures around the world. Fingerprint indexing algorithms are used to reduce the fingerprint search space, speed up the identification processing time and also improve the accuracy of the identification result. In this paper, we propose a new binary fingerprint indexing method based on synthetic indexes to address this problem on large databases. Two fundamental properties are considered for these synthetic indexes: discriminancy and representativeness. A biometric database is then structured considering synthetic indexes for each fingerprint template, which guaranties to have a fixed number of indexes for the database during the enrollment and identification processes. We compare the proposed algorithm with the classical Minutiae Cylinder Code (MCC) indexing method, which is one of the best methods in the State of the art. In order to evaluate the proposed method, we use all Fingerprint Verification Competition (FVC) datasets from 2000 to 2006 databases separately and combined to confirm the accuracy of our algorithm for real applications. The proposed method achieves a high hit rate (more than 98%) for a low value of penetration rate (less than 5%) compared to existing methods in the literature.

在使用自动指纹识别系统（AFIS）进行人员识别时，指纹识别是一个重要问题。随着世界各地越来越多地使用自动指纹识别系统进行边境管制、签证发放和其他程序的身份识别，指纹数据库的规模也在不断扩大。指纹索引算法可用于减少指纹搜索空间、加快识别处理时间并提高识别结果的准确性。本文提出了一种基于合成索引的新型二进制指纹索引方法，以解决大型数据库中的这一问题。这些合成索引有两个基本特性：辨别力和代表性。然后，根据每个指纹模板的合成索引来构建生物识别数据库，从而保证在注册和识别过程中数据库有固定数量的索引。我们将提议的算法与经典的 "细节圆柱码"（MCC）索引方法进行了比较，后者是目前最好的方法之一。为了评估所提出的方法，我们使用了从 2000 年到 2006 年的所有指纹验证竞赛（FVC）数据集，分别和合并使用，以确认我们的算法在实际应用中的准确性。与文献中的现有方法相比，所提出的方法在较低的渗透率值（低于 5%）下实现了较高的命中率（超过 98%）。

{"title":"Digital fingerprint indexing using synthetic binary indexes","authors":"Joannes Falade, Sandra Cremer, Christophe Rosenberger","doi":"10.1007/s10044-024-01283-y","DOIUrl":"https://doi.org/10.1007/s10044-024-01283-y","url":null,"abstract":"Fingerprint identification is an important issue for people recognition when using Automatic Fingerprint Identification Systems (AFIS). The size of fingerprint databases has increased with the growing use of AFIS for identification at border control, visa issuance and other procedures around the world. Fingerprint indexing algorithms are used to reduce the fingerprint search space, speed up the identification processing time and also improve the accuracy of the identification result. In this paper, we propose a new binary fingerprint indexing method based on synthetic indexes to address this problem on large databases. Two fundamental properties are considered for these synthetic indexes: discriminancy and representativeness. A biometric database is then structured considering synthetic indexes for each fingerprint template, which guaranties to have a fixed number of indexes for the database during the enrollment and identification processes. We compare the proposed algorithm with the classical Minutiae Cylinder Code (MCC) indexing method, which is one of the best methods in the State of the art. In order to evaluate the proposed method, we use all Fingerprint Verification Competition (FVC) datasets from 2000 to 2006 databases separately and combined to confirm the accuracy of our algorithm for real applications. The proposed method achieves a high hit rate (more than 98%) for a low value of penetration rate (less than 5%) compared to existing methods in the literature.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"14 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140886737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Aka-Net: anchor free-based object detection network for surveillance video transmission in the IOT edge computing environment Aka-Net：物联网边缘计算环境中用于监控视频传输的自由锚对象检测网络

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-05-05 DOI: 10.1007/s10044-024-01272-1

Preethi Sambandam Raju, Revathi Arumugam Rajendran, Murugan Mahalingam

With the growing use of wireless surveillance cameras in (Internet of things) IoT applications the need to address storage capacity and transmission bandwidth challenges becomes crucial. The majority of successive frames from surveillance cameras contain redundant and irrelevant information, leading to increased transmission burden. Existing video pre-processing techniques often focus on reducing the number of frames without considering accuracy and fail to effectively handle both spatial and temporal redundancies simultaneously. To address these issues, an anchor-free key action point network (AKA-Net) is proposed for video pre-processing in the IoT-edge computing environment. The oriented Features from Accelerated Segment Test (FAST) and rotated Binary Robust Independent Elementary Features (BRIEF) (ORB) feature descriptor is employed to remove duplicate frames, leading to more compact and efficient video representation. The AKA-Net's major contributions include its powerful representation capabilities achieved through the bottleneck module in the information-transferring backbone network, which effectively captures multi-scale features. The information-transferring module helps to improve the performance of the object detection algorithm for video pre-processing by fusing the complementary information from different scales. This allows the algorithm to detect objects of different sizes more accurately, making it highly effective for real-time video pre-processing tasks. Then, the key action point selection module that utilizes the self-attention mechanism is introduced to accurately select informative key action points. This enables efficient network transmission with lower bandwidth requirements, while maintaining high accuracy and low latency. It treats every pixel within the feature map as a temporal-spatial point and leverages self-attention to identify and select the most relevant keypoints. Experiments show that the proposed AKA-Net outperforms existing methods in terms of compression ratio of 54.2% and accuracy with a rate of 96.7%. By addressing spatial and temporal redundancies and optimizing key action point selection, AKA-Net offers a significant advancement in video pre-processing for smart surveillance systems, benefiting various IoT applications.

随着无线监控摄像机在（物联网）物联网应用中的使用日益增多，解决存储容量和传输带宽难题变得至关重要。监控摄像机的大部分连续帧都包含冗余和不相关的信息，从而增加了传输负担。现有的视频预处理技术通常只注重减少帧数，而不考虑精度，无法同时有效处理空间和时间冗余。为了解决这些问题，我们提出了一种无锚关键行动点网络（AKA-Net），用于物联网边缘计算环境中的视频预处理。它采用了来自加速片段测试（FAST）的定向特征和旋转二进制鲁棒独立基本特征（BRIEF）（ORB）特征描述符来去除重复帧，从而获得更紧凑、更高效的视频表示。AKA 网络的主要贡献包括通过信息传输主干网络中的瓶颈模块实现强大的表示能力，从而有效捕捉多尺度特征。信息传递模块通过融合不同尺度的互补信息，有助于提高视频预处理中物体检测算法的性能。这样，该算法就能更准确地检测出不同大小的物体，使其在实时视频预处理任务中非常有效。然后，引入了利用自我关注机制的关键行动点选择模块，以准确选择信息量大的关键行动点。这样就能以更低的带宽要求实现高效的网络传输，同时保持高精度和低延迟。它将特征图中的每个像素都视为一个时空点，并利用自我注意来识别和选择最相关的关键点。实验表明，所提出的 AKA-Net 在压缩率（54.2%）和准确率（96.7%）方面均优于现有方法。通过处理空间和时间冗余以及优化关键行动点选择，AKA-Net 在智能监控系统的视频预处理方面取得了重大进展，使各种物联网应用受益匪浅。

{"title":"Aka-Net: anchor free-based object detection network for surveillance video transmission in the IOT edge computing environment","authors":"Preethi Sambandam Raju, Revathi Arumugam Rajendran, Murugan Mahalingam","doi":"10.1007/s10044-024-01272-1","DOIUrl":"https://doi.org/10.1007/s10044-024-01272-1","url":null,"abstract":"With the growing use of wireless surveillance cameras in (Internet of things) IoT applications the need to address storage capacity and transmission bandwidth challenges becomes crucial. The majority of successive frames from surveillance cameras contain redundant and irrelevant information, leading to increased transmission burden. Existing video pre-processing techniques often focus on reducing the number of frames without considering accuracy and fail to effectively handle both spatial and temporal redundancies simultaneously. To address these issues, an anchor-free key action point network (AKA-Net) is proposed for video pre-processing in the IoT-edge computing environment. The oriented Features from Accelerated Segment Test (FAST) and rotated Binary Robust Independent Elementary Features (BRIEF) (ORB) feature descriptor is employed to remove duplicate frames, leading to more compact and efficient video representation. The AKA-Net's major contributions include its powerful representation capabilities achieved through the bottleneck module in the information-transferring backbone network, which effectively captures multi-scale features. The information-transferring module helps to improve the performance of the object detection algorithm for video pre-processing by fusing the complementary information from different scales. This allows the algorithm to detect objects of different sizes more accurately, making it highly effective for real-time video pre-processing tasks. Then, the key action point selection module that utilizes the self-attention mechanism is introduced to accurately select informative key action points. This enables efficient network transmission with lower bandwidth requirements, while maintaining high accuracy and low latency. It treats every pixel within the feature map as a temporal-spatial point and leverages self-attention to identify and select the most relevant keypoints. Experiments show that the proposed AKA-Net outperforms existing methods in terms of compression ratio of 54.2% and accuracy with a rate of 96.7%. By addressing spatial and temporal redundancies and optimizing key action point selection, AKA-Net offers a significant advancement in video pre-processing for smart surveillance systems, benefiting various IoT applications.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"40 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140886690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised group-based crowd dynamic behavior detection and tracking in online video sequences 在线视频序列中基于群组的无监督人群动态行为检测与跟踪

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-05-03 DOI: 10.1007/s10044-024-01279-8

Atefeh Ghorbanpour, Manoochehr Nahvi

Analysis of video sequences of public places is an important topic in video surveillance systems. Due to the high probability of occurring abnormal behavior in crowded scene, the main purpose of many surveillance systems is to monitor the crowd movement, and detection of abnormalities. To speed up this process and also for error reduction, it is highly important to use automated and intelligent tools in surveillance systems, as an alternative to the human operator. This study presents an unsupervised and online algorithm for analysis of dynamic crowd behavior, which uses the proposed features, with the capability to analyze crowds over time and reveal different behaviors of the crowd groups. In the proposed algorithm, prominent points are initially tracked. These key points are processed by the proposed system that includes removing the fixed points, employing proposed features of the moving points, automated determination of neighborhood, the similarity of the invariant neighbors. Group clustering is done automatically and the classification stage is conducted without the training phase. The dynamic behavior of the crowd is examined using the features and the extracted group properties and different states in the scene are diagnosed by dynamic thresholding. Experimental evaluation of the proposed method on several databases shows that it is performed properly in video sequences and it is able to detect various abnormal behaviors in the crowd scenes.

分析公共场所的视频序列是视频监控系统的一个重要课题。由于在人群拥挤的场景中发生异常行为的概率很高，许多监控系统的主要目的就是监控人群的移动并检测异常情况。为了加快这一过程并减少错误，在监控系统中使用自动化和智能化工具来替代人工操作是非常重要的。本研究提出了一种用于分析动态人群行为的无监督在线算法，该算法使用了所提出的特征，能够随时间推移对人群进行分析，并揭示人群群体的不同行为。在所提出的算法中，首先对突出点进行跟踪。提议的系统会对这些关键点进行处理，包括移除固定点、利用提议的移动点特征、自动确定邻域、不变邻域的相似性。群组聚类自动完成，分类阶段无需训练即可进行。利用特征和提取的群体属性检查人群的动态行为，并通过动态阈值诊断场景中的不同状态。在多个数据库上对所提出的方法进行的实验评估表明，该方法在视频序列中表现良好，能够检测出人群场景中的各种异常行为。

{"title":"Unsupervised group-based crowd dynamic behavior detection and tracking in online video sequences","authors":"Atefeh Ghorbanpour, Manoochehr Nahvi","doi":"10.1007/s10044-024-01279-8","DOIUrl":"https://doi.org/10.1007/s10044-024-01279-8","url":null,"abstract":"Analysis of video sequences of public places is an important topic in video surveillance systems. Due to the high probability of occurring abnormal behavior in crowded scene, the main purpose of many surveillance systems is to monitor the crowd movement, and detection of abnormalities. To speed up this process and also for error reduction, it is highly important to use automated and intelligent tools in surveillance systems, as an alternative to the human operator. This study presents an unsupervised and online algorithm for analysis of dynamic crowd behavior, which uses the proposed features, with the capability to analyze crowds over time and reveal different behaviors of the crowd groups. In the proposed algorithm, prominent points are initially tracked. These key points are processed by the proposed system that includes removing the fixed points, employing proposed features of the moving points, automated determination of neighborhood, the similarity of the invariant neighbors. Group clustering is done automatically and the classification stage is conducted without the training phase. The dynamic behavior of the crowd is examined using the features and the extracted group properties and different states in the scene are diagnosed by dynamic thresholding. Experimental evaluation of the proposed method on several databases shows that it is performed properly in video sequences and it is able to detect various abnormal behaviors in the crowd scenes.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"28 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140886692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Bharatanatyam pose recognition: a wavelet multi head progressive attention 深度巴拉塔尼亚舞姿势识别：小波多头渐进关注

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-05-02 DOI: 10.1007/s10044-024-01273-0

D. Anil Kumar, P. V. V. Kishore, K. Sravani

Human pose identification from 2D video sequences is extremely challenging under the influence of recording artifacts such as lighting, sensor motion, unpredictable subject movements and many more. In this work, the objective is to recognize rhythmic human poses from independently sourced online videos of an Indian classical dance form, Bharatanatyam. The data set (BOICDVD22) consists of internet-sourced video frames of 5 different songs from 10 dancers that are labelled into the corresponding lyrical classes. Inferencing and achieving a decent accuracy on the models trained with this multi-sourced online data is a challenging task. The past works focused on the creation of a miniature offline non-shareable ICD dataset with standard deep learning models which resulted in unsatisfactory performance. Recently, attention-based feature learning has been driving the performance of deep learning models. The most suitable attention mechanism for online data is wavelet-based attention. Though successful, wavelet-based feature learning is applied across one layer and is dependent on global average pooling (GAP) in both channel and spatial dimensions. The current generation of wavelet attention has resulted in unbalanced spatial attention across all the video frames. To overcome this unbalanced attention and induce human-like attention this work proposes to replace the GAP wavelet channel or spatial at a particular layer in the backbone architecture with wavelet multi-head progressive attention (WMHPA). It enhances the attention mechanism as well as decreases information loss because of no GAP. Progressiveness in attention enables the WMHPA to evenly distribute attention features across all the video frames. The results show the highest possible accuracy on the dance data set due to multi-resolution attention across the entire network. The WMHPA validates against state-of-the-art on our ICD as well as benchmarked person re-identification action datasets.

从二维视频序列中识别人体姿态是一项极具挑战性的工作，因为会受到光照、传感器运动、不可预知的主体运动等录制假象的影响。在这项工作中，我们的目标是从独立来源的印度古典舞（Bharatanatyam）在线视频中识别有节奏的人体姿势。数据集（BOICDVD22）由来自互联网的 10 位舞者的 5 首不同歌曲的视频帧组成，这些视频帧被标记为相应的抒情类。使用这种多源在线数据训练的模型进行推理并达到相当的准确度是一项具有挑战性的任务。过去的工作主要是利用标准的深度学习模型创建一个微型离线不可共享 ICD 数据集，但结果并不令人满意。最近，基于注意力的特征学习推动了深度学习模型性能的提升。最适合在线数据的注意力机制是基于小波的注意力。基于小波的特征学习虽然很成功，但只适用于一个层，而且在通道和空间维度上都依赖于全局平均池化（GAP）。目前的小波注意力导致所有视频帧的空间注意力不平衡。为了克服这种不平衡的注意力，并诱发类似人类的注意力，这项工作建议在骨干架构中的特定层用小波多头渐进注意力（WMHPA）取代 GAP 小波通道或空间。由于没有 GAP，它增强了注意力机制并减少了信息损失。注意力的渐进性使 WMHPA 能够在所有视频帧中均匀分布注意力特征。结果表明，由于整个网络的多分辨率注意力，舞蹈数据集的准确率达到了最高。WMHPA 在我们的 ICD 以及基准人物再识别动作数据集上与最先进的技术进行了验证。

{"title":"Deep Bharatanatyam pose recognition: a wavelet multi head progressive attention","authors":"D. Anil Kumar, P. V. V. Kishore, K. Sravani","doi":"10.1007/s10044-024-01273-0","DOIUrl":"https://doi.org/10.1007/s10044-024-01273-0","url":null,"abstract":"Human pose identification from 2D video sequences is extremely challenging under the influence of recording artifacts such as lighting, sensor motion, unpredictable subject movements and many more. In this work, the objective is to recognize rhythmic human poses from independently sourced online videos of an Indian classical dance form, Bharatanatyam. The data set (BOICDVD22) consists of internet-sourced video frames of 5 different songs from 10 dancers that are labelled into the corresponding lyrical classes. Inferencing and achieving a decent accuracy on the models trained with this multi-sourced online data is a challenging task. The past works focused on the creation of a miniature offline non-shareable ICD dataset with standard deep learning models which resulted in unsatisfactory performance. Recently, attention-based feature learning has been driving the performance of deep learning models. The most suitable attention mechanism for online data is wavelet-based attention. Though successful, wavelet-based feature learning is applied across one layer and is dependent on global average pooling (GAP) in both channel and spatial dimensions. The current generation of wavelet attention has resulted in unbalanced spatial attention across all the video frames. To overcome this unbalanced attention and induce human-like attention this work proposes to replace the GAP wavelet channel or spatial at a particular layer in the backbone architecture with wavelet multi-head progressive attention (WMHPA). It enhances the attention mechanism as well as decreases information loss because of no GAP. Progressiveness in attention enables the WMHPA to evenly distribute attention features across all the video frames. The results show the highest possible accuracy on the dance data set due to multi-resolution attention across the entire network. The WMHPA validates against state-of-the-art on our ICD as well as benchmarked person re-identification action datasets.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"18 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140886548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Complex event recognition and anomaly detection with event behavior model 利用事件行为模型进行复杂事件识别和异常检测

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-04-30 DOI: 10.1007/s10044-024-01275-y

Min-Chang Liu, Fang-Rong Hsu, Chua-Huang Huang

The concept of complex event processing refers to the process of tracking and analyzing a set of related events and drawing conclusions from them. For such systems, complex event recognition is essential. The object of complex event recognition is to recognize meaningful events or patterns and construct processing rules to respond to them. Researchers have conducted numerous studies on the recognition of complex event patterns by using recognition languages or models. However, the completeness of the process in complex event recognition has rarely been discussed. Although the reality of the event is uncertain, the structure for modeling and explaining complex event interactions of contingent information remains unclear. In this study, we focused on developing a general framework for addressing these problems and demonstrating the applicability of model-based approaches to represent spatio-temporal dimensions and causality in complex event recognition. In this paper, we propose an event behavior model for complex event recognition from a process perspective. The developed model could detect and explain anomalies associated with complex events. An experiment was conducted to evaluate the model performance. The results revealed that temporal operations within overlapping events were crucial to event pattern recognition.

复杂事件处理的概念是指跟踪和分析一系列相关事件并从中得出结论的过程。对于这类系统来说，复杂事件识别是必不可少的。复杂事件识别的目的是识别有意义的事件或模式，并构建处理规则对其做出反应。研究人员使用识别语言或模型对复杂事件模式的识别进行了大量研究。然而，人们很少讨论复杂事件识别过程的完整性。虽然事件的现实性是不确定的，但用于建模和解释复杂事件中或有信息相互作用的结构仍不清楚。在本研究中，我们重点开发了一个解决这些问题的通用框架，并展示了基于模型的方法在复杂事件识别中表示时空维度和因果关系的适用性。在本文中，我们从过程的角度提出了一种用于复杂事件识别的事件行为模型。所开发的模型可以检测和解释与复杂事件相关的异常情况。我们通过实验对模型的性能进行了评估。结果显示，重叠事件中的时间操作对事件模式识别至关重要。

{"title":"Complex event recognition and anomaly detection with event behavior model","authors":"Min-Chang Liu, Fang-Rong Hsu, Chua-Huang Huang","doi":"10.1007/s10044-024-01275-y","DOIUrl":"https://doi.org/10.1007/s10044-024-01275-y","url":null,"abstract":"The concept of complex event processing refers to the process of tracking and analyzing a set of related events and drawing conclusions from them. For such systems, complex event recognition is essential. The object of complex event recognition is to recognize meaningful events or patterns and construct processing rules to respond to them. Researchers have conducted numerous studies on the recognition of complex event patterns by using recognition languages or models. However, the completeness of the process in complex event recognition has rarely been discussed. Although the reality of the event is uncertain, the structure for modeling and explaining complex event interactions of contingent information remains unclear. In this study, we focused on developing a general framework for addressing these problems and demonstrating the applicability of model-based approaches to represent spatio-temporal dimensions and causality in complex event recognition. In this paper, we propose an event behavior model for complex event recognition from a process perspective. The developed model could detect and explain anomalies associated with complex events. An experiment was conducted to evaluate the model performance. The results revealed that temporal operations within overlapping events were crucial to event pattern recognition.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"11 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proxemics-net++: classification of human interactions in still images Proxemics-net++：静态图像中的人际互动分类

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-04-29 DOI: 10.1007/s10044-024-01270-3

Isabel Jiménez-Velasco, Jorge Zafra-Palma, Rafael Muñoz-Salinas, Manuel J. Marín-Jiménez

Human interaction recognition (HIR) is a significant challenge in computer vision that focuses on identifying human interactions in images and videos. HIR presents a great complexity due to factors such as pose diversity, varying scene conditions, or the presence of multiple individuals. Recent research has explored different approaches to address it, with an increasing emphasis on human pose estimation. In this work, we propose Proxemics-Net++, an extension of the Proxemics-Net model, capable of addressing the problem of recognizing human interactions in images through two different tasks: the identification of the types of “touch codes” or proxemics and the identification of the type of social relationship between pairs. To achieve this, we use RGB and body pose information together with the state-of-the-art deep learning architecture, ConvNeXt, as the backbone. We performed an ablative analysis to understand how the combination of RGB and body pose information affects these two tasks. Experimental results show that body pose information contributes significantly to proxemic recognition (first task) as it allows to improve the existing state of the art, while its contribution in the classification of social relations (second task) is limited due to the ambiguity of labelling in this problem, resulting in RGB information being more influential in this task.

人机交互识别（HIR）是计算机视觉领域的一项重大挑战，其重点是识别图像和视频中的人机交互。由于姿势多样性、场景条件变化或存在多个个体等因素，人机交互识别具有极大的复杂性。最近的研究探索了不同的方法来解决这个问题，并越来越重视人的姿势估计。在这项工作中，我们提出了 Proxemics-Net++，它是 Proxemics-Net 模型的扩展，能够通过两个不同的任务来解决图像中的人际互动识别问题：识别 "触摸代码 "或近似的类型，以及识别配对之间的社会关系类型。为此，我们使用 RGB 和身体姿态信息，并以最先进的深度学习架构 ConvNeXt 为骨干。我们进行了消减分析，以了解 RGB 和身体姿态信息的组合如何影响这两项任务。实验结果表明，身体姿态信息对近距离识别（第一项任务）的贡献很大，因为它可以改善现有的技术水平，而它对社会关系分类（第二项任务）的贡献则很有限，因为在这个问题上，标签的模糊性导致 RGB 信息对这项任务的影响更大。

{"title":"Proxemics-net++: classification of human interactions in still images","authors":"Isabel Jiménez-Velasco, Jorge Zafra-Palma, Rafael Muñoz-Salinas, Manuel J. Marín-Jiménez","doi":"10.1007/s10044-024-01270-3","DOIUrl":"https://doi.org/10.1007/s10044-024-01270-3","url":null,"abstract":"Human interaction recognition (HIR) is a significant challenge in computer vision that focuses on identifying human interactions in images and videos. HIR presents a great complexity due to factors such as pose diversity, varying scene conditions, or the presence of multiple individuals. Recent research has explored different approaches to address it, with an increasing emphasis on human pose estimation. In this work, we propose Proxemics-Net++, an extension of the Proxemics-Net model, capable of addressing the problem of recognizing human interactions in images through two different tasks: the identification of the types of “touch codes” or proxemics and the identification of the type of social relationship between pairs. To achieve this, we use RGB and body pose information together with the state-of-the-art deep learning architecture, ConvNeXt, as the backbone. We performed an ablative analysis to understand how the combination of RGB and body pose information affects these two tasks. Experimental results show that body pose information contributes significantly to proxemic recognition (first task) as it allows to improve the existing state of the art, while its contribution in the classification of social relations (second task) is limited due to the ambiguity of labelling in this problem, resulting in RGB information being more influential in this task.\u0000","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"11 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140811204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Remote sensing image location based on improved Yolov7 target detection 基于改进的 Yolov7 目标检测的遥感图像定位

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-04-29 DOI: 10.1007/s10044-024-01276-x

Cui Li, Jiao Wang

Target detection, as a core issue in the field of computer vision, is widely applied in many key areas such as face recognition, license plate recognition, security protection, and driverless driving. Although its detection speed and accuracy continue to break records, there are still many challenges and difficulties in target detection of remote sensing images, which require further in-depth research and exploration. Remote sensing images can be regarded as a "three-dimensional data cube", with more complex background information, dense and small object targets, and more severe weather interference factors. These factors lead to large positioning errors and low detection accuracy in the target detection process of remote sensing images. An improved YOLOv7 object detection model is proposed to address the problem of high false negative rate for dense and small objects in remote sensing images. Firstly, the GAM attention mechanism is introduced, and a global scheduling mechanism is proposed to improve the performance of deep neural networks by reducing information reduction and expanding global interaction representations, thus enhancing the network's sensitivity to targets. Secondly, the loss function CIoU in the original Yolov7 network model is replaced by SIoU, aiming to optimize the loss function, reduce losses, and improve the generalization of the network. Finally, the model is tested on the public available RSOD remote sensing dataset, and its generalization is verified on the Okahublot FloW-Img sub-dataset. The results showed that the accuracy (MAP@0.5) of detecting objects improved by 1.7 percentage points and 1.5 percentage points respectively for the improved Yolov7 network model compared to the original model, effectively improves the accuracy of detecting small targets in remote sensing images and solves the problem of leakage detection of small targets in remote sensing images.

目标检测作为计算机视觉领域的核心问题，被广泛应用于人脸识别、车牌识别、安全防护、无人驾驶等诸多关键领域。虽然其检测速度和精度不断刷新纪录，但遥感图像的目标检测仍存在诸多挑战和困难，需要进一步深入研究和探索。遥感图像可以看作是一个 "三维数据立方体"，其背景信息较为复杂，目标物密集且体积小，天气干扰因素较为严重。这些因素导致遥感图像目标检测过程中定位误差大、检测精度低。针对遥感图像中高密度、小目标假阴性率高的问题，提出了改进的 YOLOv7 目标检测模型。首先，引入 GAM 注意机制，提出全局调度机制，通过减少信息还原和扩展全局交互表征来提高深度神经网络的性能，从而增强网络对目标的灵敏度。其次，将原 Yolov7 网络模型中的损失函数 CIoU 替换为 SIoU，旨在优化损失函数，减少损失，提高网络的泛化能力。最后，该模型在公开的 RSOD 遥感数据集上进行了测试，并在 Okahublot FloW-Img 子数据集上验证了其泛化能力。结果表明，与原始模型相比，改进后的 Yolov7 网络模型检测物体的准确率（MAP@0.5）分别提高了 1.7 个百分点和 1.5 个百分点，有效提高了遥感图像中小目标的检测准确率，解决了遥感图像中小目标的漏检问题。

{"title":"Remote sensing image location based on improved Yolov7 target detection","authors":"Cui Li, Jiao Wang","doi":"10.1007/s10044-024-01276-x","DOIUrl":"https://doi.org/10.1007/s10044-024-01276-x","url":null,"abstract":"Target detection, as a core issue in the field of computer vision, is widely applied in many key areas such as face recognition, license plate recognition, security protection, and driverless driving. Although its detection speed and accuracy continue to break records, there are still many challenges and difficulties in target detection of remote sensing images, which require further in-depth research and exploration. Remote sensing images can be regarded as a \"three-dimensional data cube\", with more complex background information, dense and small object targets, and more severe weather interference factors. These factors lead to large positioning errors and low detection accuracy in the target detection process of remote sensing images. An improved YOLOv7 object detection model is proposed to address the problem of high false negative rate for dense and small objects in remote sensing images. Firstly, the GAM attention mechanism is introduced, and a global scheduling mechanism is proposed to improve the performance of deep neural networks by reducing information reduction and expanding global interaction representations, thus enhancing the network's sensitivity to targets. Secondly, the loss function CIoU in the original Yolov7 network model is replaced by SIoU, aiming to optimize the loss function, reduce losses, and improve the generalization of the network. Finally, the model is tested on the public available RSOD remote sensing dataset, and its generalization is verified on the Okahublot FloW-Img sub-dataset. The results showed that the accuracy (MAP@0.5) of detecting objects improved by 1.7 percentage points and 1.5 percentage points respectively for the improved Yolov7 network model compared to the original model, effectively improves the accuracy of detecting small targets in remote sensing images and solves the problem of leakage detection of small targets in remote sensing images.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"11 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0