Optimization-based meta-learning aims to learn a meta-initialization that can adapt quickly a new unseen task within a few gradient updates. Model Agnostic Meta-Learning (MAML) is a benchmark meta-learning algorithm comprising two optimization loops. The outer loop leads to the meta initialization and the inner loop is dedicated to learning a new task quickly. ANIL (almost no inner loop) algorithm emphasized that adaptation to new tasks reuses the meta-initialization features instead of rapidly learning changes in representations. This obviates the need for rapid learning. In this work, we propose that contrary to ANIL, learning new features may be needed during meta-testing. A new unseen task from a non-similar distribution would necessitate rapid learning in addition to the reuse and recombination of existing features. We invoke the width-depth duality of neural networks, wherein we increase the width of the network by adding additional connection units (ACUs). The ACUs enable the learning of new atomic features in the meta-testing task, and the associated increased width facilitates information propagation in the forward pass. The newly learned features combine with existing features in the last layer for meta-learning. Experimental results confirm our observations. The proposed MAC method outperformed the existing ANIL algorithm for non-similar task distribution by (approx) 12% (5-shot task setting).
{"title":"MAC: a meta-learning approach for feature learning and recombination","authors":"Sambhavi Tiwari, Manas Gogoi, Shekhar Verma, Krishna Pratap Singh","doi":"10.1007/s10044-024-01271-2","DOIUrl":"https://doi.org/10.1007/s10044-024-01271-2","url":null,"abstract":"<p>Optimization-based meta-learning aims to learn a meta-initialization that can adapt quickly a new unseen task within a few gradient updates. Model Agnostic Meta-Learning (MAML) is a benchmark meta-learning algorithm comprising two optimization loops. The outer loop leads to the meta initialization and the inner loop is dedicated to learning a new task quickly. ANIL (almost no inner loop) algorithm emphasized that adaptation to new tasks reuses the meta-initialization features instead of rapidly learning changes in representations. This obviates the need for rapid learning. In this work, we propose that contrary to ANIL, learning new features may be needed during meta-testing. A new unseen task from a non-similar distribution would necessitate rapid learning in addition to the reuse and recombination of existing features. We invoke the width-depth duality of neural networks, wherein we increase the width of the network by adding additional connection units (ACUs). The ACUs enable the learning of new atomic features in the meta-testing task, and the associated increased width facilitates information propagation in the forward pass. The newly learned features combine with existing features in the last layer for meta-learning. Experimental results confirm our observations. The proposed MAC method outperformed the existing ANIL algorithm for non-similar task distribution by <span>(approx)</span> 12% (5-shot task setting).\u0000</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"29 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-13DOI: 10.1007/s10044-024-01274-z
Panpan Niu, Yinghong He, Wei Guo, Xiangyang Wang
Robustness, imperceptibility, and watermark capacity are three indispensable and contradictory properties for any image watermarking systems. It is a challenging work to achieve the balance among the three important properties. In this paper, by using bivariate Birnbaum–Saunders (BRBS) distribution model, we present a statistical image watermark scheme in nonsubsampled shearlet transform (NSST)-pseudo Zernike moments (PZMs) magnitude hybrid domain. The whole watermarking algorithm includes two parts: watermark embedding and extraction. NSST is firstly performed on host image to obtain the frequency subbands, and the NSST subbands are divided into non overlapping blocks. Then, the significant high-entropy NSST domain blocks are selected. Meanwhile, for each selected NSST coefficient block, PZMs are calculated to obtain the NSST-PZMs amplitude. Finally, watermark signals are inserted into the amplitude hybrid domain of NSST-PZMs. In order to decode accurately watermark signal, the statistical characteristics of NSST-PZMs magnitudes are analyzed in detail. Then, NSST-PZMs magnitudes are described statistically by BRBS distribution, which can simultaneously capture the marginal distribution and strong dependencies of NSST-PZMs magnitudes. Also, BRBS statistical model parameters are estimated accurately by modified closed-form maximum likelihood estimator (MML). Finally, a statistical watermark decoder based on BRBS distribution and maximum likelihood (ML) decision rule is developed in NSST-PZMS magnitude hybrid domain. Extensive experimental results show the superiority of the proposed image watermark decoder over some state-of-the-art statistical watermarking methods and deep learning approaches.
{"title":"BNPSIW: BRBS-based NSST-PZMs domain statistical image watermarking","authors":"Panpan Niu, Yinghong He, Wei Guo, Xiangyang Wang","doi":"10.1007/s10044-024-01274-z","DOIUrl":"https://doi.org/10.1007/s10044-024-01274-z","url":null,"abstract":"<p>Robustness, imperceptibility, and watermark capacity are three indispensable and contradictory properties for any image watermarking systems. It is a challenging work to achieve the balance among the three important properties. In this paper, by using bivariate Birnbaum–Saunders (BRBS) distribution model, we present a statistical image watermark scheme in nonsubsampled shearlet transform (NSST)-pseudo Zernike moments (PZMs) magnitude hybrid domain. The whole watermarking algorithm includes two parts: watermark embedding and extraction. NSST is firstly performed on host image to obtain the frequency subbands, and the NSST subbands are divided into non overlapping blocks. Then, the significant high-entropy NSST domain blocks are selected. Meanwhile, for each selected NSST coefficient block, PZMs are calculated to obtain the NSST-PZMs amplitude. Finally, watermark signals are inserted into the amplitude hybrid domain of NSST-PZMs. In order to decode accurately watermark signal, the statistical characteristics of NSST-PZMs magnitudes are analyzed in detail. Then, NSST-PZMs magnitudes are described statistically by BRBS distribution, which can simultaneously capture the marginal distribution and strong dependencies of NSST-PZMs magnitudes. Also, BRBS statistical model parameters are estimated accurately by modified closed-form maximum likelihood estimator (MML). Finally, a statistical watermark decoder based on BRBS distribution and maximum likelihood (ML) decision rule is developed in NSST-PZMS magnitude hybrid domain. Extensive experimental results show the superiority of the proposed image watermark decoder over some state-of-the-art statistical watermarking methods and deep learning approaches.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"64 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140927570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-13DOI: 10.1007/s10044-024-01269-w
Umang Patel, Shruti Bhilare, Avik Hati
Speaker recognition system (SRS) serves as the gatekeeper for secure access, using the unique vocal characteristics of individuals for identification and verification. SRS can be found several biometric security applications such as in banks, autonomous cars, military, and smart devices. However, as technology advances, so do the threats to these models. With the rise of adversarial attacks, these models have been put to the test. Adversarial machine learning (AML) techniques have been utilized to exploit vulnerabilities in SRS, threatening their reliability and security. In this study, we concentrate on transferability in AML within the realm of SRS. Transferability refers to the capability of adversarial examples generated for one model to outsmart another model. Our research centers on enhancing the transferability of adversarial attacks in SRS. Our innovative approach involves strategically skipping non-linear activation functions during the backpropagation process to achieve this goal. The proposed method yields promising results in enhancing the transferability of adversarial examples across diverse SRS architectures, parameters, features, and datasets. To validate the effectiveness of our proposed method, we conduct an evaluation using the state-of-the-art FoolHD attack, an attack designed specifically for exploiting SRS. By implementing our method in various scenarios, including cross-architecture, cross-parameter, cross-feature, and cross-dataset settings, we demonstrate its resilience and versatility. To evaluate the performance of the proposed method in improving transferability, we have introduced three novel metrics: enhanced transferability, relative transferability, and effort in enhancing transferability. Our experiments demonstrate a significant boost in the transferability of adversarial examples in SRS. This research contributes to the growing body of knowledge on AML for SRS and emphasizes the urgency of developing robust defenses to safeguard these critical biometric systems.
{"title":"Enhancing cross-domain transferability of black-box adversarial attacks on speaker recognition systems using linearized backpropagation","authors":"Umang Patel, Shruti Bhilare, Avik Hati","doi":"10.1007/s10044-024-01269-w","DOIUrl":"https://doi.org/10.1007/s10044-024-01269-w","url":null,"abstract":"<p>Speaker recognition system (SRS) serves as the gatekeeper for secure access, using the unique vocal characteristics of individuals for identification and verification. SRS can be found several biometric security applications such as in banks, autonomous cars, military, and smart devices. However, as technology advances, so do the threats to these models. With the rise of adversarial attacks, these models have been put to the test. Adversarial machine learning (AML) techniques have been utilized to exploit vulnerabilities in SRS, threatening their reliability and security. In this study, we concentrate on transferability in AML within the realm of SRS. Transferability refers to the capability of adversarial examples generated for one model to outsmart another model. Our research centers on enhancing the transferability of adversarial attacks in SRS. Our innovative approach involves strategically skipping non-linear activation functions during the backpropagation process to achieve this goal. The proposed method yields promising results in enhancing the transferability of adversarial examples across diverse SRS architectures, parameters, features, and datasets. To validate the effectiveness of our proposed method, we conduct an evaluation using the state-of-the-art FoolHD attack, an attack designed specifically for exploiting SRS. By implementing our method in various scenarios, including cross-architecture, cross-parameter, cross-feature, and cross-dataset settings, we demonstrate its resilience and versatility. To evaluate the performance of the proposed method in improving transferability, we have introduced three novel metrics: <i>enhanced transferability</i>, <i>relative transferability</i>, and <i>effort in enhancing transferability</i>. Our experiments demonstrate a significant boost in the transferability of adversarial examples in SRS. This research contributes to the growing body of knowledge on AML for SRS and emphasizes the urgency of developing robust defenses to safeguard these critical biometric systems.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"79 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140927568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-09DOI: 10.1007/s10044-024-01281-0
Shuming Cui, Hongwei Deng
The recently proposed DETR successfully applied the Transformer to object detection and achieved impressive results. However, the learned object queries often explore the entire image to match the corresponding regions, resulting in slow convergence of DETR. Additionally, DETR only uses single-scale features from the final stage of the backbone network, leading to poor performance in small object detection. To address these issues, we propose an effective training strategy for improving the DETR framework, named PMG-DETR. We achieve this by using Position-sensitive Multi-scale attention and Grouped queries. First, to better fuse the multi-scale features, we propose a Position-sensitive Multi-scale attention. By incorporating a spatial sampling strategy into deformable attention, we can further improve the performance of small object detection. Second, we extend the attention mechanism by introducing a novel positional encoding scheme. Finally, we propose a grouping strategy for object queries, where queries are grouped at the decoder side for a more precise inclusion of regions of interest and to accelerate DETR convergence. Extensive experiments on the COCO dataset show that PMG-DETR can achieve better performance compared to DETR, e.g., AP 47.8(%) using ResNet50 as backbone trained in 50 epochs. We perform ablation studies on the COCO dataset to validate the effectiveness of the proposed PMG-DETR.
{"title":"PMG-DETR: fast convergence of DETR with position-sensitive multi-scale attention and grouped queries","authors":"Shuming Cui, Hongwei Deng","doi":"10.1007/s10044-024-01281-0","DOIUrl":"https://doi.org/10.1007/s10044-024-01281-0","url":null,"abstract":"<p>The recently proposed DETR successfully applied the Transformer to object detection and achieved impressive results. However, the learned object queries often explore the entire image to match the corresponding regions, resulting in slow convergence of DETR. Additionally, DETR only uses single-scale features from the final stage of the backbone network, leading to poor performance in small object detection. To address these issues, we propose an effective training strategy for improving the DETR framework, named PMG-DETR. We achieve this by using Position-sensitive Multi-scale attention and Grouped queries. First, to better fuse the multi-scale features, we propose a Position-sensitive Multi-scale attention. By incorporating a spatial sampling strategy into deformable attention, we can further improve the performance of small object detection. Second, we extend the attention mechanism by introducing a novel positional encoding scheme. Finally, we propose a grouping strategy for object queries, where queries are grouped at the decoder side for a more precise inclusion of regions of interest and to accelerate DETR convergence. Extensive experiments on the COCO dataset show that PMG-DETR can achieve better performance compared to DETR, e.g., AP 47.8<span>(%)</span> using ResNet50 as backbone trained in 50 epochs. We perform ablation studies on the COCO dataset to validate the effectiveness of the proposed PMG-DETR.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140927569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-09DOI: 10.1007/s10044-024-01280-1
Ambily Francis, S. Immanuel Alex Pandian, K. Martin Sagayam, Lam Dang, J. Anitha, Linh Dinh, Marc Pomplun, Hien Dang
Alzheimer’s disease is a degenerative brain disease that impairs memory, thinking skills, and the ability to perform even the most basic tasks. The primary challenge in this domain is accurate early stage disease detection. When the disease is detected at an early stage, medical professionals can prescribe medications to reduce brain shrinkage. Although the disease may not be curable, these interventions can extend the patient’s life by slowing down the rate of shrinkage. The four cognitive states of the human brain are cognitive normal (CN), mild cognitive impairment convertible (MCIc), mild cognitive impairment non-convertible (MCInc), and Alzheimer’s disease (AD). Mild cognitive impairment convertible (MCIc) is the early stage of Alzheimer’s disease. Individuals with MCIc will develop Alzheimer’s disease for a few years. However, it is difficult to detect this state through medical investigations. The mild cognitive impairment non-convertible state (MCInc) is the state immediately before MCIc. MCInc is a common condition in people of all ages, where minor memory issues arise as a result of normal aging. Early detection of AD can be claimed if and only if the transition from MCInc to MCIc is complete. Deep learning algorithms can be promising techniques for identifying the progression stage of a disease using magnetic resonance imaging. In this study, a novel deep learning algorithm was proposed to improve the classification accuracy of MCIc vs. MCInc. This study utilized the advantages of local binary patterns along with squeeze and excitation networks (SENet). Without the squeeze and excitation network, the classification accuracy of MCIc versus MCInc was 82%. The classification accuracy improved by 86% with the use of SENet. The experimental results show that the proposed model achieves better performance for MCInc vs. MCIc classification in terms of accuracy, precision, recall, F1 score, and ROC.
{"title":"Early detection of Alzheimer’s disease using squeeze and excitation network with local binary pattern descriptor","authors":"Ambily Francis, S. Immanuel Alex Pandian, K. Martin Sagayam, Lam Dang, J. Anitha, Linh Dinh, Marc Pomplun, Hien Dang","doi":"10.1007/s10044-024-01280-1","DOIUrl":"https://doi.org/10.1007/s10044-024-01280-1","url":null,"abstract":"<p>Alzheimer’s disease is a degenerative brain disease that impairs memory, thinking skills, and the ability to perform even the most basic tasks. The primary challenge in this domain is accurate early stage disease detection. When the disease is detected at an early stage, medical professionals can prescribe medications to reduce brain shrinkage. Although the disease may not be curable, these interventions can extend the patient’s life by slowing down the rate of shrinkage. The four cognitive states of the human brain are cognitive normal (CN), mild cognitive impairment convertible (MCIc), mild cognitive impairment non-convertible (MCInc), and Alzheimer’s disease (AD). Mild cognitive impairment convertible (MCIc) is the early stage of Alzheimer’s disease. Individuals with MCIc will develop Alzheimer’s disease for a few years. However, it is difficult to detect this state through medical investigations. The mild cognitive impairment non-convertible state (MCInc) is the state immediately before MCIc. MCInc is a common condition in people of all ages, where minor memory issues arise as a result of normal aging. Early detection of AD can be claimed if and only if the transition from MCInc to MCIc is complete. Deep learning algorithms can be promising techniques for identifying the progression stage of a disease using magnetic resonance imaging. In this study, a novel deep learning algorithm was proposed to improve the classification accuracy of MCIc vs. MCInc. This study utilized the advantages of local binary patterns along with squeeze and excitation networks (SENet). Without the squeeze and excitation network, the classification accuracy of MCIc versus MCInc was 82%. The classification accuracy improved by 86% with the use of SENet. The experimental results show that the proposed model achieves better performance for MCInc vs. MCIc classification in terms of accuracy, precision, recall, F1 score, and ROC.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"119 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140927528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fingerprint identification is an important issue for people recognition when using Automatic Fingerprint Identification Systems (AFIS). The size of fingerprint databases has increased with the growing use of AFIS for identification at border control, visa issuance and other procedures around the world. Fingerprint indexing algorithms are used to reduce the fingerprint search space, speed up the identification processing time and also improve the accuracy of the identification result. In this paper, we propose a new binary fingerprint indexing method based on synthetic indexes to address this problem on large databases. Two fundamental properties are considered for these synthetic indexes: discriminancy and representativeness. A biometric database is then structured considering synthetic indexes for each fingerprint template, which guaranties to have a fixed number of indexes for the database during the enrollment and identification processes. We compare the proposed algorithm with the classical Minutiae Cylinder Code (MCC) indexing method, which is one of the best methods in the State of the art. In order to evaluate the proposed method, we use all Fingerprint Verification Competition (FVC) datasets from 2000 to 2006 databases separately and combined to confirm the accuracy of our algorithm for real applications. The proposed method achieves a high hit rate (more than 98%) for a low value of penetration rate (less than 5%) compared to existing methods in the literature.
{"title":"Digital fingerprint indexing using synthetic binary indexes","authors":"Joannes Falade, Sandra Cremer, Christophe Rosenberger","doi":"10.1007/s10044-024-01283-y","DOIUrl":"https://doi.org/10.1007/s10044-024-01283-y","url":null,"abstract":"<p>Fingerprint identification is an important issue for people recognition when using Automatic Fingerprint Identification Systems (AFIS). The size of fingerprint databases has increased with the growing use of AFIS for identification at border control, visa issuance and other procedures around the world. Fingerprint indexing algorithms are used to reduce the fingerprint search space, speed up the identification processing time and also improve the accuracy of the identification result. In this paper, we propose a new binary fingerprint indexing method based on synthetic indexes to address this problem on large databases. Two fundamental properties are considered for these synthetic indexes: discriminancy and representativeness. A biometric database is then structured considering synthetic indexes for each fingerprint template, which guaranties to have a fixed number of indexes for the database during the enrollment and identification processes. We compare the proposed algorithm with the classical Minutiae Cylinder Code (MCC) indexing method, which is one of the best methods in the State of the art. In order to evaluate the proposed method, we use all Fingerprint Verification Competition (FVC) datasets from 2000 to 2006 databases separately and combined to confirm the accuracy of our algorithm for real applications. The proposed method achieves a high hit rate (more than 98%) for a low value of penetration rate (less than 5%) compared to existing methods in the literature.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"14 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140886737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the growing use of wireless surveillance cameras in (Internet of things) IoT applications the need to address storage capacity and transmission bandwidth challenges becomes crucial. The majority of successive frames from surveillance cameras contain redundant and irrelevant information, leading to increased transmission burden. Existing video pre-processing techniques often focus on reducing the number of frames without considering accuracy and fail to effectively handle both spatial and temporal redundancies simultaneously. To address these issues, an anchor-free key action point network (AKA-Net) is proposed for video pre-processing in the IoT-edge computing environment. The oriented Features from Accelerated Segment Test (FAST) and rotated Binary Robust Independent Elementary Features (BRIEF) (ORB) feature descriptor is employed to remove duplicate frames, leading to more compact and efficient video representation. The AKA-Net's major contributions include its powerful representation capabilities achieved through the bottleneck module in the information-transferring backbone network, which effectively captures multi-scale features. The information-transferring module helps to improve the performance of the object detection algorithm for video pre-processing by fusing the complementary information from different scales. This allows the algorithm to detect objects of different sizes more accurately, making it highly effective for real-time video pre-processing tasks. Then, the key action point selection module that utilizes the self-attention mechanism is introduced to accurately select informative key action points. This enables efficient network transmission with lower bandwidth requirements, while maintaining high accuracy and low latency. It treats every pixel within the feature map as a temporal-spatial point and leverages self-attention to identify and select the most relevant keypoints. Experiments show that the proposed AKA-Net outperforms existing methods in terms of compression ratio of 54.2% and accuracy with a rate of 96.7%. By addressing spatial and temporal redundancies and optimizing key action point selection, AKA-Net offers a significant advancement in video pre-processing for smart surveillance systems, benefiting various IoT applications.
{"title":"Aka-Net: anchor free-based object detection network for surveillance video transmission in the IOT edge computing environment","authors":"Preethi Sambandam Raju, Revathi Arumugam Rajendran, Murugan Mahalingam","doi":"10.1007/s10044-024-01272-1","DOIUrl":"https://doi.org/10.1007/s10044-024-01272-1","url":null,"abstract":"<p>With the growing use of wireless surveillance cameras in (Internet of things) IoT applications the need to address storage capacity and transmission bandwidth challenges becomes crucial. The majority of successive frames from surveillance cameras contain redundant and irrelevant information, leading to increased transmission burden. Existing video pre-processing techniques often focus on reducing the number of frames without considering accuracy and fail to effectively handle both spatial and temporal redundancies simultaneously. To address these issues, an anchor-free key action point network (AKA-Net) is proposed for video pre-processing in the IoT-edge computing environment. The oriented Features from Accelerated Segment Test (FAST) and rotated Binary Robust Independent Elementary Features (BRIEF) (ORB) feature descriptor is employed to remove duplicate frames, leading to more compact and efficient video representation. The AKA-Net's major contributions include its powerful representation capabilities achieved through the bottleneck module in the information-transferring backbone network, which effectively captures multi-scale features. The information-transferring module helps to improve the performance of the object detection algorithm for video pre-processing by fusing the complementary information from different scales. This allows the algorithm to detect objects of different sizes more accurately, making it highly effective for real-time video pre-processing tasks. Then, the key action point selection module that utilizes the self-attention mechanism is introduced to accurately select informative key action points. This enables efficient network transmission with lower bandwidth requirements, while maintaining high accuracy and low latency. It treats every pixel within the feature map as a temporal-spatial point and leverages self-attention to identify and select the most relevant keypoints. Experiments show that the proposed AKA-Net outperforms existing methods in terms of compression ratio of 54.2% and accuracy with a rate of 96.7%. By addressing spatial and temporal redundancies and optimizing key action point selection, AKA-Net offers a significant advancement in video pre-processing for smart surveillance systems, benefiting various IoT applications.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"40 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140886690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-03DOI: 10.1007/s10044-024-01279-8
Atefeh Ghorbanpour, Manoochehr Nahvi
Analysis of video sequences of public places is an important topic in video surveillance systems. Due to the high probability of occurring abnormal behavior in crowded scene, the main purpose of many surveillance systems is to monitor the crowd movement, and detection of abnormalities. To speed up this process and also for error reduction, it is highly important to use automated and intelligent tools in surveillance systems, as an alternative to the human operator. This study presents an unsupervised and online algorithm for analysis of dynamic crowd behavior, which uses the proposed features, with the capability to analyze crowds over time and reveal different behaviors of the crowd groups. In the proposed algorithm, prominent points are initially tracked. These key points are processed by the proposed system that includes removing the fixed points, employing proposed features of the moving points, automated determination of neighborhood, the similarity of the invariant neighbors. Group clustering is done automatically and the classification stage is conducted without the training phase. The dynamic behavior of the crowd is examined using the features and the extracted group properties and different states in the scene are diagnosed by dynamic thresholding. Experimental evaluation of the proposed method on several databases shows that it is performed properly in video sequences and it is able to detect various abnormal behaviors in the crowd scenes.
{"title":"Unsupervised group-based crowd dynamic behavior detection and tracking in online video sequences","authors":"Atefeh Ghorbanpour, Manoochehr Nahvi","doi":"10.1007/s10044-024-01279-8","DOIUrl":"https://doi.org/10.1007/s10044-024-01279-8","url":null,"abstract":"<p>Analysis of video sequences of public places is an important topic in video surveillance systems. Due to the high probability of occurring abnormal behavior in crowded scene, the main purpose of many surveillance systems is to monitor the crowd movement, and detection of abnormalities. To speed up this process and also for error reduction, it is highly important to use automated and intelligent tools in surveillance systems, as an alternative to the human operator. This study presents an unsupervised and online algorithm for analysis of dynamic crowd behavior, which uses the proposed features, with the capability to analyze crowds over time and reveal different behaviors of the crowd groups. In the proposed algorithm, prominent points are initially tracked. These key points are processed by the proposed system that includes removing the fixed points, employing proposed features of the moving points, automated determination of neighborhood, the similarity of the invariant neighbors. Group clustering is done automatically and the classification stage is conducted without the training phase. The dynamic behavior of the crowd is examined using the features and the extracted group properties and different states in the scene are diagnosed by dynamic thresholding. Experimental evaluation of the proposed method on several databases shows that it is performed properly in video sequences and it is able to detect various abnormal behaviors in the crowd scenes.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"28 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140886692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-02DOI: 10.1007/s10044-024-01273-0
D. Anil Kumar, P. V. V. Kishore, K. Sravani
Human pose identification from 2D video sequences is extremely challenging under the influence of recording artifacts such as lighting, sensor motion, unpredictable subject movements and many more. In this work, the objective is to recognize rhythmic human poses from independently sourced online videos of an Indian classical dance form, Bharatanatyam. The data set (BOICDVD22) consists of internet-sourced video frames of 5 different songs from 10 dancers that are labelled into the corresponding lyrical classes. Inferencing and achieving a decent accuracy on the models trained with this multi-sourced online data is a challenging task. The past works focused on the creation of a miniature offline non-shareable ICD dataset with standard deep learning models which resulted in unsatisfactory performance. Recently, attention-based feature learning has been driving the performance of deep learning models. The most suitable attention mechanism for online data is wavelet-based attention. Though successful, wavelet-based feature learning is applied across one layer and is dependent on global average pooling (GAP) in both channel and spatial dimensions. The current generation of wavelet attention has resulted in unbalanced spatial attention across all the video frames. To overcome this unbalanced attention and induce human-like attention this work proposes to replace the GAP wavelet channel or spatial at a particular layer in the backbone architecture with wavelet multi-head progressive attention (WMHPA). It enhances the attention mechanism as well as decreases information loss because of no GAP. Progressiveness in attention enables the WMHPA to evenly distribute attention features across all the video frames. The results show the highest possible accuracy on the dance data set due to multi-resolution attention across the entire network. The WMHPA validates against state-of-the-art on our ICD as well as benchmarked person re-identification action datasets.
{"title":"Deep Bharatanatyam pose recognition: a wavelet multi head progressive attention","authors":"D. Anil Kumar, P. V. V. Kishore, K. Sravani","doi":"10.1007/s10044-024-01273-0","DOIUrl":"https://doi.org/10.1007/s10044-024-01273-0","url":null,"abstract":"<p>Human pose identification from 2D video sequences is extremely challenging under the influence of recording artifacts such as lighting, sensor motion, unpredictable subject movements and many more. In this work, the objective is to recognize rhythmic human poses from independently sourced online videos of an Indian classical dance form, Bharatanatyam. The data set (BOICDVD22) consists of internet-sourced video frames of 5 different songs from 10 dancers that are labelled into the corresponding lyrical classes. Inferencing and achieving a decent accuracy on the models trained with this multi-sourced online data is a challenging task. The past works focused on the creation of a miniature offline non-shareable ICD dataset with standard deep learning models which resulted in unsatisfactory performance. Recently, attention-based feature learning has been driving the performance of deep learning models. The most suitable attention mechanism for online data is wavelet-based attention. Though successful, wavelet-based feature learning is applied across one layer and is dependent on global average pooling (GAP) in both channel and spatial dimensions. The current generation of wavelet attention has resulted in unbalanced spatial attention across all the video frames. To overcome this unbalanced attention and induce human-like attention this work proposes to replace the GAP wavelet channel or spatial at a particular layer in the backbone architecture with wavelet multi-head progressive attention (WMHPA). It enhances the attention mechanism as well as decreases information loss because of no GAP. Progressiveness in attention enables the WMHPA to evenly distribute attention features across all the video frames. The results show the highest possible accuracy on the dance data set due to multi-resolution attention across the entire network. The WMHPA validates against state-of-the-art on our ICD as well as benchmarked person re-identification action datasets.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"18 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140886548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-30DOI: 10.1007/s10044-024-01275-y
Min-Chang Liu, Fang-Rong Hsu, Chua-Huang Huang
The concept of complex event processing refers to the process of tracking and analyzing a set of related events and drawing conclusions from them. For such systems, complex event recognition is essential. The object of complex event recognition is to recognize meaningful events or patterns and construct processing rules to respond to them. Researchers have conducted numerous studies on the recognition of complex event patterns by using recognition languages or models. However, the completeness of the process in complex event recognition has rarely been discussed. Although the reality of the event is uncertain, the structure for modeling and explaining complex event interactions of contingent information remains unclear. In this study, we focused on developing a general framework for addressing these problems and demonstrating the applicability of model-based approaches to represent spatio-temporal dimensions and causality in complex event recognition. In this paper, we propose an event behavior model for complex event recognition from a process perspective. The developed model could detect and explain anomalies associated with complex events. An experiment was conducted to evaluate the model performance. The results revealed that temporal operations within overlapping events were crucial to event pattern recognition.
{"title":"Complex event recognition and anomaly detection with event behavior model","authors":"Min-Chang Liu, Fang-Rong Hsu, Chua-Huang Huang","doi":"10.1007/s10044-024-01275-y","DOIUrl":"https://doi.org/10.1007/s10044-024-01275-y","url":null,"abstract":"<p>The concept of complex event processing refers to the process of tracking and analyzing a set of related events and drawing conclusions from them. For such systems, complex event recognition is essential. The object of complex event recognition is to recognize meaningful events or patterns and construct processing rules to respond to them. Researchers have conducted numerous studies on the recognition of complex event patterns by using recognition languages or models. However, the completeness of the process in complex event recognition has rarely been discussed. Although the reality of the event is uncertain, the structure for modeling and explaining complex event interactions of contingent information remains unclear. In this study, we focused on developing a general framework for addressing these problems and demonstrating the applicability of model-based approaches to represent spatio-temporal dimensions and causality in complex event recognition. In this paper, we propose an event behavior model for complex event recognition from a process perspective. The developed model could detect and explain anomalies associated with complex events. An experiment was conducted to evaluate the model performance. The results revealed that temporal operations within overlapping events were crucial to event pattern recognition.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"11 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}