Neural Processing Letters最新文献_第9页

Efficient Bayesian CNN Model Compression using Bayes by Backprop and L1-Norm Regularization 利用贝叶斯反推和 L1 正则化实现高效贝叶斯 CNN 模型压缩

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-04-04 DOI: 10.1007/s11063-024-11593-1

Ali Muhammad Shaikh, Yun-bo Zhao, Aakash Kumar, Munawar Ali, Yu Kang

The swift advancement of convolutional neural networks (CNNs) in numerous real-world utilizations urges an elevation in computational cost along with the size of the model. In this context, many researchers steered their focus to eradicate these specific issues by compressing the original CNN models by pruning weights and filters, respectively. As filter pruning has an upper hand over the weight pruning method because filter pruning methods don’t impact sparse connectivity patterns. In this work, we suggested a Bayesian Convolutional Neural Network (BayesCNN) with Variational Inference, which prefaces probability distribution over weights. For the pruning task of Bayesian CNN, we utilized a combined version of L1-norm with capped L1-norm to help epitomize the amount of information that can be extracted through filter and control regularization. In this formation, we pruned unimportant filters directly without any test accuracy loss and achieved a slimmer model with comparative accuracy. The whole process of pruning is iterative and to validate the performance of our proposed work, we utilized several different CNN architectures on the standard classification dataset available. We have compared our results with non-Bayesian CNN models particularly, datasets such as CIFAR-10 on VGG-16, and pruned 75.8% parameters with float-point-operations (FLOPs) reduction of 51.3% without loss of accuracy and has achieved advancement in state-of-art.

卷积神经网络（CNN）在实际应用中的迅速发展，促使计算成本和模型大小不断增加。在这种情况下，许多研究人员通过剪枝权重和滤波器来压缩原始 CNN 模型，从而解决了这些具体问题。与权重剪枝法相比，滤波器剪枝法更具优势，因为滤波器剪枝法不会影响稀疏连接模式。在这项工作中，我们提出了一种带有变异推理的贝叶斯卷积神经网络（BayesCNN），它将概率分布置于权重之上。对于贝叶斯卷积神经网络的剪枝任务，我们采用了 L1 规范与封顶 L1 规范的组合版本，以帮助通过过滤和控制正则化提取信息量。在这种形成过程中，我们直接剪枝了不重要的滤波器，而不会造成任何测试精度损失，并实现了具有可比精度的更纤细模型。整个剪枝过程是迭代进行的，为了验证我们所提工作的性能，我们在现有的标准分类数据集上使用了几种不同的 CNN 架构。我们将结果与非贝叶斯 CNN 模型（尤其是 VGG-16 上的 CIFAR-10 等数据集）进行了比较，在不损失准确性的情况下，剪枝了 75.8% 的参数，浮点运算 (FLOP) 减少了 51.3%，实现了技术上的进步。

{"title":"Efficient Bayesian CNN Model Compression using Bayes by Backprop and L1-Norm Regularization","authors":"Ali Muhammad Shaikh, Yun-bo Zhao, Aakash Kumar, Munawar Ali, Yu Kang","doi":"10.1007/s11063-024-11593-1","DOIUrl":"https://doi.org/10.1007/s11063-024-11593-1","url":null,"abstract":"<p>The swift advancement of convolutional neural networks (CNNs) in numerous real-world utilizations urges an elevation in computational cost along with the size of the model. In this context, many researchers steered their focus to eradicate these specific issues by compressing the original CNN models by pruning weights and filters, respectively. As filter pruning has an upper hand over the weight pruning method because filter pruning methods don’t impact sparse connectivity patterns. In this work, we suggested a Bayesian Convolutional Neural Network (BayesCNN) with Variational Inference, which prefaces probability distribution over weights. For the pruning task of Bayesian CNN, we utilized a combined version of L1-norm with capped L1-norm to help epitomize the amount of information that can be extracted through filter and control regularization. In this formation, we pruned unimportant filters directly without any test accuracy loss and achieved a slimmer model with comparative accuracy. The whole process of pruning is iterative and to validate the performance of our proposed work, we utilized several different CNN architectures on the standard classification dataset available. We have compared our results with non-Bayesian CNN models particularly, datasets such as CIFAR-10 on VGG-16, and pruned 75.8% parameters with float-point-operations (FLOPs) reduction of 51.3% without loss of accuracy and has achieved advancement in state-of-art.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"61 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A New Adaptive Robust Modularized Semi-Supervised Community Detection Method Based on Non-negative Matrix Factorization 基于非负矩阵因式分解的新型自适应鲁棒模块化半监督社群检测方法

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-04-02 DOI: 10.1007/s11063-024-11588-y

Abstract

The most extensively used tools for categorizing complicated networks are community detection methods. One of the most common methods for unsupervised and semi-supervised clustering is community detection based on Non-negative Matrix Factorization (NMF). Nonetheless, this approach encounters multiple challenges, including the lack of specificity for the data type and the decreased efficiency when errors occur in each cluster’s knowledge priority. As modularity is the basic and thorough criterion for evaluating and validating performance of community detection methods, this paper proposes a new approach for modularity-based community detection which is similar to symmetric NMF. The provided approach is a semi-supervised adaptive robust community detection model referred to as modularized robust semi-supervised adaptive symmetric NMF (MRASNMF). In this model, the modularity criterion has been successfully combined with the NMF model via a novel multi-view clustering method. Also, the tuning parameter is adjusted iteratively via an adaptive method. MRASNMF makes use of knowledge priority, modularity criterion, reinforcement of non-negative matrix factorization, and has iterative solution, as well. In this regard, the MRASNMF model was evaluated and validated using five real-world networks in comparison to existing semi-supervised community detection approaches. According to the findings of this study, the proposed strategy is most effective for all types of networks.

摘要用于对复杂网络进行分类的最广泛工具是社群检测方法。无监督和半监督聚类最常用的方法之一是基于非负矩阵因式分解（NMF）的群落检测。然而，这种方法遇到了多重挑战，包括数据类型缺乏特异性，以及当每个聚类的知识优先级出现错误时效率降低。由于模块性是评估和验证群落检测方法性能的基本而全面的标准，本文提出了一种与对称 NMF 相似的基于模块性的群落检测新方法。所提供的方法是一种半监督自适应鲁棒社区检测模型，称为模块化鲁棒半监督自适应对称 NMF（MRASNMF）。在该模型中，模块化准则通过一种新颖的多视角聚类方法成功地与 NMF 模型相结合。此外，还通过自适应方法对调整参数进行迭代调整。MRASNMF 利用了知识优先权、模块化准则、非负矩阵因式分解的强化，并具有迭代求解功能。为此，我们使用五个真实世界的网络对 MRASNMF 模型进行了评估和验证，并与现有的半监督群落检测方法进行了比较。研究结果表明，所提出的策略对所有类型的网络都最为有效。

{"title":"A New Adaptive Robust Modularized Semi-Supervised Community Detection Method Based on Non-negative Matrix Factorization","authors":"","doi":"10.1007/s11063-024-11588-y","DOIUrl":"https://doi.org/10.1007/s11063-024-11588-y","url":null,"abstract":"<h3>Abstract</h3> <p>The most extensively used tools for categorizing complicated networks are community detection methods. One of the most common methods for unsupervised and semi-supervised clustering is community detection based on Non-negative Matrix Factorization (NMF). Nonetheless, this approach encounters multiple challenges, including the lack of specificity for the data type and the decreased efficiency when errors occur in each cluster’s knowledge priority. As modularity is the basic and thorough criterion for evaluating and validating performance of community detection methods, this paper proposes a new approach for modularity-based community detection which is similar to symmetric NMF. The provided approach is a semi-supervised adaptive robust community detection model referred to as modularized robust semi-supervised adaptive symmetric NMF (MRASNMF). In this model, the modularity criterion has been successfully combined with the NMF model via a novel multi-view clustering method. Also, the tuning parameter is adjusted iteratively via an adaptive method. MRASNMF makes use of knowledge priority, modularity criterion, reinforcement of non-negative matrix factorization, and has iterative solution, as well. In this regard, the MRASNMF model was evaluated and validated using five real-world networks in comparison to existing semi-supervised community detection approaches. According to the findings of this study, the proposed strategy is most effective for all types of networks.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"239 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detection and Classification of Brain Tumor Using Convolution Extreme Gradient Boosting Model and an Enhanced Salp Swarm Optimization 利用卷积极端梯度提升模型和增强型萨尔普群优化技术检测脑肿瘤并对其进行分类

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-04-02 DOI: 10.1007/s11063-024-11590-4

J. Jebastine

Some types of tumors in people with brain cancer grow so rapidly that their average size doubles in twenty-five days. Precisely determining the type of tumor enables physicians to conduct clinical planning and estimate dosage. However, accurate classification remains a challenging task due to the variable shape, size, and location of the tumors.The major objective of this paper is to detect and classify brain tumors. This paper introduces an effective Convolution Extreme Gradient Boosting model based on enhanced Salp Swarm Optimization (CEXGB-ESSO) for detecting brain tumors, and their types. Initially, the MRI image is fed to bilateral filtering for the purpose of noise removal. Then, the de-noised image is fed to the CEXGB model, where Extreme Gradient Boosting (EXGB) is used, replacing a fully connected layer of CNN to detect and classify brain tumors. It consists of numerous stacked convolutional neural networks (CNN) for efficient automatic learning of features, which avoids overfitting and time-consuming processes. Then, the tumor type is predicted using the EXGB in the last layer, where there is no need to bring the weight values from the fully connected layer. Enhanced Salp Swarm Optimization (ESSO) is utilized to find the optimal hyperparameters of EXGB, which enhance convergence speed and accuracy. Our proposed CEXGB-ESSO model gives high performance in terms of accuracy (99), sensitivity (97.52), precision (98.2), and specificity (97.7).Also, the convergence analysis reveals the efficient optimization process of ESSO, obtaining optimal hyperparameter values around iteration 25. Furthermore, the classification results showcase the CEXGB-ESSO model’s capability to accurately detect and classify brain tumors.

脑癌患者体内某些类型的肿瘤生长速度非常快，平均在二十五天内就会增大一倍。准确确定肿瘤类型有助于医生制定临床计划和估算用药剂量。然而，由于肿瘤的形状、大小和位置多变，准确分类仍是一项具有挑战性的任务。本文介绍了一种基于增强型萨尔普群优化（CEXGB-ESSO）的有效卷积极梯度提升模型，用于检测脑肿瘤及其类型。首先，磁共振成像图像被送入双边滤波以去除噪声。然后，将去噪后的图像送入 CEXGB 模型，在该模型中使用了极端梯度提升（EXGB）技术，取代 CNN 的全连接层来检测和分类脑肿瘤。它由大量堆叠的卷积神经网络（CNN）组成，可高效地自动学习特征，避免过度拟合和耗时的过程。然后，利用最后一层中的 EXGB 预测肿瘤类型，在这一层中无需引入全连接层的权重值。增强型萨尔普群优化（ESSO）可用于寻找 EXGB 的最优超参数，从而提高收敛速度和准确性。我们提出的 CEXGB-ESSO 模型在准确度（99）、灵敏度（97.52）、精确度（98.2）和特异度（97.7）方面都有很高的表现。此外，分类结果还展示了 CEXGB-ESSO 模型准确检测和分类脑肿瘤的能力。

{"title":"Detection and Classification of Brain Tumor Using Convolution Extreme Gradient Boosting Model and an Enhanced Salp Swarm Optimization","authors":"J. Jebastine","doi":"10.1007/s11063-024-11590-4","DOIUrl":"https://doi.org/10.1007/s11063-024-11590-4","url":null,"abstract":"<p>Some types of tumors in people with brain cancer grow so rapidly that their average size doubles in twenty-five days. Precisely determining the type of tumor enables physicians to conduct clinical planning and estimate dosage. However, accurate classification remains a challenging task due to the variable shape, size, and location of the tumors.The major objective of this paper is to detect and classify brain tumors. This paper introduces an effective Convolution Extreme Gradient Boosting model based on enhanced Salp Swarm Optimization (CEXGB-ESSO) for detecting brain tumors, and their types. Initially, the MRI image is fed to bilateral filtering for the purpose of noise removal. Then, the de-noised image is fed to the CEXGB model, where Extreme Gradient Boosting (EXGB) is used, replacing a fully connected layer of CNN to detect and classify brain tumors. It consists of numerous stacked convolutional neural networks (CNN) for efficient automatic learning of features, which avoids overfitting and time-consuming processes. Then, the tumor type is predicted using the EXGB in the last layer, where there is no need to bring the weight values from the fully connected layer. Enhanced Salp Swarm Optimization (ESSO) is utilized to find the optimal hyperparameters of EXGB, which enhance convergence speed and accuracy. Our proposed CEXGB-ESSO model gives high performance in terms of accuracy (99), sensitivity (97.52), precision (98.2), and specificity (97.7).Also, the convergence analysis reveals the efficient optimization process of ESSO, obtaining optimal hyperparameter values around iteration 25. Furthermore, the classification results showcase the CEXGB-ESSO model’s capability to accurately detect and classify brain tumors.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"2014 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human Gait Recognition Based on Frontal-View Walking Sequences Using Multi-modal Feature Representations and Learning 利用多模态特征表征和学习，基于正面视图行走序列的人类步态识别

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-04-02 DOI: 10.1007/s11063-024-11554-8

Abstract

Despite that much progress has been reported in gait recognition, most of these existing works adopt lateral-view parameters as gait features, which requires large area of data collection environment and limits the applications of gait recognition in real-world practice. In this paper, we adopt frontal-view walking sequences rather than lateral-view sequences and propose a new gait recognition method based on multi-modal feature representations and learning. Specifically, we characterize walking sequences with two different kinds of frontal-view gait features representations, including holistic silhouette and dense optical flow. Pedestrian regions extraction is achieved by an improved YOLOv7 algorithm called Gait-YOLO algorithm to eliminate the effects of background interference. Multi-modal fusion module (MFM) is proposed to explore the intrinsic connections between silhouette and dense optical flow features by using squeeze and excitation operations at the channel and spatial levels. Gait feature encoder is further used to extract global walking characteristics, enabling efficient multi-modal information fusion. To validate the efficacy of the proposed method, we conduct experiments on CASIA-B and OUMVLP gait databases and compare performance of our proposed method with other existing state-of-the-art gait recognition methods.

摘要尽管在步态识别方面已经取得了很多进展，但现有的这些工作大多采用侧视参数作为步态特征，这需要大面积的数据采集环境，限制了步态识别在现实世界中的应用。在本文中，我们采用正面视角的行走序列而非侧视序列，并提出了一种基于多模态特征表征和学习的新步态识别方法。具体来说，我们使用两种不同的正面视角步态特征表征来描述行走序列，包括整体轮廓和密集光流。行人区域提取是通过一种名为 Gait-YOLO 算法的改进型 YOLOv7 算法来实现的，以消除背景干扰的影响。提出了多模态融合模块（MFM），通过在通道和空间层面使用挤压和激励操作，探索剪影和密集光流特征之间的内在联系。步态特征编码器进一步用于提取全局行走特征，从而实现高效的多模态信息融合。为了验证所提方法的有效性，我们在 CASIA-B 和 OUMVLP 步态数据库上进行了实验，并将所提方法的性能与其他现有的最先进步态识别方法进行了比较。

{"title":"Human Gait Recognition Based on Frontal-View Walking Sequences Using Multi-modal Feature Representations and Learning","authors":"","doi":"10.1007/s11063-024-11554-8","DOIUrl":"https://doi.org/10.1007/s11063-024-11554-8","url":null,"abstract":"<h3>Abstract</h3> <p>Despite that much progress has been reported in gait recognition, most of these existing works adopt lateral-view parameters as gait features, which requires large area of data collection environment and limits the applications of gait recognition in real-world practice. In this paper, we adopt frontal-view walking sequences rather than lateral-view sequences and propose a new gait recognition method based on multi-modal feature representations and learning. Specifically, we characterize walking sequences with two different kinds of frontal-view gait features representations, including holistic silhouette and dense optical flow. Pedestrian regions extraction is achieved by an improved YOLOv7 algorithm called Gait-YOLO algorithm to eliminate the effects of background interference. Multi-modal fusion module (MFM) is proposed to explore the intrinsic connections between silhouette and dense optical flow features by using squeeze and excitation operations at the channel and spatial levels. Gait feature encoder is further used to extract global walking characteristics, enabling efficient multi-modal information fusion. To validate the efficacy of the proposed method, we conduct experiments on CASIA-B and OUMVLP gait databases and compare performance of our proposed method with other existing state-of-the-art gait recognition methods.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"50 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140572358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-Enhanced Attention for Image Captioning 图像字幕的自我增强注意力

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-04-01 DOI: 10.1007/s11063-024-11527-x

Abstract

Image captioning, which involves automatically generating textual descriptions based on the content of images, has garnered increasing attention from researchers. Recently, Transformers have emerged as the preferred choice for the language model in image captioning models. Transformers leverage self-attention mechanisms to address gradient accumulation issues and eliminate the risk of gradient explosion commonly associated with RNN networks. However, a challenge arises when the input features of the self-attention mechanism belong to different categories, as it may result in ineffective highlighting of important features. To address this issue, our paper proposes a novel attention mechanism called Self-Enhanced Attention (SEA), which replaces the self-attention mechanism in the decoder part of the Transformer model. In our proposed SEA, after generating the attention weight matrix, it further adjusts the matrix based on its own distribution to effectively highlight important features. To evaluate the effectiveness of SEA, we conducted experiments on the COCO dataset, comparing the results with different visual models and training strategies. The experimental results demonstrate that when using SEA, the CIDEr score is significantly higher compared to the scores obtained without using SEA. This indicates the successful addressing of the challenge of effectively highlighting important features with our proposed mechanism.

摘要图像标题是指根据图像内容自动生成文字说明，越来越受到研究人员的关注。最近，变换器已成为图像标题模型中语言模型的首选。变换器利用自我注意机制来解决梯度累积问题，并消除了 RNN 网络常见的梯度爆炸风险。然而，当自我注意机制的输入特征属于不同类别时，就会出现挑战，因为这可能导致无法有效地突出重要特征。为了解决这个问题，我们的论文提出了一种名为 "自我增强注意"（SEA）的新型注意机制，它取代了 Transformer 模型解码器部分的自我注意机制。在我们提出的 SEA 中，在生成注意力权重矩阵后，它会根据自身的分布进一步调整矩阵，从而有效地突出重要特征。为了评估 SEA 的有效性，我们在 COCO 数据集上进行了实验，比较了不同视觉模型和训练策略的结果。实验结果表明，使用 SEA 时，CIDEr 得分明显高于未使用 SEA 时的得分。这表明我们提出的机制成功地解决了有效突出重要特征的难题。

{"title":"Self-Enhanced Attention for Image Captioning","authors":"","doi":"10.1007/s11063-024-11527-x","DOIUrl":"https://doi.org/10.1007/s11063-024-11527-x","url":null,"abstract":"<h3>Abstract</h3> <p>Image captioning, which involves automatically generating textual descriptions based on the content of images, has garnered increasing attention from researchers. Recently, Transformers have emerged as the preferred choice for the language model in image captioning models. Transformers leverage self-attention mechanisms to address gradient accumulation issues and eliminate the risk of gradient explosion commonly associated with RNN networks. However, a challenge arises when the input features of the self-attention mechanism belong to different categories, as it may result in ineffective highlighting of important features. To address this issue, our paper proposes a novel attention mechanism called Self-Enhanced Attention (SEA), which replaces the self-attention mechanism in the decoder part of the Transformer model. In our proposed SEA, after generating the attention weight matrix, it further adjusts the matrix based on its own distribution to effectively highlight important features. To evaluate the effectiveness of SEA, we conducted experiments on the COCO dataset, comparing the results with different visual models and training strategies. The experimental results demonstrate that when using SEA, the CIDEr score is significantly higher compared to the scores obtained without using SEA. This indicates the successful addressing of the challenge of effectively highlighting important features with our proposed mechanism.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"34 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Model UNet: An Adversarial Defense Mechanism for Robust Visual Tracking 多模型 UNet：鲁棒视觉跟踪的对抗性防御机制

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-04-01 DOI: 10.1007/s11063-024-11592-2

Wattanapong Suttapak, Jianfu Zhang, Haohuo Zhao, Liqing Zhang

Currently, state-of-the-art object-tracking algorithms are facing a severe threat from adversarial attacks, which can significantly undermine their performance. In this research, we introduce MUNet, a novel defensive model designed for visual tracking. This model is capable of generating defensive images that can effectively counter attacks while maintaining a low computational overhead. To achieve this, we experiment with various configurations of MUNet models, finding that even a minimal three-layer setup significantly improves tracking robustness when the target tracker is under attack. Each model undergoes end-to-end training on randomly paired images, which include both clean and adversarial noise images. This training separately utilizes pixel-wise denoiser and feature-wise defender. Our proposed models significantly enhance tracking performance even when the target tracker is attacked or the target frame is clean. Additionally, MUNet can simultaneously share its parameters on both template and search regions. In experimental results, the proposed models successfully defend against top attackers on six benchmark datasets, including OTB100, LaSOT, UAV123, VOT2018, VOT2019, and GOT-10k. Performance results on all datasets show a significant improvement over all attackers, with a decline of less than 4.6% for every benchmark metric compared to the original tracker. Notably, our model demonstrates the ability to enhance tracking robustness in other blackbox trackers.

目前，最先进的物体跟踪算法正面临着对抗性攻击的严重威胁，这些攻击会大大降低算法的性能。在这项研究中，我们介绍了一种专为视觉跟踪设计的新型防御模型 MUNet。该模型能够生成防御图像，有效抵御攻击，同时保持较低的计算开销。为此，我们尝试了 MUNet 模型的各种配置，发现即使是最小的三层设置，也能在目标跟踪器受到攻击时显著提高跟踪鲁棒性。每个模型都要在随机配对的图像上进行端到端训练，其中包括干净的图像和敌意噪声图像。这种训练分别使用像素去噪器和特征防御器。即使目标跟踪器受到攻击或目标帧是干净的，我们提出的模型也能大大提高跟踪性能。此外，MUNet 可以同时在模板和搜索区域共享参数。在实验结果中，所提出的模型成功抵御了六个基准数据集上的顶级攻击者，包括 OTB100、LaSOT、UAV123、VOT2018、VOT2019 和 GOT-10k。在所有数据集上的性能结果表明，与所有攻击者相比都有显著提高，与原始跟踪器相比，每个基准指标的下降幅度都小于 4.6%。值得注意的是，我们的模型展示了在其他黑盒跟踪器中增强跟踪鲁棒性的能力。

{"title":"Multi-Model UNet: An Adversarial Defense Mechanism for Robust Visual Tracking","authors":"Wattanapong Suttapak, Jianfu Zhang, Haohuo Zhao, Liqing Zhang","doi":"10.1007/s11063-024-11592-2","DOIUrl":"https://doi.org/10.1007/s11063-024-11592-2","url":null,"abstract":"<p>Currently, state-of-the-art object-tracking algorithms are facing a severe threat from adversarial attacks, which can significantly undermine their performance. In this research, we introduce MUNet, a novel defensive model designed for visual tracking. This model is capable of generating defensive images that can effectively counter attacks while maintaining a low computational overhead. To achieve this, we experiment with various configurations of MUNet models, finding that even a minimal three-layer setup significantly improves tracking robustness when the target tracker is under attack. Each model undergoes end-to-end training on randomly paired images, which include both clean and adversarial noise images. This training separately utilizes pixel-wise denoiser and feature-wise defender. Our proposed models significantly enhance tracking performance even when the target tracker is attacked or the target frame is clean. Additionally, MUNet can simultaneously share its parameters on both template and search regions. In experimental results, the proposed models successfully defend against top attackers on six benchmark datasets, including OTB100, LaSOT, UAV123, VOT2018, VOT2019, and GOT-10k. Performance results on all datasets show a significant improvement over all attackers, with a decline of less than 4.6% for every benchmark metric compared to the original tracker. Notably, our model demonstrates the ability to enhance tracking robustness in other blackbox trackers.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"89 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Self-Supervised Attributed Graph Clustering for Social Network Analysis 用于社交网络分析的深度自监督属性图聚类法

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-04-01 DOI: 10.1007/s11063-024-11596-y

Hu Lu, Haotian Hong, Xia Geng

Deep graph clustering is an unsupervised learning task that divides nodes in a graph into disjoint regions with the help of graph auto-encoders. Currently, such methods have several problems, as follows. (1) The deep graph clustering method does not effectively utilize the generated pseudo-labels, resulting in sub-optimal model training results. (2) Each cluster has a different confidence level, which affects the reliability of the pseudo-label. To address these problems, we propose a Deep Self-supervised Attribute Graph Clustering model (DSAGC) to fully leverage the information of the data itself. We divide the proposed model into two parts: an upstream model and a downstream model. In the upstream model, we use the pseudo-label information generated by spectral clustering to form a new high-confidence distribution with which to optimize the model for a higher performance. We also propose a new reliable sample selection mechanism to obtain more reliable samples for downstream tasks. In the downstream model, we only use the reliable samples and the pseudo-label for the semi-supervised classification task without the true label. We compare the proposed method with 17 related methods on four publicly available citation network datasets, and the proposed method generally outperforms most existing methods in three performance metrics. By conducting a large number of ablative experiments, we validate the effectiveness of the proposed method.

深度图形聚类是一种无监督学习任务，它借助图形自动编码器将图形中的节点划分为不相连的区域。目前，这类方法存在以下几个问题。(1) 深度图聚类方法不能有效利用生成的伪标签，导致模型训练结果不理想。(2）每个聚类的置信度不同，影响了伪标签的可靠性。针对这些问题，我们提出了一种深度自监督属性图聚类模型（DSAGC），以充分利用数据本身的信息。我们将提出的模型分为两个部分：上游模型和下游模型。在上游模型中，我们利用光谱聚类产生的伪标签信息形成新的高置信度分布，并以此优化模型以获得更高的性能。我们还提出了一种新的可靠样本选择机制，为下游任务获取更可靠的样本。在下游模型中，我们只使用可靠样本和伪标签进行半监督分类任务，而不使用真实标签。我们在四个公开的引文网络数据集上比较了所提出的方法和 17 种相关方法，结果发现所提出的方法在三个性能指标上普遍优于大多数现有方法。通过进行大量的消减实验，我们验证了所提方法的有效性。

{"title":"Deep Self-Supervised Attributed Graph Clustering for Social Network Analysis","authors":"Hu Lu, Haotian Hong, Xia Geng","doi":"10.1007/s11063-024-11596-y","DOIUrl":"https://doi.org/10.1007/s11063-024-11596-y","url":null,"abstract":"<p>Deep graph clustering is an unsupervised learning task that divides nodes in a graph into disjoint regions with the help of graph auto-encoders. Currently, such methods have several problems, as follows. (1) The deep graph clustering method does not effectively utilize the generated pseudo-labels, resulting in sub-optimal model training results. (2) Each cluster has a different confidence level, which affects the reliability of the pseudo-label. To address these problems, we propose a Deep Self-supervised Attribute Graph Clustering model (DSAGC) to fully leverage the information of the data itself. We divide the proposed model into two parts: an upstream model and a downstream model. In the upstream model, we use the pseudo-label information generated by spectral clustering to form a new high-confidence distribution with which to optimize the model for a higher performance. We also propose a new reliable sample selection mechanism to obtain more reliable samples for downstream tasks. In the downstream model, we only use the reliable samples and the pseudo-label for the semi-supervised classification task without the true label. We compare the proposed method with 17 related methods on four publicly available citation network datasets, and the proposed method generally outperforms most existing methods in three performance metrics. By conducting a large number of ablative experiments, we validate the effectiveness of the proposed method.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"89 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140572262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Power Optimization in Wireless Sensor Network Using VLSI Technique on FPGA Platform 在 FPGA 平台上利用 VLSI 技术优化无线传感器网络的功率

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-03-28 DOI: 10.1007/s11063-024-11495-2

Saranya Leelakrishnan, Arvind Chakrapani

Nowadays, the demand for high-performance wireless sensor networks (WSN) is increasing, and its power requirement has threatened the survival of WSN. The routing methods cannot optimize power consumption. To improve the power consumption, VLSI based power optimization technology is proposed in this article. Different elements in WSN, such as sensor nodes, modulation schemes, and package data transmission, influence energy usage. Following a WSN power study, it was discovered that lowering the energy usage of sensor networks is critical in WSN. In this manuscript, a power optimization model for wireless sensor networks (POM-WSN) is proposed. The proposed system shows how to build and execute a power-saving strategy for WSNs using a customized collaborative unit with parallel processing capabilities on FPGA (Field Programmable Gate Array) and a smart power component. The customizable cooperation unit focuses on applying specialized hardware to customize Operating System speed and transfer it to a soft intel core. This device decreases the OS (Operating System) central processing unit (CPU) overhead associated with installing processor-based IoT (Internet of Things) devices. The smart power unit controls the soft CPU’s clock and physical peripherals, putting them in the right state depending on the hardware requirements of the program (tasks) being executed. Furthermore, by taking the command signal from a collaborative custom unit, it is necessary to adjust the amplitude and current. The efficiency and energy usage of the FPGA-based energy saver approach for sensor nodes are compared to the energy usage of processor-based WSN nodes implementations. Using FPGA programmable architecture, the research seeks to build effective power-saving approaches for WSNs.

如今，对高性能无线传感器网络（WSN）的需求与日俱增，其功耗要求已威胁到 WSN 的生存。路由方法无法优化功耗。为了改善功耗，本文提出了基于 VLSI 的功耗优化技术。WSN 中的不同要素，如传感器节点、调制方案和数据包传输，都会影响能量的使用。在对 WSN 功耗进行研究后发现，降低传感器网络的能耗对 WSN 至关重要。本手稿提出了无线传感器网络的功率优化模型（POM-WSN）。所提出的系统展示了如何利用 FPGA（现场可编程门阵列）上具有并行处理能力的定制合作单元和智能电源组件，为 WSN 建立和执行省电策略。可定制的合作单元侧重于应用专用硬件来定制操作系统的速度，并将其传输到软 intel 内核。该设备降低了与安装基于处理器的物联网（IoT）设备相关的操作系统（OS）中央处理器（CPU）开销。智能电源装置控制软 CPU 的时钟和物理外设，根据正在执行的程序（任务）的硬件要求，将它们置于正确的状态。此外，通过接收协作定制单元的指令信号，有必要调整振幅和电流。基于 FPGA 的传感器节点节能方法的效率和能耗与基于处理器的 WSN 节点实现方法的能耗进行了比较。这项研究利用 FPGA 可编程架构，力求为 WSN 建立有效的省电方法。

{"title":"Power Optimization in Wireless Sensor Network Using VLSI Technique on FPGA Platform","authors":"Saranya Leelakrishnan, Arvind Chakrapani","doi":"10.1007/s11063-024-11495-2","DOIUrl":"https://doi.org/10.1007/s11063-024-11495-2","url":null,"abstract":"<p>Nowadays, the demand for high-performance wireless sensor networks (WSN) is increasing, and its power requirement has threatened the survival of WSN. The routing methods cannot optimize power consumption. To improve the power consumption, VLSI based power optimization technology is proposed in this article. Different elements in WSN, such as sensor nodes, modulation schemes, and package data transmission, influence energy usage. Following a WSN power study, it was discovered that lowering the energy usage of sensor networks is critical in WSN. In this manuscript, a power optimization model for wireless sensor networks (POM-WSN) is proposed. The proposed system shows how to build and execute a power-saving strategy for WSNs using a customized collaborative unit with parallel processing capabilities on FPGA (Field Programmable Gate Array) and a smart power component. The customizable cooperation unit focuses on applying specialized hardware to customize Operating System speed and transfer it to a soft intel core. This device decreases the OS (Operating System) central processing unit (CPU) overhead associated with installing processor-based IoT (Internet of Things) devices. The smart power unit controls the soft CPU’s clock and physical peripherals, putting them in the right state depending on the hardware requirements of the program (tasks) being executed. Furthermore, by taking the command signal from a collaborative custom unit, it is necessary to adjust the amplitude and current. The efficiency and energy usage of the FPGA-based energy saver approach for sensor nodes are compared to the energy usage of processor-based WSN nodes implementations. Using FPGA programmable architecture, the research seeks to build effective power-saving approaches for WSNs.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"42 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140323771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reconstruction-Aware Kernelized Fuzzy Clustering Framework Incorporating Local Information for Image Segmentation 用于图像分割的包含局部信息的重构感知核化模糊聚类框架

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-03-27 DOI: 10.1007/s11063-024-11450-1

Chengmao Wu, Xiao Qi

Kernelized fuzzy C-means clustering with weighted local information is an extensively applied robust segmentation algorithm for noisy image. However, it is difficult to effectively solve the problem of segmenting image polluted by strong noise. To address this issue, a reconstruction-aware kernel fuzzy C-mean clustering with rich local information is proposed in this paper. Firstly, the optimization modeling of guided bilateral filtering is given for noisy image; Secondly, this filtering model is embedded into kernelized fuzzy C-means clustering with local information, and a novel reconstruction-filtering information driven fuzzy clustering model for noise-corrupted image segmentation is presented; Finally, a tri-level alternative and iterative algorithm is derived from optimizing model using optimization theory and its convergence is strictly analyzed. Many Experimental results on noisy synthetic images and actual images indicate that compared with the latest advanced fuzzy clustering-related algorithms, the algorithm presented in this paper has better segmentation performance and stronger robustness to noise, and its PSNR and ACC values increase by about 0.16–3.28 and 0.01–0.08 respectively.

具有加权局部信息的核化模糊 C-means 聚类是一种广泛应用于噪声图像的鲁棒分割算法。然而，它很难有效解决强噪声污染图像的分割问题。针对这一问题，本文提出了一种具有丰富局部信息的重构感知核模糊 C 均值聚类算法。首先，给出了针对噪声图像的引导双边滤波优化模型；其次，将该滤波模型嵌入到具有局部信息的核化模糊 C-means 聚类中，提出了一种新颖的用于噪声污染图像分割的重构-滤波信息驱动模糊聚类模型；最后，利用最优化理论从优化模型中推导出了一种三级替代和迭代算法，并对其收敛性进行了严格分析。在噪声合成图像和实际图像上的大量实验结果表明，与最新的先进模糊聚类相关算法相比，本文提出的算法具有更好的分割性能和更强的噪声鲁棒性，其PSNR和ACC值分别提高了约0.16-3.28和0.01-0.08。

{"title":"Reconstruction-Aware Kernelized Fuzzy Clustering Framework Incorporating Local Information for Image Segmentation","authors":"Chengmao Wu, Xiao Qi","doi":"10.1007/s11063-024-11450-1","DOIUrl":"https://doi.org/10.1007/s11063-024-11450-1","url":null,"abstract":"<p>Kernelized fuzzy C-means clustering with weighted local information is an extensively applied robust segmentation algorithm for noisy image. However, it is difficult to effectively solve the problem of segmenting image polluted by strong noise. To address this issue, a reconstruction-aware kernel fuzzy C-mean clustering with rich local information is proposed in this paper. Firstly, the optimization modeling of guided bilateral filtering is given for noisy image; Secondly, this filtering model is embedded into kernelized fuzzy C-means clustering with local information, and a novel reconstruction-filtering information driven fuzzy clustering model for noise-corrupted image segmentation is presented; Finally, a tri-level alternative and iterative algorithm is derived from optimizing model using optimization theory and its convergence is strictly analyzed. Many Experimental results on noisy synthetic images and actual images indicate that compared with the latest advanced fuzzy clustering-related algorithms, the algorithm presented in this paper has better segmentation performance and stronger robustness to noise, and its PSNR and ACC values increase by about 0.16–3.28 and 0.01–0.08 respectively.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"7 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140316311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multipath Attention and Adaptive Gating Network for Video Action Recognition 用于视频动作识别的多路径注意力和自适应门控网络

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-03-27 DOI: 10.1007/s11063-024-11591-3

Haiping Zhang, Zepeng Hu, Dongjin Yu, Liming Guan, Xu Liu, Conghao Ma

3D CNN networks can model existing large action recognition datasets well in temporal modeling and have made extremely great progress in the field of RGB-based video action recognition. However, the previous 3D CNN models also face many troubles. For video feature extraction convolutional kernels are often designed and fixed in each layer of the network, which may not be suitable for the diversity of data in action recognition tasks. In this paper, a new model called Multipath Attention and Adaptive Gating Network (MAAGN) is proposed. The core idea of MAAGN is to use the spatial difference module (SDM) and the multi-angle temporal attention module (MTAM) in parallel at each layer of the multipath network to obtain spatial and temporal features, respectively, and then dynamically fuses the spatial-temporal features by the adaptive gating module (AGM). SDM explores the action video spatial domain using difference operators based on the attention mechanism, while MTAM tends to explore the action video temporal domain in terms of both global timing and local timing. AGM is built on an adaptive gate unit, the value of which is determined by the input of each layer, and it is unique in each layer, dynamically fusing the spatial and temporal features in the paths of each layer in the multipath network. We construct the temporal network MAAGN, which has a competitive or better performance than state-of-the-art methods in video action recognition, and we provide exhaustive experiments on several large datasets to demonstrate the effectiveness of our approach.

三维 CNN 网络可以对现有的大型动作识别数据集进行良好的时态建模，并在基于 RGB 的视频动作识别领域取得了巨大进步。然而，以往的三维 CNN 模型也面临着许多问题。对于视频特征提取，卷积核通常被设计并固定在网络的每一层，这可能不适合动作识别任务中数据的多样性。本文提出了一种名为 "多路注意和自适应门控网络（MAAGN）"的新模型。MAAGN 的核心思想是在多路径网络的每一层并行使用空间差异模块（SDM）和多角度时间注意力模块（MTAM），分别获取空间和时间特征，然后通过自适应门控模块（AGM）动态融合空间-时间特征。SDM 利用基于注意力机制的差分算子探索动作视频的空间域，而 MTAM 则倾向于从全局定时和局部定时两个方面探索动作视频的时间域。AGM 建立在自适应门单元之上，其值由各层输入决定，在各层中都是唯一的，可动态融合多路径网络中各层路径的空间和时间特征。我们构建的时态网络 MAAGN 在视频动作识别方面的性能可与最先进的方法相媲美，甚至更好，我们还在几个大型数据集上进行了详尽的实验，以证明我们的方法的有效性。

{"title":"Multipath Attention and Adaptive Gating Network for Video Action Recognition","authors":"Haiping Zhang, Zepeng Hu, Dongjin Yu, Liming Guan, Xu Liu, Conghao Ma","doi":"10.1007/s11063-024-11591-3","DOIUrl":"https://doi.org/10.1007/s11063-024-11591-3","url":null,"abstract":"<p>3D CNN networks can model existing large action recognition datasets well in temporal modeling and have made extremely great progress in the field of RGB-based video action recognition. However, the previous 3D CNN models also face many troubles. For video feature extraction convolutional kernels are often designed and fixed in each layer of the network, which may not be suitable for the diversity of data in action recognition tasks. In this paper, a new model called <i>Multipath Attention and Adaptive Gating Network</i> (MAAGN) is proposed. The core idea of MAAGN is to use the <i>spatial difference module</i> (SDM) and the <i>multi-angle temporal attention module</i> (MTAM) in parallel at each layer of the multipath network to obtain spatial and temporal features, respectively, and then dynamically fuses the spatial-temporal features by the <i>adaptive gating module</i> (AGM). SDM explores the action video spatial domain using difference operators based on the attention mechanism, while MTAM tends to explore the action video temporal domain in terms of both global timing and local timing. AGM is built on an adaptive gate unit, the value of which is determined by the input of each layer, and it is unique in each layer, dynamically fusing the spatial and temporal features in the paths of each layer in the multipath network. We construct the temporal network MAAGN, which has a competitive or better performance than state-of-the-art methods in video action recognition, and we provide exhaustive experiments on several large datasets to demonstrate the effectiveness of our approach.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"27 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140316535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0