首页 > 最新文献

Neural Processing Letters最新文献

英文 中文
Hierarchical Patch Aggregation Transformer for Motion Deblurring 用于运动去模糊的分层补丁聚合变换器
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-04 DOI: 10.1007/s11063-024-11594-0
Yujie Wu, Lei Liang, Siyao Ling, Zhisheng Gao

The encoder-decoder framework based on Transformer components has become a paradigm in the field of image deblurring architecture design. In this paper, we critically revisit this approach and find that many current architectures severely focus on limited local regions during the feature extraction stage. These designs compromise the feature richness and diversity of the encoder-decoder framework, leading to bottlenecks in performance improvement. To address these deficiencies, a novel Hierarchical Patch Aggregation Transformer architecture (HPAT) is proposed. In the initial feature extraction stage, HPAT combines Axis-Selective Transformer Blocks with linear complexity and is supplemented by an adaptive hierarchical attention fusion mechanism. These mechanisms enable the model to effectively capture the spatial relationships between features and integrate features from different hierarchical levels. Then, we redesign the feedforward network of the Transformer block in the encoder-decoder structure and propose the Fused Feedforward Network. This effective aggregation enhances the ability to capture and retain local detailed features. We evaluate HPAT through extensive experiments and compare its performance with baseline methods on public datasets. Experimental results show that the proposed HPAT model achieves state-of-the-art performance in image deblurring tasks.

基于变换器组件的编码器-解码器框架已成为图像去模糊架构设计领域的典范。在本文中,我们重新审视了这一方法,发现当前的许多架构在特征提取阶段严重关注有限的局部区域。这些设计损害了编码器-解码器框架的特征丰富性和多样性,导致性能提升遇到瓶颈。为了解决这些缺陷,我们提出了一种新颖的分层补丁聚合转换器架构(HPAT)。在初始特征提取阶段,HPAT 结合了具有线性复杂性的轴选择变换器块,并辅以自适应分层注意力融合机制。这些机制使模型能够有效捕捉特征之间的空间关系,并整合来自不同层次的特征。然后,我们重新设计了编码器-解码器结构中变换器模块的前馈网络,并提出了融合前馈网络。这种有效的聚合增强了捕捉和保留局部细节特征的能力。我们通过大量实验对 HPAT 进行了评估,并将其性能与公共数据集上的基准方法进行了比较。实验结果表明,所提出的 HPAT 模型在图像去模糊任务中实现了最先进的性能。
{"title":"Hierarchical Patch Aggregation Transformer for Motion Deblurring","authors":"Yujie Wu, Lei Liang, Siyao Ling, Zhisheng Gao","doi":"10.1007/s11063-024-11594-0","DOIUrl":"https://doi.org/10.1007/s11063-024-11594-0","url":null,"abstract":"<p>The encoder-decoder framework based on Transformer components has become a paradigm in the field of image deblurring architecture design. In this paper, we critically revisit this approach and find that many current architectures severely focus on limited local regions during the feature extraction stage. These designs compromise the feature richness and diversity of the encoder-decoder framework, leading to bottlenecks in performance improvement. To address these deficiencies, a novel Hierarchical Patch Aggregation Transformer architecture (HPAT) is proposed. In the initial feature extraction stage, HPAT combines Axis-Selective Transformer Blocks with linear complexity and is supplemented by an adaptive hierarchical attention fusion mechanism. These mechanisms enable the model to effectively capture the spatial relationships between features and integrate features from different hierarchical levels. Then, we redesign the feedforward network of the Transformer block in the encoder-decoder structure and propose the Fused Feedforward Network. This effective aggregation enhances the ability to capture and retain local detailed features. We evaluate HPAT through extensive experiments and compare its performance with baseline methods on public datasets. Experimental results show that the proposed HPAT model achieves state-of-the-art performance in image deblurring tasks.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Bayesian CNN Model Compression using Bayes by Backprop and L1-Norm Regularization 利用贝叶斯反推和 L1 正则化实现高效贝叶斯 CNN 模型压缩
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-04 DOI: 10.1007/s11063-024-11593-1
Ali Muhammad Shaikh, Yun-bo Zhao, Aakash Kumar, Munawar Ali, Yu Kang

The swift advancement of convolutional neural networks (CNNs) in numerous real-world utilizations urges an elevation in computational cost along with the size of the model. In this context, many researchers steered their focus to eradicate these specific issues by compressing the original CNN models by pruning weights and filters, respectively. As filter pruning has an upper hand over the weight pruning method because filter pruning methods don’t impact sparse connectivity patterns. In this work, we suggested a Bayesian Convolutional Neural Network (BayesCNN) with Variational Inference, which prefaces probability distribution over weights. For the pruning task of Bayesian CNN, we utilized a combined version of L1-norm with capped L1-norm to help epitomize the amount of information that can be extracted through filter and control regularization. In this formation, we pruned unimportant filters directly without any test accuracy loss and achieved a slimmer model with comparative accuracy. The whole process of pruning is iterative and to validate the performance of our proposed work, we utilized several different CNN architectures on the standard classification dataset available. We have compared our results with non-Bayesian CNN models particularly, datasets such as CIFAR-10 on VGG-16, and pruned 75.8% parameters with float-point-operations (FLOPs) reduction of 51.3% without loss of accuracy and has achieved advancement in state-of-art.

卷积神经网络(CNN)在实际应用中的迅速发展,促使计算成本和模型大小不断增加。在这种情况下,许多研究人员通过剪枝权重和滤波器来压缩原始 CNN 模型,从而解决了这些具体问题。与权重剪枝法相比,滤波器剪枝法更具优势,因为滤波器剪枝法不会影响稀疏连接模式。在这项工作中,我们提出了一种带有变异推理的贝叶斯卷积神经网络(BayesCNN),它将概率分布置于权重之上。对于贝叶斯卷积神经网络的剪枝任务,我们采用了 L1 规范与封顶 L1 规范的组合版本,以帮助通过过滤和控制正则化提取信息量。在这种形成过程中,我们直接剪枝了不重要的滤波器,而不会造成任何测试精度损失,并实现了具有可比精度的更纤细模型。整个剪枝过程是迭代进行的,为了验证我们所提工作的性能,我们在现有的标准分类数据集上使用了几种不同的 CNN 架构。我们将结果与非贝叶斯 CNN 模型(尤其是 VGG-16 上的 CIFAR-10 等数据集)进行了比较,在不损失准确性的情况下,剪枝了 75.8% 的参数,浮点运算 (FLOP) 减少了 51.3%,实现了技术上的进步。
{"title":"Efficient Bayesian CNN Model Compression using Bayes by Backprop and L1-Norm Regularization","authors":"Ali Muhammad Shaikh, Yun-bo Zhao, Aakash Kumar, Munawar Ali, Yu Kang","doi":"10.1007/s11063-024-11593-1","DOIUrl":"https://doi.org/10.1007/s11063-024-11593-1","url":null,"abstract":"<p>The swift advancement of convolutional neural networks (CNNs) in numerous real-world utilizations urges an elevation in computational cost along with the size of the model. In this context, many researchers steered their focus to eradicate these specific issues by compressing the original CNN models by pruning weights and filters, respectively. As filter pruning has an upper hand over the weight pruning method because filter pruning methods don’t impact sparse connectivity patterns. In this work, we suggested a Bayesian Convolutional Neural Network (BayesCNN) with Variational Inference, which prefaces probability distribution over weights. For the pruning task of Bayesian CNN, we utilized a combined version of L1-norm with capped L1-norm to help epitomize the amount of information that can be extracted through filter and control regularization. In this formation, we pruned unimportant filters directly without any test accuracy loss and achieved a slimmer model with comparative accuracy. The whole process of pruning is iterative and to validate the performance of our proposed work, we utilized several different CNN architectures on the standard classification dataset available. We have compared our results with non-Bayesian CNN models particularly, datasets such as CIFAR-10 on VGG-16, and pruned 75.8% parameters with float-point-operations (FLOPs) reduction of 51.3% without loss of accuracy and has achieved advancement in state-of-art.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A New Adaptive Robust Modularized Semi-Supervised Community Detection Method Based on Non-negative Matrix Factorization 基于非负矩阵因式分解的新型自适应鲁棒模块化半监督社群检测方法
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-02 DOI: 10.1007/s11063-024-11588-y

Abstract

The most extensively used tools for categorizing complicated networks are community detection methods. One of the most common methods for unsupervised and semi-supervised clustering is community detection based on Non-negative Matrix Factorization (NMF). Nonetheless, this approach encounters multiple challenges, including the lack of specificity for the data type and the decreased efficiency when errors occur in each cluster’s knowledge priority. As modularity is the basic and thorough criterion for evaluating and validating performance of community detection methods, this paper proposes a new approach for modularity-based community detection which is similar to symmetric NMF. The provided approach is a semi-supervised adaptive robust community detection model referred to as modularized robust semi-supervised adaptive symmetric NMF (MRASNMF). In this model, the modularity criterion has been successfully combined with the NMF model via a novel multi-view clustering method. Also, the tuning parameter is adjusted iteratively via an adaptive method. MRASNMF makes use of knowledge priority, modularity criterion, reinforcement of non-negative matrix factorization, and has iterative solution, as well. In this regard, the MRASNMF model was evaluated and validated using five real-world networks in comparison to existing semi-supervised community detection approaches. According to the findings of this study, the proposed strategy is most effective for all types of networks.

摘要 用于对复杂网络进行分类的最广泛工具是社群检测方法。无监督和半监督聚类最常用的方法之一是基于非负矩阵因式分解(NMF)的群落检测。然而,这种方法遇到了多重挑战,包括数据类型缺乏特异性,以及当每个聚类的知识优先级出现错误时效率降低。由于模块性是评估和验证群落检测方法性能的基本而全面的标准,本文提出了一种与对称 NMF 相似的基于模块性的群落检测新方法。所提供的方法是一种半监督自适应鲁棒社区检测模型,称为模块化鲁棒半监督自适应对称 NMF(MRASNMF)。在该模型中,模块化准则通过一种新颖的多视角聚类方法成功地与 NMF 模型相结合。此外,还通过自适应方法对调整参数进行迭代调整。MRASNMF 利用了知识优先权、模块化准则、非负矩阵因式分解的强化,并具有迭代求解功能。为此,我们使用五个真实世界的网络对 MRASNMF 模型进行了评估和验证,并与现有的半监督群落检测方法进行了比较。研究结果表明,所提出的策略对所有类型的网络都最为有效。
{"title":"A New Adaptive Robust Modularized Semi-Supervised Community Detection Method Based on Non-negative Matrix Factorization","authors":"","doi":"10.1007/s11063-024-11588-y","DOIUrl":"https://doi.org/10.1007/s11063-024-11588-y","url":null,"abstract":"<h3>Abstract</h3> <p>The most extensively used tools for categorizing complicated networks are community detection methods. One of the most common methods for unsupervised and semi-supervised clustering is community detection based on Non-negative Matrix Factorization (NMF). Nonetheless, this approach encounters multiple challenges, including the lack of specificity for the data type and the decreased efficiency when errors occur in each cluster’s knowledge priority. As modularity is the basic and thorough criterion for evaluating and validating performance of community detection methods, this paper proposes a new approach for modularity-based community detection which is similar to symmetric NMF. The provided approach is a semi-supervised adaptive robust community detection model referred to as modularized robust semi-supervised adaptive symmetric NMF (MRASNMF). In this model, the modularity criterion has been successfully combined with the NMF model via a novel multi-view clustering method. Also, the tuning parameter is adjusted iteratively via an adaptive method. MRASNMF makes use of knowledge priority, modularity criterion, reinforcement of non-negative matrix factorization, and has iterative solution, as well. In this regard, the MRASNMF model was evaluated and validated using five real-world networks in comparison to existing semi-supervised community detection approaches. According to the findings of this study, the proposed strategy is most effective for all types of networks.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection and Classification of Brain Tumor Using Convolution Extreme Gradient Boosting Model and an Enhanced Salp Swarm Optimization 利用卷积极端梯度提升模型和增强型萨尔普群优化技术检测脑肿瘤并对其进行分类
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-02 DOI: 10.1007/s11063-024-11590-4
J. Jebastine

Some types of tumors in people with brain cancer grow so rapidly that their average size doubles in twenty-five days. Precisely determining the type of tumor enables physicians to conduct clinical planning and estimate dosage. However, accurate classification remains a challenging task due to the variable shape, size, and location of the tumors.The major objective of this paper is to detect and classify brain tumors. This paper introduces an effective Convolution Extreme Gradient Boosting model based on enhanced Salp Swarm Optimization (CEXGB-ESSO) for detecting brain tumors, and their types. Initially, the MRI image is fed to bilateral filtering for the purpose of noise removal. Then, the de-noised image is fed to the CEXGB model, where Extreme Gradient Boosting (EXGB) is used, replacing a fully connected layer of CNN to detect and classify brain tumors. It consists of numerous stacked convolutional neural networks (CNN) for efficient automatic learning of features, which avoids overfitting and time-consuming processes. Then, the tumor type is predicted using the EXGB in the last layer, where there is no need to bring the weight values from the fully connected layer. Enhanced Salp Swarm Optimization (ESSO) is utilized to find the optimal hyperparameters of EXGB, which enhance convergence speed and accuracy. Our proposed CEXGB-ESSO model gives high performance in terms of accuracy (99), sensitivity (97.52), precision (98.2), and specificity (97.7).Also, the convergence analysis reveals the efficient optimization process of ESSO, obtaining optimal hyperparameter values around iteration 25. Furthermore, the classification results showcase the CEXGB-ESSO model’s capability to accurately detect and classify brain tumors.

脑癌患者体内某些类型的肿瘤生长速度非常快,平均在二十五天内就会增大一倍。准确确定肿瘤类型有助于医生制定临床计划和估算用药剂量。然而,由于肿瘤的形状、大小和位置多变,准确分类仍是一项具有挑战性的任务。本文介绍了一种基于增强型萨尔普群优化(CEXGB-ESSO)的有效卷积极梯度提升模型,用于检测脑肿瘤及其类型。首先,磁共振成像图像被送入双边滤波以去除噪声。然后,将去噪后的图像送入 CEXGB 模型,在该模型中使用了极端梯度提升(EXGB)技术,取代 CNN 的全连接层来检测和分类脑肿瘤。它由大量堆叠的卷积神经网络(CNN)组成,可高效地自动学习特征,避免过度拟合和耗时的过程。然后,利用最后一层中的 EXGB 预测肿瘤类型,在这一层中无需引入全连接层的权重值。增强型萨尔普群优化(ESSO)可用于寻找 EXGB 的最优超参数,从而提高收敛速度和准确性。我们提出的 CEXGB-ESSO 模型在准确度(99)、灵敏度(97.52)、精确度(98.2)和特异度(97.7)方面都有很高的表现。此外,分类结果还展示了 CEXGB-ESSO 模型准确检测和分类脑肿瘤的能力。
{"title":"Detection and Classification of Brain Tumor Using Convolution Extreme Gradient Boosting Model and an Enhanced Salp Swarm Optimization","authors":"J. Jebastine","doi":"10.1007/s11063-024-11590-4","DOIUrl":"https://doi.org/10.1007/s11063-024-11590-4","url":null,"abstract":"<p>Some types of tumors in people with brain cancer grow so rapidly that their average size doubles in twenty-five days. Precisely determining the type of tumor enables physicians to conduct clinical planning and estimate dosage. However, accurate classification remains a challenging task due to the variable shape, size, and location of the tumors.The major objective of this paper is to detect and classify brain tumors. This paper introduces an effective Convolution Extreme Gradient Boosting model based on enhanced Salp Swarm Optimization (CEXGB-ESSO) for detecting brain tumors, and their types. Initially, the MRI image is fed to bilateral filtering for the purpose of noise removal. Then, the de-noised image is fed to the CEXGB model, where Extreme Gradient Boosting (EXGB) is used, replacing a fully connected layer of CNN to detect and classify brain tumors. It consists of numerous stacked convolutional neural networks (CNN) for efficient automatic learning of features, which avoids overfitting and time-consuming processes. Then, the tumor type is predicted using the EXGB in the last layer, where there is no need to bring the weight values from the fully connected layer. Enhanced Salp Swarm Optimization (ESSO) is utilized to find the optimal hyperparameters of EXGB, which enhance convergence speed and accuracy. Our proposed CEXGB-ESSO model gives high performance in terms of accuracy (99), sensitivity (97.52), precision (98.2), and specificity (97.7).Also, the convergence analysis reveals the efficient optimization process of ESSO, obtaining optimal hyperparameter values around iteration 25. Furthermore, the classification results showcase the CEXGB-ESSO model’s capability to accurately detect and classify brain tumors.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human Gait Recognition Based on Frontal-View Walking Sequences Using Multi-modal Feature Representations and Learning 利用多模态特征表征和学习,基于正面视图行走序列的人类步态识别
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-02 DOI: 10.1007/s11063-024-11554-8

Abstract

Despite that much progress has been reported in gait recognition, most of these existing works adopt lateral-view parameters as gait features, which requires large area of data collection environment and limits the applications of gait recognition in real-world practice. In this paper, we adopt frontal-view walking sequences rather than lateral-view sequences and propose a new gait recognition method based on multi-modal feature representations and learning. Specifically, we characterize walking sequences with two different kinds of frontal-view gait features representations, including holistic silhouette and dense optical flow. Pedestrian regions extraction is achieved by an improved YOLOv7 algorithm called Gait-YOLO algorithm to eliminate the effects of background interference. Multi-modal fusion module (MFM) is proposed to explore the intrinsic connections between silhouette and dense optical flow features by using squeeze and excitation operations at the channel and spatial levels. Gait feature encoder is further used to extract global walking characteristics, enabling efficient multi-modal information fusion. To validate the efficacy of the proposed method, we conduct experiments on CASIA-B and OUMVLP gait databases and compare performance of our proposed method with other existing state-of-the-art gait recognition methods.

摘要 尽管在步态识别方面已经取得了很多进展,但现有的这些工作大多采用侧视参数作为步态特征,这需要大面积的数据采集环境,限制了步态识别在现实世界中的应用。在本文中,我们采用正面视角的行走序列而非侧视序列,并提出了一种基于多模态特征表征和学习的新步态识别方法。具体来说,我们使用两种不同的正面视角步态特征表征来描述行走序列,包括整体轮廓和密集光流。行人区域提取是通过一种名为 Gait-YOLO 算法的改进型 YOLOv7 算法来实现的,以消除背景干扰的影响。提出了多模态融合模块(MFM),通过在通道和空间层面使用挤压和激励操作,探索剪影和密集光流特征之间的内在联系。步态特征编码器进一步用于提取全局行走特征,从而实现高效的多模态信息融合。为了验证所提方法的有效性,我们在 CASIA-B 和 OUMVLP 步态数据库上进行了实验,并将所提方法的性能与其他现有的最先进步态识别方法进行了比较。
{"title":"Human Gait Recognition Based on Frontal-View Walking Sequences Using Multi-modal Feature Representations and Learning","authors":"","doi":"10.1007/s11063-024-11554-8","DOIUrl":"https://doi.org/10.1007/s11063-024-11554-8","url":null,"abstract":"<h3>Abstract</h3> <p>Despite that much progress has been reported in gait recognition, most of these existing works adopt lateral-view parameters as gait features, which requires large area of data collection environment and limits the applications of gait recognition in real-world practice. In this paper, we adopt frontal-view walking sequences rather than lateral-view sequences and propose a new gait recognition method based on multi-modal feature representations and learning. Specifically, we characterize walking sequences with two different kinds of frontal-view gait features representations, including holistic silhouette and dense optical flow. Pedestrian regions extraction is achieved by an improved YOLOv7 algorithm called Gait-YOLO algorithm to eliminate the effects of background interference. Multi-modal fusion module (MFM) is proposed to explore the intrinsic connections between silhouette and dense optical flow features by using squeeze and excitation operations at the channel and spatial levels. Gait feature encoder is further used to extract global walking characteristics, enabling efficient multi-modal information fusion. To validate the efficacy of the proposed method, we conduct experiments on CASIA-B and OUMVLP gait databases and compare performance of our proposed method with other existing state-of-the-art gait recognition methods.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140572358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Enhanced Attention for Image Captioning 图像字幕的自我增强注意力
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-01 DOI: 10.1007/s11063-024-11527-x

Abstract

Image captioning, which involves automatically generating textual descriptions based on the content of images, has garnered increasing attention from researchers. Recently, Transformers have emerged as the preferred choice for the language model in image captioning models. Transformers leverage self-attention mechanisms to address gradient accumulation issues and eliminate the risk of gradient explosion commonly associated with RNN networks. However, a challenge arises when the input features of the self-attention mechanism belong to different categories, as it may result in ineffective highlighting of important features. To address this issue, our paper proposes a novel attention mechanism called Self-Enhanced Attention (SEA), which replaces the self-attention mechanism in the decoder part of the Transformer model. In our proposed SEA, after generating the attention weight matrix, it further adjusts the matrix based on its own distribution to effectively highlight important features. To evaluate the effectiveness of SEA, we conducted experiments on the COCO dataset, comparing the results with different visual models and training strategies. The experimental results demonstrate that when using SEA, the CIDEr score is significantly higher compared to the scores obtained without using SEA. This indicates the successful addressing of the challenge of effectively highlighting important features with our proposed mechanism.

摘要 图像标题是指根据图像内容自动生成文字说明,越来越受到研究人员的关注。最近,变换器已成为图像标题模型中语言模型的首选。变换器利用自我注意机制来解决梯度累积问题,并消除了 RNN 网络常见的梯度爆炸风险。然而,当自我注意机制的输入特征属于不同类别时,就会出现挑战,因为这可能导致无法有效地突出重要特征。为了解决这个问题,我们的论文提出了一种名为 "自我增强注意"(SEA)的新型注意机制,它取代了 Transformer 模型解码器部分的自我注意机制。在我们提出的 SEA 中,在生成注意力权重矩阵后,它会根据自身的分布进一步调整矩阵,从而有效地突出重要特征。为了评估 SEA 的有效性,我们在 COCO 数据集上进行了实验,比较了不同视觉模型和训练策略的结果。实验结果表明,使用 SEA 时,CIDEr 得分明显高于未使用 SEA 时的得分。这表明我们提出的机制成功地解决了有效突出重要特征的难题。
{"title":"Self-Enhanced Attention for Image Captioning","authors":"","doi":"10.1007/s11063-024-11527-x","DOIUrl":"https://doi.org/10.1007/s11063-024-11527-x","url":null,"abstract":"<h3>Abstract</h3> <p>Image captioning, which involves automatically generating textual descriptions based on the content of images, has garnered increasing attention from researchers. Recently, Transformers have emerged as the preferred choice for the language model in image captioning models. Transformers leverage self-attention mechanisms to address gradient accumulation issues and eliminate the risk of gradient explosion commonly associated with RNN networks. However, a challenge arises when the input features of the self-attention mechanism belong to different categories, as it may result in ineffective highlighting of important features. To address this issue, our paper proposes a novel attention mechanism called Self-Enhanced Attention (SEA), which replaces the self-attention mechanism in the decoder part of the Transformer model. In our proposed SEA, after generating the attention weight matrix, it further adjusts the matrix based on its own distribution to effectively highlight important features. To evaluate the effectiveness of SEA, we conducted experiments on the COCO dataset, comparing the results with different visual models and training strategies. The experimental results demonstrate that when using SEA, the CIDEr score is significantly higher compared to the scores obtained without using SEA. This indicates the successful addressing of the challenge of effectively highlighting important features with our proposed mechanism.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Model UNet: An Adversarial Defense Mechanism for Robust Visual Tracking 多模型 UNet:鲁棒视觉跟踪的对抗性防御机制
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-01 DOI: 10.1007/s11063-024-11592-2
Wattanapong Suttapak, Jianfu Zhang, Haohuo Zhao, Liqing Zhang

Currently, state-of-the-art object-tracking algorithms are facing a severe threat from adversarial attacks, which can significantly undermine their performance. In this research, we introduce MUNet, a novel defensive model designed for visual tracking. This model is capable of generating defensive images that can effectively counter attacks while maintaining a low computational overhead. To achieve this, we experiment with various configurations of MUNet models, finding that even a minimal three-layer setup significantly improves tracking robustness when the target tracker is under attack. Each model undergoes end-to-end training on randomly paired images, which include both clean and adversarial noise images. This training separately utilizes pixel-wise denoiser and feature-wise defender. Our proposed models significantly enhance tracking performance even when the target tracker is attacked or the target frame is clean. Additionally, MUNet can simultaneously share its parameters on both template and search regions. In experimental results, the proposed models successfully defend against top attackers on six benchmark datasets, including OTB100, LaSOT, UAV123, VOT2018, VOT2019, and GOT-10k. Performance results on all datasets show a significant improvement over all attackers, with a decline of less than 4.6% for every benchmark metric compared to the original tracker. Notably, our model demonstrates the ability to enhance tracking robustness in other blackbox trackers.

目前,最先进的物体跟踪算法正面临着对抗性攻击的严重威胁,这些攻击会大大降低算法的性能。在这项研究中,我们介绍了一种专为视觉跟踪设计的新型防御模型 MUNet。该模型能够生成防御图像,有效抵御攻击,同时保持较低的计算开销。为此,我们尝试了 MUNet 模型的各种配置,发现即使是最小的三层设置,也能在目标跟踪器受到攻击时显著提高跟踪鲁棒性。每个模型都要在随机配对的图像上进行端到端训练,其中包括干净的图像和敌意噪声图像。这种训练分别使用像素去噪器和特征防御器。即使目标跟踪器受到攻击或目标帧是干净的,我们提出的模型也能大大提高跟踪性能。此外,MUNet 可以同时在模板和搜索区域共享参数。在实验结果中,所提出的模型成功抵御了六个基准数据集上的顶级攻击者,包括 OTB100、LaSOT、UAV123、VOT2018、VOT2019 和 GOT-10k。在所有数据集上的性能结果表明,与所有攻击者相比都有显著提高,与原始跟踪器相比,每个基准指标的下降幅度都小于 4.6%。值得注意的是,我们的模型展示了在其他黑盒跟踪器中增强跟踪鲁棒性的能力。
{"title":"Multi-Model UNet: An Adversarial Defense Mechanism for Robust Visual Tracking","authors":"Wattanapong Suttapak, Jianfu Zhang, Haohuo Zhao, Liqing Zhang","doi":"10.1007/s11063-024-11592-2","DOIUrl":"https://doi.org/10.1007/s11063-024-11592-2","url":null,"abstract":"<p>Currently, state-of-the-art object-tracking algorithms are facing a severe threat from adversarial attacks, which can significantly undermine their performance. In this research, we introduce MUNet, a novel defensive model designed for visual tracking. This model is capable of generating defensive images that can effectively counter attacks while maintaining a low computational overhead. To achieve this, we experiment with various configurations of MUNet models, finding that even a minimal three-layer setup significantly improves tracking robustness when the target tracker is under attack. Each model undergoes end-to-end training on randomly paired images, which include both clean and adversarial noise images. This training separately utilizes pixel-wise denoiser and feature-wise defender. Our proposed models significantly enhance tracking performance even when the target tracker is attacked or the target frame is clean. Additionally, MUNet can simultaneously share its parameters on both template and search regions. In experimental results, the proposed models successfully defend against top attackers on six benchmark datasets, including OTB100, LaSOT, UAV123, VOT2018, VOT2019, and GOT-10k. Performance results on all datasets show a significant improvement over all attackers, with a decline of less than 4.6% for every benchmark metric compared to the original tracker. Notably, our model demonstrates the ability to enhance tracking robustness in other blackbox trackers.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140571994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Self-Supervised Attributed Graph Clustering for Social Network Analysis 用于社交网络分析的深度自监督属性图聚类法
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-04-01 DOI: 10.1007/s11063-024-11596-y
Hu Lu, Haotian Hong, Xia Geng

Deep graph clustering is an unsupervised learning task that divides nodes in a graph into disjoint regions with the help of graph auto-encoders. Currently, such methods have several problems, as follows. (1) The deep graph clustering method does not effectively utilize the generated pseudo-labels, resulting in sub-optimal model training results. (2) Each cluster has a different confidence level, which affects the reliability of the pseudo-label. To address these problems, we propose a Deep Self-supervised Attribute Graph Clustering model (DSAGC) to fully leverage the information of the data itself. We divide the proposed model into two parts: an upstream model and a downstream model. In the upstream model, we use the pseudo-label information generated by spectral clustering to form a new high-confidence distribution with which to optimize the model for a higher performance. We also propose a new reliable sample selection mechanism to obtain more reliable samples for downstream tasks. In the downstream model, we only use the reliable samples and the pseudo-label for the semi-supervised classification task without the true label. We compare the proposed method with 17 related methods on four publicly available citation network datasets, and the proposed method generally outperforms most existing methods in three performance metrics. By conducting a large number of ablative experiments, we validate the effectiveness of the proposed method.

深度图形聚类是一种无监督学习任务,它借助图形自动编码器将图形中的节点划分为不相连的区域。目前,这类方法存在以下几个问题。(1) 深度图聚类方法不能有效利用生成的伪标签,导致模型训练结果不理想。(2)每个聚类的置信度不同,影响了伪标签的可靠性。针对这些问题,我们提出了一种深度自监督属性图聚类模型(DSAGC),以充分利用数据本身的信息。我们将提出的模型分为两个部分:上游模型和下游模型。在上游模型中,我们利用光谱聚类产生的伪标签信息形成新的高置信度分布,并以此优化模型以获得更高的性能。我们还提出了一种新的可靠样本选择机制,为下游任务获取更可靠的样本。在下游模型中,我们只使用可靠样本和伪标签进行半监督分类任务,而不使用真实标签。我们在四个公开的引文网络数据集上比较了所提出的方法和 17 种相关方法,结果发现所提出的方法在三个性能指标上普遍优于大多数现有方法。通过进行大量的消减实验,我们验证了所提方法的有效性。
{"title":"Deep Self-Supervised Attributed Graph Clustering for Social Network Analysis","authors":"Hu Lu, Haotian Hong, Xia Geng","doi":"10.1007/s11063-024-11596-y","DOIUrl":"https://doi.org/10.1007/s11063-024-11596-y","url":null,"abstract":"<p>Deep graph clustering is an unsupervised learning task that divides nodes in a graph into disjoint regions with the help of graph auto-encoders. Currently, such methods have several problems, as follows. (1) The deep graph clustering method does not effectively utilize the generated pseudo-labels, resulting in sub-optimal model training results. (2) Each cluster has a different confidence level, which affects the reliability of the pseudo-label. To address these problems, we propose a Deep Self-supervised Attribute Graph Clustering model (DSAGC) to fully leverage the information of the data itself. We divide the proposed model into two parts: an upstream model and a downstream model. In the upstream model, we use the pseudo-label information generated by spectral clustering to form a new high-confidence distribution with which to optimize the model for a higher performance. We also propose a new reliable sample selection mechanism to obtain more reliable samples for downstream tasks. In the downstream model, we only use the reliable samples and the pseudo-label for the semi-supervised classification task without the true label. We compare the proposed method with 17 related methods on four publicly available citation network datasets, and the proposed method generally outperforms most existing methods in three performance metrics. By conducting a large number of ablative experiments, we validate the effectiveness of the proposed method.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140572262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Global Asymptotic Stability of Anti-Periodic Solutions of Time-Delayed Fractional Bam Neural Networks 时延分式巴姆神经网络反周期解的全局渐近稳定性
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-30 DOI: 10.1007/s11063-024-11561-9
Münevver Tuz
{"title":"Global Asymptotic Stability of Anti-Periodic Solutions of Time-Delayed Fractional Bam Neural Networks","authors":"Münevver Tuz","doi":"10.1007/s11063-024-11561-9","DOIUrl":"https://doi.org/10.1007/s11063-024-11561-9","url":null,"abstract":"","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140364101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Central Attention with Multi-Graphs for Image Annotation 利用多图谱的中心注意力进行图像注释
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-30 DOI: 10.1007/s11063-024-11525-z
Baodi Liu, Yan Liu, Qianqian Shao, Weifeng Liu
{"title":"Central Attention with Multi-Graphs for Image Annotation","authors":"Baodi Liu, Yan Liu, Qianqian Shao, Weifeng Liu","doi":"10.1007/s11063-024-11525-z","DOIUrl":"https://doi.org/10.1007/s11063-024-11525-z","url":null,"abstract":"","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140362721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neural Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1