Neurocomputing最新文献_第8页

Finite-time-convergent support vector neural dynamics for classification 有限时间收敛的支持向量神经动力学分类

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2024-11-19 DOI: 10.1016/j.neucom.2024.128810

Mei Liu , Qihai Jiang , Hui Li , Xinwei Cao , Xin Lv

Support vector machine (SVM) is a popular binary classification algorithm widely utilized in various fields due to its accuracy and versatility. However, most of the existing research involving SVMs stays at the application level, and there is few research on optimizing the support vector solving process. Therefore, it is an alternative way to optimize the support vector solving process for improving the classification performance via constructing new solving methods. Recent research has demonstrated that neural dynamics exhibit robust solving performance and high accuracy. Motivated by this inspiration, this paper leverages neural dynamics to improve the accuracy and robustness of SVM solutions. Specifically, this paper models the solving process of SVM as a standard quadratic programming (QP) problem. Then, a support vector neural dynamics (SVND) model is specifically developed to provide the optimal solution to the aforementioned QP problem, with theoretical analysis confirming its ability to achieve global convergence. Datasets of varying sizes from various sources are employed to validate the effectiveness of the designed SVND model. Experimental results show that the designed SVND model demonstrates superior classification accuracy and robustness compared to other classical machine learning algorithms. The source code is available at https://github.com/LongJin-lab/NC_SVND.

支持向量机（SVM）是一种流行的二值分类算法，以其准确性和通用性被广泛应用于各个领域。然而，现有涉及支持向量机的研究大多停留在应用层面上，对支持向量求解过程的优化研究较少。因此，通过构建新的求解方法来优化支持向量求解过程以提高分类性能是另一种方法。近年来的研究表明，神经动力学具有鲁棒的求解性能和较高的精度。受此启发，本文利用神经动力学来提高支持向量机解的准确性和鲁棒性。具体来说，本文将支持向量机的求解过程建模为一个标准的二次规划（QP）问题。然后，专门建立了支持向量神经动力学（SVND）模型来提供上述QP问题的最优解，并通过理论分析证实了其实现全局收敛的能力。利用来自不同来源的不同大小的数据集来验证所设计的SVND模型的有效性。实验结果表明，与其他经典机器学习算法相比，所设计的SVND模型具有更好的分类精度和鲁棒性。源代码可从https://github.com/LongJin-lab/NC_SVND获得。

{"title":"Finite-time-convergent support vector neural dynamics for classification","authors":"Mei Liu , Qihai Jiang , Hui Li , Xinwei Cao , Xin Lv","doi":"10.1016/j.neucom.2024.128810","DOIUrl":"10.1016/j.neucom.2024.128810","url":null,"abstract":"<div><div>Support vector machine (SVM) is a popular binary classification algorithm widely utilized in various fields due to its accuracy and versatility. However, most of the existing research involving SVMs stays at the application level, and there is few research on optimizing the support vector solving process. Therefore, it is an alternative way to optimize the support vector solving process for improving the classification performance via constructing new solving methods. Recent research has demonstrated that neural dynamics exhibit robust solving performance and high accuracy. Motivated by this inspiration, this paper leverages neural dynamics to improve the accuracy and robustness of SVM solutions. Specifically, this paper models the solving process of SVM as a standard quadratic programming (QP) problem. Then, a support vector neural dynamics (SVND) model is specifically developed to provide the optimal solution to the aforementioned QP problem, with theoretical analysis confirming its ability to achieve global convergence. Datasets of varying sizes from various sources are employed to validate the effectiveness of the designed SVND model. Experimental results show that the designed SVND model demonstrates superior classification accuracy and robustness compared to other classical machine learning algorithms. The source code is available at <span><span>https://github.com/LongJin-lab/NC_SVND</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"617 ","pages":"Article 128810"},"PeriodicalIF":5.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142745440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Temporal convolution derived multi-layered reservoir computing 时间卷积衍生的多层油藏计算

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2024-11-19 DOI: 10.1016/j.neucom.2024.128938

Johannes Viehweg , Dominik Walther , Patrick Mäder

The prediction of time series is a challenging task relevant in such diverse applications as analyzing financial data, forecasting flow dynamics or understanding biological processes. Especially chaotic time series that depend on a long history pose an exceptionally difficult problem. While machine learning has shown to be a promising approach for predicting such time series, it either demands long training time and much training data when using deep Recurrent Neural Networks. Alternative, when using a Reservoir Computing approach it comes with high uncertainty and typically a high number of random initializations and extensive hyper-parameter tuning. In this paper, we focus on the Reservoir Computing approach and propose a new mapping of input data into the reservoir’s state space. Furthermore, we incorporate this method in two novel network architectures increasing parallelizability, depth and predictive capabilities of the neural network while reducing the dependence on randomness. For the evaluation, we approximate a set of time series from the Mackey–Glass equation, inhabiting non-chaotic as well as chaotic behavior as well as the SantaFe Laser dataset and compare our approaches in regard to their predictive capabilities to Echo State Networks, Autoencoder connected Echo State Networks and Gated Recurrent Units. For the chaotic time series, we observe an error reduction of up to 85.45% compared to Echo State Networks and 90.72% compared to Gated Recurrent Units. Furthermore, we also observe tremendous improvements for non-chaotic time series of up to 99.99% in contrast to the existing approaches.

时间序列的预测是一项具有挑战性的任务，与分析金融数据、预测流动动力学或理解生物过程等各种应用相关。特别是依赖于长历史的混沌时间序列，这是一个异常困难的问题。虽然机器学习已经被证明是预测这种时间序列的一种很有前途的方法，但当使用深度递归神经网络时，它要么需要很长的训练时间，要么需要大量的训练数据。另外，当使用油藏计算方法时，它具有很高的不确定性，通常具有大量的随机初始化和大量的超参数调优。在本文中，我们关注水库计算方法，并提出了一种新的将输入数据映射到水库状态空间的方法。此外，我们将这种方法结合到两种新的网络架构中，提高了神经网络的并行性、深度和预测能力，同时减少了对随机性的依赖。为了进行评估，我们从Mackey-Glass方程中近似出一组时间序列，其中包含非混沌和混沌行为以及SantaFe激光数据集，并将我们的方法与回声状态网络、连接回声状态网络的自编码器和门通循环单元的预测能力进行比较。对于混沌时间序列，我们观察到与回声状态网络相比误差降低高达85.45%，与门控循环单元相比误差降低高达90.72%。此外，我们还观察到与现有方法相比，非混沌时间序列的准确率高达99.99%。

{"title":"Temporal convolution derived multi-layered reservoir computing","authors":"Johannes Viehweg , Dominik Walther , Patrick Mäder","doi":"10.1016/j.neucom.2024.128938","DOIUrl":"10.1016/j.neucom.2024.128938","url":null,"abstract":"<div><div>The prediction of time series is a challenging task relevant in such diverse applications as analyzing financial data, forecasting flow dynamics or understanding biological processes. Especially chaotic time series that depend on a long history pose an exceptionally difficult problem. While machine learning has shown to be a promising approach for predicting such time series, it either demands long training time and much training data when using deep Recurrent Neural Networks. Alternative, when using a Reservoir Computing approach it comes with high uncertainty and typically a high number of random initializations and extensive hyper-parameter tuning. In this paper, we focus on the Reservoir Computing approach and propose a new mapping of input data into the reservoir’s state space. Furthermore, we incorporate this method in two novel network architectures increasing parallelizability, depth and predictive capabilities of the neural network while reducing the dependence on randomness. For the evaluation, we approximate a set of time series from the Mackey–Glass equation, inhabiting non-chaotic as well as chaotic behavior as well as the SantaFe Laser dataset and compare our approaches in regard to their predictive capabilities to Echo State Networks, Autoencoder connected Echo State Networks and Gated Recurrent Units. For the chaotic time series, we observe an error reduction of up to 85.45% compared to Echo State Networks and 90.72% compared to Gated Recurrent Units. Furthermore, we also observe tremendous improvements for non-chaotic time series of up to 99.99% in contrast to the existing approaches.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"617 ","pages":"Article 128938"},"PeriodicalIF":5.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142745443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VIWHard: Text adversarial attacks based on important-word discriminator in the hard-label black-box setting 硬标签黑盒环境下基于重要词鉴别器的文本对抗性攻击

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2024-11-19 DOI: 10.1016/j.neucom.2024.128917

Hua Zhang , Jiahui Wang , Haoran Gao , Xin Zhang , Huewei Wang , Wenmin Li

In the hard-label black-box setting, the adversary only obtains the decision of the target model, which is more practical. Both the perturbed words and the sets of substitute words affect the performance of adversarial attack. We propose a hard-label black-box adversarial attack framework called VIWHard, which takes important words as perturbed words. In order to verify the words which highly impact on the classification of the target model, we design an important-word discriminator consisting of a binary classifier and a masked language model as an important component of VIWHard. Meanwhile, we use a masked language model to construct the context-preserving sets of substitute words for important words, which further improves the naturalness of the adversarial texts. We conduct experiments by attacking WordCNN, WordLSTM and BERT on seven datasets, which contain text classification, toxic information, and sensitive information datasets. Experimental results show that our method achieves powerful attacking performance and generates natural adversarial texts. The average attack success rate on the seven datasets reaches 98.556%, and the average naturalness of the adversarial texts reaches 7.894. Specially, on the four security datasets Jigsaw2018, HSOL, EDENCE, and FAS, our average attack success rate reaches 97.663%, and the average naturalness of the adversarial texts reaches 8.626. In addition, we evaluate the attack performance of VIWHard on large language models (LLMs), the generated adversarial examples are effective for LLMs.

在硬标签黑箱设置下，对手只获得目标模型的决策，更实用。干扰词和替代词集都会影响对抗性攻击的性能。我们提出了一个硬标签黑盒对抗攻击框架，称为viward，它将重要词作为扰动词。为了验证对目标模型分类影响较大的词，我们设计了一个由二值分类器和掩码语言模型组成的重要词鉴别器，作为viward的重要组成部分。同时，我们利用掩码语言模型构建了重要词替代词的上下文保持集，进一步提高了对抗性文本的自然度。我们在包含文本分类、有毒信息和敏感信息数据集的7个数据集上攻击WordCNN、WordLSTM和BERT进行实验。实验结果表明，该方法达到了强大的攻击性能，并生成了自然的对抗性文本。7个数据集的平均攻击成功率达到98.556%，对抗性文本的平均自然度达到7.894。特别是在Jigsaw2018、HSOL、EDENCE和FAS四个安全数据集上，我们的平均攻击成功率达到97.663%，对抗性文本的平均自然度达到8.626。此外，我们评估了viward在大型语言模型（llm）上的攻击性能，生成的对抗示例对llm是有效的。

{"title":"VIWHard: Text adversarial attacks based on important-word discriminator in the hard-label black-box setting","authors":"Hua Zhang , Jiahui Wang , Haoran Gao , Xin Zhang , Huewei Wang , Wenmin Li","doi":"10.1016/j.neucom.2024.128917","DOIUrl":"10.1016/j.neucom.2024.128917","url":null,"abstract":"<div><div>In the hard-label black-box setting, the adversary only obtains the decision of the target model, which is more practical. Both the perturbed words and the sets of substitute words affect the performance of adversarial attack. We propose a hard-label black-box adversarial attack framework called VIWHard, which takes important words as perturbed words. In order to verify the words which highly impact on the classification of the target model, we design an important-word discriminator consisting of a binary classifier and a masked language model as an important component of VIWHard. Meanwhile, we use a masked language model to construct the context-preserving sets of substitute words for important words, which further improves the naturalness of the adversarial texts. We conduct experiments by attacking WordCNN, WordLSTM and BERT on seven datasets, which contain text classification, toxic information, and sensitive information datasets. Experimental results show that our method achieves powerful attacking performance and generates natural adversarial texts. The average attack success rate on the seven datasets reaches 98.556%, and the average naturalness of the adversarial texts reaches 7.894. Specially, on the four security datasets Jigsaw2018, HSOL, EDENCE, and FAS, our average attack success rate reaches 97.663%, and the average naturalness of the adversarial texts reaches 8.626. In addition, we evaluate the attack performance of VIWHard on large language models (LLMs), the generated adversarial examples are effective for LLMs.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128917"},"PeriodicalIF":5.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142743511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hyperspectral image denoising via cooperated self-supervised CNN transform and nonconvex regularization 基于协同自监督CNN变换和非凸正则化的高光谱图像去噪

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2024-11-19 DOI: 10.1016/j.neucom.2024.128912

Ruizhi Hou , Fang Li

Methods that leverage the sparsity and the low-rankness in the transformed domain have gained growing interest for hyperspectral image (HSI) denoising. Recently, many researches simultaneously utilizing low-rankness and local smoothness have emerged. Although these approaches achieve great denoising performance, they exhibit several limitations. First, the widely adopted

l_{1}

norm is a biased function, potentially leading to blurring edges. Second, employing tensor singular value decomposition (T-SVD) to ensure low-rankness brings a heavy computational burden. Additionally, the manually designed regularization norm is fixed for all testing data, which may cause a generalization problem. To address these challenges, this work proposes a novel optimization model for HSI denoising that incorporates the self-supervised CNN transform and TV regularization (CTTV) with the nonconvex function induced norm. The CNN-based transform could implicitly ensure the low-rankness of the tensor and learn the potential information in the noisy data. Furthermore, we exploit the unbiased nonconvex minimax concave penalty (MCP) to enforce the local smoothness of the extracted features while preserving sharp edges. We design an algorithm to solve the proposed model built on the hybrid of the half-quadratic splitting (HQS) and the alternating direction method of multipliers (ADMM), in which the network parameter and the denoised image are separately optimized. Extensive experiments on various datasets indicate that our proposed method can achieve state-of-the-art performance in HSI denoising.

利用变换域的稀疏性和低秩性对高光谱图像进行去噪的方法越来越受到关注。近年来，出现了许多同时利用低秩度和局部平滑的研究。尽管这些方法取得了很好的去噪性能，但它们也有一些局限性。首先，广泛采用的l1范数是一个有偏函数，可能导致边缘模糊。其次，采用张量奇异值分解（T-SVD）来保证低秩带来了沉重的计算负担。此外，人工设计的正则化范数对所有测试数据都是固定的，这可能会导致泛化问题。为了解决这些挑战，本研究提出了一种新的HSI去噪优化模型，该模型将自监督CNN变换和电视正则化（CTTV）与非凸函数诱导范数相结合。基于cnn的变换可以隐式地保证张量的低秩性，并学习到噪声数据中的潜在信息。此外，我们利用无偏非凸极小极大凹惩罚（MCP）来增强提取特征的局部平滑性，同时保持尖锐的边缘。我们设计了一种基于半二次分割（HQS）和乘法器交替方向法（ADMM）的混合算法来求解该模型，该算法分别对网络参数和去噪图像进行优化。在各种数据集上的大量实验表明，我们提出的方法可以达到最先进的HSI去噪性能。

{"title":"Hyperspectral image denoising via cooperated self-supervised CNN transform and nonconvex regularization","authors":"Ruizhi Hou , Fang Li","doi":"10.1016/j.neucom.2024.128912","DOIUrl":"10.1016/j.neucom.2024.128912","url":null,"abstract":"<div><div>Methods that leverage the sparsity and the low-rankness in the transformed domain have gained growing interest for hyperspectral image (HSI) denoising. Recently, many researches simultaneously utilizing low-rankness and local smoothness have emerged. Although these approaches achieve great denoising performance, they exhibit several limitations. First, the widely adopted <span><math><msub><mrow><mi>l</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> norm is a biased function, potentially leading to blurring edges. Second, employing tensor singular value decomposition (T-SVD) to ensure low-rankness brings a heavy computational burden. Additionally, the manually designed regularization norm is fixed for all testing data, which may cause a generalization problem. To address these challenges, this work proposes a novel optimization model for HSI denoising that incorporates the self-supervised CNN transform and TV regularization (CTTV) with the nonconvex function induced norm. The CNN-based transform could implicitly ensure the low-rankness of the tensor and learn the potential information in the noisy data. Furthermore, we exploit the unbiased nonconvex minimax concave penalty (MCP) to enforce the local smoothness of the extracted features while preserving sharp edges. We design an algorithm to solve the proposed model built on the hybrid of the half-quadratic splitting (HQS) and the alternating direction method of multipliers (ADMM), in which the network parameter and the denoised image are separately optimized. Extensive experiments on various datasets indicate that our proposed method can achieve state-of-the-art performance in HSI denoising.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128912"},"PeriodicalIF":5.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142742967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying local useful information for attribute graph anomaly detection 识别局部有用信息用于属性图异常检测

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2024-11-18 DOI: 10.1016/j.neucom.2024.128900

Penghui Xi , Debo Cheng , Guangquan Lu , Zhenyun Deng , Guixian Zhang , Shichao Zhang

Graph anomaly detection primarily relies on shallow learning methods based on feature engineering and deep learning strategies centred on autoencoder-based reconstruction. However, these methods frequently fail to harness the local attributes and structural information within graph data, making it challenging to capture the underlying distribution in scenarios with class-imbalanced graph anomalies, which can result in overfitting. To deal with the above issue, this paper proposes a new anomaly detection method called LIAD (Identifying Local Useful Information for Attribute Graph Anomaly Detection), which learns the data’s underlying distribution and captures richer local information. First, LIAD employs data augmentation techniques to create masked graphs and pairs of positive and negative subgraphs. Then, LIAD leverages contrastive learning to derive rich embedding representations from diverse local structural information. Additionally, LIAD utilizes a variational autoencoder (VAE) to generate new graph data, capturing the neighbourhood distribution within the masked graph. During the training process, LIAD aligns the generated graph data with the original to deepen its comprehension of local information. Finally, anomaly scoring is achieved by comparing the discrimination and reconstruction scores of the contrastive pairs, enabling effective anomaly detection. Extensive experiments on five real-world datasets demonstrate the effectiveness of LIAD compared to state-of-the-art methods. Comprehensive ablation studies and parametric analyses further affirm the robustness and efficacy of our model.

图异常检测主要依赖于基于特征工程的浅学习方法和基于自编码器重建的深度学习策略。然而，这些方法经常不能利用图数据中的局部属性和结构信息，使得在类不平衡的图异常场景中捕捉底层分布变得困难，这可能导致过拟合。针对上述问题，本文提出了一种新的异常检测方法LIAD (identification Local Useful Information for Attribute Graph anomaly detection)，该方法学习数据的底层分布，捕获更丰富的局部信息。首先，LIAD使用数据增强技术来创建掩码图和正负子图对。然后，LIAD利用对比学习从不同的局部结构信息中获得丰富的嵌入表示。此外，LIAD利用变分自编码器（VAE）生成新的图数据，捕获掩码图内的邻域分布。在训练过程中，LIAD将生成的图数据与原始图数据对齐，以加深对局部信息的理解。最后，通过对比对的判别分数和重建分数进行异常评分，实现有效的异常检测。在五个真实世界数据集上进行的大量实验表明，与最先进的方法相比，LIAD的有效性。综合消融研究和参数分析进一步证实了我们模型的稳健性和有效性。

{"title":"Identifying local useful information for attribute graph anomaly detection","authors":"Penghui Xi , Debo Cheng , Guangquan Lu , Zhenyun Deng , Guixian Zhang , Shichao Zhang","doi":"10.1016/j.neucom.2024.128900","DOIUrl":"10.1016/j.neucom.2024.128900","url":null,"abstract":"<div><div>Graph anomaly detection primarily relies on shallow learning methods based on feature engineering and deep learning strategies centred on autoencoder-based reconstruction. However, these methods frequently fail to harness the local attributes and structural information within graph data, making it challenging to capture the underlying distribution in scenarios with class-imbalanced graph anomalies, which can result in overfitting. To deal with the above issue, this paper proposes a new anomaly detection method called LIAD (Identifying <u>L</u>ocal Useful <u>I</u>nformation for <u>A</u>ttribute Graph Anomaly <u>D</u>etection), which learns the data’s underlying distribution and captures richer local information. First, LIAD employs data augmentation techniques to create masked graphs and pairs of positive and negative subgraphs. Then, LIAD leverages contrastive learning to derive rich embedding representations from diverse local structural information. Additionally, LIAD utilizes a variational autoencoder (VAE) to generate new graph data, capturing the neighbourhood distribution within the masked graph. During the training process, LIAD aligns the generated graph data with the original to deepen its comprehension of local information. Finally, anomaly scoring is achieved by comparing the discrimination and reconstruction scores of the contrastive pairs, enabling effective anomaly detection. Extensive experiments on five real-world datasets demonstrate the effectiveness of LIAD compared to state-of-the-art methods. Comprehensive ablation studies and parametric analyses further affirm the robustness and efficacy of our model.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"617 ","pages":"Article 128900"},"PeriodicalIF":5.5,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142745436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DoA-ViT: Dual-objective Affine Vision Transformer for Data Insufficiency DoA-ViT：针对数据不足的双目标仿射视觉变换器

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2024-11-17 DOI: 10.1016/j.neucom.2024.128896

Qiang Ren, Junli Wang

Vision Transformers (ViTs) excel in large-scale image recognition tasks but struggle with limited data due to ineffective patch-level local information utilization. Existing methods focus on enhancing local representations at the model level but often treat all features equally, leading to noise from irrelevant information. Effectively distinguishing between discriminative features and irrelevant information helps minimize the interference of noise at the model level. To tackle this, we introduce Dual-objective Affine Vision Transformer (DoA-ViT), which enhances ViTs for data-limited tasks by improving feature discrimination. DoA-ViT incorporates a learnable affine transformation that associates transformed features with class-specific ones while preserving their intrinsic features. Additionally, an adaptive patch-based enhancement mechanism is designed to assign importance scores to patches, minimizing the impact of irrelevant information. These enhancements can be seamlessly integrated into existing ViTs as plug-and-play components. Extensive experiments on small-scale datasets show that DoA-ViT consistently outperforms existing methods, with visualization results highlighting its ability to identify critical image regions effectively.

视觉变换器（ViTs）在大规模图像识别任务中表现出色，但在有限的数据中却因片段级局部信息利用不力而举步维艰。现有方法侧重于增强模型级的局部表示，但往往对所有特征一视同仁，从而导致无关信息产生噪音。有效区分区分性特征和无关信息有助于最大限度地减少模型级噪声的干扰。为了解决这个问题，我们引入了双目标仿射视觉转换器（DoA-ViT），通过提高特征识别能力来增强数据有限任务的 ViT。DoA-ViT 融合了可学习的仿射变换，可将变换后的特征与特定类别的特征关联起来，同时保留其内在特征。此外，还设计了一种基于补丁的自适应增强机制，为补丁分配重要性分数，最大限度地减少无关信息的影响。这些增强功能可作为即插即用组件无缝集成到现有的 ViT 中。在小规模数据集上进行的广泛实验表明，DoA-ViT 的性能始终优于现有方法，其可视化结果凸显了其有效识别关键图像区域的能力。

{"title":"DoA-ViT: Dual-objective Affine Vision Transformer for Data Insufficiency","authors":"Qiang Ren, Junli Wang","doi":"10.1016/j.neucom.2024.128896","DOIUrl":"10.1016/j.neucom.2024.128896","url":null,"abstract":"<div><div>Vision Transformers (ViTs) excel in large-scale image recognition tasks but struggle with limited data due to ineffective patch-level local information utilization. Existing methods focus on enhancing local representations at the model level but often treat all features equally, leading to noise from irrelevant information. Effectively distinguishing between discriminative features and irrelevant information helps minimize the interference of noise at the model level. To tackle this, we introduce Dual-objective Affine Vision Transformer (DoA-ViT), which enhances ViTs for data-limited tasks by improving feature discrimination. DoA-ViT incorporates a learnable affine transformation that associates transformed features with class-specific ones while preserving their intrinsic features. Additionally, an adaptive patch-based enhancement mechanism is designed to assign importance scores to patches, minimizing the impact of irrelevant information. These enhancements can be seamlessly integrated into existing ViTs as plug-and-play components. Extensive experiments on small-scale datasets show that DoA-ViT consistently outperforms existing methods, with visualization results highlighting its ability to identify critical image regions effectively.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"615 ","pages":"Article 128896"},"PeriodicalIF":5.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142703000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning dual-pixel alignment for defocus deblurring 学习双像素对齐散焦去模糊

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2024-11-17 DOI: 10.1016/j.neucom.2024.128880

Yu Li, Yaling Yi, Xinya Shu, Dongwei Ren, Qince Li, Wangmeng Zuo

It is a challenging task to recover sharp image from a single defocus blurry image in real-world applications. On many modern cameras, dual-pixel (DP) sensors create two-image views, based on which stereo information can be exploited to benefit defocus deblurring. Despite the impressive results achieved by existing DP defocus deblurring methods, the misalignment between DP image views is still not studied, leaving room for improving DP defocus deblurring. In this work, we propose a Dual-Pixel Alignment Network (DPANet) for defocus deblurring. Generally, DPANet is an encoder–decoder with skip-connections, where two branches with shared parameters in the encoder are employed to extract and align deep features from left and right views, and one decoder is adopted to fuse aligned features for predicting the sharp image. Due to that DP views suffer from different blur amounts, it is not trivial to align left and right views. To this end, we propose novel encoder alignment module (EAM) and decoder alignment module (DAM). In particular, a correlation layer is suggested in EAM to measure the disparity between DP views, whose deep features can then be accordingly aligned using deformable convolutions. DAM can further enhance the alignment of skip-connected features from encoder and deep features in decoder. By introducing several EAMs and DAMs, DP views in DPANet can be well aligned for better predicting latent sharp image. Experimental results on real-world datasets show that our DPANet is notably superior to state-of-the-art deblurring methods in reducing defocus blur while recovering visually plausible sharp structures and textures.

在实际应用中，从单个散焦模糊图像中恢复清晰图像是一项具有挑战性的任务。在许多现代相机上，双像素（DP）传感器可以创建两幅图像视图，在此基础上可以利用立体信息来实现离焦去模糊。尽管现有的DP离焦去模糊方法取得了令人印象深刻的结果，但DP图像视图之间的不对准问题仍然没有得到研究，这为DP离焦去模糊的改进留下了空间。在这项工作中，我们提出了一个双像素对齐网络（DPANet）用于散焦去模糊。通常，DPANet是一种具有跳过连接的编码器-解码器，其中编码器中使用两个具有共享参数的分支从左右视图提取和对齐深度特征，并使用一个解码器融合对齐特征以预测锐利图像。由于DP视图受到不同模糊量的影响，因此对齐左右视图并不是微不足道的。为此，我们提出了新的编码器对齐模块（EAM）和解码器对齐模块（DAM）。特别地，在EAM中提出了一个相关层来测量DP视图之间的差异，然后可以使用可变形卷积相应地对齐DP视图的深层特征。DAM可以进一步增强编码器中跳过连接特征与解码器中深度特征的对齐。通过引入多个eam和dam， DPANet中的DP视图可以很好地对齐，从而更好地预测潜在的尖锐图像。在真实数据集上的实验结果表明，我们的DPANet在减少离焦模糊同时恢复视觉上可信的尖锐结构和纹理方面明显优于最先进的去模糊方法。

{"title":"Learning dual-pixel alignment for defocus deblurring","authors":"Yu Li, Yaling Yi, Xinya Shu, Dongwei Ren, Qince Li, Wangmeng Zuo","doi":"10.1016/j.neucom.2024.128880","DOIUrl":"10.1016/j.neucom.2024.128880","url":null,"abstract":"<div><div>It is a challenging task to recover sharp image from a single defocus blurry image in real-world applications. On many modern cameras, dual-pixel (DP) sensors create two-image views, based on which stereo information can be exploited to benefit defocus deblurring. Despite the impressive results achieved by existing DP defocus deblurring methods, the misalignment between DP image views is still not studied, leaving room for improving DP defocus deblurring. In this work, we propose a Dual-Pixel Alignment Network (DPANet) for defocus deblurring. Generally, DPANet is an encoder–decoder with skip-connections, where two branches with shared parameters in the encoder are employed to extract and align deep features from left and right views, and one decoder is adopted to fuse aligned features for predicting the sharp image. Due to that DP views suffer from different blur amounts, it is not trivial to align left and right views. To this end, we propose novel encoder alignment module (EAM) and decoder alignment module (DAM). In particular, a correlation layer is suggested in EAM to measure the disparity between DP views, whose deep features can then be accordingly aligned using deformable convolutions. DAM can further enhance the alignment of skip-connected features from encoder and deep features in decoder. By introducing several EAMs and DAMs, DP views in DPANet can be well aligned for better predicting latent sharp image. Experimental results on real-world datasets show that our DPANet is notably superior to state-of-the-art deblurring methods in reducing defocus blur while recovering visually plausible sharp structures and textures.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128880"},"PeriodicalIF":5.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142742860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep multi-similarity hashing via label-guided network for cross-modal retrieval 基于标签引导网络的深度多相似散列跨模态检索

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2024-11-17 DOI: 10.1016/j.neucom.2024.128830

Lei Wu , Qibing Qin , Jinkui Hou , Jiangyan Dai , Lei Huang , Wenfeng Zhang

Due to low storage cost and efficient retrieval advantages, hashing technologies have gained broad attention in the field of cross-modal retrieval in recent years. However, most current cross-modal hashing usually employs random sampling or semi-hard negative mining to construct training batches for model optimization, which ignores the distribution relationships between raw samples, generating redundant and unbalanced pairs, and resulting in sub-optimal embedding spaces. In this work, we address this dilemma with a novel deep cross-modal hashing framework, called Deep Multi-similarity Hashing via Label-Guided Networks (DMsH-LN), to learn a high separability public embedding space and generate discriminative binary descriptors. Specifically, by utilizing pair mining and weighting to jointly calculate self-similarity and relative similarity between pairs, the multi-similarity loss is extended to cross-modal hashing to alleviate the negative impacts caused by redundant and imbalanced samples on hash learning, enhancing the distinguishing ability of the obtained discrete codes. Besides, to capture fine-grained semantic supervised signals, the Label-guided Network is proposed to learn class-specific semantic signals, which could effectively guide the parameter optimization of the Image Network and Text Network. Extensive experiments are conducted on four benchmark datasets, which demonstrate that the DMsH-LN framework achieves excellent retrieval performance. The source codes of DMsH-LN are downloaded from https://github.com/QinLab-WFU/DMsH-LN.

由于低存储成本和高效检索的优点，哈希技术近年来在跨模式检索领域受到了广泛关注。然而，目前大多数跨模态哈希算法通常采用随机抽样或半硬负挖掘来构建训练批进行模型优化，忽略了原始样本之间的分布关系，产生冗余和不平衡对，导致嵌入空间次优。在这项工作中，我们使用一种新的深度跨模态哈希框架来解决这一困境，该框架称为基于标签引导网络的深度多相似哈希（DMsH-LN），以学习高可分性公共嵌入空间并生成判别二元描述符。具体来说，通过利用对挖掘和加权共同计算对之间的自相似度和相对相似度，将多重相似损失扩展到跨模态哈希，以减轻样本冗余和不平衡对哈希学习的负面影响，增强得到的离散码的区分能力。此外，为了捕获细粒度的语义监督信号，提出了标签引导网络学习特定类别的语义信号，可以有效地指导图像网络和文本网络的参数优化。在4个基准数据集上进行了大量的实验，结果表明，DMsH-LN框架具有良好的检索性能。DMsH-LN的源代码可从https://github.com/QinLab-WFU/DMsH-LN下载。

{"title":"Deep multi-similarity hashing via label-guided network for cross-modal retrieval","authors":"Lei Wu , Qibing Qin , Jinkui Hou , Jiangyan Dai , Lei Huang , Wenfeng Zhang","doi":"10.1016/j.neucom.2024.128830","DOIUrl":"10.1016/j.neucom.2024.128830","url":null,"abstract":"<div><div>Due to low storage cost and efficient retrieval advantages, hashing technologies have gained broad attention in the field of cross-modal retrieval in recent years. However, most current cross-modal hashing usually employs random sampling or semi-hard negative mining to construct training batches for model optimization, which ignores the distribution relationships between raw samples, generating redundant and unbalanced pairs, and resulting in sub-optimal embedding spaces. In this work, we address this dilemma with a novel deep cross-modal hashing framework, called Deep Multi-similarity Hashing via Label-Guided Networks (DMsH-LN), to learn a high separability public embedding space and generate discriminative binary descriptors. Specifically, by utilizing pair mining and weighting to jointly calculate self-similarity and relative similarity between pairs, the multi-similarity loss is extended to cross-modal hashing to alleviate the negative impacts caused by redundant and imbalanced samples on hash learning, enhancing the distinguishing ability of the obtained discrete codes. Besides, to capture fine-grained semantic supervised signals, the Label-guided Network is proposed to learn class-specific semantic signals, which could effectively guide the parameter optimization of the Image Network and Text Network. Extensive experiments are conducted on four benchmark datasets, which demonstrate that the DMsH-LN framework achieves excellent retrieval performance. The source codes of DMsH-LN are downloaded from <span><span>https://github.com/QinLab-WFU/DMsH-LN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128830"},"PeriodicalIF":5.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142742966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CNN explanation methods for ordinal regression tasks 用于序数回归任务的 CNN 解释方法

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2024-11-17 DOI: 10.1016/j.neucom.2024.128878

Javier Barbero-Gómez , Ricardo P.M. Cruz , Jaime S. Cardoso , Pedro A. Gutiérrez , César Hervás-Martínez

The use of Convolutional Neural Network (CNN) models for image classification tasks has gained significant popularity. However, the lack of interpretability in CNN models poses challenges for debugging and validation. To address this issue, various explanation methods have been developed to provide insights into CNN models. This paper focuses on the validity of these explanation methods for ordinal regression tasks, where the classes have a predefined order relationship. Different modifications are proposed for two explanation methods to exploit the ordinal relationships between classes: Grad-CAM based on Ordinal Binary Decomposition (GradOBD-CAM) and Ordinal Information Bottleneck Analysis (OIBA). The performance of these modified methods is compared to existing popular alternatives. Experimental results demonstrate that GradOBD-CAM outperforms other methods in terms of interpretability for three out of four datasets, while OIBA achieves superior performance compared to IBA.

在图像分类任务中使用卷积神经网络（CNN）模型已获得极大的普及。然而，CNN 模型缺乏可解释性，这给调试和验证带来了挑战。为解决这一问题，人们开发了各种解释方法，以深入了解 CNN 模型。本文重点研究了这些解释方法在顺序回归任务中的有效性，在顺序回归任务中，类具有预定义的顺序关系。本文对两种解释方法提出了不同的修改建议，以利用类之间的顺序关系：基于序数二元分解的 Grad-CAM 方法（GradOBD-CAM）和序数信息瓶颈分析方法（OIBA）。这些改进方法的性能与现有的流行替代方法进行了比较。实验结果表明，GradOBD-CAM 在四个数据集中的三个数据集的可解释性方面优于其他方法，而 OIBA 的性能则优于 IBA。

引用次数: 0

Fooling human detectors via robust and visually natural adversarial patches 通过强大和视觉上自然的对抗补丁来欺骗人类探测器

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing

Pub Date : 2024-11-17 DOI: 10.1016/j.neucom.2024.128915

Dawei Zhou , Hongbin Qu , Nannan Wang , Chunlei Peng , Zhuoqi Ma , Xi Yang , Xinbo Gao

DNNs are vulnerable to adversarial attacks. Physical attacks alter local regions of images by either physically equipping crafted objects or synthesizing adversarial patches. This design is applicable to real-world image capturing scenarios. Currently, adversarial patches are typically generated from random noise. Their textures are different from image textures. Also, these patches are developed without focusing on the relationship between human poses and adversarial robustness. The unnatural pose and texture make patches noticeable in practice. In this work, we propose to synthesize adversarial patches which are visually natural from the perspectives of both poses and textures. In order to adapt adversarial patches to human pose, we propose a patch adaption network PosePatch for patch synthesis, which is guided by perspective transform with estimated human poses. Meanwhile, we develop a network StylePatch to generate harmonized textures for adversarial patches. These networks are combined together for end-to-end training. As a result, our method can synthesize adversarial patches for arbitrary human images without knowing poses and localization in advance. Experiments on benchmark datasets and real-world scenarios show that our method is robust to human pose variations and synthesized adversarial patches are effective, and a user study is made to validate the naturalness.

dnn很容易受到对抗性攻击。物理攻击通过物理装备精心制作的物体或合成对抗性补丁来改变图像的局部区域。本设计适用于现实世界的图像捕捉场景。目前，对抗补丁通常是由随机噪声产生的。它们的纹理与图像纹理不同。此外，这些补丁的开发没有关注人类姿势和对抗鲁棒性之间的关系。不自然的姿势和纹理使补丁在实践中明显。在这项工作中，我们建议从姿势和纹理的角度合成视觉上自然的对抗性斑块。为了使对抗补丁适应人体姿态，我们提出了一种基于预估人体姿态的透视变换指导下的补丁合成补丁自适应网络PosePatch。同时，我们开发了一个网络StylePatch来生成对抗性补丁的协调纹理。这些网络组合在一起进行端到端训练。因此，我们的方法可以在不知道姿态和定位的情况下，对任意的人体图像合成对抗补丁。基于基准数据集和真实场景的实验表明，该方法对人体姿态变化具有鲁棒性，合成的对抗补丁是有效的，并通过用户研究验证了该方法的自然度。

{"title":"Fooling human detectors via robust and visually natural adversarial patches","authors":"Dawei Zhou , Hongbin Qu , Nannan Wang , Chunlei Peng , Zhuoqi Ma , Xi Yang , Xinbo Gao","doi":"10.1016/j.neucom.2024.128915","DOIUrl":"10.1016/j.neucom.2024.128915","url":null,"abstract":"<div><div>DNNs are vulnerable to adversarial attacks. Physical attacks alter local regions of images by either physically equipping crafted objects or synthesizing adversarial patches. This design is applicable to real-world image capturing scenarios. Currently, adversarial patches are typically generated from random noise. Their textures are different from image textures. Also, these patches are developed without focusing on the relationship between human poses and adversarial robustness. The unnatural pose and texture make patches noticeable in practice. In this work, we propose to synthesize adversarial patches which are visually natural from the perspectives of both poses and textures. In order to adapt adversarial patches to human pose, we propose a patch adaption network PosePatch for patch synthesis, which is guided by perspective transform with estimated human poses. Meanwhile, we develop a network StylePatch to generate harmonized textures for adversarial patches. These networks are combined together for end-to-end training. As a result, our method can synthesize adversarial patches for arbitrary human images without knowing poses and localization in advance. Experiments on benchmark datasets and real-world scenarios show that our method is robust to human pose variations and synthesized adversarial patches are effective, and a user study is made to validate the naturalness.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128915"},"PeriodicalIF":5.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142742858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0