Neural Networks最新文献_第3页

ADFQ-ViT: Activation-Distribution-Friendly post-training Quantization for Vision Transformers

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-02-22 DOI: 10.1016/j.neunet.2025.107289

Yanfeng Jiang , Ning Sun , Xueshuo Xie , Fei Yang , Tao Li

Vision Transformers (ViTs) have exhibited exceptional performance across diverse computer vision tasks, while their substantial parameter size incurs significantly increased memory and computational demands, impeding effective inference on resource-constrained devices. Quantization has emerged as a promising solution to mitigate these challenges, yet existing methods still suffer from significant accuracy loss at low-bit. We attribute this issue to the distinctive distributions of post-LayerNorm and post-GELU activations within ViTs, rendering conventional hardware-friendly quantizers ineffective, particularly in low-bit scenarios. To address this issue, we propose a novel framework called Activation-Distribution-Friendly post-training Quantization for Vision Transformers, ADFQ-ViT. Concretely, we introduce the Per-Patch Outlier-aware Quantizer to tackle irregular outliers in post-LayerNorm activations. This quantizer refines the granularity of the uniform quantizer to a per-patch level while retaining a minimal subset of values exceeding a threshold at full-precision. To handle the non-uniform distributions of post-GELU activations between positive and negative regions, we design the Shift-Log2 Quantizer, which shifts all elements to the positive region and then applies log2 quantization. Moreover, we present the Attention-score enhanced Module-wise Optimization which adjusts the parameters of each quantizer by reconstructing errors to further mitigate quantization error. Extensive experiments demonstrate ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit. Specifically, when quantizing the ViT-B model to 4-bit, we achieve a 5.17% improvement in Top-1 accuracy on the ImageNet dataset. Our code is available at: https://github.com/llwx593/adfq-vit.git.

{"title":"ADFQ-ViT: Activation-Distribution-Friendly post-training Quantization for Vision Transformers","authors":"Yanfeng Jiang , Ning Sun , Xueshuo Xie , Fei Yang , Tao Li","doi":"10.1016/j.neunet.2025.107289","DOIUrl":"10.1016/j.neunet.2025.107289","url":null,"abstract":"<div><div>Vision Transformers (ViTs) have exhibited exceptional performance across diverse computer vision tasks, while their substantial parameter size incurs significantly increased memory and computational demands, impeding effective inference on resource-constrained devices. Quantization has emerged as a promising solution to mitigate these challenges, yet existing methods still suffer from significant accuracy loss at low-bit. We attribute this issue to the distinctive distributions of post-LayerNorm and post-GELU activations within ViTs, rendering conventional hardware-friendly quantizers ineffective, particularly in low-bit scenarios. To address this issue, we propose a novel framework called Activation-Distribution-Friendly post-training Quantization for Vision Transformers, ADFQ-ViT. Concretely, we introduce the Per-Patch Outlier-aware Quantizer to tackle irregular outliers in post-LayerNorm activations. This quantizer refines the granularity of the uniform quantizer to a per-patch level while retaining a minimal subset of values exceeding a threshold at full-precision. To handle the non-uniform distributions of post-GELU activations between positive and negative regions, we design the Shift-Log2 Quantizer, which shifts all elements to the positive region and then applies log2 quantization. Moreover, we present the Attention-score enhanced Module-wise Optimization which adjusts the parameters of each quantizer by reconstructing errors to further mitigate quantization error. Extensive experiments demonstrate ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit. Specifically, when quantizing the ViT-B model to 4-bit, we achieve a 5.17% improvement in Top-1 accuracy on the ImageNet dataset. Our code is available at: <span><span>https://github.com/llwx593/adfq-vit.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107289"},"PeriodicalIF":6.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CNN-Transformer and Channel-Spatial Attention based network for hyperspectral image classification with few samples

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-02-22 DOI: 10.1016/j.neunet.2025.107283

Chuan Fu , Tianyuan Zhou , Tan Guo , Qikui Zhu , Fulin Luo , Bo Du

Hyperspectral image classification is an important foundational technology in the field of Earth observation and remote sensing. In recent years, deep learning has achieved a series of remarkable achievements in this area. These deep learning-based hyperspectral image classifications typically require a large number of annotated samples to train the models. However, obtaining a large number of accurate annotated hyperspectral images for high-altitude or remote areas is usually extremely difficult. In this paper, we propose a novel algorithm, CTA-net, for hyperspectral classification with a small number of samples. First, we proposed a sample expansion scheme to generate a large number of new samples to alleviate the problem of insufficient samples. On this basis, we introduced a novel hyperspectral classification network. The network first utilizes a module based on CNN-Transformer to extract blocks of hyperspectral images, where CNN focuses primarily on local features, while the Transformer module focuses mainly on non-local features. Furthermore, a simple channel-spatial attention module is adopted to further optimize the features. We conducted experiments on multiple hyperspectral image datasets, and the experiments verified the effectiveness of our CTA-net.

高光谱图像分类是地球观测与遥感领域的一项重要基础技术。近年来，深度学习在这一领域取得了一系列令人瞩目的成就。这些基于深度学习的高光谱图像分类通常需要大量标注样本来训练模型。然而，为高海拔或偏远地区获取大量准确的高光谱图像注释通常极其困难。在本文中，我们提出了一种新型算法 CTA-net，用于使用少量样本进行高光谱分类。首先，我们提出了一种样本扩展方案，以生成大量新样本来缓解样本不足的问题。在此基础上，我们引入了一种新型高光谱分类网络。该网络首先利用基于 CNN-Transformer 的模块提取高光谱图像块，其中 CNN 主要关注局部特征，而 Transformer 模块主要关注非局部特征。此外，我们还采用了一个简单的通道空间关注模块来进一步优化特征。我们在多个高光谱图像数据集上进行了实验，实验验证了我们的 CTA 网络的有效性。

{"title":"CNN-Transformer and Channel-Spatial Attention based network for hyperspectral image classification with few samples","authors":"Chuan Fu , Tianyuan Zhou , Tan Guo , Qikui Zhu , Fulin Luo , Bo Du","doi":"10.1016/j.neunet.2025.107283","DOIUrl":"10.1016/j.neunet.2025.107283","url":null,"abstract":"<div><div>Hyperspectral image classification is an important foundational technology in the field of Earth observation and remote sensing. In recent years, deep learning has achieved a series of remarkable achievements in this area. These deep learning-based hyperspectral image classifications typically require a large number of annotated samples to train the models. However, obtaining a large number of accurate annotated hyperspectral images for high-altitude or remote areas is usually extremely difficult. In this paper, we propose a novel algorithm, CTA-net, for hyperspectral classification with a small number of samples. First, we proposed a sample expansion scheme to generate a large number of new samples to alleviate the problem of insufficient samples. On this basis, we introduced a novel hyperspectral classification network. The network first utilizes a module based on CNN-Transformer to extract blocks of hyperspectral images, where CNN focuses primarily on local features, while the Transformer module focuses mainly on non-local features. Furthermore, a simple channel-spatial attention module is adopted to further optimize the features. We conducted experiments on multiple hyperspectral image datasets, and the experiments verified the effectiveness of our CTA-net.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107283"},"PeriodicalIF":6.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Label as Equilibrium: A performance booster for Graph Neural Networks on node classification

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-02-22 DOI: 10.1016/j.neunet.2025.107284

Yi Luo, Guangchun Luo, Guiduo Duan, Aiguo Chen

Graph Neural Network (GNN) is effective in graph mining and has become a dominant solution to the node classification task. Recently, a series of label reuse approaches emerged to boost the node classification performance of GNN. They repeatedly input the predicted node class labels into the underlying GNN to update the predictions. However, there are two issues in label reuse that prevent it from performing better. First, re-inputting predictions that are close to the training labels makes the GNN over-fitting, resulting in generalization loss and performance degradation. Second, the repeated iterations consume unaffordable memory for gradient descent, leading to compromised optimization and suboptimal results. To address these issues, we propose an advanced label reuse approach termed Label as Equilibrium (LaE). It has (1) an improved masking strategy with supervision concealment that resolves prediction over-fitting and (2) an infinite number of iterations which is optimizable within constant memory consumption. Excessive node classification experiments demonstrate the superiority of LaE. It significantly increases the accuracy scores of prevailing GNNs by 2.31% on average and outperforms previous label reuse approaches on eight real-world datasets by 1.60% on average. Considering the wide application of label reuse, many state-of-the-art GNNs can benefit from our techniques. Code to reproduce all our experiments is released at https://github.com/cf020031308/LaE.

{"title":"Label as Equilibrium: A performance booster for Graph Neural Networks on node classification","authors":"Yi Luo, Guangchun Luo, Guiduo Duan, Aiguo Chen","doi":"10.1016/j.neunet.2025.107284","DOIUrl":"10.1016/j.neunet.2025.107284","url":null,"abstract":"<div><div>Graph Neural Network (GNN) is effective in graph mining and has become a dominant solution to the node classification task. Recently, a series of label reuse approaches emerged to boost the node classification performance of GNN. They repeatedly input the predicted node class labels into the underlying GNN to update the predictions. However, there are two issues in label reuse that prevent it from performing better. First, re-inputting predictions that are close to the training labels makes the GNN over-fitting, resulting in generalization loss and performance degradation. Second, the repeated iterations consume unaffordable memory for gradient descent, leading to compromised optimization and suboptimal results. To address these issues, we propose an advanced label reuse approach termed Label as Equilibrium (LaE). It has <strong>(1)</strong> an improved masking strategy with supervision concealment that resolves prediction over-fitting and <strong>(2)</strong> an infinite number of iterations which is optimizable within constant memory consumption. Excessive node classification experiments demonstrate the superiority of LaE. It significantly increases the accuracy scores of prevailing GNNs by 2.31% on average and outperforms previous label reuse approaches on eight real-world datasets by 1.60% on average. Considering the wide application of label reuse, many state-of-the-art GNNs can benefit from our techniques. Code to reproduce all our experiments is released at <span><span>https://github.com/cf020031308/LaE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107284"},"PeriodicalIF":6.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143509125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adversarial perturbation and defense for generalizable person re-identification

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-02-22 DOI: 10.1016/j.neunet.2025.107287

Hongchen Tan , Kaiqiang Xu , Pingping Tao , Xiuping Liu

In the Domain Generalizable Person Re-Identification (DG Re-ID) task, the quality of identity-relevant descriptor is crucial for domain generalization performance. However, for hard-matching samples, it is difficult to separate high-quality identity-relevant feature from identity-irrelevant feature. It will inevitably affect the domain generalization performance. Thus, in this paper, we try to enhance the model’s ability to separate identity-relevant feature from identity-irrelevant feature of hard matching samples, to achieve high-performance domain generalization. To this end, we propose an Adversarial Perturbation and Defense (APD) Re-identification Method. In the APD, to synthesize hard matching samples, we introduce a Metric-Perturbation Generation Network (MPG-Net) grounded in the concept of metric adversariality. In the MPG-Net, we try to perturb the metric relationship of samples in the latent space, while preserving the essential visual details of the original samples. Then, to capture high-quality identity-relevant feature, we propose a Semantic Purification Network (SP-Net). The hard matching samples synthesized by MPG-Net is used to train the SP-Net. In the SP-Net, we further design the Semantic Self-perturbation and Defense (SSD) Scheme, to better disentangle and purify identity-relevant feature from these hard matching samples. Above all, through extensive experimentation, we validate the effectiveness of the APD method in the DG Re-ID task.

{"title":"Adversarial perturbation and defense for generalizable person re-identification","authors":"Hongchen Tan , Kaiqiang Xu , Pingping Tao , Xiuping Liu","doi":"10.1016/j.neunet.2025.107287","DOIUrl":"10.1016/j.neunet.2025.107287","url":null,"abstract":"<div><div>In the Domain Generalizable Person Re-Identification (DG Re-ID) task, the quality of identity-relevant descriptor is crucial for domain generalization performance. However, for hard-matching samples, it is difficult to separate high-quality identity-relevant feature from identity-irrelevant feature. It will inevitably affect the domain generalization performance. Thus, in this paper, we try to enhance the model’s ability to separate identity-relevant feature from identity-irrelevant feature of hard matching samples, to achieve high-performance domain generalization. To this end, we propose an Adversarial Perturbation and Defense (APD) Re-identification Method. In the APD, to synthesize hard matching samples, we introduce a Metric-Perturbation Generation Network (MPG-Net) grounded in the concept of metric adversariality. In the MPG-Net, we try to perturb the metric relationship of samples in the latent space, while preserving the essential visual details of the original samples. Then, to capture high-quality identity-relevant feature, we propose a Semantic Purification Network (SP-Net). The hard matching samples synthesized by MPG-Net is used to train the SP-Net. In the SP-Net, we further design the Semantic Self-perturbation and Defense (SSD) Scheme, to better disentangle and purify identity-relevant feature from these hard matching samples. Above all, through extensive experimentation, we validate the effectiveness of the APD method in the DG Re-ID task.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107287"},"PeriodicalIF":6.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143487522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neural-network-based practical specified-time resilient formation maneuver control for second-order nonlinear multi-robot systems under FDI attacks

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-02-22 DOI: 10.1016/j.neunet.2025.107288

Chuanhai Yang , Jingyi Huang , Shuang Wu , Qingshan Liu

This paper presents a specified-time resilient formation maneuver control approach for second-order nonlinear multi-robot systems under false data injection (FDI) attacks, incorporating an offline neural network. Building on existing works in integrated distributed localization and specified-time formation maneuver, the proposed approach introduces a hierarchical topology framework based on

(d + 1)

-reachability theory to achieve downward decoupling, ensuring that each robot in a given layer remains unaffected by attacks on lower-layer robots. The framework enhances resilience by restricting the flow of follower information to the current and previous layers and the leader, thereby improving distributed relative localization accuracy. An offline radial basis function neural network (RBFNN) is employed to mitigate unknown nonlinearities and FDI attacks, enabling the control protocol to achieve specified time convergence while reducing system errors compared to traditional finite-time and fixed-time methods. Simulation results validate the effectiveness of the method with enhanced robustness and reduced error under adversarial conditions.

{"title":"Neural-network-based practical specified-time resilient formation maneuver control for second-order nonlinear multi-robot systems under FDI attacks","authors":"Chuanhai Yang , Jingyi Huang , Shuang Wu , Qingshan Liu","doi":"10.1016/j.neunet.2025.107288","DOIUrl":"10.1016/j.neunet.2025.107288","url":null,"abstract":"<div><div>This paper presents a specified-time resilient formation maneuver control approach for second-order nonlinear multi-robot systems under false data injection (FDI) attacks, incorporating an offline neural network. Building on existing works in integrated distributed localization and specified-time formation maneuver, the proposed approach introduces a hierarchical topology framework based on <span><math><mrow><mo>(</mo><mi>d</mi><mo>+</mo><mn>1</mn><mo>)</mo></mrow></math></span>-reachability theory to achieve downward decoupling, ensuring that each robot in a given layer remains unaffected by attacks on lower-layer robots. The framework enhances resilience by restricting the flow of follower information to the current and previous layers and the leader, thereby improving distributed relative localization accuracy. An offline radial basis function neural network (RBFNN) is employed to mitigate unknown nonlinearities and FDI attacks, enabling the control protocol to achieve specified time convergence while reducing system errors compared to traditional finite-time and fixed-time methods. Simulation results validate the effectiveness of the method with enhanced robustness and reduced error under adversarial conditions.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107288"},"PeriodicalIF":6.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143509129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Approximation by non-symmetric networks for cross-domain learning

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-02-22 DOI: 10.1016/j.neunet.2025.107282

H.N. Mhaskar

For the past 30 years or so, machine learning has stimulated a great deal of research in the study of approximation capabilities (expressive power) of a multitude of processes, such as approximation by shallow or deep neural networks, radial basis function networks, and a variety of kernel based methods. Motivated by applications such as invariant learning, transfer learning, and synthetic aperture radar imaging, we initiate in this paper a general approach to study the approximation capabilities of kernel based networks using non-symmetric kernels. While singular value decomposition is a natural instinct to study such kernels, we consider a more general approach to include the use of a family of kernels, such as generalized translation networks (which include neural networks and translation invariant kernels as special cases) and rotated zonal function kernels. Naturally, unlike traditional kernel based approximation, we cannot require the kernels to be positive definite. In particular, we obtain estimates on the accuracy of uniform approximation of functions in a Sobolev class by ReLU

^{r}

networks when

r

is not necessarily an integer. Our general results apply to the approximation of functions with small smoothness compared to the dimension of the input space.

{"title":"Approximation by non-symmetric networks for cross-domain learning","authors":"H.N. Mhaskar","doi":"10.1016/j.neunet.2025.107282","DOIUrl":"10.1016/j.neunet.2025.107282","url":null,"abstract":"<div><div>For the past 30 years or so, machine learning has stimulated a great deal of research in the study of approximation capabilities (expressive power) of a multitude of processes, such as approximation by shallow or deep neural networks, radial basis function networks, and a variety of kernel based methods. Motivated by applications such as invariant learning, transfer learning, and synthetic aperture radar imaging, we initiate in this paper a general approach to study the approximation capabilities of kernel based networks using non-symmetric kernels. While singular value decomposition is a natural instinct to study such kernels, we consider a more general approach to include the use of a family of kernels, such as generalized translation networks (which include neural networks and translation invariant kernels as special cases) and rotated zonal function kernels. Naturally, unlike traditional kernel based approximation, we cannot require the kernels to be positive definite. In particular, we obtain estimates on the accuracy of uniform approximation of functions in a Sobolev class by ReLU<span><math><msup><mrow></mrow><mrow><mi>r</mi></mrow></msup></math></span> networks when <span><math><mi>r</mi></math></span> is not necessarily an integer. Our general results apply to the approximation of functions with small smoothness compared to the dimension of the input space.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107282"},"PeriodicalIF":6.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143552995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Replica tree-based federated learning using limited data

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-02-22 DOI: 10.1016/j.neunet.2025.107281

Ramona Ghilea, Islem Rekik

Learning from limited data has been extensively studied in machine learning, considering that deep neural networks achieve optimal performance when trained using a large amount of samples. Although various strategies have been proposed for centralized training, the topic of federated learning with small datasets remains largely unexplored. Moreover, in realistic scenarios, such as settings where medical institutions are involved, the number of participating clients is also constrained. In this work, we propose a novel federated learning framework, named RepTreeFL. At the core of the solution is the concept of a replica, where we replicate each participating client by copying its model architecture and perturbing its local data distribution. Our approach enables learning from limited data and a small number of clients by aggregating a larger number of models with diverse data distributions. Furthermore, we leverage the hierarchical structure of the clients network (both original and virtual), alongside the model diversity across replicas, and introduce a diversity-based tree aggregation, where replicas are combined in a tree-like manner and the aggregation weights are dynamically updated based on the model discrepancy. We evaluated our method on two tasks and two types of data, graph generation and image classification (binary and multi-class), with both homogeneous and heterogeneous model architectures. Experimental results demonstrate the effectiveness and outperformance of RepTreeFL in settings where both data and clients are limited.

{"title":"Replica tree-based federated learning using limited data","authors":"Ramona Ghilea, Islem Rekik","doi":"10.1016/j.neunet.2025.107281","DOIUrl":"10.1016/j.neunet.2025.107281","url":null,"abstract":"<div><div>Learning from limited data has been extensively studied in machine learning, considering that deep neural networks achieve optimal performance when trained using a large amount of samples. Although various strategies have been proposed for centralized training, the topic of federated learning with small datasets remains largely unexplored. Moreover, in realistic scenarios, such as settings where medical institutions are involved, the number of participating clients is also constrained. In this work, we propose a novel federated learning framework, named <em>RepTreeFL</em>. At the core of the solution is the concept of a replica, where we replicate each participating client by copying its model architecture and perturbing its local data distribution. Our approach enables learning from limited data and a small number of clients by aggregating a larger number of models with diverse data distributions. Furthermore, we leverage the hierarchical structure of the clients network (both original and virtual), alongside the model diversity across replicas, and introduce a diversity-based tree aggregation, where replicas are combined in a tree-like manner and the aggregation weights are dynamically updated based on the model discrepancy. We evaluated our method on two tasks and two types of data, graph generation and image classification (binary and multi-class), with both homogeneous and heterogeneous model architectures. Experimental results demonstrate the effectiveness and outperformance of <em>RepTreeFL</em> in settings where both data and clients are limited.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107281"},"PeriodicalIF":6.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143509128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GeM: Gaussian embeddings with Multi-hop graph transfer for next POI recommendation

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-02-22 DOI: 10.1016/j.neunet.2025.107290

Wenqian Mu , Jiyuan Liu , Yongshun Gong , Ji Zhong , Wei Liu , Haoliang Sun , Xiushan Nie , Yilong Yin , Yu Zheng

Next Point-of-Interest (POI) recommendation is crucial in location-based applications, analyzing user behavior patterns from historical trajectories. Existing studies usually use graph structures and attention mechanisms for sequential prediction with single fixed points. However, existing work based on the Markov chain hypothesis neglects dependencies of multi-hop transfers between POIs, which is a common pattern of user behaviors. To address these limitations, we propose GeM, a unified framework that effectively employs Gaussian distribution and Multi-hop graph relation to capture movement patterns and simulate user travel decisions, considering user preference and objective factors simultaneously. At the subjective module, Gaussian embeddings with Mahalanobis distance are exploited to make the embedded space non-flat and stable, which enables the expression of asymmetric relations, while the objective module also mines graph information and multi-hop dependency through a global trajectory graph, reflecting POI associations with user movement. Besides, matrix factorization is used to learn user-POI interaction. By combining both modules, we get a more accurate representation of user behavior patterns. Extensive experiments conducted on two real-world datasets show that our model outperforms the compared state-of-the-art methods.

{"title":"GeM: Gaussian embeddings with Multi-hop graph transfer for next POI recommendation","authors":"Wenqian Mu , Jiyuan Liu , Yongshun Gong , Ji Zhong , Wei Liu , Haoliang Sun , Xiushan Nie , Yilong Yin , Yu Zheng","doi":"10.1016/j.neunet.2025.107290","DOIUrl":"10.1016/j.neunet.2025.107290","url":null,"abstract":"<div><div>Next Point-of-Interest (POI) recommendation is crucial in location-based applications, analyzing user behavior patterns from historical trajectories. Existing studies usually use graph structures and attention mechanisms for sequential prediction with single fixed points. However, existing work based on the Markov chain hypothesis neglects dependencies of multi-hop transfers between POIs, which is a common pattern of user behaviors. To address these limitations, we propose GeM, a unified framework that effectively employs Gaussian distribution and Multi-hop graph relation to capture movement patterns and simulate user travel decisions, considering user preference and objective factors simultaneously. At the subjective module, Gaussian embeddings with Mahalanobis distance are exploited to make the embedded space non-flat and stable, which enables the expression of asymmetric relations, while the objective module also mines graph information and multi-hop dependency through a global trajectory graph, reflecting POI associations with user movement. Besides, matrix factorization is used to learn user-POI interaction. By combining both modules, we get a more accurate representation of user behavior patterns. Extensive experiments conducted on two real-world datasets show that our model outperforms the compared state-of-the-art methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107290"},"PeriodicalIF":6.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143487519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A strictly predefined-time convergent and anti-noise fractional-order zeroing neural network for solving time-variant quadratic programming in kinematic robot control

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-02-22 DOI: 10.1016/j.neunet.2025.107279

Yi Yang , Xiao Li , Xuchen Wang , Mei Liu , Junwei Yin , Weibing Li , Richard M. Voyles , Xin Ma

This paper proposes a strictly predefined-time convergent and anti-noise fractional-order zeroing neural network (SPTC-AN-FOZNN) model, meticulously designed for addressing time-variant quadratic programming (TVQP) problems. This model marks the first variable-gain ZNN to collectively manifest strictly predefined-time convergence and noise resilience, specifically tailored for kinematic motion control of robots. The SPTC-AN-FOZNN advances traditional ZNNs by incorporating a conformable fractional derivative in accordance with the Leibniz rule, a compliance not commonly achieved by other fractional derivative definitions. It also features a novel activation function designed to ensure favorable convergence independent of the model’s order. When compared to five recently published recurrent neural networks (RNNs), the SPTC-AN-FOZNN, configured with

0 < α \leq 1

, exhibits superior positional accuracy and robustness against additive noises for TVQP applications. Extensive empirical evaluations, including simulations with two types of robotic manipulators and experiments with a Flexiv Rizon robot, have validated the SPTC-AN-FOZNN’s effectiveness in precise tracking and computational efficiency, establishing its utility for robust kinematic control.

{"title":"A strictly predefined-time convergent and anti-noise fractional-order zeroing neural network for solving time-variant quadratic programming in kinematic robot control","authors":"Yi Yang , Xiao Li , Xuchen Wang , Mei Liu , Junwei Yin , Weibing Li , Richard M. Voyles , Xin Ma","doi":"10.1016/j.neunet.2025.107279","DOIUrl":"10.1016/j.neunet.2025.107279","url":null,"abstract":"<div><div>This paper proposes a strictly predefined-time convergent and anti-noise fractional-order zeroing neural network (SPTC-AN-FOZNN) model, meticulously designed for addressing time-variant quadratic programming (TVQP) problems. This model marks the first variable-gain ZNN to collectively manifest strictly predefined-time convergence and noise resilience, specifically tailored for kinematic motion control of robots. The SPTC-AN-FOZNN advances traditional ZNNs by incorporating a conformable fractional derivative in accordance with the Leibniz rule, a compliance not commonly achieved by other fractional derivative definitions. It also features a novel activation function designed to ensure favorable convergence independent of the model’s order. When compared to five recently published recurrent neural networks (RNNs), the SPTC-AN-FOZNN, configured with <span><math><mrow><mn>0</mn><mo><</mo><mi>α</mi><mo>≤</mo><mn>1</mn></mrow></math></span>, exhibits superior positional accuracy and robustness against additive noises for TVQP applications. Extensive empirical evaluations, including simulations with two types of robotic manipulators and experiments with a Flexiv Rizon robot, have validated the SPTC-AN-FOZNN’s effectiveness in precise tracking and computational efficiency, establishing its utility for robust kinematic control.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107279"},"PeriodicalIF":6.0,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143487521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-triggered neural tracking control for discrete-time nonlinear systems via adaptive critic learning

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks

Pub Date : 2025-02-21 DOI: 10.1016/j.neunet.2025.107280

Lingzhi Hu, Ding Wang, Gongming Wang, Junfei Qiao

In this paper, a novel self-triggered optimal tracking control method is developed based on the online action–critic technique for discrete-time nonlinear systems. First, an augmented plant is constructed by integrating the system state with the reference trajectory. This transformation redefines the optimal tracking control design as the optimal regulation issue of the reconstructed nonlinear error system. Subsequently, under the premise of ensuring the controlled system stability, a self-sampling function that depends solely on the sampling tracking error is devised, thereby determining the next triggering instant. This approach not only effectively reduces the computational burden but also eliminates the need for continuous evaluation of the triggering condition, as required in traditional event-based methods. Furthermore, the developed control method can be found to possess excellent triggering performance. The model, critic, and action neural networks are constructed to implement the online critic learning algorithm, enabling real-time adjustment of the tracking control policy to achieve optimal performance. Finally, an experimental plant with nonlinear characteristics is presented to illustrate the overall performance of the proposed online self-triggered tracking control strategy.

{"title":"Self-triggered neural tracking control for discrete-time nonlinear systems via adaptive critic learning","authors":"Lingzhi Hu, Ding Wang, Gongming Wang, Junfei Qiao","doi":"10.1016/j.neunet.2025.107280","DOIUrl":"10.1016/j.neunet.2025.107280","url":null,"abstract":"<div><div>In this paper, a novel self-triggered optimal tracking control method is developed based on the online action–critic technique for discrete-time nonlinear systems. First, an augmented plant is constructed by integrating the system state with the reference trajectory. This transformation redefines the optimal tracking control design as the optimal regulation issue of the reconstructed nonlinear error system. Subsequently, under the premise of ensuring the controlled system stability, a self-sampling function that depends solely on the sampling tracking error is devised, thereby determining the next triggering instant. This approach not only effectively reduces the computational burden but also eliminates the need for continuous evaluation of the triggering condition, as required in traditional event-based methods. Furthermore, the developed control method can be found to possess excellent triggering performance. The model, critic, and action neural networks are constructed to implement the online critic learning algorithm, enabling real-time adjustment of the tracking control policy to achieve optimal performance. Finally, an experimental plant with nonlinear characteristics is presented to illustrate the overall performance of the proposed online self-triggered tracking control strategy.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107280"},"PeriodicalIF":6.0,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143509126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0