Graph neural networks (GNNs) have become a popular approach for semi-supervised graph representation learning. GNNs research has generally focused on improving methodological details, whereas less attention has been paid to exploring the importance of labeling the data. However, for semi-supervised learning, the quality of training data is vital. In this paper, we first introduce and elaborate on the problem of training data selection for GNNs. More specifically, focusing on node classification, we aim to select representative nodes from a graph used to train GNNs to achieve the best performance. To solve this problem, we are inspired by the popular lottery ticket hypothesis, typically used for sparse architectures, and we propose the following subset hypothesis for graph data: "There exists a core subset when selecting a fixed-size dataset from the dense training dataset, that can represent the properties of the dataset, and GNNs trained on this core subset can achieve a better graph representation". Equipped with this subset hypothesis, we present an efficient algorithm to identify the core data in the graph for GNNs. Extensive experiments demonstrate that the selected data (as a training set) can obtain performance improvements across various datasets and GNNs architectures.
{"title":"Finding core labels for maximizing generalization of graph neural networks.","authors":"Sichao Fu, Xueqi Ma, Yibing Zhan, Fanyu You, Qinmu Peng, Tongliang Liu, James Bailey, Danilo Mandic","doi":"10.1016/j.neunet.2024.106635","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.106635","url":null,"abstract":"<p><p>Graph neural networks (GNNs) have become a popular approach for semi-supervised graph representation learning. GNNs research has generally focused on improving methodological details, whereas less attention has been paid to exploring the importance of labeling the data. However, for semi-supervised learning, the quality of training data is vital. In this paper, we first introduce and elaborate on the problem of training data selection for GNNs. More specifically, focusing on node classification, we aim to select representative nodes from a graph used to train GNNs to achieve the best performance. To solve this problem, we are inspired by the popular lottery ticket hypothesis, typically used for sparse architectures, and we propose the following subset hypothesis for graph data: \"There exists a core subset when selecting a fixed-size dataset from the dense training dataset, that can represent the properties of the dataset, and GNNs trained on this core subset can achieve a better graph representation\". Equipped with this subset hypothesis, we present an efficient algorithm to identify the core data in the graph for GNNs. Extensive experiments demonstrate that the selected data (as a training set) can obtain performance improvements across various datasets and GNNs architectures.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1016/j.neunet.2024.106636
DeepFake detection is pivotal in personal privacy and public safety. With the iterative advancement of DeepFake techniques, high-quality forged videos and images are becoming increasingly deceptive. Prior research has seen numerous attempts by scholars to incorporate biometric features into the field of DeepFake detection. However, traditional biometric-based approaches tend to segregate biometric features from general ones and freeze the biometric feature extractor. These approaches resulted in the exclusion of valuable general features, potentially leading to a performance decline and, consequently, a failure to fully exploit the potential of biometric information in assisting DeepFake detection. Moreover, insufficient attention has been dedicated to scrutinizing gaze authenticity within the realm of DeepFake detection in recent years. In this paper, we introduce GazeForensics, an innovative DeepFake detection method that utilizes gaze representation obtained from a 3D gaze estimation model to regularize the corresponding representation within our DeepFake detection model, while concurrently integrating general features to further enhance the performance of our model. Experimental results demonstrate that our proposed GazeForensics method performs admirably in terms of performance and exhibits excellent interpretability.
{"title":"GazeForensics: DeepFake detection via gaze-guided spatial inconsistency learning","authors":"","doi":"10.1016/j.neunet.2024.106636","DOIUrl":"10.1016/j.neunet.2024.106636","url":null,"abstract":"<div><p>DeepFake detection is pivotal in personal privacy and public safety. With the iterative advancement of DeepFake techniques, high-quality forged videos and images are becoming increasingly deceptive. Prior research has seen numerous attempts by scholars to incorporate biometric features into the field of DeepFake detection. However, traditional biometric-based approaches tend to segregate biometric features from general ones and freeze the biometric feature extractor. These approaches resulted in the exclusion of valuable general features, potentially leading to a performance decline and, consequently, a failure to fully exploit the potential of biometric information in assisting DeepFake detection. Moreover, insufficient attention has been dedicated to scrutinizing gaze authenticity within the realm of DeepFake detection in recent years. In this paper, we introduce <em>GazeForensics</em>, an innovative DeepFake detection method that utilizes gaze representation obtained from a 3D gaze estimation model to regularize the corresponding representation within our DeepFake detection model, while concurrently integrating general features to further enhance the performance of our model. Experimental results demonstrate that our proposed <em>GazeForensics</em> method performs admirably in terms of performance and exhibits excellent interpretability.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1016/j.neunet.2024.106631
Parameter efficient transfer learning (PETL) methods provide an efficient alternative for fine-tuning. However, typical PETL methods inject the same structures to all Pre-trained Language Model (PLM) layers and only use the final hidden states for downstream tasks, regardless of the knowledge diversity across PLM layers. Additionally, the backpropagation path of existing PETL methods still passes through the frozen PLM during training, which is computational and memory inefficient. In this paper, we propose FLAT, a generic PETL method that explicitly and individually combines knowledge across all PLM layers based on the tokens to perform a better transferring. FLAT considers the backbone PLM as a feature extractor and combines the features in a side-network, hence the backpropagation does not involve the PLM, which results in much less memory requirement than previous methods. The results on the GLUE benchmark show that FLAT outperforms other tuning techniques in the low-resource scenarios and achieves on-par performance in the high-resource scenarios with only 0.53% trainable parameters per task and less GPU memory usagewith BERT. Besides, further ablation study is conducted to reveal that the proposed fusion layer effectively combines knowledge from PLM and helps the classifier to exploit the PLM knowledge to downstream tasks. We will release our code for better reproducibility.
{"title":"FLAT: Fusing layer representations for more efficient transfer learning in NLP","authors":"","doi":"10.1016/j.neunet.2024.106631","DOIUrl":"10.1016/j.neunet.2024.106631","url":null,"abstract":"<div><p>Parameter efficient transfer learning (PETL) methods provide an efficient alternative for fine-tuning. However, typical PETL methods inject the same structures to all Pre-trained Language Model (PLM) layers and only use the final hidden states for downstream tasks, regardless of the knowledge diversity across PLM layers. Additionally, the backpropagation path of existing PETL methods still passes through the frozen PLM during training, which is computational and memory inefficient. In this paper, we propose FLAT, a generic PETL method that explicitly and individually combines knowledge across all PLM layers based on the tokens to perform a better transferring. FLAT considers the backbone PLM as a feature extractor and combines the features in a side-network, hence the backpropagation does not involve the PLM, which results in much less memory requirement than previous methods. The results on the GLUE benchmark show that FLAT outperforms other tuning techniques in the low-resource scenarios and achieves on-par performance in the high-resource scenarios with only 0.53% trainable parameters per task and <span><math><mrow><mn>3</mn><mo>.</mo><mn>2</mn><mo>×</mo></mrow></math></span> less GPU memory usagewith BERT<span><math><msub><mrow></mrow><mrow><mtext>base</mtext></mrow></msub></math></span>. Besides, further ablation study is conducted to reveal that the proposed fusion layer effectively combines knowledge from PLM and helps the classifier to exploit the PLM knowledge to downstream tasks. We will release our code for better reproducibility.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142002181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1016/j.neunet.2024.106624
Emotion recognition is an essential but challenging task in human–computer interaction systems due to the distinctive spatial structures and dynamic temporal dependencies associated with each emotion. However, current approaches fail to accurately capture the intricate effects of electroencephalogram (EEG) signals across different brain regions on emotion recognition. Therefore, this paper designs a transformer-based method, denoted by R2G-STLT, which relies on a spatial–temporal transformer encoder with regional to global hierarchical learning that learns the representative spatiotemporal features from the electrode level to the brain-region level. The regional spatial–temporal transformer (RST-Trans) encoder is designed to obtain spatial information and context dependence at the electrode level aiming to learn the regional spatiotemporal features. Then, the global spatial–temporal transformer (GST-Trans) encoder is utilized to extract reliable global spatiotemporal features, reflecting the impact of various brain regions on emotion recognition tasks. Moreover, the multi-head attention mechanism is placed into the GST-Trans encoder to empower it to capture the long-range spatial–temporal information among the brain regions. Finally, subject-independent experiments are conducted on each frequency band of the DEAP, SEED, and SEED-IV datasets to assess the performance of the proposed model. Results indicate that the R2G-STLT model surpasses several state-of-the-art approaches.
{"title":"Emotion recognition using hierarchical spatial–temporal learning transformer from regional to global brain","authors":"","doi":"10.1016/j.neunet.2024.106624","DOIUrl":"10.1016/j.neunet.2024.106624","url":null,"abstract":"<div><p>Emotion recognition is an essential but challenging task in human–computer interaction systems due to the distinctive spatial structures and dynamic temporal dependencies associated with each emotion. However, current approaches fail to accurately capture the intricate effects of electroencephalogram (EEG) signals across different brain regions on emotion recognition. Therefore, this paper designs a transformer-based method, denoted by R2G-STLT, which relies on a spatial–temporal transformer encoder with regional to global hierarchical learning that learns the representative spatiotemporal features from the electrode level to the brain-region level. The regional spatial–temporal transformer (RST-Trans) encoder is designed to obtain spatial information and context dependence at the electrode level aiming to learn the regional spatiotemporal features. Then, the global spatial–temporal transformer (GST-Trans) encoder is utilized to extract reliable global spatiotemporal features, reflecting the impact of various brain regions on emotion recognition tasks. Moreover, the multi-head attention mechanism is placed into the GST-Trans encoder to empower it to capture the long-range spatial–temporal information among the brain regions. Finally, subject-independent experiments are conducted on each frequency band of the DEAP, SEED, and SEED-IV datasets to assess the performance of the proposed model. Results indicate that the R2G-STLT model surpasses several state-of-the-art approaches.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1016/j.neunet.2024.106632
The universal approximation theorem states that a neural network with one hidden layer can approximate continuous functions on compact sets with any desired precision. This theorem supports using neural networks for various applications, including regression and classification tasks. Furthermore, it is valid for real-valued neural networks and some hypercomplex-valued neural networks such as complex-, quaternion-, tessarine-, and Clifford-valued neural networks. However, hypercomplex-valued neural networks are a type of vector-valued neural network defined on an algebra with additional algebraic or geometric properties. This paper extends the universal approximation theorem for a wide range of vector-valued neural networks, including hypercomplex-valued models as particular instances. Precisely, we introduce the concept of non-degenerate algebra and state the universal approximation theorem for neural networks defined on such algebras.
{"title":"Universal approximation theorem for vector- and hypercomplex-valued neural networks","authors":"","doi":"10.1016/j.neunet.2024.106632","DOIUrl":"10.1016/j.neunet.2024.106632","url":null,"abstract":"<div><p>The universal approximation theorem states that a neural network with one hidden layer can approximate continuous functions on compact sets with any desired precision. This theorem supports using neural networks for various applications, including regression and classification tasks. Furthermore, it is valid for real-valued neural networks and some hypercomplex-valued neural networks such as complex-, quaternion-, tessarine-, and Clifford-valued neural networks. However, hypercomplex-valued neural networks are a type of vector-valued neural network defined on an algebra with additional algebraic or geometric properties. This paper extends the universal approximation theorem for a wide range of vector-valued neural networks, including hypercomplex-valued models as particular instances. Precisely, we introduce the concept of non-degenerate algebra and state the universal approximation theorem for neural networks defined on such algebras.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1016/j.neunet.2024.106619
This paper introduces a novel approach to learn multi-task regression models with constrained architecture complexity. The proposed model, named RFF-BLR, consists of a randomised feedforward neural network with two fundamental characteristics: a single hidden layer whose units implement the random Fourier features that approximate an RBF kernel, and a Bayesian formulation that optimises the weights connecting the hidden and output layers. The RFF-based hidden layer inherits the robustness of kernel methods. The Bayesian formulation enables promoting multioutput sparsity: all tasks interplay during the optimisation to select a compact subset of the hidden layer units that serve as common non-linear mapping for every tasks. The experimental results show that the RFF-BLR framework can lead to significant performance improvements compared to the state-of-the-art methods in multitask nonlinear regression, especially in small-sized training dataset scenarios.
{"title":"Bayesian learning of feature spaces for multitask regression","authors":"","doi":"10.1016/j.neunet.2024.106619","DOIUrl":"10.1016/j.neunet.2024.106619","url":null,"abstract":"<div><p>This paper introduces a novel approach to learn multi-task regression models with constrained architecture complexity. The proposed model, named RFF-BLR, consists of a randomised feedforward neural network with two fundamental characteristics: a single hidden layer whose units implement the random Fourier features that approximate an RBF kernel, and a Bayesian formulation that optimises the weights connecting the hidden and output layers. The RFF-based hidden layer inherits the robustness of kernel methods. The Bayesian formulation enables promoting multioutput sparsity: all tasks interplay during the optimisation to select a compact subset of the hidden layer units that serve as common non-linear mapping for every tasks. The experimental results show that the RFF-BLR framework can lead to significant performance improvements compared to the state-of-the-art methods in multitask nonlinear regression, especially in small-sized training dataset scenarios.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0893608024005434/pdfft?md5=b5ec56e8de25c78b0d2793417101b953&pid=1-s2.0-S0893608024005434-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1016/j.neunet.2024.106626
Recently, point cloud domain adaptation (DA) practices have been implemented to improve the generalization ability of deep learning models on point cloud data. However, variations across domains often result in decreased performance of models trained on different distributed data sources. Previous studies have focused on output-level domain alignment to address this challenge. But this approach may increase the amount of errors experienced when aligning different domains, particularly for targets that would otherwise be predicted incorrectly. Therefore, in this study, we propose an input-level discretization-based matching to enhance the generalization ability of DA. Specifically, an efficient geometric deformation depth decoupling network (3DeNet) is implemented to learn the knowledge from the source domain and embed it into an implicit feature space, which facilitates the effective constraint of unsupervised predictions for downstream tasks. Secondly, we demonstrate that the sparsity within the implicit feature space varies between domains, rendering domain differences difficult to support. Consequently, we match sets of neighboring points with different densities and biases by differentiating the adaptive densities. Finally, inter-domain differences are aligned by constraining the loss originating from and between the target domains. We conduct experiments on point cloud DA datasets PointDA-10 and PointSegDA, achieving advanced results (over 1.2% and 1% on average).
最近,为了提高深度学习模型在点云数据上的泛化能力,人们实施了点云领域适应(DA)实践。然而,不同领域之间的差异往往会导致在不同分布式数据源上训练的模型性能下降。以往的研究主要通过输出级域对齐来解决这一难题。但这种方法可能会增加不同域对齐时的误差,特别是对于那些本来会被错误预测的目标。因此,在本研究中,我们提出了一种基于输入级离散化的匹配方法,以增强 DA 的泛化能力。具体来说,我们采用了一种高效的几何形变深度解耦网络(3DeNet)来学习源域的知识,并将其嵌入到隐式特征空间中,从而为下游任务的无监督预测提供了有效的约束。其次,我们证明了隐式特征空间内的稀疏性因领域而异,导致难以支持领域差异。因此,我们通过区分自适应密度来匹配具有不同密度和偏差的相邻点集。最后,通过限制来自目标域和目标域之间的损失来调整域间差异。我们在点云数据集 PointDA-10 和 PointSegDA 上进行了实验,取得了先进的结果(平均超过 1.2% 和 1%)。
{"title":"Deformation depth decoupling network for point cloud domain adaptation","authors":"","doi":"10.1016/j.neunet.2024.106626","DOIUrl":"10.1016/j.neunet.2024.106626","url":null,"abstract":"<div><p>Recently, point cloud domain adaptation (DA) practices have been implemented to improve the generalization ability of deep learning models on point cloud data. However, variations across domains often result in decreased performance of models trained on different distributed data sources. Previous studies have focused on output-level domain alignment to address this challenge. But this approach may increase the amount of errors experienced when aligning different domains, particularly for targets that would otherwise be predicted incorrectly. Therefore, in this study, we propose an input-level discretization-based matching to enhance the generalization ability of DA. Specifically, an efficient geometric deformation depth decoupling network (3DeNet) is implemented to learn the knowledge from the source domain and embed it into an implicit feature space, which facilitates the effective constraint of unsupervised predictions for downstream tasks. Secondly, we demonstrate that the sparsity within the implicit feature space varies between domains, rendering domain differences difficult to support. Consequently, we match sets of neighboring points with different densities and biases by differentiating the adaptive densities. Finally, inter-domain differences are aligned by constraining the loss originating from and between the target domains. We conduct experiments on point cloud DA datasets PointDA-10 and PointSegDA, achieving advanced results (over 1.2% and 1% on average).</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1016/j.neunet.2024.106629
Domain Generalization (DG) focuses on the Out-Of-Distribution (OOD) generalization, which is able to learn a robust model that generalizes the knowledge acquired from the source domain to the unseen target domain. However, due to the existence of the domain shift, domain-invariant representation learning is challenging. Guided by fine-grained knowledge, we propose a novel paradigm Mask-Shift-Inference (MSI) for DG based on the architecture of Convolutional Neural Networks (CNN). Different from relying on a series of constraints and assumptions for model optimization, this paradigm novelly shifts the focus to feature channels in the latent space for domain-invariant representation learning. We put forward a two-branch working mode of a main module and multiple domain-specific sub-modules. The latter can only achieve good prediction performance in its own specific domain but poor predictions in other source domains, which provides the main module with the fine-grained knowledge guidance and contributes to the improvement of the cognitive ability of MSI. Firstly, during the forward propagation of the main module, the proposed MSI accurately discards unstable channels based on spurious classifications varying across domains, which have domain-specific prediction limitations and are not conducive to generalization. In this process, a progressive scheme is adopted to adaptively increase the masking ratio according to the training progress to further reduce the risk of overfitting. Subsequently, our paradigm enters the compatible shifting stage before the formal prediction. Based on maximizing semantic retention, we implement the domain style matching and shifting through the simple transformation through Fourier transform, which can explicitly and safely shift the target domain back to the source domain whose style is closest to it, requiring no additional model updates and reducing the domain gap. Eventually, the paradigm MSI enters the formal inference stage. The updated target domain is predicted in the main module trained in the previous stage with the benefit of familiar knowledge from the nearest source domain masking scheme. Our paradigm is logically progressive, which can intuitively exclude the confounding influence of domain-specific spurious information along with mitigating domain shifts and implicitly perform semantically invariant representation learning, achieving robust OOD generalization. Extensive experimental results on PACS, VLCS, Office-Home and DomainNet datasets verify the superiority and effectiveness of the proposed method.
{"title":"Mask-Shift-Inference: A novel paradigm for domain generalization","authors":"","doi":"10.1016/j.neunet.2024.106629","DOIUrl":"10.1016/j.neunet.2024.106629","url":null,"abstract":"<div><p>Domain Generalization (DG) focuses on the Out-Of-Distribution (OOD) generalization, which is able to learn a robust model that generalizes the knowledge acquired from the source domain to the unseen target domain. However, due to the existence of the domain shift, domain-invariant representation learning is challenging. Guided by fine-grained knowledge, we propose a novel paradigm Mask-Shift-Inference (MSI) for DG based on the architecture of Convolutional Neural Networks (CNN). Different from relying on a series of constraints and assumptions for model optimization, this paradigm novelly shifts the focus to feature channels in the latent space for domain-invariant representation learning. We put forward a two-branch working mode of a main module and multiple domain-specific sub-modules. The latter can only achieve good prediction performance in its own specific domain but poor predictions in other source domains, which provides the main module with the fine-grained knowledge guidance and contributes to the improvement of the cognitive ability of MSI. Firstly, during the forward propagation of the main module, the proposed MSI accurately discards unstable channels based on spurious classifications varying across domains, which have domain-specific prediction limitations and are not conducive to generalization. In this process, a progressive scheme is adopted to adaptively increase the masking ratio according to the training progress to further reduce the risk of overfitting. Subsequently, our paradigm enters the compatible shifting stage before the formal prediction. Based on maximizing semantic retention, we implement the domain style matching and shifting through the simple transformation through Fourier transform, which can explicitly and safely shift the target domain back to the source domain whose style is closest to it, requiring no additional model updates and reducing the domain gap. Eventually, the paradigm MSI enters the formal inference stage. The updated target domain is predicted in the main module trained in the previous stage with the benefit of familiar knowledge from the nearest source domain masking scheme. Our paradigm is logically progressive, which can intuitively exclude the confounding influence of domain-specific spurious information along with mitigating domain shifts and implicitly perform semantically invariant representation learning, achieving robust OOD generalization. Extensive experimental results on PACS, VLCS, Office-Home and DomainNet datasets verify the superiority and effectiveness of the proposed method.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1016/j.neunet.2024.106625
In this paper, a smoothing approximation-based adaptive neurodynamic approach is proposed for a nonsmooth resource allocation problem (NRAP) with multiple constraints. The smoothing approximation method is combined with multi-agent systems to avoid the introduction of set-valued subgradient terms, thereby facilitating the practical implementation of the neurodynamic approach. In addition, using the adaptive penalty technique, private inequality constraints are processed, which eliminates the need for additional quantitative estimation of penalty parameters and significantly reduces the computational cost. Moreover, to reduce the impact of smoothing approximation on the convergence of the neurodynamic approach, time-varying control parameters are introduced. Due to the parallel computing characteristics of multi-agent systems, the neurodynamic approach proposed in this paper is completely distributed. Theoretical proof shows that the state solution of the neurodynamic approach converges to the optimal solution of NRAP. Finally, two application examples are used to validate the feasibility of the neurodynamic approach.
{"title":"A smoothing approximation-based adaptive neurodynamic approach for nonsmooth resource allocation problem","authors":"","doi":"10.1016/j.neunet.2024.106625","DOIUrl":"10.1016/j.neunet.2024.106625","url":null,"abstract":"<div><p>In this paper, a smoothing approximation-based adaptive neurodynamic approach is proposed for a nonsmooth resource allocation problem (NRAP) with multiple constraints. The smoothing approximation method is combined with multi-agent systems to avoid the introduction of set-valued subgradient terms, thereby facilitating the practical implementation of the neurodynamic approach. In addition, using the adaptive penalty technique, private inequality constraints are processed, which eliminates the need for additional quantitative estimation of penalty parameters and significantly reduces the computational cost. Moreover, to reduce the impact of smoothing approximation on the convergence of the neurodynamic approach, time-varying control parameters are introduced. Due to the parallel computing characteristics of multi-agent systems, the neurodynamic approach proposed in this paper is completely distributed. Theoretical proof shows that the state solution of the neurodynamic approach converges to the optimal solution of NRAP. Finally, two application examples are used to validate the feasibility of the neurodynamic approach.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142012446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-10DOI: 10.1016/j.neunet.2024.106628
Dictionary learning is an important sparse representation algorithm which has been widely used in machine learning and artificial intelligence. However, for massive data in the big data era, classical dictionary learning algorithms are computationally expensive and even can be infeasible. To overcome this difficulty, we propose new dictionary learning methods based on randomized algorithms. The contributions of this work are as follows. First, we find that dictionary matrix is often numerically low-rank. Based on this property, we apply randomized singular value decomposition (RSVD) to the dictionary matrix, and propose a randomized algorithm for linear dictionary learning. Compared with the classical K-SVD algorithm, an advantage is that one can update all the elements of the dictionary matrix simultaneously. Second, to the best of our knowledge, there are few theoretical results on why one can solve the involved matrix computation problems inexactly in dictionary learning. To fill-in this gap, we show the rationality of this randomized algorithm with inexact solving, from a matrix perturbation analysis point of view. Third, based on the numerically low-rank property and Nyström approximation of the kernel matrix, we propose a randomized kernel dictionary learning algorithm, and establish the distance between the exact solution and the computed solution, to show the effectiveness of the proposed randomized kernel dictionary learning algorithm. Fourth, we propose an efficient scheme for the testing stage in kernel dictionary learning. By using this strategy, there is no need to form nor store kernel matrices explicitly both in the training and the testing stages. Comprehensive numerical experiments are performed on some real-world data sets. Numerical results demonstrate the rationality of our strategies, and show that the proposed algorithms are much efficient than some state-of-the-art dictionary learning algorithms. The MATLAB codes of the proposed algorithms are publicly available from https://github.com/Jiali-yang/RALDL_RAKDL.
{"title":"Randomized algorithms for large-scale dictionary learning","authors":"","doi":"10.1016/j.neunet.2024.106628","DOIUrl":"10.1016/j.neunet.2024.106628","url":null,"abstract":"<div><p>Dictionary learning is an important sparse representation algorithm which has been widely used in machine learning and artificial intelligence. However, for massive data in the big data era, classical dictionary learning algorithms are computationally expensive and even can be infeasible. To overcome this difficulty, we propose new dictionary learning methods based on randomized algorithms. The contributions of this work are as follows. First, we find that dictionary matrix is often numerically low-rank. Based on this property, we apply randomized singular value decomposition (RSVD) to the dictionary matrix, and propose a randomized algorithm for linear dictionary learning. Compared with the classical K-SVD algorithm, an advantage is that one can update all the elements of the dictionary matrix simultaneously. Second, to the best of our knowledge, there are few theoretical results on why one can solve the involved matrix computation problems <em>inexactly</em> in dictionary learning. To fill-in this gap, we show the rationality of this randomized algorithm with inexact solving, from a matrix perturbation analysis point of view. Third, based on the numerically low-rank property and Nyström approximation of the kernel matrix, we propose a randomized kernel dictionary learning algorithm, and establish the distance between the exact solution and the computed solution, to show the effectiveness of the proposed randomized kernel dictionary learning algorithm. Fourth, we propose an efficient scheme for the testing stage in kernel dictionary learning. By using this strategy, there is no need to form nor store kernel matrices explicitly both in the training and the testing stages. Comprehensive numerical experiments are performed on some real-world data sets. Numerical results demonstrate the rationality of our strategies, and show that the proposed algorithms are much efficient than some state-of-the-art dictionary learning algorithms. The MATLAB codes of the proposed algorithms are publicly available from <span><span>https://github.com/Jiali-yang/RALDL_RAKDL</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142012445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}