Pub Date : 2024-08-28DOI: 10.1016/j.neunet.2024.106665
In brain-computer interface (BCI), building accurate electroencephalogram (EEG) classifiers for specific mental tasks is critical for BCI performance. The classifiers are developed by machine learning (ML) and deep learning (DL) techniques, requiring a large dataset for training to build reliable and accurate models. However, collecting large enough EEG datasets is difficult due to intra-/inter-subject variabilities and experimental costs. This leads to the data scarcity problem, which causes overfitting issues to training samples, resulting in reducing generalization performance. To solve the EEG data scarcity problem and improve the performance of the EEG classifiers, we propose a novel EEG data augmentation (DA) framework using conditional generative adversarial networks (cGANs). An experimental study is implemented with two public EEG datasets, including motor imagery (MI) tasks (BCI competition IV IIa and III IVa), to validate the effectiveness of the proposed EEG DA method for the EEG classifiers. To evaluate the proposed cGAN-based DA method, we tested eight EEG classifiers for the experiment, including traditional MLs and state-of-the-art DLs with three existing EEG DA methods. Experimental results showed that most DA methods with proper DA proportion in the training dataset had higher classification performances than without DA. Moreover, applying the proposed DA method showed superior classification performance improvement than the other DA methods. This shows that the proposed method is a promising EEG DA method for enhancing the performances of the EEG classifiers in MI-based BCIs.
在脑机接口(BCI)中,为特定的心理任务建立精确的脑电图(EEG)分类器对于BCI的性能至关重要。分类器是通过机器学习(ML)和深度学习(DL)技术开发的,需要大量数据集进行训练,以建立可靠、准确的模型。然而,由于受试者内/受试者间的差异和实验成本,很难收集到足够大的脑电图数据集。这就导致了数据稀缺问题,造成训练样本的过度拟合问题,从而降低泛化性能。为了解决脑电图数据稀缺问题并提高脑电图分类器的性能,我们提出了一种使用条件生成对抗网络(cGANs)的新型脑电图数据增强(DA)框架。我们利用两个公共脑电图数据集(包括运动图像(MI)任务(BCI 竞赛 IV IIa 和 III IVa))进行了实验研究,以验证所提出的脑电图数据增强方法对脑电图分类器的有效性。为了评估所提出的基于 cGAN 的 DA 方法,我们在实验中测试了八种脑电图分类器,包括传统的 ML 和最先进的 DL,以及三种现有的脑电图 DA 方法。实验结果表明,大多数 DA 方法在训练数据集中采用适当的 DA 比例后,其分类性能均高于未采用 DA 的方法。此外,与其他 DA 方法相比,应用所提出的 DA 方法能显著提高分类性能。这表明所提出的方法是一种很有前途的脑电图 DA 方法,可用于提高基于 MI 的 BCI 中脑电图分类器的性能。
{"title":"Improving classification performance of motor imagery BCI through EEG data augmentation with conditional generative adversarial networks","authors":"","doi":"10.1016/j.neunet.2024.106665","DOIUrl":"10.1016/j.neunet.2024.106665","url":null,"abstract":"<div><p>In brain-computer interface (BCI), building accurate electroencephalogram (EEG) classifiers for specific mental tasks is critical for BCI performance. The classifiers are developed by machine learning (ML) and deep learning (DL) techniques, requiring a large dataset for training to build reliable and accurate models. However, collecting large enough EEG datasets is difficult due to intra-/inter-subject variabilities and experimental costs. This leads to the data scarcity problem, which causes overfitting issues to training samples, resulting in reducing generalization performance. To solve the EEG data scarcity problem and improve the performance of the EEG classifiers, we propose a novel EEG data augmentation (DA) framework using conditional generative adversarial networks (cGANs). An experimental study is implemented with two public EEG datasets, including motor imagery (MI) tasks (BCI competition IV IIa and III IVa), to validate the effectiveness of the proposed EEG DA method for the EEG classifiers. To evaluate the proposed cGAN-based DA method, we tested eight EEG classifiers for the experiment, including traditional MLs and state-of-the-art DLs with three existing EEG DA methods. Experimental results showed that most DA methods with proper DA proportion in the training dataset had higher classification performances than without DA. Moreover, applying the proposed DA method showed superior classification performance improvement than the other DA methods. This shows that the proposed method is a promising EEG DA method for enhancing the performances of the EEG classifiers in MI-based BCIs.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142146650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-27DOI: 10.1016/j.neunet.2024.106664
Complex-valued convolutional neural networks (CVCNNs) have been demonstrated effectiveness in classifying complex signals and synthetic aperture radar (SAR) images. However, due to the introduction of complex-valued parameters, CVCNNs tend to become redundant with heavy floating-point operations. Model sparsity is emerged as an efficient method of removing the redundancy without much loss of performance. Currently, there are few studies on the sparsity problem of CVCNNs. Therefore, a complex-valued soft-log threshold reweighting (CV-SLTR) algorithm is proposed for the design of sparse CVCNN to reduce the number of weight parameters and simplify the structure of CVCNN. On one hand, considering the difference between complex and real numbers, we redefine and derive the complex-valued log-sum threshold method. On the other hand, by considering the distinctive characteristics of complex-valued convolutional (CConv) layers and complex-valued fully connected (CFC) layers of CVCNNs, the complex-valued soft and log-sum threshold methods are respectively developed to prune the weights of different layers during the forward propagation, and the sparsity thresholds are optimized during the backward propagation by inducing a sparsity budget. Furthermore, different optimizers can be integrated with CV-SLTR. When stochastic gradient descent (SGD) is used, the convergence of CV-SLTR is proved if Lipschitzian continuity is satisfied. Experiments on the RadioML 2016.10A and S1SLC-CVDL datasets show that the proposed algorithm is efficient for the sparsity of CVCNNs. It is worth noting that the proposed algorithm has fast sparsity speed while maintaining high classification accuracy. These demonstrate the feasibility and potential of the CV-SLTR algorithm.
{"title":"Complex-valued soft-log threshold reweighting for sparsity of complex-valued convolutional neural networks","authors":"","doi":"10.1016/j.neunet.2024.106664","DOIUrl":"10.1016/j.neunet.2024.106664","url":null,"abstract":"<div><p>Complex-valued convolutional neural networks (CVCNNs) have been demonstrated effectiveness in classifying complex signals and synthetic aperture radar (SAR) images. However, due to the introduction of complex-valued parameters, CVCNNs tend to become redundant with heavy floating-point operations. Model sparsity is emerged as an efficient method of removing the redundancy without much loss of performance. Currently, there are few studies on the sparsity problem of CVCNNs. Therefore, a complex-valued soft-log threshold reweighting (CV-SLTR) algorithm is proposed for the design of sparse CVCNN to reduce the number of weight parameters and simplify the structure of CVCNN. On one hand, considering the difference between complex and real numbers, we redefine and derive the complex-valued log-sum threshold method. On the other hand, by considering the distinctive characteristics of complex-valued convolutional (CConv) layers and complex-valued fully connected (CFC) layers of CVCNNs, the complex-valued soft and log-sum threshold methods are respectively developed to prune the weights of different layers during the forward propagation, and the sparsity thresholds are optimized during the backward propagation by inducing a sparsity budget. Furthermore, different optimizers can be integrated with CV-SLTR. When stochastic gradient descent (SGD) is used, the convergence of CV-SLTR is proved if Lipschitzian continuity is satisfied. Experiments on the RadioML 2016.10A and S1SLC-CVDL datasets show that the proposed algorithm is efficient for the sparsity of CVCNNs. It is worth noting that the proposed algorithm has fast sparsity speed while maintaining high classification accuracy. These demonstrate the feasibility and potential of the CV-SLTR algorithm.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142096749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-27DOI: 10.1016/j.neunet.2024.106659
Domain adaptation on time-series data, which is often encountered in the field of industry, like anomaly detection and sensor data forecasting, but received limited attention in academia, is an important but challenging task in real-world scenarios. Most of the existing methods for time-series data use the covariate shift assumption for non-time-series data to extract the domain-invariant representation, but this assumption is hard to meet in practice due to the complex dependence among variables and a small change of the time lags may lead to a huge change of future values. To address this challenge, we leverage the stableness of causal structures among different domains. To further avoid the strong assumptions in causal discovery like linear non-Gaussian assumption, we relax it to mine the stable sparse associative structures instead of discovering the causal structures directly. Besides the domain-invariant structures, we also find that some domain-specific information like the strengths of the structures is important for prediction. Based on the aforementioned intuition, we extend the sparse associative structure alignment model in the conference version to the Sparse Associative Structure Alignment model with domain-specific information enhancement (SASA2 in short), which aligns the invariant unweighted spare associative structures and considers the variant information for time-series unsupervised domain adaptation. Specifically, we first generate the segment set to exclude the obstacle of offsets. Second, we extract the unweighted sparse associative structures via sparse attention mechanisms. Third, we extract the domain-specific information via an autoregressive module. Finally, we employ a unidirectional alignment restriction to guide the transformation from the source to the target. Moreover, we further provide a generalization analysis to show the theoretical superiority of our method. Compared with existing methods, our method yields state-of-the-art performance, with a 5% relative improvement in three real-world datasets, covering different applications: air quality, in-hospital healthcare, and anomaly detection. Furthermore, visualization results of sparse associative structures illustrate what knowledge can be transferred, boosting the transparency and interpretability of our method.
{"title":"Time-series domain adaptation via sparse associative structure alignment: Learning invariance and variance","authors":"","doi":"10.1016/j.neunet.2024.106659","DOIUrl":"10.1016/j.neunet.2024.106659","url":null,"abstract":"<div><p>Domain adaptation on time-series data, which is often encountered in the field of industry, like anomaly detection and sensor data forecasting, but received limited attention in academia, is an important but challenging task in real-world scenarios. Most of the existing methods for time-series data use the covariate shift assumption for non-time-series data to extract the domain-invariant representation, but this assumption is hard to meet in practice due to the complex dependence among variables and a small change of the time lags may lead to a huge change of future values. To address this challenge, we leverage the stableness of causal structures among different domains. To further avoid the strong assumptions in causal discovery like linear non-Gaussian assumption, we relax it to mine the stable sparse associative structures instead of discovering the causal structures directly. Besides the domain-invariant structures, we also find that some domain-specific information like the strengths of the structures is important for prediction. Based on the aforementioned intuition, we extend the sparse associative structure alignment model in the conference version to the <strong>S</strong>parse <strong>A</strong>ssociative <strong>S</strong>tructure <strong>A</strong>lignment model with domain-specific information enhancement (<strong>SASA2</strong> in short), which aligns the invariant unweighted spare associative structures and considers the variant information for time-series unsupervised domain adaptation. Specifically, we first generate the segment set to exclude the obstacle of offsets. Second, we extract the unweighted sparse associative structures via sparse attention mechanisms. Third, we extract the domain-specific information via an autoregressive module. Finally, we employ a unidirectional alignment restriction to guide the transformation from the source to the target. Moreover, we further provide a generalization analysis to show the theoretical superiority of our method. Compared with existing methods, our method yields state-of-the-art performance, with a 5% relative improvement in three real-world datasets, covering different applications: air quality, in-hospital healthcare, and anomaly detection. Furthermore, visualization results of sparse associative structures illustrate what knowledge can be transferred, boosting the transparency and interpretability of our method.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142096751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-26DOI: 10.1016/j.neunet.2024.106667
This paper addresses the tracking control problem of nonlinear discrete-time multi-agent systems (MASs). First, a local neighborhood error system (LNES) is constructed. Then, a novel tracking algorithm based on asynchronous iterative Q-learning (AIQL) is developed, which can transform the tracking problem into the optimal regulation of LNES. The AIQL-based algorithm has two Q values and for each agent , where is used for improving the control policy and is used for evaluating the value of the control policy. Moreover, the convergence of LNES is given. It is shown that the LNES converges to 0 and the tracking problem is solved. A neural network-based actor-critic framework is used to implement AIQL. The critic network of AIQL is composed of two neural networks, which are used for approximating and respectively. Finally, simulation results are given to verify the performance of the developed algorithm. It is shown that the AIQL-based tracking algorithm has a lower cost value and faster convergence speed than the IQL-based tracking algorithm.
{"title":"Asynchronous iterative Q-learning based tracking control for nonlinear discrete-time multi-agent systems","authors":"","doi":"10.1016/j.neunet.2024.106667","DOIUrl":"10.1016/j.neunet.2024.106667","url":null,"abstract":"<div><p>This paper addresses the tracking control problem of nonlinear discrete-time multi-agent systems (MASs). First, a local neighborhood error system (LNES) is constructed. Then, a novel tracking algorithm based on asynchronous iterative Q-learning (AIQL) is developed, which can transform the tracking problem into the optimal regulation of LNES. The AIQL-based algorithm has two Q values <span><math><msubsup><mrow><mi>Q</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>A</mi></mrow></msubsup></math></span> and <span><math><msubsup><mrow><mi>Q</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>B</mi></mrow></msubsup></math></span> for each agent <span><math><mi>i</mi></math></span>, where <span><math><msubsup><mrow><mi>Q</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>A</mi></mrow></msubsup></math></span> is used for improving the control policy and <span><math><msubsup><mrow><mi>Q</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>B</mi></mrow></msubsup></math></span> is used for evaluating the value of the control policy. Moreover, the convergence of LNES is given. It is shown that the LNES converges to 0 and the tracking problem is solved. A neural network-based actor-critic framework is used to implement AIQL. The critic network of AIQL is composed of two neural networks, which are used for approximating <span><math><msubsup><mrow><mi>Q</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>A</mi></mrow></msubsup></math></span> and <span><math><msubsup><mrow><mi>Q</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>B</mi></mrow></msubsup></math></span> respectively. Finally, simulation results are given to verify the performance of the developed algorithm. It is shown that the AIQL-based tracking algorithm has a lower cost value and faster convergence speed than the IQL-based tracking algorithm.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142096662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-26DOI: 10.1016/j.neunet.2024.106658
In this work, the exponential synchronization issue of stochastic complex networks with time delays and time-varying multi-links (SCNTM) is discussed via a novel aperiodic intermittent dynamic event-triggered control (AIDE-TC). The AIDE-TC is designed by combining intermittent control with an exponential function and dynamic event-triggered control, aiming to minimize the number of the required triggers. Then, based on the proposed control strategy, the sufficient conditions for exponential synchronization in mean square of SCNTM are obtained by adopting graph theoretic approach and Lyapunov function method. In the meanwhile, it is proven that the Zeno behavior can be excluded under the AIDE-TC, which ensures the feasibility of the control mechanism to realize the synchronization of SCNTM. Finally, we provide a numerical simulation on islanded microgrid systems to validate the effectiveness of main results and the simulation comparison results show that the AIDE-TC can reduce the number of event triggers.
{"title":"Aperiodic intermittent dynamic event-triggered synchronization control for stochastic delayed multi-links complex networks","authors":"","doi":"10.1016/j.neunet.2024.106658","DOIUrl":"10.1016/j.neunet.2024.106658","url":null,"abstract":"<div><p>In this work, the exponential synchronization issue of stochastic complex networks with time delays and time-varying multi-links (SCNTM) is discussed via a novel aperiodic intermittent dynamic event-triggered control (AIDE-TC). The AIDE-TC is designed by combining intermittent control with an exponential function and dynamic event-triggered control, aiming to minimize the number of the required triggers. Then, based on the proposed control strategy, the sufficient conditions for exponential synchronization in mean square of SCNTM are obtained by adopting graph theoretic approach and Lyapunov function method. In the meanwhile, it is proven that the Zeno behavior can be excluded under the AIDE-TC, which ensures the feasibility of the control mechanism to realize the synchronization of SCNTM. Finally, we provide a numerical simulation on islanded microgrid systems to validate the effectiveness of main results and the simulation comparison results show that the AIDE-TC can reduce the number of event triggers.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-24DOI: 10.1016/j.neunet.2024.106666
Grounding on a commonsense knowledge subgraph can help the model generate more informative and diverse dialogue responses. Prior Traverse-based works explicitly retrieve a subgraph from the external knowledge base (eKB). Notably, the available knowledge is strictly restricted by the eKB. To break this restriction, Generative Retrieval methods externalize knowledge from the language model. However, they always generate boring knowledge due to their one-pass externalization procedure. This work proposes a novel TiLM Traverse in Language Model (TiLM), which uses three ‘Chain-of-Thought’ sub-tasks, i.e., Query Entity Production, Topic Entity Prediction, and Knowledge Subgraph Completion, to build a high-quality knowledge subgraph to ground the next Response Generation without explicitly accessing the eKB in inference. Experimental results on both Chinese and English datasets demonstrate TiLM’s outstanding performance even only with a small scale of parameters.
以常识知识子图为基础可以帮助模型生成信息量更大、更多样化的对话回复。之前基于 Traverse 的工作明确地从外部知识库(eKB)中检索子图。值得注意的是,可用知识受到 eKB 的严格限制。为了打破这一限制,生成式检索方法将语言模型中的知识外部化。然而,由于其一次外部化过程,它们总是会产生无聊的知识。本研究提出了一种新颖的语言模型中的知识子图(TiLM Traverse in Language Model),它使用三个 "思维链 "子任务(即查询实体生成、主题实体预测和知识子图完成)来构建高质量的知识子图,以便为下一次响应生成奠定基础,而无需在推理中明确访问 eKB。在中英文数据集上的实验结果表明,TiLM即使只使用较小规模的参数也能表现出卓越的性能。
{"title":"Generative commonsense knowledge subgraph retrieval for open-domain dialogue response generation","authors":"","doi":"10.1016/j.neunet.2024.106666","DOIUrl":"10.1016/j.neunet.2024.106666","url":null,"abstract":"<div><p>Grounding on a commonsense knowledge subgraph can help the model generate more informative and diverse dialogue responses. Prior <em>Traverse-based</em> works explicitly retrieve a subgraph from the external knowledge base (eKB). Notably, the available knowledge is strictly restricted by the eKB. To break this restriction, <em>Generative Retrieval</em> methods externalize knowledge from the language model. However, they always generate boring knowledge due to their one-pass externalization procedure. This work proposes a novel TiLM <em>Traverse in Language Model (TiLM)</em>, which uses three ‘Chain-of-Thought’ sub-tasks, i.e., <em>Query Entity Production</em>, <em>Topic Entity Prediction</em>, and <em>Knowledge Subgraph Completion</em>, to build a high-quality knowledge subgraph to ground the next <em>Response Generation</em> without explicitly accessing the eKB in inference. Experimental results on both Chinese and English datasets demonstrate <em>TiLM</em>’s outstanding performance even only with a small scale of parameters.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142096664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-23DOI: 10.1016/j.neunet.2024.106651
Graph neural networks (GNNs) have achieved state-of-the-art performance in graph representation learning. Message passing neural networks, which learn representations through recursively aggregating information from each node and its neighbors, are among the most commonly-used GNNs. However, a wealth of structural information of individual nodes and full graphs is often ignored in such process, which restricts the expressive power of GNNs. Various graph data augmentation methods that enable the message passing with richer structure knowledge have been introduced as one main way to tackle this issue, but they are often focused on individual structure features and difficult to scale up with more structure features. In this work we propose a novel approach, namely collective structure knowledge-augmented graph neural network (CoS-GNN), in which a new message passing method is introduced to allow GNNs to harness a diverse set of node- and graph-level structure features, together with original node features/attributes, in augmented graphs. In doing so, our approach largely improves the structural knowledge modeling of GNNs in both node and graph levels, resulting in substantially improved graph representations. This is justified by extensive empirical results where CoS-GNN outperforms state-of-the-art models in various graph-level learning tasks, including graph classification, anomaly detection, and out-of-distribution generalization.
{"title":"Harnessing collective structure knowledge in data augmentation for graph neural networks","authors":"","doi":"10.1016/j.neunet.2024.106651","DOIUrl":"10.1016/j.neunet.2024.106651","url":null,"abstract":"<div><p>Graph neural networks (GNNs) have achieved state-of-the-art performance in graph representation learning. Message passing neural networks, which learn representations through recursively aggregating information from each node and its neighbors, are among the most commonly-used GNNs. However, a wealth of structural information of individual nodes and full graphs is often ignored in such process, which restricts the expressive power of GNNs. Various graph data augmentation methods that enable the message passing with richer structure knowledge have been introduced as one main way to tackle this issue, but they are often focused on individual structure features and difficult to scale up with more structure features. In this work we propose a novel approach, namely <u>co</u>llective <u>s</u>tructure knowledge-augmented <u>g</u>raph <u>n</u>eural <u>n</u>etwork (CoS-GNN), in which a new message passing method is introduced to allow GNNs to harness a diverse set of node- and graph-level structure features, together with original node features/attributes, in augmented graphs. In doing so, our approach largely improves the structural knowledge modeling of GNNs in both node and graph levels, resulting in substantially improved graph representations. This is justified by extensive empirical results where CoS-GNN outperforms state-of-the-art models in various graph-level learning tasks, including graph classification, anomaly detection, and out-of-distribution generalization.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142096750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-23DOI: 10.1016/j.neunet.2024.106647
Industrial process optimization and control is crucial to increase economic and ecologic efficiency. However, data sovereignty, differing goals, or the required expert knowledge for implementation impede holistic implementation. Further, the increasing use of data-driven AI-methods in process models and industrial sensory often requires regular fine-tuning to accommodate distribution drifts. We propose the Artificial Neural Twin, which combines concepts from model predictive control, deep learning, and sensor networks to address these issues. Our approach introduces decentral, differentiable data fusion to estimate the state of distributed process steps and their dependence on input data. By treating the interconnected process steps as a quasi neural-network, we can backpropagate loss gradients for process optimization or model fine-tuning to process parameters or AI models respectively. The concept is demonstrated on a virtual machine park simulated in Unity, consisting of bulk material processes in plastic recycling.
{"title":"The Artificial Neural Twin — Process optimization and continual learning in distributed process chains","authors":"","doi":"10.1016/j.neunet.2024.106647","DOIUrl":"10.1016/j.neunet.2024.106647","url":null,"abstract":"<div><p>Industrial process optimization and control is crucial to increase economic and ecologic efficiency. However, data sovereignty, differing goals, or the required expert knowledge for implementation impede holistic implementation. Further, the increasing use of data-driven AI-methods in process models and industrial sensory often requires regular fine-tuning to accommodate distribution drifts. We propose the Artificial Neural Twin, which combines concepts from model predictive control, deep learning, and sensor networks to address these issues. Our approach introduces decentral, differentiable data fusion to estimate the state of distributed process steps and their dependence on input data. By treating the interconnected process steps as a quasi neural-network, we can backpropagate loss gradients for process optimization or model fine-tuning to process parameters or AI models respectively. The concept is demonstrated on a virtual machine park simulated in Unity, consisting of bulk material processes in plastic recycling.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0893608024005719/pdfft?md5=b4aabfb0ff22ec0ef47c186de7bc9f80&pid=1-s2.0-S0893608024005719-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-23DOI: 10.1016/j.neunet.2024.106662
For scene matching, the extraction of metric features is a challenging task in the face of multi-source and multi-view scenes. Aiming at the requirements of multi-source and multi-view scene matching, a siamese network model for Spatial Relation Aware feature perception and fusion is proposed. The key contributions of this work are as follows: (1) Seeking to enhance the coherence of multi-view image features, we investigate the relation aware feature perception. With the help of spatial relation vector decomposition, the distribution consistency perception of image features in the horizontal and vertical directions is realized. (2) In order to establish the metric consistency relationship, the large-scale local information perception strategy is studied to realize the relative trade-off scale selection under the size of mainstream aerial images and satellite images. (3) After obtaining the multi-scale metric features, in order to improve the metric confidence, the feature selection and fusion strategy is proposed. The significance of distinct feature levels in the backbone network is systematically assessed prior to fusion, leading to an enhancement in the representation of pivotal components within the metric features during the fusion process. The experimental results obtained from the University-1652 dataset and the collected real scene data affirm the efficacy of the proposed method in enhancing the reliability of the metric model. The demonstrated effectiveness of this method suggests its applicability to diverse scene matching tasks.
{"title":"Multi-view scene matching with relation aware feature perception","authors":"","doi":"10.1016/j.neunet.2024.106662","DOIUrl":"10.1016/j.neunet.2024.106662","url":null,"abstract":"<div><p>For scene matching, the extraction of metric features is a challenging task in the face of multi-source and multi-view scenes. Aiming at the requirements of multi-source and multi-view scene matching, a siamese network model for Spatial Relation Aware feature perception and fusion is proposed. The key contributions of this work are as follows: (1) Seeking to enhance the coherence of multi-view image features, we investigate the relation aware feature perception. With the help of spatial relation vector decomposition, the distribution consistency perception of image features in the horizontal <span><math><mover><mrow><mi>H</mi></mrow><mo>→</mo></mover></math></span> and vertical <span><math><mover><mrow><mi>W</mi></mrow><mo>→</mo></mover></math></span> directions is realized. (2) In order to establish the metric consistency relationship, the large-scale local information perception strategy is studied to realize the relative trade-off scale selection under the size of mainstream aerial images and satellite images. (3) After obtaining the multi-scale metric features, in order to improve the metric confidence, the feature selection and fusion strategy is proposed. The significance of distinct feature levels in the backbone network is systematically assessed prior to fusion, leading to an enhancement in the representation of pivotal components within the metric features during the fusion process. The experimental results obtained from the University-1652 dataset and the collected real scene data affirm the efficacy of the proposed method in enhancing the reliability of the metric model. The demonstrated effectiveness of this method suggests its applicability to diverse scene matching tasks.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142096663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-23DOI: 10.1016/j.neunet.2024.106650
Real-world graphs exhibit increasing heterophily, where nodes no longer tend to be connected to nodes with the same label, challenging the homophily assumption of classical graph neural networks (GNNs) and impeding their performance. Intriguingly, from the observation of heterophilous data, we notice that certain high-order information exhibits higher homophily, which motivates us to involve high-order information in node representation learning. However, common practices in GNNs to acquire high-order information mainly through increasing model depth and altering message-passing mechanisms, which, albeit effective to a certain extent, suffer from three shortcomings: (1) over-smoothing due to excessive model depth and propagation times; (2) high-order information is not fully utilized; (3) low computational efficiency. In this regard, we design a similarity-based path sampling strategy to capture smooth paths containing high-order homophily. Then we propose a lightweight model based on multi-layer perceptrons (MLP), named PathMLP, which can encode messages carried by paths via simple transformation and concatenation operations, and effectively learn node representations in heterophilous graphs through adaptive path aggregation. Extensive experiments demonstrate that our method outperforms baselines on 16 out of 20 datasets, underlining its effectiveness and superiority in alleviating the heterophily problem. In addition, our method is immune to over-smoothing and has high computational efficiency. The source code will be available in https://github.com/Graph4Sec-Team/PathMLP.
{"title":"PathMLP: Smooth path towards high-order homophily","authors":"","doi":"10.1016/j.neunet.2024.106650","DOIUrl":"10.1016/j.neunet.2024.106650","url":null,"abstract":"<div><p>Real-world graphs exhibit increasing heterophily, where nodes no longer tend to be connected to nodes with the same label, challenging the homophily assumption of classical graph neural networks (GNNs) and impeding their performance. Intriguingly, from the observation of heterophilous data, we notice that certain high-order information exhibits higher homophily, which motivates us to involve high-order information in node representation learning. However, common practices in GNNs to acquire high-order information mainly through increasing model depth and altering message-passing mechanisms, which, albeit effective to a certain extent, suffer from three shortcomings: (1) over-smoothing due to excessive model depth and propagation times; (2) high-order information is not fully utilized; (3) low computational efficiency. In this regard, we design a similarity-based path sampling strategy to capture smooth paths containing high-order homophily. Then we propose a lightweight model based on multi-layer perceptrons (MLP), named PathMLP, which can encode messages carried by paths via simple transformation and concatenation operations, and effectively learn node representations in heterophilous graphs through adaptive path aggregation. Extensive experiments demonstrate that our method outperforms baselines on 16 out of 20 datasets, underlining its effectiveness and superiority in alleviating the heterophily problem. In addition, our method is immune to over-smoothing and has high computational efficiency. The source code will be available in <span><span>https://github.com/Graph4Sec-Team/PathMLP</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}