Pub Date : 2024-08-21DOI: 10.1016/j.neunet.2024.106637
The stability and passivity of delayed neural networks are addressed in this paper. A novel Lyapunov–Krasovskii functional (LKF) without multiple integrals is constructed. By using an improved matrix-valued polynomial inequality (MVPI), the previous constraint involving skew-symmetric matrices within the MVPI is removed. Then, the stability and passivity criteria for delayed neural networks that are less conservative than the existing ones are proposed. Finally, three examples are employed to demonstrate the meliority and feasibility of the obtained results.
{"title":"Stability and passivity analysis of delayed neural networks via an improved matrix-valued polynomial inequality","authors":"","doi":"10.1016/j.neunet.2024.106637","DOIUrl":"10.1016/j.neunet.2024.106637","url":null,"abstract":"<div><p>The stability and passivity of delayed neural networks are addressed in this paper. A novel Lyapunov–Krasovskii functional (LKF) without multiple integrals is constructed. By using an improved matrix-valued polynomial inequality (MVPI), the previous constraint involving skew-symmetric matrices within the MVPI is removed. Then, the stability and passivity criteria for delayed neural networks that are less conservative than the existing ones are proposed. Finally, three examples are employed to demonstrate the meliority and feasibility of the obtained results.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-21DOI: 10.1016/j.neunet.2024.106638
Identifying anomalies in multi-dimensional sequential data is crucial for ensuring optimal performance across various domains and in large-scale systems. Traditional contrastive methods utilize feature similarity between different features extracted from multidimensional raw inputs as an indicator of anomaly severity. However, the complex objective functions and meticulously designed modules of these methods often lead to efficiency issues and a lack of interpretability. Our study introduces a structural framework called SimDetector, which is a Local–Global Multi-Scale Similarity Contrast network. Specifically, the restructured and enhanced GRU module extracts more generalized local features, including long-term cyclical trends. The multi-scale sparse attention module efficiently extracts multi-scale global features with pattern information. Additionally, we modified the KL divergence to suit the characteristics of time series anomaly detection, proposing a symmetric absolute KL divergence that focuses more on overall distribution differences. The proposed method achieves results that surpass or approach the State-of-the-Art (SOTA) on multiple real-world datasets and synthetic datasets, while also significantly reducing Multiply-Accumulate Operations (MACs) and memory usage.
{"title":"Learning the feature distribution similarities for online time series anomaly detection","authors":"","doi":"10.1016/j.neunet.2024.106638","DOIUrl":"10.1016/j.neunet.2024.106638","url":null,"abstract":"<div><p>Identifying anomalies in multi-dimensional sequential data is crucial for ensuring optimal performance across various domains and in large-scale systems. Traditional contrastive methods utilize feature similarity between different features extracted from multidimensional raw inputs as an indicator of anomaly severity. However, the complex objective functions and meticulously designed modules of these methods often lead to efficiency issues and a lack of interpretability. Our study introduces a structural framework called SimDetector, which is a Local–Global Multi-Scale Similarity Contrast network. Specifically, the restructured and enhanced GRU module extracts more generalized local features, including long-term cyclical trends. The multi-scale sparse attention module efficiently extracts multi-scale global features with pattern information. Additionally, we modified the KL divergence to suit the characteristics of time series anomaly detection, proposing a symmetric absolute KL divergence that focuses more on overall distribution differences. The proposed method achieves results that surpass or approach the State-of-the-Art (SOTA) on multiple real-world datasets and synthetic datasets, while also significantly reducing Multiply-Accumulate Operations (MACs) and memory usage.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1016/j.neunet.2024.106630
Spiking Neural Networks (SNNs) are naturally suited to process sequence tasks such as NLP with low power, due to its brain-inspired spatio-temporal dynamics and spike-driven nature. Current SNNs employ ”repeat coding” that re-enter all input tokens at each timestep, which fails to fully exploit temporal relationships between the tokens and introduces memory overhead. In this work, we align the number of input tokens with the timestep and refer to this input coding as ”individual coding”. To cope with the increase in training time for individual encoded SNNs due to the dramatic increase in timesteps, we design a Bidirectional Parallel Spiking Neuron (BPSN) with following features: First, BPSN supports spike parallel computing and effectively avoids the issue of uninterrupted firing; Second, BPSN excels in handling adaptive sequence length tasks, which is a capability that existing work does not have; Third, the fusion of bidirectional information enhances the temporal information modeling capabilities of SNNs; To validate the effectiveness of our BPSN, we present the SNN-BERT, a deep direct training SNN architecture based on the BERT model in NLP. Compared to prior repeat 4-timestep coding baseline, our method achieves a 6.46 reduction in energy consumption and a significant 16.1% improvement, raising the performance upper bound of the SNN domain on the GLUE dataset to 74.4%. Additionally, our method achieves 3.5 training acceleration and 3.8 training memory optimization. Compared with artificial neural networks of similar architecture, we obtain comparable performance but up to 22.5 energy efficiency. We would provide the codes.
{"title":"SNN-BERT: Training-efficient Spiking Neural Networks for energy-efficient BERT","authors":"","doi":"10.1016/j.neunet.2024.106630","DOIUrl":"10.1016/j.neunet.2024.106630","url":null,"abstract":"<div><p>Spiking Neural Networks (SNNs) are naturally suited to process sequence tasks such as NLP with low power, due to its brain-inspired spatio-temporal dynamics and spike-driven nature. Current SNNs employ ”repeat coding” that re-enter all input tokens at each timestep, which fails to fully exploit temporal relationships between the tokens and introduces memory overhead. In this work, we align the number of input tokens with the timestep and refer to this input coding as ”individual coding”. To cope with the increase in training time for individual encoded SNNs due to the dramatic increase in timesteps, we design a Bidirectional Parallel Spiking Neuron (BPSN) with following features: First, BPSN supports spike parallel computing and effectively avoids the issue of uninterrupted firing; Second, BPSN excels in handling adaptive sequence length tasks, which is a capability that existing work does not have; Third, the fusion of bidirectional information enhances the temporal information modeling capabilities of SNNs; To validate the effectiveness of our BPSN, we present the SNN-BERT, a deep direct training SNN architecture based on the BERT model in NLP. Compared to prior repeat 4-timestep coding baseline, our method achieves a 6.46<span><math><mo>×</mo></math></span> reduction in energy consumption and a significant 16.1% improvement, raising the performance upper bound of the SNN domain on the GLUE dataset to 74.4%. Additionally, our method achieves 3.5<span><math><mo>×</mo></math></span> training acceleration and 3.8<span><math><mo>×</mo></math></span> training memory optimization. Compared with artificial neural networks of similar architecture, we obtain comparable performance but up to 22.5<span><math><mo>×</mo></math></span> energy efficiency. We would provide the codes.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-19DOI: 10.1016/j.neunet.2024.106640
Neural ordinary differential equations have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of the role played by their architecture remains elusive. In this work, we examine the interplay between the width and the number of transitions between layers (corresponding to a depth of ). Specifically, we construct explicit controls interpolating either a finite dataset , comprising pairs of points in , or two probability measures within a Wasserstein error margin . Our findings reveal a balancing trade-off between and , with scaling as for data interpolation, and as for measures.
In the high-dimensional and wide setting where , our result can be refined to achieve . This naturally raises the problem of data interpolation in the autonomous regime, characterized by . We adopt two alternative approaches: either controlling in a probabilistic sense, or by relaxing the target condition. In the first case, when we develop an inductive control strategy based on a separability assumption whose probability increases with . In the second one, we establish an explicit error decay rate with respect to which results from applying a universal approximation theorem to a custom-built Lipschitz vector field interpolating .
从控制的角度来看,神经常微分方程已成为监督学习的一种天然工具,但人们对其结构所起作用的全面了解却仍然遥不可及。在这项工作中,我们研究了宽度 p 与层间转换次数 L(对应深度 L+1)之间的相互作用。具体来说,我们构建了明确的控制方法,既可以对由 Rd 中 N 对点组成的有限数据集 D 进行插值,也可以对 Wasserstein 误差范围ɛ>0 内的两个概率度量进行插值。我们的发现揭示了 p 和 L 之间的平衡权衡,对于数据插值,L 的缩放为 1+O(N/p),而对于度量,L 的缩放为 1+Op-1+(1+p)-1ɛ-d。在 d,p>N 的高维和宽范围设置中,我们的结果可以细化到 L=0。在第一种情况下,当 p=N 时,我们基于可分性假设开发了一种归纳控制策略,其概率随 d 的增加而增加。在第二种情况下,我们建立了一个与 p 有关的显式误差衰减率,该误差衰减率是将通用近似定理应用于定制的利普斯奇茨矢量场插值 D 的结果。
{"title":"Interplay between depth and width for interpolation in neural ODEs","authors":"","doi":"10.1016/j.neunet.2024.106640","DOIUrl":"10.1016/j.neunet.2024.106640","url":null,"abstract":"<div><p>Neural ordinary differential equations have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of the role played by their architecture remains elusive. In this work, we examine the interplay between the width <span><math><mi>p</mi></math></span> and the number of transitions between layers <span><math><mi>L</mi></math></span> (corresponding to a depth of <span><math><mrow><mi>L</mi><mo>+</mo><mn>1</mn></mrow></math></span>). Specifically, we construct explicit controls interpolating either a finite dataset <span><math><mi>D</mi></math></span>, comprising <span><math><mi>N</mi></math></span> pairs of points in <span><math><msup><mrow><mi>R</mi></mrow><mrow><mi>d</mi></mrow></msup></math></span>, or two probability measures within a Wasserstein error margin <span><math><mrow><mi>ɛ</mi><mo>></mo><mn>0</mn></mrow></math></span>. Our findings reveal a balancing trade-off between <span><math><mi>p</mi></math></span> and <span><math><mi>L</mi></math></span>, with <span><math><mi>L</mi></math></span> scaling as <span><math><mrow><mn>1</mn><mo>+</mo><mi>O</mi><mrow><mo>(</mo><mi>N</mi><mo>/</mo><mi>p</mi><mo>)</mo></mrow></mrow></math></span> for data interpolation, and as <span><math><mrow><mn>1</mn><mo>+</mo><mi>O</mi><mfenced><mrow><msup><mrow><mi>p</mi></mrow><mrow><mo>−</mo><mn>1</mn></mrow></msup><mo>+</mo><msup><mrow><mrow><mo>(</mo><mn>1</mn><mo>+</mo><mi>p</mi><mo>)</mo></mrow></mrow><mrow><mo>−</mo><mn>1</mn></mrow></msup><msup><mrow><mi>ɛ</mi></mrow><mrow><mo>−</mo><mi>d</mi></mrow></msup></mrow></mfenced></mrow></math></span> for measures.</p><p>In the high-dimensional and wide setting where <span><math><mrow><mi>d</mi><mo>,</mo><mi>p</mi><mo>></mo><mi>N</mi></mrow></math></span>, our result can be refined to achieve <span><math><mrow><mi>L</mi><mo>=</mo><mn>0</mn></mrow></math></span>. This naturally raises the problem of data interpolation in the autonomous regime, characterized by <span><math><mrow><mi>L</mi><mo>=</mo><mn>0</mn></mrow></math></span>. We adopt two alternative approaches: either controlling in a probabilistic sense, or by relaxing the target condition. In the first case, when <span><math><mrow><mi>p</mi><mo>=</mo><mi>N</mi></mrow></math></span> we develop an inductive control strategy based on a separability assumption whose probability increases with <span><math><mi>d</mi></math></span>. In the second one, we establish an explicit error decay rate with respect to <span><math><mi>p</mi></math></span> which results from applying a universal approximation theorem to a custom-built Lipschitz vector field interpolating <span><math><mi>D</mi></math></span>.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0893608024005641/pdfft?md5=480cae19d4a2c169ff78cc6025a33eae&pid=1-s2.0-S0893608024005641-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-17DOI: 10.1016/j.neunet.2024.106646
In this paper, finite-time cluster synchronization (FTCS) of multi-weighted fractional-order neural networks is studied. Firstly, a FTCS criterion of the considered neural networks is obtained by designing a new delayed state feedback controller. Secondly, a FTCS criterion for the considered neural networks with mixed impulsive effects is given by constructing a new piecewise controller, where both synchronizing and desynchronizing impulses are taken into account. It should be noted that it is the first time that finite-time cluster synchronization of multi-weighted neural networks has been investigated. Finally, numerical simulations are given to show the validity of the theoretical results.
{"title":"Finite-time cluster synchronization of multi-weighted fractional-order coupled neural networks with and without impulsive effects","authors":"","doi":"10.1016/j.neunet.2024.106646","DOIUrl":"10.1016/j.neunet.2024.106646","url":null,"abstract":"<div><p>In this paper, finite-time cluster synchronization (FTCS) of multi-weighted fractional-order neural networks is studied. Firstly, a FTCS criterion of the considered neural networks is obtained by designing a new delayed state feedback controller. Secondly, a FTCS criterion for the considered neural networks with mixed impulsive effects is given by constructing a new piecewise controller, where both synchronizing and desynchronizing impulses are taken into account. It should be noted that it is the first time that finite-time cluster synchronization of multi-weighted neural networks has been investigated. Finally, numerical simulations are given to show the validity of the theoretical results.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-17DOI: 10.1016/j.neunet.2024.106639
In the era of Artificial Intelligence Generated Content (AIGC), face forgery models pose significant security threats. These models have caused widespread negative impacts through the creation of forged products targeting public figures, national leaders, and other Persons-of-interest (POI). To address this, we propose the Face Omron Ring (FOR) to proactively protect the POI from face forgery. Specifically, by introducing FOR into a target face forgery model, the model will proactively refuse to forge any face image of protected identities without compromising the forgery capability for unprotected ones. We conduct extensive experiments on 4 face forgery models, StarGAN, AGGAN, AttGAN, and HiSD on the widely used large-scale face image datasets CelebA, CelebA-HQ, and PubFig83. Our results demonstrate that the proposed method can effectively protect 5000 different identities with a 100% protection success rate, for each of which only about 100 face images are needed. Our method also shows great robustness against multiple image processing attacks, such as JPEG, cropping, noise addition, and blurring. Compared to existing proactive defense methods, our method offers identity-centric protection for any image of the protected identity without requiring any special preprocessing, resulting in improved scalability and security. We hope that this work can provide a solution for responsible AIGC companies in regulating the use of face forgery models.
{"title":"Face Omron Ring: Proactive defense against face forgery with identity awareness","authors":"","doi":"10.1016/j.neunet.2024.106639","DOIUrl":"10.1016/j.neunet.2024.106639","url":null,"abstract":"<div><p>In the era of Artificial Intelligence Generated Content (AIGC), face forgery models pose significant security threats. These models have caused widespread negative impacts through the creation of forged products targeting public figures, national leaders, and other Persons-of-interest (POI). To address this, we propose the <em>Face Omron Ring</em> (FOR) to proactively protect the POI from face forgery. Specifically, by introducing FOR into a target face forgery model, the model will proactively refuse to forge any face image of protected identities without compromising the forgery capability for unprotected ones. We conduct extensive experiments on 4 face forgery models, StarGAN, AGGAN, AttGAN, and HiSD on the widely used large-scale face image datasets CelebA, CelebA-HQ, and PubFig83. Our results demonstrate that the proposed method can effectively protect 5000 different identities with a 100% protection success rate, for each of which only about 100 face images are needed. Our method also shows great robustness against multiple image processing attacks, such as JPEG, cropping, noise addition, and blurring. Compared to existing proactive defense methods, our method offers identity-centric protection for any image of the protected identity without requiring any special preprocessing, resulting in improved scalability and security. We hope that this work can provide a solution for responsible AIGC companies in regulating the use of face forgery models.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-16DOI: 10.1016/j.neunet.2024.106642
In multi-label recognition, effectively addressing the challenge of partial labels is crucial for reducing annotation costs and enhancing model generalization. Existing methods exhibit limitations by relying on unrealistic simulations with uniformly dropped labels, overlooking how ambiguous instances and instance-level factors impacts label ambiguity in real-world datasets. To address this deficiency, our paper introduces a realistic partial label setting grounded in instance ambiguity, complemented by Reliable Ambiguity-Aware Instance Weighting (R-AAIW)—a strategy that utilizes importance weighting to adapt dynamically to the inherent ambiguity of multi-label instances. The strategy leverages an ambiguity score to prioritize learning from clearer instances. As proficiency of the model improves, the weights are dynamically modulated to gradually shift focus towards more ambiguous instances. By employing an adaptive re-weighting method that adjusts to the complexity of each instance, our approach not only enhances the model’s capability to detect subtle variations among labels but also ensures comprehensive learning without excluding difficult instances. Extensive experimentation across various benchmarks highlights our approach’s superiority over existing methods, showcasing its ability to provide a more accurate and adaptable framework for multi-label recognition tasks.
{"title":"Adaptive ambiguity-aware weighting for multi-label recognition with limited annotations","authors":"","doi":"10.1016/j.neunet.2024.106642","DOIUrl":"10.1016/j.neunet.2024.106642","url":null,"abstract":"<div><p>In multi-label recognition, effectively addressing the challenge of partial labels is crucial for reducing annotation costs and enhancing model generalization. Existing methods exhibit limitations by relying on unrealistic simulations with uniformly dropped labels, overlooking how ambiguous instances and instance-level factors impacts label ambiguity in real-world datasets. To address this deficiency, our paper introduces a realistic partial label setting grounded in instance ambiguity, complemented by Reliable Ambiguity-Aware Instance Weighting (R-AAIW)—a strategy that utilizes importance weighting to adapt dynamically to the inherent ambiguity of multi-label instances. The strategy leverages an ambiguity score to prioritize learning from clearer instances. As proficiency of the model improves, the weights are dynamically modulated to gradually shift focus towards more ambiguous instances. By employing an adaptive re-weighting method that adjusts to the complexity of each instance, our approach not only enhances the model’s capability to detect subtle variations among labels but also ensures comprehensive learning without excluding difficult instances. Extensive experimentation across various benchmarks highlights our approach’s superiority over existing methods, showcasing its ability to provide a more accurate and adaptable framework for multi-label recognition tasks.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-16DOI: 10.1016/j.neunet.2024.106641
This research article will employ the combined Lyapunov functionals method to deal with stability analysis of a more general type of Cohen–Grossberg neural networks which simultaneously involve constant time and neutral delay parameters. By utilizing some combinations of various Lyapunov functionals, we determine novel criteria ensuring global stability of such a model of neural systems that employ Lipschitz continuous activation functions. These proposed results are totally stated independently of delay terms and they can be completely characterized by the constants parameters involved in the neural system. By making some detailed analytical comparisons between the stability results derived in this research article and the existing corresponding stability criteria obtained in the past literature, we prove that our proposed stability results lead to establishing some sets of stability conditions and these conditions may be evaluated as different alternative results to the previously reported corresponding stability criteria. A numerical example is also presented to show the applicability of the proposed stability results.
{"title":"The combined Lyapunov functionals method for stability analysis of neutral Cohen–Grossberg neural networks with multiple delays","authors":"","doi":"10.1016/j.neunet.2024.106641","DOIUrl":"10.1016/j.neunet.2024.106641","url":null,"abstract":"<div><p>This research article will employ the combined Lyapunov functionals method to deal with stability analysis of a more general type of Cohen–Grossberg neural networks which simultaneously involve constant time and neutral delay parameters. By utilizing some combinations of various Lyapunov functionals, we determine novel criteria ensuring global stability of such a model of neural systems that employ Lipschitz continuous activation functions. These proposed results are totally stated independently of delay terms and they can be completely characterized by the constants parameters involved in the neural system. By making some detailed analytical comparisons between the stability results derived in this research article and the existing corresponding stability criteria obtained in the past literature, we prove that our proposed stability results lead to establishing some sets of stability conditions and these conditions may be evaluated as different alternative results to the previously reported corresponding stability criteria. A numerical example is also presented to show the applicability of the proposed stability results.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1016/j.neunet.2024.106634
Explainable artificial intelligence (XAI) holds significant importance in enhancing the reliability and transparency of network decision-making. SHapley Additive exPlanations (SHAP) is a game-theoretic approach for network interpretation, attributing confidence to inputs features to measure their importance. However, SHAP often relies on a flawed assumption that the model’s features are independent, leading to incorrect results when dealing with correlated features. In this paper, we introduce a novel manifold-based Shapley explanation method, termed Latent SHAP. Latent SHAP transforms high-dimensional data into low-dimensional manifolds to capture correlations among features. We compute Shapley values on the data manifold and devise three distinct gradient-based mapping methods to transfer them back to the high-dimensional space. Our primary objectives include: (1) correcting misinterpretations by SHAP in certain samples; (2) addressing the challenge of feature correlations in high-dimensional data interpretation; and (3) reducing algorithmic complexity through Manifold SHAP for application in complex network interpretations. Code is available at https://github.com/Teriri1999/Latent-SHAP.
{"title":"Manifold-based shapley explanations for high dimensional correlated features","authors":"","doi":"10.1016/j.neunet.2024.106634","DOIUrl":"10.1016/j.neunet.2024.106634","url":null,"abstract":"<div><p>Explainable artificial intelligence (XAI) holds significant importance in enhancing the reliability and transparency of network decision-making. SHapley Additive exPlanations (SHAP) is a game-theoretic approach for network interpretation, attributing confidence to inputs features to measure their importance. However, SHAP often relies on a flawed assumption that the model’s features are independent, leading to incorrect results when dealing with correlated features. In this paper, we introduce a novel manifold-based Shapley explanation method, termed Latent SHAP. Latent SHAP transforms high-dimensional data into low-dimensional manifolds to capture correlations among features. We compute Shapley values on the data manifold and devise three distinct gradient-based mapping methods to transfer them back to the high-dimensional space. Our primary objectives include: (1) correcting misinterpretations by SHAP in certain samples; (2) addressing the challenge of feature correlations in high-dimensional data interpretation; and (3) reducing algorithmic complexity through Manifold SHAP for application in complex network interpretations. Code is available at <span><span>https://github.com/Teriri1999/Latent-SHAP</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142040844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1016/j.neunet.2024.106633
In the construction process of radial basis function (RBF) networks, two common crucial issues arise: the selection of RBF centers and the effective utilization of the given source without encountering the overfitting problem. Another important issue is the fault tolerant capability. That is, when noise or faults exist in a trained network, it is crucial that the network’s performance does not undergo significant deterioration or decrease. However, without employing a fault tolerant procedure, a trained RBF network may exhibit significantly poor performance. Unfortunately, most existing algorithms are unable to simultaneously address all of the aforementioned issues. This paper proposes fault tolerant training algorithms that can simultaneously select RBF nodes and train RBF output weights. Additionally, our algorithms can directly control the number of RBF nodes in an explicit manner, eliminating the need for a time-consuming procedure to tune the regularization parameter and achieve the target RBF network size. Based on simulation results, our algorithms demonstrate improved test set performance when more RBF nodes are used, effectively utilizing the given source without encountering the overfitting problem. This paper first defines a fault tolerant objective function, which includes a term to suppress the effects of weight faults and weight noise. This term also prevents the issue of overfitting, resulting in better test set performance when more RBF nodes are utilized. With the defined objective function, the training process is designed to solve a generalized -sparse problem by incorporating an -norm constraint. The -norm constraint allows us to directly and explicitly control the number of RBF nodes. To address the generalized -sparse problem, we introduce the noise-resistant iterative hard thresholding (NR-IHT) algorithm. The convergence properties of the NR-IHT algorithm are subsequently discussed theoretically. To further enhance performance, we incorporate the momentum concept into the NR-IHT algorithm, referring to the modified version as “NR-IHT-Mom”. Simulation results show that both the NR-IHT algorithm and the NR-IHT-Mom algorithm outperform several state-of-the-art comparison algorithms.
{"title":"Generalized M-sparse algorithms for constructing fault tolerant RBF networks","authors":"","doi":"10.1016/j.neunet.2024.106633","DOIUrl":"10.1016/j.neunet.2024.106633","url":null,"abstract":"<div><p>In the construction process of radial basis function (RBF) networks, two common crucial issues arise: the selection of RBF centers and the effective utilization of the given source without encountering the overfitting problem. Another important issue is the fault tolerant capability. That is, when noise or faults exist in a trained network, it is crucial that the network’s performance does not undergo significant deterioration or decrease. However, without employing a fault tolerant procedure, a trained RBF network may exhibit significantly poor performance. Unfortunately, most existing algorithms are unable to simultaneously address all of the aforementioned issues. This paper proposes fault tolerant training algorithms that can simultaneously select RBF nodes and train RBF output weights. Additionally, our algorithms can directly control the number of RBF nodes in an explicit manner, eliminating the need for a time-consuming procedure to tune the regularization parameter and achieve the target RBF network size. Based on simulation results, our algorithms demonstrate improved test set performance when more RBF nodes are used, effectively utilizing the given source without encountering the overfitting problem. This paper first defines a fault tolerant objective function, which includes a term to suppress the effects of weight faults and weight noise. This term also prevents the issue of overfitting, resulting in better test set performance when more RBF nodes are utilized. With the defined objective function, the training process is designed to solve a generalized <span><math><mi>M</mi></math></span>-sparse problem by incorporating an <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>-norm constraint. The <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>-norm constraint allows us to directly and explicitly control the number of RBF nodes. To address the generalized <span><math><mi>M</mi></math></span>-sparse problem, we introduce the noise-resistant iterative hard thresholding (NR-IHT) algorithm. The convergence properties of the NR-IHT algorithm are subsequently discussed theoretically. To further enhance performance, we incorporate the momentum concept into the NR-IHT algorithm, referring to the modified version as “NR-IHT-Mom”. Simulation results show that both the NR-IHT algorithm and the NR-IHT-Mom algorithm outperform several state-of-the-art comparison algorithms.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}