Recommending suitable exercises and providing the reasons for these recommendations is a highly valuable task, as it can significantly improve students' learning efficiency. Nevertheless, the extensive range of exercise resources and the diverse learning capacities of students present a notable difficulty in recommending exercises. Collaborative filtering approaches frequently have difficulties in recommending suitable exercises, whereas deep learning methods lack explanation, which restricts their practical use. To address these issue, this paper proposes KG4EER, an explainable exercise recommendation with a knowledge graph. KG4EER facilitates the matching of various students with suitable exercises and offers explanations for its recommendations. More precisely, a feature extraction module is introduced to represent students' learning features, and a knowledge graph is constructed to recommend exercises. This knowledge graph, which includes three primary entities - knowledge concepts, students, and exercises - and their interrelationships, serves to recommend suitable exercises. Extensive experiments conducted on three real-world datasets, coupled with expert interviews, establish the superiority of KG4EER over existing baseline methods and underscore its robust explainability.
{"title":"Explainable exercise recommendation with knowledge graph.","authors":"Quanlong Guan, Xinghe Cheng, Fang Xiao, Zhuzhou Li, Chaobo He, Liangda Fang, Guanliang Chen, Zhiguo Gong, Weiqi Luo","doi":"10.1016/j.neunet.2024.106954","DOIUrl":"10.1016/j.neunet.2024.106954","url":null,"abstract":"<p><p>Recommending suitable exercises and providing the reasons for these recommendations is a highly valuable task, as it can significantly improve students' learning efficiency. Nevertheless, the extensive range of exercise resources and the diverse learning capacities of students present a notable difficulty in recommending exercises. Collaborative filtering approaches frequently have difficulties in recommending suitable exercises, whereas deep learning methods lack explanation, which restricts their practical use. To address these issue, this paper proposes KG4EER, an explainable exercise recommendation with a knowledge graph. KG4EER facilitates the matching of various students with suitable exercises and offers explanations for its recommendations. More precisely, a feature extraction module is introduced to represent students' learning features, and a knowledge graph is constructed to recommend exercises. This knowledge graph, which includes three primary entities - knowledge concepts, students, and exercises - and their interrelationships, serves to recommend suitable exercises. Extensive experiments conducted on three real-world datasets, coupled with expert interviews, establish the superiority of KG4EER over existing baseline methods and underscore its robust explainability.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106954"},"PeriodicalIF":6.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-12-07DOI: 10.1016/j.neunet.2024.106978
Tong Yu, Lei Cheng, Ruslan Khalitov, Erland B Olsson, Zhirong Yang
Self-supervised Learning (SSL) has been recognized as a method to enhance prediction accuracy in various downstream tasks. However, its efficacy for DNA sequences remains somewhat constrained. This limitation stems primarily from the fact that most existing SSL approaches in genomics focus on masked language modeling of individual sequences, neglecting the crucial aspect of encoding statistics across multiple sequences. To overcome this challenge, we introduce an innovative deep neural network model, which incorporates collaborative learning between a 'student' and a 'teacher' subnetwork. In this model, the student subnetwork employs masked learning on nucleotides and progressively adapts its parameters to the teacher subnetwork through an exponential moving average approach. Concurrently, both subnetworks engage in contrastive learning, deriving insights from two augmented representations of the input sequences. This self-distillation process enables our model to effectively assimilate both contextual information from individual sequences and distributional data across the sequence population. We validated our approach with preliminary pretraining using the human reference genome, followed by applying it to 20 downstream inference tasks. The empirical results from these experiments demonstrate that our novel method significantly boosts inference performance across the majority of these tasks. Our code is available at https://github.com/wiedersehne/FinDNA.
{"title":"Self-distillation improves self-supervised learning for DNA sequence inference.","authors":"Tong Yu, Lei Cheng, Ruslan Khalitov, Erland B Olsson, Zhirong Yang","doi":"10.1016/j.neunet.2024.106978","DOIUrl":"10.1016/j.neunet.2024.106978","url":null,"abstract":"<p><p>Self-supervised Learning (SSL) has been recognized as a method to enhance prediction accuracy in various downstream tasks. However, its efficacy for DNA sequences remains somewhat constrained. This limitation stems primarily from the fact that most existing SSL approaches in genomics focus on masked language modeling of individual sequences, neglecting the crucial aspect of encoding statistics across multiple sequences. To overcome this challenge, we introduce an innovative deep neural network model, which incorporates collaborative learning between a 'student' and a 'teacher' subnetwork. In this model, the student subnetwork employs masked learning on nucleotides and progressively adapts its parameters to the teacher subnetwork through an exponential moving average approach. Concurrently, both subnetworks engage in contrastive learning, deriving insights from two augmented representations of the input sequences. This self-distillation process enables our model to effectively assimilate both contextual information from individual sequences and distributional data across the sequence population. We validated our approach with preliminary pretraining using the human reference genome, followed by applying it to 20 downstream inference tasks. The empirical results from these experiments demonstrate that our novel method significantly boosts inference performance across the majority of these tasks. Our code is available at https://github.com/wiedersehne/FinDNA.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106978"},"PeriodicalIF":6.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The road traffic volumes are constantly increasing worldwide, leading to significant challenges in maintaining asphalt pavements. Vehicular loads and environmental changes impact asphalt pavements, necessitating suitable predictive models. The International Roughness Index (IRI), a key indicator of road smoothness, requires IRI prediction models for performance analysis. Using the fractional accumulation operator and sine term can improve the traditional grey model's low prediction accuracy. Then, the chaotic adaptive whale optimization algorithm and Markov chain are used to optimize the model. Based on the different asphalt pavement structures used by RIOHtrack as data for the experiments, the average RMSE, MAE, and MAPE reached 0.025, 0.020, and 1.392%, respectively. Compared with other grey models, it performs better in IRI multi-step prediction. Particularly, the proposed model can achieve compelling predictions in a small sample size only through the changes in IRI itself, which helps to evaluate road conditions and design maintenance plans.
全球道路交通量不断增加,给沥青路面的维护带来了巨大挑战。车辆荷载和环境变化都会对沥青路面产生影响,因此需要合适的预测模型。国际粗糙度指数(IRI)是衡量路面平整度的关键指标,因此需要建立 IRI 预测模型来进行性能分析。使用分数累加算子和正弦项可以改善传统灰色模型预测精度低的问题。然后,利用混沌自适应鲸鱼优化算法和马尔可夫链对模型进行优化。基于 RIOHtrack 使用的不同沥青路面结构作为实验数据,平均 RMSE、MAE 和 MAPE 分别达到 0.025%、0.020% 和 1.392%。与其他灰色模型相比,该模型在 IRI 多步预测中表现更好。特别是,所提出的模型仅通过 IRI 本身的变化就能在较小样本量下实现令人信服的预测,这有助于评估道路状况和设计维护计划。
{"title":"Roughness prediction of asphalt pavement using FGM(1,1-sin) model optimized by swarm intelligence and Markov chain.","authors":"Zhuoxuan Li, Jinde Cao, Hairuo Shi, Xinli Shi, Tao Ma, Wei Huang","doi":"10.1016/j.neunet.2024.107000","DOIUrl":"10.1016/j.neunet.2024.107000","url":null,"abstract":"<p><p>The road traffic volumes are constantly increasing worldwide, leading to significant challenges in maintaining asphalt pavements. Vehicular loads and environmental changes impact asphalt pavements, necessitating suitable predictive models. The International Roughness Index (IRI), a key indicator of road smoothness, requires IRI prediction models for performance analysis. Using the fractional accumulation operator and sine term can improve the traditional grey model's low prediction accuracy. Then, the chaotic adaptive whale optimization algorithm and Markov chain are used to optimize the model. Based on the different asphalt pavement structures used by RIOHtrack as data for the experiments, the average RMSE, MAE, and MAPE reached 0.025, 0.020, and 1.392%, respectively. Compared with other grey models, it performs better in IRI multi-step prediction. Particularly, the proposed model can achieve compelling predictions in a small sample size only through the changes in IRI itself, which helps to evaluate road conditions and design maintenance plans.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"107000"},"PeriodicalIF":6.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-12-04DOI: 10.1016/j.neunet.2024.106959
Xuelian Zou, Xiaojun Bi
Human pose estimation is one of the most critical and challenging problems in computer vision. It is applied in many computer vision fields and has important research significance. However, it is still a difficult challenge to strike a balance between the number of parameters and computing load of the model and the accuracy of human pose estimation. In this study, we suggest a Lightweight Cross-scale Feature Fusion Network (LCFFNet) to strike a balance between accuracy and computational load and parameter volume. The Lightweight HRNet-Like (LHRNet) network, Cross-Resolution-Aware Semantics Module (CRASM), and Adapt Feature Fusion Module (AFFM) make up LCFFNet. To be more precise, first, we suggest a lightweight LHRNet network that includes Dynamic Multi-scale Convolution Basic (DMSC-Basic block) block, Basic block, and DMSC-Basic block submodules in the network's three high-resolution subnetwork stages. The proposed dynamic multi-scale convolution in DMSC-Basic block can reduces the amount of model parameters and complexity of the LHRNet network, and has the ability to extract variable pose features. In order to maintain the model's ability to express features, the Basic block is introduced. As a result, the LHRNet network not only makes the model more lightweight but also enhances its feature expression capabilities. Second, we propose a CRASM module to enhance contextual semantic information while reducing the semantic gap between different scales by fusing features from different scales. Finally, the augmented semantic feature map's spatial resolution is finally restored from bottom to top using our suggested AFFM, and adaptive feature fusion is used to increase the positioning accuracy of important sites. Our method successfully predicts keypoints with 74.2 % AP, 89.9 % PCKh@0.5 and 66.9 % AP on the MSCOCO 2017, MPII and Crowdpose datasets, respectively. Our model reduces the number of parameters by 89.0 % and the computational complexity by 87.5 % compared with HRNet. The proposed network performs as well as current large-model human pose estimation networks while outperforming state-of the-art lightweight networks.
{"title":"LCFFNet: A Lightweight Cross-scale Feature Fusion Network for human pose estimation.","authors":"Xuelian Zou, Xiaojun Bi","doi":"10.1016/j.neunet.2024.106959","DOIUrl":"10.1016/j.neunet.2024.106959","url":null,"abstract":"<p><p>Human pose estimation is one of the most critical and challenging problems in computer vision. It is applied in many computer vision fields and has important research significance. However, it is still a difficult challenge to strike a balance between the number of parameters and computing load of the model and the accuracy of human pose estimation. In this study, we suggest a Lightweight Cross-scale Feature Fusion Network (LCFFNet) to strike a balance between accuracy and computational load and parameter volume. The Lightweight HRNet-Like (LHRNet) network, Cross-Resolution-Aware Semantics Module (CRASM), and Adapt Feature Fusion Module (AFFM) make up LCFFNet. To be more precise, first, we suggest a lightweight LHRNet network that includes Dynamic Multi-scale Convolution Basic (DMSC-Basic block) block, Basic block, and DMSC-Basic block submodules in the network's three high-resolution subnetwork stages. The proposed dynamic multi-scale convolution in DMSC-Basic block can reduces the amount of model parameters and complexity of the LHRNet network, and has the ability to extract variable pose features. In order to maintain the model's ability to express features, the Basic block is introduced. As a result, the LHRNet network not only makes the model more lightweight but also enhances its feature expression capabilities. Second, we propose a CRASM module to enhance contextual semantic information while reducing the semantic gap between different scales by fusing features from different scales. Finally, the augmented semantic feature map's spatial resolution is finally restored from bottom to top using our suggested AFFM, and adaptive feature fusion is used to increase the positioning accuracy of important sites. Our method successfully predicts keypoints with 74.2 % AP, 89.9 % PCKh@0.5 and 66.9 % AP on the MSCOCO 2017, MPII and Crowdpose datasets, respectively. Our model reduces the number of parameters by 89.0 % and the computational complexity by 87.5 % compared with HRNet. The proposed network performs as well as current large-model human pose estimation networks while outperforming state-of the-art lightweight networks.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106959"},"PeriodicalIF":6.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-11-29DOI: 10.1016/j.neunet.2024.106958
Yi Zhang, Jichang Guo, Huihui Yue, Sida Zheng, Chonghao Liu
Due to limited photons, low-light environments pose significant challenges for computer vision tasks. Unsupervised domain adaptation offers a potential solution, but struggles with domain misalignment caused by inadequate utilization of features at different stages. To address this, we propose an Illumination-Guided Progressive Unsupervised Domain Adaptation method, called IPULIS, for low-light instance segmentation by progressively exploring the alignment of features at image-, instance-, and pixel-levels between normal- and low-light conditions under illumination guidance. This is achieved through: (1) an Illumination-Guided Domain Discriminator (IGD) for image-level feature alignment using retinex-derived illumination maps, (2) a Foreground Focus Module (FFM) incorporating global information with local center features to facilitate instance-level feature alignment, and (3) a Contour-aware Domain Discriminator (CAD) for pixel-level feature alignment by matching contour vertex features from a contour-based model. By progressively deploying these modules, IPULIS achieves precise feature alignment, leading to high-quality instance segmentation. Experimental results demonstrate that our IPULIS achieves state-of-the-art performance on real-world low-light dataset LIS.
{"title":"Illumination-Guided progressive unsupervised domain adaptation for low-light instance segmentation.","authors":"Yi Zhang, Jichang Guo, Huihui Yue, Sida Zheng, Chonghao Liu","doi":"10.1016/j.neunet.2024.106958","DOIUrl":"10.1016/j.neunet.2024.106958","url":null,"abstract":"<p><p>Due to limited photons, low-light environments pose significant challenges for computer vision tasks. Unsupervised domain adaptation offers a potential solution, but struggles with domain misalignment caused by inadequate utilization of features at different stages. To address this, we propose an Illumination-Guided Progressive Unsupervised Domain Adaptation method, called IPULIS, for low-light instance segmentation by progressively exploring the alignment of features at image-, instance-, and pixel-levels between normal- and low-light conditions under illumination guidance. This is achieved through: (1) an Illumination-Guided Domain Discriminator (IGD) for image-level feature alignment using retinex-derived illumination maps, (2) a Foreground Focus Module (FFM) incorporating global information with local center features to facilitate instance-level feature alignment, and (3) a Contour-aware Domain Discriminator (CAD) for pixel-level feature alignment by matching contour vertex features from a contour-based model. By progressively deploying these modules, IPULIS achieves precise feature alignment, leading to high-quality instance segmentation. Experimental results demonstrate that our IPULIS achieves state-of-the-art performance on real-world low-light dataset LIS.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106958"},"PeriodicalIF":6.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142787485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-12-04DOI: 10.1016/j.neunet.2024.106983
Yihan Wang, Shuang Liu, Xiao-Shan Gao
Stability analysis is an essential aspect of studying the generalization ability of deep learning, as it involves deriving generalization bounds for stochastic gradient descent-based training algorithms. Adversarial training is the most widely used defense against adversarial attacks. However, previous generalization bounds for adversarial training have not included information regarding data distribution. In this paper, we fill this gap by providing generalization bounds for stochastic gradient descent-based adversarial training that incorporate data distribution information. We utilize the concepts of on-average stability and high-order approximate Lipschitz conditions to examine how changes in data distribution and adversarial budget can affect robust generalization gaps. Our derived generalization bounds for both convex and non-convex losses are at least as good as the uniform stability-based counterparts which do not include data distribution information. Furthermore, our findings demonstrate how distribution shifts from data poisoning attacks can impact robust generalization.
{"title":"Data-dependent stability analysis of adversarial training.","authors":"Yihan Wang, Shuang Liu, Xiao-Shan Gao","doi":"10.1016/j.neunet.2024.106983","DOIUrl":"10.1016/j.neunet.2024.106983","url":null,"abstract":"<p><p>Stability analysis is an essential aspect of studying the generalization ability of deep learning, as it involves deriving generalization bounds for stochastic gradient descent-based training algorithms. Adversarial training is the most widely used defense against adversarial attacks. However, previous generalization bounds for adversarial training have not included information regarding data distribution. In this paper, we fill this gap by providing generalization bounds for stochastic gradient descent-based adversarial training that incorporate data distribution information. We utilize the concepts of on-average stability and high-order approximate Lipschitz conditions to examine how changes in data distribution and adversarial budget can affect robust generalization gaps. Our derived generalization bounds for both convex and non-convex losses are at least as good as the uniform stability-based counterparts which do not include data distribution information. Furthermore, our findings demonstrate how distribution shifts from data poisoning attacks can impact robust generalization.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106983"},"PeriodicalIF":6.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142792261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning-based cell senescence detection is crucial for accurate quantitative analysis of senescence assessment. However, senescent cells are small in size and have little differences in appearance and shape in different states, which leads to insensitivity problems such as missed and false detection. In addition, complex intelligent models are not conducive to clinical application. Therefore, to solve the above problems, we proposed a Faster Region Convolutional Neural Network (Faster R-CNN) detection model with Swin Transformer (Swin-T) and group normalization (GN), called STGF R-CNN, for the detection of different senescent cells to achieve quantification assessment of induced pluripotent stem cell-derived mesenchymal stem cells (iP-MSCs) senescence. Specifically, to enhance the representation learning ability of the network, Swin-T with a hierarchical structure was constructed. It utilizes a local window attention mechanism to capture features of different scales and levels. In addition, the GN strategy is adopted to achieve a lightweight model. To verify the effectiveness of the STGF R-CNN, a cell senescence dataset, the iP-MSCs dataset, was constructed, and a series of experiments were conducted. Experiment results show that it has the advantage of high senescent detection accuracy, mean Average Precision (mAP) is 0.835, Params is 46.06M, and FLOPs is 95.62G, which significantly reduces senescent assessment time from 12 h to less than 1 s. The STGF R-CNN has advantages over existing cell senescence detection methods, providing potential for anti-senescent drug screening. Our code is available at https://github.com/RY-97/STGF-R-CNN.
{"title":"An object detection-based model for automated screening of stem-cells senescence during drug screening.","authors":"Yu Ren, Youyi Song, Mingzhu Li, Liangge He, Chunlun Xiao, Peng Yang, Yongtao Zhang, Cheng Zhao, Tianfu Wang, Guangqian Zhou, Baiying Lei","doi":"10.1016/j.neunet.2024.106940","DOIUrl":"10.1016/j.neunet.2024.106940","url":null,"abstract":"<p><p>Deep learning-based cell senescence detection is crucial for accurate quantitative analysis of senescence assessment. However, senescent cells are small in size and have little differences in appearance and shape in different states, which leads to insensitivity problems such as missed and false detection. In addition, complex intelligent models are not conducive to clinical application. Therefore, to solve the above problems, we proposed a Faster Region Convolutional Neural Network (Faster R-CNN) detection model with Swin Transformer (Swin-T) and group normalization (GN), called STGF R-CNN, for the detection of different senescent cells to achieve quantification assessment of induced pluripotent stem cell-derived mesenchymal stem cells (iP-MSCs) senescence. Specifically, to enhance the representation learning ability of the network, Swin-T with a hierarchical structure was constructed. It utilizes a local window attention mechanism to capture features of different scales and levels. In addition, the GN strategy is adopted to achieve a lightweight model. To verify the effectiveness of the STGF R-CNN, a cell senescence dataset, the iP-MSCs dataset, was constructed, and a series of experiments were conducted. Experiment results show that it has the advantage of high senescent detection accuracy, mean Average Precision (mAP) is 0.835, Params is 46.06M, and FLOPs is 95.62G, which significantly reduces senescent assessment time from 12 h to less than 1 s. The STGF R-CNN has advantages over existing cell senescence detection methods, providing potential for anti-senescent drug screening. Our code is available at https://github.com/RY-97/STGF-R-CNN.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106940"},"PeriodicalIF":6.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142781696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-11-29DOI: 10.1016/j.neunet.2024.106945
Wenyang Li, Mingliang Wang, Mingxia Liu, Qingshan Liu
Functional connectivity (FC), derived from resting-state functional magnetic resonance imaging (rs-fMRI), has been widely used to characterize brain abnormalities in disorders. FC is usually defined as a correlation matrix that is a symmetric positive definite (SPD) matrix lying on the Riemannian manifold. Recently, a number of learning-based methods have been proposed for FC analysis, while the geometric properties of Riemannian manifold have not yet been fully explored in previous studies. Also, most existing methods are designed to target one imaging site of fMRI data, which may result in limited training data for learning reliable and robust models. In this paper, we propose a novel Riemannian Manifold-based Disentangled Representation Learning (RM-DRL) framework which is capable of learning invariant representations from fMRI data across multiple sites for brain disorder diagnosis. In RM-DRL, we first employ an SPD-based encoder module to learn a latent unified representation of FC from different sites, which can preserve the Riemannian geometry of the SPD matrices. In latent space, a disentangled representation module is then designed to split the learned features into domain-specific and domain-invariant parts, respectively. Finally, a decoder module is introduced to ensure that sufficient information can be preserved during disentanglement learning. These designs allow us to introduce four types of training objectives to improve the disentanglement learning. Our RM-DRL method is evaluated on the public multi-site ABIDE dataset, showing superior performance compared with several state-of-the-art methods.
{"title":"Riemannian manifold-based disentangled representation learning for multi-site functional connectivity analysis.","authors":"Wenyang Li, Mingliang Wang, Mingxia Liu, Qingshan Liu","doi":"10.1016/j.neunet.2024.106945","DOIUrl":"10.1016/j.neunet.2024.106945","url":null,"abstract":"<p><p>Functional connectivity (FC), derived from resting-state functional magnetic resonance imaging (rs-fMRI), has been widely used to characterize brain abnormalities in disorders. FC is usually defined as a correlation matrix that is a symmetric positive definite (SPD) matrix lying on the Riemannian manifold. Recently, a number of learning-based methods have been proposed for FC analysis, while the geometric properties of Riemannian manifold have not yet been fully explored in previous studies. Also, most existing methods are designed to target one imaging site of fMRI data, which may result in limited training data for learning reliable and robust models. In this paper, we propose a novel Riemannian Manifold-based Disentangled Representation Learning (RM-DRL) framework which is capable of learning invariant representations from fMRI data across multiple sites for brain disorder diagnosis. In RM-DRL, we first employ an SPD-based encoder module to learn a latent unified representation of FC from different sites, which can preserve the Riemannian geometry of the SPD matrices. In latent space, a disentangled representation module is then designed to split the learned features into domain-specific and domain-invariant parts, respectively. Finally, a decoder module is introduced to ensure that sufficient information can be preserved during disentanglement learning. These designs allow us to introduce four types of training objectives to improve the disentanglement learning. Our RM-DRL method is evaluated on the public multi-site ABIDE dataset, showing superior performance compared with several state-of-the-art methods.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106945"},"PeriodicalIF":6.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142792335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-11-30DOI: 10.1016/j.neunet.2024.106964
Chunyu Zhang, Zhanshan Li
In generalized zero-shot learning (GZSL), it is required to identify seen and unseen samples under the condition that only seen classes can be obtained during training. Recent methods utilize disentanglement to make the information contained in visual features semantically related, and ensuring semantic consistency and independence of the disentangled representations is the key to achieving better performance. However, we think there are still some limitations. Firstly, due to the fact that only seen classes can be obtained during training, the recognition of unseen samples will be poor. Secondly, the distribution relations of the representation space and the semantic space are different, and ignoring the discrepancy between them may impact the generalization of the model. In addition, the instances are associated with each other, and considering the interactions between them can obtain more discriminative information, which should not be ignored. Thirdly, since the synthesized visual features may not match the corresponding semantic descriptions well, it will compromise the learning of semantic consistency. To overcome these challenges, we propose to learn discriminative and transferable disentangled representations (DTDR) for generalized zero-shot learning. Firstly, we exploit the estimated class similarities to supervise the relations between seen semantic-matched representations and unseen semantic descriptions, thereby gaining better insight into the unseen domain. Secondly, we use cosine similarities between semantic descriptions to constrain the similarities between semantic-matched representations, thereby facilitating the distribution relation of semantic-matched representation space to approximate the distribution relation of semantic space. And during the process, the instance-level correlation can be taken into account. Thirdly, we reconstruct the synthesized visual features into the corresponding semantic descriptions to better establish the associations between them. The experimental results on four datasets verify the effectiveness of our method.
{"title":"Generalized zero-shot learning via discriminative and transferable disentangled representations.","authors":"Chunyu Zhang, Zhanshan Li","doi":"10.1016/j.neunet.2024.106964","DOIUrl":"10.1016/j.neunet.2024.106964","url":null,"abstract":"<p><p>In generalized zero-shot learning (GZSL), it is required to identify seen and unseen samples under the condition that only seen classes can be obtained during training. Recent methods utilize disentanglement to make the information contained in visual features semantically related, and ensuring semantic consistency and independence of the disentangled representations is the key to achieving better performance. However, we think there are still some limitations. Firstly, due to the fact that only seen classes can be obtained during training, the recognition of unseen samples will be poor. Secondly, the distribution relations of the representation space and the semantic space are different, and ignoring the discrepancy between them may impact the generalization of the model. In addition, the instances are associated with each other, and considering the interactions between them can obtain more discriminative information, which should not be ignored. Thirdly, since the synthesized visual features may not match the corresponding semantic descriptions well, it will compromise the learning of semantic consistency. To overcome these challenges, we propose to learn discriminative and transferable disentangled representations (DTDR) for generalized zero-shot learning. Firstly, we exploit the estimated class similarities to supervise the relations between seen semantic-matched representations and unseen semantic descriptions, thereby gaining better insight into the unseen domain. Secondly, we use cosine similarities between semantic descriptions to constrain the similarities between semantic-matched representations, thereby facilitating the distribution relation of semantic-matched representation space to approximate the distribution relation of semantic space. And during the process, the instance-level correlation can be taken into account. Thirdly, we reconstruct the synthesized visual features into the corresponding semantic descriptions to better establish the associations between them. The experimental results on four datasets verify the effectiveness of our method.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106964"},"PeriodicalIF":6.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142808456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-11-28DOI: 10.1016/j.neunet.2024.106955
Wen Wen, Han Li, Rui Wu, Lingjuan Wu, Hong Chen
Adversarial pairwise learning has become the predominant method to enhance the discrimination ability of models against adversarial attacks, achieving tremendous success in various application fields. Despite excellent empirical performance, adversarial robustness and generalization of adversarial pairwise learning remain poorly understood from the theoretical perspective. This paper moves towards this by establishing the high-probability generalization bounds. Our bounds generally apply to various models and pairwise learning tasks. We give application examples involving explicit bounds of adversarial bipartite ranking and adversarial metric learning to illustrate how the theoretical results can be extended. Furthermore, we develop the optimistic generalization bound at order O(n-1) on the sample size n by leveraging local Rademacher complexity. Our analysis provides meaningful theoretical guidance for improving adversarial robustness through feature size and regularization. Experimental results validate theoretical findings.
{"title":"Generalization analysis of adversarial pairwise learning.","authors":"Wen Wen, Han Li, Rui Wu, Lingjuan Wu, Hong Chen","doi":"10.1016/j.neunet.2024.106955","DOIUrl":"10.1016/j.neunet.2024.106955","url":null,"abstract":"<p><p>Adversarial pairwise learning has become the predominant method to enhance the discrimination ability of models against adversarial attacks, achieving tremendous success in various application fields. Despite excellent empirical performance, adversarial robustness and generalization of adversarial pairwise learning remain poorly understood from the theoretical perspective. This paper moves towards this by establishing the high-probability generalization bounds. Our bounds generally apply to various models and pairwise learning tasks. We give application examples involving explicit bounds of adversarial bipartite ranking and adversarial metric learning to illustrate how the theoretical results can be extended. Furthermore, we develop the optimistic generalization bound at order O(n<sup>-1</sup>) on the sample size n by leveraging local Rademacher complexity. Our analysis provides meaningful theoretical guidance for improving adversarial robustness through feature size and regularization. Experimental results validate theoretical findings.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106955"},"PeriodicalIF":6.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}