Pub Date : 2025-03-01Epub Date: 2024-12-06DOI: 10.1016/j.neunet.2024.106930
Xin Liu, Xunbin Xiong, Mingyu Yan, Runzhen Xue, Shirui Pan, Songwen Pei, Lei Deng, Xiaochun Ye, Dongrui Fan
Large-scale graphs are prevalent in various real-world scenarios and can be effectively processed using Graph Neural Networks (GNNs) on GPUs to derive meaningful representations. However, the inherent irregularity found in real-world graphs poses challenges for leveraging the single-instruction multiple-data execution mode of GPUs, leading to inefficiencies in GNN training. In this paper, we try to alleviate this irregularity at its origin-the irregular graph data itself. To this end, we propose DropNaE to alleviate the irregularity in large-scale graphs by conditionally dropping nodes and edges before GNN training. Specifically, we first present a metric to quantify the neighbor heterophily of all nodes in a graph. Then, we propose DropNaE containing two variants to transform the irregular degree distribution of the large-scale graph to a uniform one, based on the proposed metric. Experiments show that DropNaE is highly compatible and can be integrated into popular GNNs to promote both training efficiency and accuracy of used GNNs. DropNaE is offline performed and requires no online computing resources, benefiting the state-of-the-art GNNs in the present and future to a significant extent.
{"title":"DropNaE: Alleviating irregularity for large-scale graph representation learning.","authors":"Xin Liu, Xunbin Xiong, Mingyu Yan, Runzhen Xue, Shirui Pan, Songwen Pei, Lei Deng, Xiaochun Ye, Dongrui Fan","doi":"10.1016/j.neunet.2024.106930","DOIUrl":"10.1016/j.neunet.2024.106930","url":null,"abstract":"<p><p>Large-scale graphs are prevalent in various real-world scenarios and can be effectively processed using Graph Neural Networks (GNNs) on GPUs to derive meaningful representations. However, the inherent irregularity found in real-world graphs poses challenges for leveraging the single-instruction multiple-data execution mode of GPUs, leading to inefficiencies in GNN training. In this paper, we try to alleviate this irregularity at its origin-the irregular graph data itself. To this end, we propose DropNaE to alleviate the irregularity in large-scale graphs by conditionally dropping nodes and edges before GNN training. Specifically, we first present a metric to quantify the neighbor heterophily of all nodes in a graph. Then, we propose DropNaE containing two variants to transform the irregular degree distribution of the large-scale graph to a uniform one, based on the proposed metric. Experiments show that DropNaE is highly compatible and can be integrated into popular GNNs to promote both training efficiency and accuracy of used GNNs. DropNaE is offline performed and requires no online computing resources, benefiting the state-of-the-art GNNs in the present and future to a significant extent.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106930"},"PeriodicalIF":6.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-12-02DOI: 10.1016/j.neunet.2024.106969
Jaeill Kim, Wonseok Lee, Moonjung Eo, Wonjong Rhee
Class Incremental Learning (CIL) constitutes a pivotal subfield within continual learning, aimed at enabling models to progressively learn new classification tasks while retaining knowledge obtained from prior tasks. Although previous studies have predominantly focused on backward compatible approaches to mitigate catastrophic forgetting, recent investigations have introduced forward compatible methods to enhance performance on novel tasks and complement existing backward compatible methods. In this study, we introduce effective-Rank based Feature Richness enhancement (RFR) method that is designed for improving forward compatibility. Specifically, this method increases the effective rank of representations during the base session, thereby facilitating the incorporation of more informative features pertinent to unseen novel tasks. Consequently, RFR achieves dual objectives in backward and forward compatibility: minimizing feature extractor modifications and enhancing novel task performance, respectively. To validate the efficacy of our approach, we establish a theoretical connection between effective rank and the Shannon entropy of representations. Subsequently, we conduct comprehensive experiments by integrating RFR into eleven well-known CIL methods. Our results demonstrate the effectiveness of our approach in enhancing novel-task performance while mitigating catastrophic forgetting. Furthermore, our method notably improves the average incremental accuracy across all eleven cases examined.
{"title":"Improving forward compatibility in class incremental learning by increasing representation rank and feature richness.","authors":"Jaeill Kim, Wonseok Lee, Moonjung Eo, Wonjong Rhee","doi":"10.1016/j.neunet.2024.106969","DOIUrl":"10.1016/j.neunet.2024.106969","url":null,"abstract":"<p><p>Class Incremental Learning (CIL) constitutes a pivotal subfield within continual learning, aimed at enabling models to progressively learn new classification tasks while retaining knowledge obtained from prior tasks. Although previous studies have predominantly focused on backward compatible approaches to mitigate catastrophic forgetting, recent investigations have introduced forward compatible methods to enhance performance on novel tasks and complement existing backward compatible methods. In this study, we introduce effective-Rank based Feature Richness enhancement (RFR) method that is designed for improving forward compatibility. Specifically, this method increases the effective rank of representations during the base session, thereby facilitating the incorporation of more informative features pertinent to unseen novel tasks. Consequently, RFR achieves dual objectives in backward and forward compatibility: minimizing feature extractor modifications and enhancing novel task performance, respectively. To validate the efficacy of our approach, we establish a theoretical connection between effective rank and the Shannon entropy of representations. Subsequently, we conduct comprehensive experiments by integrating RFR into eleven well-known CIL methods. Our results demonstrate the effectiveness of our approach in enhancing novel-task performance while mitigating catastrophic forgetting. Furthermore, our method notably improves the average incremental accuracy across all eleven cases examined.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106969"},"PeriodicalIF":6.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142796518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-11-26DOI: 10.1016/j.neunet.2024.106943
Kai Ye, Haoteng Tang, Siyuan Dai, Igor Fortel, Paul M Thompson, R Scott Mackin, Alex Leow, Heng Huang, Liang Zhan
The application of deep learning techniques to analyze brain functional magnetic resonance imaging (fMRI) data has led to significant advancements in identifying prospective biomarkers associated with various clinical phenotypes and neurological conditions. Despite these achievements, the aspect of prediction uncertainty has been relatively underexplored in brain fMRI data analysis. Accurate uncertainty estimation is essential for trustworthy learning, given the challenges associated with brain fMRI data acquisition and the potential diagnostic implications for patients. To address this gap, we introduce a novel posterior evidential network, named the Brain Posterior Evidential Network (BPEN), designed to capture both aleatoric and epistemic uncertainty in the analysis of brain fMRI data. We conducted comprehensive experiments using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and ADNI-depression (ADNI-D) cohorts, focusing on predictions for mild cognitive impairment (MCI) and depression across various diagnostic groups. Our experiments not only unequivocally demonstrate the superior predictive performance of our BPEN model compared to existing state-of-the-art methods but also underscore the importance of uncertainty estimation in predictive models.
{"title":"BPEN: Brain Posterior Evidential Network for trustworthy brain imaging analysis.","authors":"Kai Ye, Haoteng Tang, Siyuan Dai, Igor Fortel, Paul M Thompson, R Scott Mackin, Alex Leow, Heng Huang, Liang Zhan","doi":"10.1016/j.neunet.2024.106943","DOIUrl":"10.1016/j.neunet.2024.106943","url":null,"abstract":"<p><p>The application of deep learning techniques to analyze brain functional magnetic resonance imaging (fMRI) data has led to significant advancements in identifying prospective biomarkers associated with various clinical phenotypes and neurological conditions. Despite these achievements, the aspect of prediction uncertainty has been relatively underexplored in brain fMRI data analysis. Accurate uncertainty estimation is essential for trustworthy learning, given the challenges associated with brain fMRI data acquisition and the potential diagnostic implications for patients. To address this gap, we introduce a novel posterior evidential network, named the Brain Posterior Evidential Network (BPEN), designed to capture both aleatoric and epistemic uncertainty in the analysis of brain fMRI data. We conducted comprehensive experiments using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and ADNI-depression (ADNI-D) cohorts, focusing on predictions for mild cognitive impairment (MCI) and depression across various diagnostic groups. Our experiments not only unequivocally demonstrate the superior predictive performance of our BPEN model compared to existing state-of-the-art methods but also underscore the importance of uncertainty estimation in predictive models.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"106943"},"PeriodicalIF":6.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11750605/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142808452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-20DOI: 10.1016/j.neunet.2025.107169
Xiang Wang, Hao Dou, Dibo Dong, Zhenyu Meng
Anomaly detection on graph data has garnered significant interest from both the academia and industry. In recent years, fueled by the rapid development of Graph Neural Networks (GNNs), various GNNs-based anomaly detection methods have been proposed and achieved good results. However, GNNs-based methods assume that connected nodes have similar classes and features, leading to issues of class inconsistency and semantic inconsistency in graph anomaly detection. Existing methods have yet to adequately address these issues, thereby limiting the detection performance of the model. Therefore, an anomaly detection method that consists of one semantic fusion-based node representation module and one attention mechanism-based node representation module is proposed to resolve the aforementioned issues, respectively. The main highlights of the current study are outlined below: First, a novel framework is developed, aiming to better resolve the issues of class inconsistency and semantic inconsistency in graph anomaly detection. Second, we propose the semantic fusion-based node representation module which is based on Chebyshev polynomial graph filtering and is able to effectively capture high-frequency and low-frequency components of graph signals. Third, to overcome semantic inconsistency in graph data, we devise an attention mechanism-based node representation module which can adaptively learns importance information of graph nodes, resulting in significant improvement of the model performance. Finally, experiments are carried out on five real-world anomaly detection datasets, and the results show that the proposed method outperforms the state-of-the-art methods.
{"title":"Graph anomaly detection based on hybrid node representation learning.","authors":"Xiang Wang, Hao Dou, Dibo Dong, Zhenyu Meng","doi":"10.1016/j.neunet.2025.107169","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107169","url":null,"abstract":"<p><p>Anomaly detection on graph data has garnered significant interest from both the academia and industry. In recent years, fueled by the rapid development of Graph Neural Networks (GNNs), various GNNs-based anomaly detection methods have been proposed and achieved good results. However, GNNs-based methods assume that connected nodes have similar classes and features, leading to issues of class inconsistency and semantic inconsistency in graph anomaly detection. Existing methods have yet to adequately address these issues, thereby limiting the detection performance of the model. Therefore, an anomaly detection method that consists of one semantic fusion-based node representation module and one attention mechanism-based node representation module is proposed to resolve the aforementioned issues, respectively. The main highlights of the current study are outlined below: First, a novel framework is developed, aiming to better resolve the issues of class inconsistency and semantic inconsistency in graph anomaly detection. Second, we propose the semantic fusion-based node representation module which is based on Chebyshev polynomial graph filtering and is able to effectively capture high-frequency and low-frequency components of graph signals. Third, to overcome semantic inconsistency in graph data, we devise an attention mechanism-based node representation module which can adaptively learns importance information of graph nodes, resulting in significant improvement of the model performance. Finally, experiments are carried out on five real-world anomaly detection datasets, and the results show that the proposed method outperforms the state-of-the-art methods.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107169"},"PeriodicalIF":6.0,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143025050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-18DOI: 10.1016/j.neunet.2025.107176
Driss El Alaoui, Jamal Riffi, Abdelouahed Sabri, Badraddine Aghoutane, Ali Yahyaouy, Hamid Tairi
Session-based recommendation systems (SBRS) are essential for enhancing the customer experience, improving sales and loyalty, and providing the possibility to discover products in dynamic and real-world scenarios without needing user history. Despite their importance, traditional or even current SBRS algorithms face limitations, notably the inability to capture complex item transitions within each session and the disregard for general patterns that can be derived from multiple sessions. This paper proposes a novel SBRS model, called Capsule GraphSAGE for Session-Based Recommendation (CapsGSR), that marries GraphSAGE's scalability and inductive learning capabilities with the Capsules network's abstraction levels by generating multiple integrations for each node from different perspectives. Consequently, CapsGSR addresses challenges that may hinder the optimal item representations and captures transitions' complex nature, mitigating the loss of crucial information. Our system significantly outperforms baseline models on benchmark datasets, with improvements of 8.44% in HR@20 and 4.66% in MRR@20 , indicating its effectiveness in delivering precise and relevant recommendations.
{"title":"A Novel session-based recommendation system using capsule graph neural network.","authors":"Driss El Alaoui, Jamal Riffi, Abdelouahed Sabri, Badraddine Aghoutane, Ali Yahyaouy, Hamid Tairi","doi":"10.1016/j.neunet.2025.107176","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107176","url":null,"abstract":"<p><p>Session-based recommendation systems (SBRS) are essential for enhancing the customer experience, improving sales and loyalty, and providing the possibility to discover products in dynamic and real-world scenarios without needing user history. Despite their importance, traditional or even current SBRS algorithms face limitations, notably the inability to capture complex item transitions within each session and the disregard for general patterns that can be derived from multiple sessions. This paper proposes a novel SBRS model, called Capsule GraphSAGE for Session-Based Recommendation (CapsGSR), that marries GraphSAGE's scalability and inductive learning capabilities with the Capsules network's abstraction levels by generating multiple integrations for each node from different perspectives. Consequently, CapsGSR addresses challenges that may hinder the optimal item representations and captures transitions' complex nature, mitigating the loss of crucial information. Our system significantly outperforms baseline models on benchmark datasets, with improvements of 8.44% in HR@20 and 4.66% in MRR@20 , indicating its effectiveness in delivering precise and relevant recommendations.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107176"},"PeriodicalIF":6.0,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143025004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-18DOI: 10.1016/j.neunet.2025.107143
Jiayong Bao, Jiangshe Zhang, Chunxia Zhang, Lili Bao
The discrete cosine transform (DCT) has been widely used in computer vision tasks due to its ability of high compression ratio and high-quality visual presentation. However, conventional DCT is usually affected by the size of transform region and results in blocking effect. Therefore, eliminating the blocking effects to efficiently serve for vision tasks is significant and challenging. In this paper, we introduce All Phase Sequency DCT (APSeDCT) into convolutional networks to extract multi-frequency information of deep features. Due to the fact that APSeDCT can be equivalent to convolutional operation, we construct corresponding convolution module called APSeDCT Convolution (APSeDCTConv) that has great transferability similar to vanilla convolution. Then we propose an augmented convolutional operator called MultiConv with APSeDCTConv. By replacing the last three bottleneck blocks of ResNet with MultiConv, our approach not only reduces the computational costs and the number of parameters, but also exhibits great performance in classification, object detection and instance segmentation tasks. Extensive experiments show that APSeDCTConv augmentation leads to consistent performance improvements in image classification on ImageNet across various different models and scales, including ResNet, Res2Net and ResNext, and achieving 0.5%-1.1% and 0.4%-0.7% AP performance improvements for object detection and instance segmentation, respectively, on the COCO benchmark compared to the baseline.
{"title":"DCTCNet: Sequency discrete cosine transform convolution network for visual recognition.","authors":"Jiayong Bao, Jiangshe Zhang, Chunxia Zhang, Lili Bao","doi":"10.1016/j.neunet.2025.107143","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107143","url":null,"abstract":"<p><p>The discrete cosine transform (DCT) has been widely used in computer vision tasks due to its ability of high compression ratio and high-quality visual presentation. However, conventional DCT is usually affected by the size of transform region and results in blocking effect. Therefore, eliminating the blocking effects to efficiently serve for vision tasks is significant and challenging. In this paper, we introduce All Phase Sequency DCT (APSeDCT) into convolutional networks to extract multi-frequency information of deep features. Due to the fact that APSeDCT can be equivalent to convolutional operation, we construct corresponding convolution module called APSeDCT Convolution (APSeDCTConv) that has great transferability similar to vanilla convolution. Then we propose an augmented convolutional operator called MultiConv with APSeDCTConv. By replacing the last three bottleneck blocks of ResNet with MultiConv, our approach not only reduces the computational costs and the number of parameters, but also exhibits great performance in classification, object detection and instance segmentation tasks. Extensive experiments show that APSeDCTConv augmentation leads to consistent performance improvements in image classification on ImageNet across various different models and scales, including ResNet, Res2Net and ResNext, and achieving 0.5%-1.1% and 0.4%-0.7% AP performance improvements for object detection and instance segmentation, respectively, on the COCO benchmark compared to the baseline.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107143"},"PeriodicalIF":6.0,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1016/j.neunet.2025.107141
Gu-Bon Jeong, Dong-Wan Choi
Among various out-of-distribution (OOD) detection methods in neural networks, outlier exposure (OE) using auxiliary data has shown to achieve practical performance. However, existing OE methods are typically assumed to run in a centralized manner, and thus are not feasible for a standard federated learning (FL) setting where each client has low computing power and cannot collect a variety of auxiliary samples. To address this issue, we propose a practical yet realistic OE scenario in FL where only the central server has a large amount of outlier data and a relatively small amount of in-distribution (ID) data is given to each client. For this scenario, we introduce an effective OE-based OOD detection method, called internal separation & backstage collaboration, which makes the best use of many auxiliary outlier samples without sacrificing the ultimate goal of FL, that is, privacy preservation as well as collaborative training performance. The most challenging part is how to make the same effect in our scenario as in joint centralized training with outliers and ID samples. Our main strategy (internal separation) is to jointly train the feature vectors of an internal layer with outliers in the back layers of the global model, while ensuring privacy preservation. We also suggest an collaborative approach (backstage collaboration) where multiple back layers are trained together to detect OOD samples. Our extensive experiments demonstrate that our method shows remarkable detection performance, compared to baseline approaches in the proposed OE scenario.
在各种神经网络的out- distribution (OOD)检测方法中,利用辅助数据的outlier exposure (OE)已被证明具有较好的实用性。然而,现有的OE方法通常假定以集中的方式运行,因此对于标准的联邦学习(FL)设置是不可用的,因为每个客户机的计算能力都很低,不能收集各种辅助样本。为了解决这个问题,我们在FL中提出了一个实用而现实的OE场景,其中只有中央服务器拥有大量的离群数据,而相对少量的分布内(ID)数据被提供给每个客户端。针对这种场景,我们引入了一种有效的基于oe的OOD检测方法,称为内部分离&后台协作,在不牺牲FL的最终目标即隐私保护和协同训练性能的前提下,充分利用了众多辅助离群样本。最具挑战性的部分是如何在我们的场景中取得与异常值和ID样本联合集中训练相同的效果。我们的主要策略(内部分离)是在保证隐私保护的同时,与全局模型后层的离群值共同训练内层的特征向量。我们还建议采用协作方法(后台协作),其中多个后台层一起训练以检测OOD样本。我们的大量实验表明,与提出的OE场景中的基线方法相比,我们的方法具有显着的检测性能。
{"title":"Out-of-Distribution Detection via outlier exposure in federated learning.","authors":"Gu-Bon Jeong, Dong-Wan Choi","doi":"10.1016/j.neunet.2025.107141","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107141","url":null,"abstract":"<p><p>Among various out-of-distribution (OOD) detection methods in neural networks, outlier exposure (OE) using auxiliary data has shown to achieve practical performance. However, existing OE methods are typically assumed to run in a centralized manner, and thus are not feasible for a standard federated learning (FL) setting where each client has low computing power and cannot collect a variety of auxiliary samples. To address this issue, we propose a practical yet realistic OE scenario in FL where only the central server has a large amount of outlier data and a relatively small amount of in-distribution (ID) data is given to each client. For this scenario, we introduce an effective OE-based OOD detection method, called internal separation & backstage collaboration, which makes the best use of many auxiliary outlier samples without sacrificing the ultimate goal of FL, that is, privacy preservation as well as collaborative training performance. The most challenging part is how to make the same effect in our scenario as in joint centralized training with outliers and ID samples. Our main strategy (internal separation) is to jointly train the feature vectors of an internal layer with outliers in the back layers of the global model, while ensuring privacy preservation. We also suggest an collaborative approach (backstage collaboration) where multiple back layers are trained together to detect OOD samples. Our extensive experiments demonstrate that our method shows remarkable detection performance, compared to baseline approaches in the proposed OE scenario.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107141"},"PeriodicalIF":6.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-view classification integrates features from different views to optimize classification performance. Most of the existing works typically utilize semantic information to achieve view fusion but neglect the spatial information of data itself, which accommodates data representation with correlation information and is proven to be an essential aspect. Thus robust independent subspace analysis network, optimized by sparse and soft orthogonal optimization, is first proposed to extract the latent spatial information of multi-view data with subspace bases. Building on this, a novel contrastive independent subspace analysis framework for multi-view classification is developed to further optimize from spatial perspective. Specifically, contrastive subspace optimization separates the subspaces, thereby enhancing their representational capacity. Whilst contrastive fusion optimization aims at building cross-view subspace correlations and forms a non overlapping data representation. In k-fold validation experiments, MvCISA achieved state-of-the-art accuracies of 76.95%, 98.50%, 93.33% and 88.24% on four benchmark multi-view datasets, significantly outperforming the second-best method by 8.57%, 0.25%, 1.66% and 5.96% in accuracy. And visualization experiments demonstrate the effectiveness of the subspace and feature space optimization, also indicating their promising potential for other downstream tasks. Our code is available at https://github.com/raRn0y/MvCISA.
{"title":"Contrastive independent subspace analysis network for multi-view spatial information extraction.","authors":"Tengyu Zhang, Deyu Zeng, Wei Liu, Zongze Wu, Chris Ding, Xiaopin Zhong","doi":"10.1016/j.neunet.2024.107105","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107105","url":null,"abstract":"<p><p>Multi-view classification integrates features from different views to optimize classification performance. Most of the existing works typically utilize semantic information to achieve view fusion but neglect the spatial information of data itself, which accommodates data representation with correlation information and is proven to be an essential aspect. Thus robust independent subspace analysis network, optimized by sparse and soft orthogonal optimization, is first proposed to extract the latent spatial information of multi-view data with subspace bases. Building on this, a novel contrastive independent subspace analysis framework for multi-view classification is developed to further optimize from spatial perspective. Specifically, contrastive subspace optimization separates the subspaces, thereby enhancing their representational capacity. Whilst contrastive fusion optimization aims at building cross-view subspace correlations and forms a non overlapping data representation. In k-fold validation experiments, MvCISA achieved state-of-the-art accuracies of 76.95%, 98.50%, 93.33% and 88.24% on four benchmark multi-view datasets, significantly outperforming the second-best method by 8.57%, 0.25%, 1.66% and 5.96% in accuracy. And visualization experiments demonstrate the effectiveness of the subspace and feature space optimization, also indicating their promising potential for other downstream tasks. Our code is available at https://github.com/raRn0y/MvCISA.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107105"},"PeriodicalIF":6.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143025042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-16DOI: 10.1016/j.neunet.2025.107168
Along He, Yanlin Wu, Zhihong Wang, Tao Li, Huazhu Fu
Pre-training and fine-tuning have become popular due to the rich representations embedded in large pre-trained models, which can be leveraged for downstream medical tasks. However, existing methods typically either fine-tune all parameters or only task-specific layers of pre-trained models, overlooking the variability in input medical images. As a result, these approaches may lack efficiency or effectiveness. In this study, our goal is to explore parameter-efficient fine-tuning (PEFT) for medical image analysis. To address this challenge, we introduce a novel method called Dynamic Visual Prompt Tuning (DVPT). It can extract knowledge beneficial to downstream tasks from large models with only a few trainable parameters. First, the frozen features are transformed by a lightweight bottleneck layer to learn the domain-specific distribution of downstream medical tasks. Then, a few learnable visual prompts are employed as dynamic queries to conduct cross-attention with the transformed features, aiming to acquire sample-specific features. This DVPT module can be shared across different Transformer layers, further reducing the number of trainable parameters. We conduct extensive experiments with various pre-trained models on medical classification and segmentation tasks. We find that this PEFT method not only efficiently adapts pre-trained models to the medical domain but also enhances data efficiency with limited labeled data. For example, with only 0.5% additional trainable parameters, our method not only outperforms state-of-the-art PEFT methods but also surpasses full fine-tuning by more than 2.20% in Kappa score on the medical classification task. It can save up to 60% of labeled data and 99% of storage cost of ViT-B/16.
{"title":"DVPT: Dynamic Visual Prompt Tuning of large pre-trained models for medical image analysis.","authors":"Along He, Yanlin Wu, Zhihong Wang, Tao Li, Huazhu Fu","doi":"10.1016/j.neunet.2025.107168","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107168","url":null,"abstract":"<p><p>Pre-training and fine-tuning have become popular due to the rich representations embedded in large pre-trained models, which can be leveraged for downstream medical tasks. However, existing methods typically either fine-tune all parameters or only task-specific layers of pre-trained models, overlooking the variability in input medical images. As a result, these approaches may lack efficiency or effectiveness. In this study, our goal is to explore parameter-efficient fine-tuning (PEFT) for medical image analysis. To address this challenge, we introduce a novel method called Dynamic Visual Prompt Tuning (DVPT). It can extract knowledge beneficial to downstream tasks from large models with only a few trainable parameters. First, the frozen features are transformed by a lightweight bottleneck layer to learn the domain-specific distribution of downstream medical tasks. Then, a few learnable visual prompts are employed as dynamic queries to conduct cross-attention with the transformed features, aiming to acquire sample-specific features. This DVPT module can be shared across different Transformer layers, further reducing the number of trainable parameters. We conduct extensive experiments with various pre-trained models on medical classification and segmentation tasks. We find that this PEFT method not only efficiently adapts pre-trained models to the medical domain but also enhances data efficiency with limited labeled data. For example, with only 0.5% additional trainable parameters, our method not only outperforms state-of-the-art PEFT methods but also surpasses full fine-tuning by more than 2.20% in Kappa score on the medical classification task. It can save up to 60% of labeled data and 99% of storage cost of ViT-B/16.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107168"},"PeriodicalIF":6.0,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate 3D point cloud object detection is crucially important for autonomous driving vehicles. The sparsity of point clouds in 3D scenes, especially for smaller targets like pedestrians and bicycles that contain fewer points, makes detection particularly challenging. To solve this problem, we propose a single-stage voxel-based 3D object detection method, namely PFENet. Firstly, we design a robust voxel feature encoding network that incorporates a stacked triple attention mechanism to enhance the extraction of key features and suppress noise. Moreover, a 3D sparse convolution layer dynamically adjusts feature processing based on output location importance, improving small object recognition. Additionally, the attentional feature fusion module in the region proposal network merges low-level spatial features with high-level semantic features, and broadens the receptive field through atrous spatial pyramid pooling to capture multi-scale features. Finally, we develop multiple detection heads for more refined feature extraction and object classification, as well as more accurate bounding box regression. Experimental results on the KITTI dataset demonstrate the effectiveness of the proposed method.
{"title":"PFENet: Towards precise feature extraction from sparse point cloud for 3D object detection.","authors":"Yaochen Li, Qiao Li, Cong Gao, Shengjing Gao, Hao Wu, Rui Liu","doi":"10.1016/j.neunet.2025.107144","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107144","url":null,"abstract":"<p><p>Accurate 3D point cloud object detection is crucially important for autonomous driving vehicles. The sparsity of point clouds in 3D scenes, especially for smaller targets like pedestrians and bicycles that contain fewer points, makes detection particularly challenging. To solve this problem, we propose a single-stage voxel-based 3D object detection method, namely PFENet. Firstly, we design a robust voxel feature encoding network that incorporates a stacked triple attention mechanism to enhance the extraction of key features and suppress noise. Moreover, a 3D sparse convolution layer dynamically adjusts feature processing based on output location importance, improving small object recognition. Additionally, the attentional feature fusion module in the region proposal network merges low-level spatial features with high-level semantic features, and broadens the receptive field through atrous spatial pyramid pooling to capture multi-scale features. Finally, we develop multiple detection heads for more refined feature extraction and object classification, as well as more accurate bounding box regression. Experimental results on the KITTI dataset demonstrate the effectiveness of the proposed method.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107144"},"PeriodicalIF":6.0,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}