首页 > 最新文献

Neural Networks最新文献

英文 中文
Supporting vision-language model few-shot inference with confounder-pruned knowledge prompt.
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-18 DOI: 10.1016/j.neunet.2025.107173
Jiangmeng Li, Wenyi Mo, Fei Song, Chuxiong Sun, Wenwen Qiang, Bing Su, Changwen Zheng

Vision-language models are pre-trained by aligning image-text pairs in a common space to deal with open-set visual concepts. Recent works adopt fixed or learnable prompts, i.e., classification weights are synthesized from natural language descriptions of task-relevant categories, to reduce the gap between tasks during the pre-training and inference phases. However, how and what prompts can improve inference performance remains unclear. In this paper, we explicitly clarify the importance of incorporating semantic information into prompts, while existing prompting methods generate prompts without sufficiently exploring the semantic information of textual labels. Manually constructing prompts with rich semantics requires domain expertise and is extremely time-consuming. To cope with this issue, we propose a knowledge-aware prompt learning method, namely Confounder-pruned Knowledge Prompt (CPKP), which retrieves an ontology knowledge graph by treating the textual label as a query to extract task-relevant semantic information. CPKP further introduces a double-tier confounder-pruning procedure to refine the derived semantic information. Adhering to the individual causal effect principle, the graph-tier confounders are gradually identified and phased out. The feature-tier confounders are eliminated by following the maximum entropy principle in information theory. Empirically, the evaluations demonstrate the effectiveness of CPKP in few-shot inference, e.g., with only two shots, CPKP outperforms the manual-prompt method by 4.64% and the learnable-prompt method by 1.09% on average.

{"title":"Supporting vision-language model few-shot inference with confounder-pruned knowledge prompt.","authors":"Jiangmeng Li, Wenyi Mo, Fei Song, Chuxiong Sun, Wenwen Qiang, Bing Su, Changwen Zheng","doi":"10.1016/j.neunet.2025.107173","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107173","url":null,"abstract":"<p><p>Vision-language models are pre-trained by aligning image-text pairs in a common space to deal with open-set visual concepts. Recent works adopt fixed or learnable prompts, i.e., classification weights are synthesized from natural language descriptions of task-relevant categories, to reduce the gap between tasks during the pre-training and inference phases. However, how and what prompts can improve inference performance remains unclear. In this paper, we explicitly clarify the importance of incorporating semantic information into prompts, while existing prompting methods generate prompts without sufficiently exploring the semantic information of textual labels. Manually constructing prompts with rich semantics requires domain expertise and is extremely time-consuming. To cope with this issue, we propose a knowledge-aware prompt learning method, namely Confounder-pruned Knowledge Prompt (CPKP), which retrieves an ontology knowledge graph by treating the textual label as a query to extract task-relevant semantic information. CPKP further introduces a double-tier confounder-pruning procedure to refine the derived semantic information. Adhering to the individual causal effect principle, the graph-tier confounders are gradually identified and phased out. The feature-tier confounders are eliminated by following the maximum entropy principle in information theory. Empirically, the evaluations demonstrate the effectiveness of CPKP in few-shot inference, e.g., with only two shots, CPKP outperforms the manual-prompt method by 4.64% and the learnable-prompt method by 1.09% on average.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107173"},"PeriodicalIF":6.0,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LGS-KT: Integrating logical and grammatical skills for effective programming knowledge tracing.
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-18 DOI: 10.1016/j.neunet.2025.107164
Xinjie Sun, Qi Liu, Kai Zhang, Shuanghong Shen, Yan Zhuang, Yuxiang Guo

Knowledge tracing (KT) estimates students' mastery of knowledge concepts or skills by analyzing their historical interactions. Although general KT methods have effectively assessed students' knowledge states, specific measurements of students' programming skills remain insufficient. Existing studies mainly rely on exercise outcomes and do not fully utilize behavioral data during the programming process. Therefore, we integrate a Logical and Grammar Skills Knowledge Tracing (LGS-KT) model to enhance programming education. This model integrates static analysis and dynamic monitoring (such as CPU and memory consumption) to evaluate code elements, providing a thorough assessment of code quality. By analyzing students' multiple iterations on the same programming problem, we constructed a reweighted logical skill evolution graph to assess the development of students' logical skills. Additionally, to enhance the interactions among representations with similar grammatical skills, we developed a grammatical skills interaction graph based on the similarity of knowledge concepts. This approach significantly improves the accuracy of inferring students' programming grammatical skill states. The LGS-KT model has demonstrated superior performance in predicting student outcomes. Our research highlights the potential application of a KT model that integrates logical and grammatical skills in programming exercises. To support reproducible research, we have published the data and code at https://github.com/xinjiesun-ustc/LGS-KT, encouraging further innovation in this field.

{"title":"LGS-KT: Integrating logical and grammatical skills for effective programming knowledge tracing.","authors":"Xinjie Sun, Qi Liu, Kai Zhang, Shuanghong Shen, Yan Zhuang, Yuxiang Guo","doi":"10.1016/j.neunet.2025.107164","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107164","url":null,"abstract":"<p><p>Knowledge tracing (KT) estimates students' mastery of knowledge concepts or skills by analyzing their historical interactions. Although general KT methods have effectively assessed students' knowledge states, specific measurements of students' programming skills remain insufficient. Existing studies mainly rely on exercise outcomes and do not fully utilize behavioral data during the programming process. Therefore, we integrate a Logical and Grammar Skills Knowledge Tracing (LGS-KT) model to enhance programming education. This model integrates static analysis and dynamic monitoring (such as CPU and memory consumption) to evaluate code elements, providing a thorough assessment of code quality. By analyzing students' multiple iterations on the same programming problem, we constructed a reweighted logical skill evolution graph to assess the development of students' logical skills. Additionally, to enhance the interactions among representations with similar grammatical skills, we developed a grammatical skills interaction graph based on the similarity of knowledge concepts. This approach significantly improves the accuracy of inferring students' programming grammatical skill states. The LGS-KT model has demonstrated superior performance in predicting student outcomes. Our research highlights the potential application of a KT model that integrates logical and grammatical skills in programming exercises. To support reproducible research, we have published the data and code at https://github.com/xinjiesun-ustc/LGS-KT, encouraging further innovation in this field.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107164"},"PeriodicalIF":6.0,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DCTCNet: Sequency discrete cosine transform convolution network for visual recognition.
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-18 DOI: 10.1016/j.neunet.2025.107143
Jiayong Bao, Jiangshe Zhang, Chunxia Zhang, Lili Bao

The discrete cosine transform (DCT) has been widely used in computer vision tasks due to its ability of high compression ratio and high-quality visual presentation. However, conventional DCT is usually affected by the size of transform region and results in blocking effect. Therefore, eliminating the blocking effects to efficiently serve for vision tasks is significant and challenging. In this paper, we introduce All Phase Sequency DCT (APSeDCT) into convolutional networks to extract multi-frequency information of deep features. Due to the fact that APSeDCT can be equivalent to convolutional operation, we construct corresponding convolution module called APSeDCT Convolution (APSeDCTConv) that has great transferability similar to vanilla convolution. Then we propose an augmented convolutional operator called MultiConv with APSeDCTConv. By replacing the last three bottleneck blocks of ResNet with MultiConv, our approach not only reduces the computational costs and the number of parameters, but also exhibits great performance in classification, object detection and instance segmentation tasks. Extensive experiments show that APSeDCTConv augmentation leads to consistent performance improvements in image classification on ImageNet across various different models and scales, including ResNet, Res2Net and ResNext, and achieving 0.5%-1.1% and 0.4%-0.7% AP performance improvements for object detection and instance segmentation, respectively, on the COCO benchmark compared to the baseline.

{"title":"DCTCNet: Sequency discrete cosine transform convolution network for visual recognition.","authors":"Jiayong Bao, Jiangshe Zhang, Chunxia Zhang, Lili Bao","doi":"10.1016/j.neunet.2025.107143","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107143","url":null,"abstract":"<p><p>The discrete cosine transform (DCT) has been widely used in computer vision tasks due to its ability of high compression ratio and high-quality visual presentation. However, conventional DCT is usually affected by the size of transform region and results in blocking effect. Therefore, eliminating the blocking effects to efficiently serve for vision tasks is significant and challenging. In this paper, we introduce All Phase Sequency DCT (APSeDCT) into convolutional networks to extract multi-frequency information of deep features. Due to the fact that APSeDCT can be equivalent to convolutional operation, we construct corresponding convolution module called APSeDCT Convolution (APSeDCTConv) that has great transferability similar to vanilla convolution. Then we propose an augmented convolutional operator called MultiConv with APSeDCTConv. By replacing the last three bottleneck blocks of ResNet with MultiConv, our approach not only reduces the computational costs and the number of parameters, but also exhibits great performance in classification, object detection and instance segmentation tasks. Extensive experiments show that APSeDCTConv augmentation leads to consistent performance improvements in image classification on ImageNet across various different models and scales, including ResNet, Res2Net and ResNext, and achieving 0.5%-1.1% and 0.4%-0.7% AP performance improvements for object detection and instance segmentation, respectively, on the COCO benchmark compared to the baseline.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107143"},"PeriodicalIF":6.0,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DGMSCL: A dynamic graph mixed supervised contrastive learning approach for class imbalanced multivariate time series classification.
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-17 DOI: 10.1016/j.neunet.2025.107131
Lipeng Qian, Qiong Zuo, Dahu Li, Hong Zhu

In the Imbalanced Multivariate Time Series Classification (ImMTSC) task, minority-class instances typically correspond to critical events, such as system faults in power grids or abnormal health occurrences in medical monitoring. Despite being rare and random, these events are highly significant. The dynamic spatial-temporal relationships between minority-class instances and other instances make them more prone to interference from neighboring instances during classification. Increasing the number of minority-class samples during training often results in overfitting to a single pattern of the minority class. Contrastive learning ensures that majority-class instances learn similar features in the representation space. However, it does not effectively aggregate features from neighboring minority-class instances, hindering its ability to properly represent these instances in the ImMTS dataset. Therefor, we propose a dynamic graph-based mixed supervised contrastive learning method (DGMSCL) that effectively fits minority-class features without increasing their number, while also separating them from other instances in the representation space. First, it reconstructs the input sequence into dynamic graphs and employs a hierarchical attention graph neural network (HAGNN) to generate a discriminative embedding representation between instances. Based on this, we introduce a novel mixed contrast loss, which includes weight-augmented inter-graph supervised contrast (WAIGC) and context-based minority class-aware contrast (MCAC). It adjusts the sample weights based on their quantity and intrinsic characteristics, placing greater emphasis on minority-class loss to produce more effective gradient gains during training. Additionally, it separates minority-class instances from adjacent transitional instances in the representation space, enhancing their representational capacity. Extensive experiments across various scenarios and datasets with differing degrees of imbalance demonstrate that DGMSCL consistently outperforms existing baseline models. Specifically, DGMSCL achieves higher overall classification accuracy, as evidenced by significantly improved average F1-score, G-mean, and kappa coefficient across multiple datasets. Moreover, classification results on a real-world power data show that DGMSCL generalizes well to real-world application.

{"title":"DGMSCL: A dynamic graph mixed supervised contrastive learning approach for class imbalanced multivariate time series classification.","authors":"Lipeng Qian, Qiong Zuo, Dahu Li, Hong Zhu","doi":"10.1016/j.neunet.2025.107131","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107131","url":null,"abstract":"<p><p>In the Imbalanced Multivariate Time Series Classification (ImMTSC) task, minority-class instances typically correspond to critical events, such as system faults in power grids or abnormal health occurrences in medical monitoring. Despite being rare and random, these events are highly significant. The dynamic spatial-temporal relationships between minority-class instances and other instances make them more prone to interference from neighboring instances during classification. Increasing the number of minority-class samples during training often results in overfitting to a single pattern of the minority class. Contrastive learning ensures that majority-class instances learn similar features in the representation space. However, it does not effectively aggregate features from neighboring minority-class instances, hindering its ability to properly represent these instances in the ImMTS dataset. Therefor, we propose a dynamic graph-based mixed supervised contrastive learning method (DGMSCL) that effectively fits minority-class features without increasing their number, while also separating them from other instances in the representation space. First, it reconstructs the input sequence into dynamic graphs and employs a hierarchical attention graph neural network (HAGNN) to generate a discriminative embedding representation between instances. Based on this, we introduce a novel mixed contrast loss, which includes weight-augmented inter-graph supervised contrast (WAIGC) and context-based minority class-aware contrast (MCAC). It adjusts the sample weights based on their quantity and intrinsic characteristics, placing greater emphasis on minority-class loss to produce more effective gradient gains during training. Additionally, it separates minority-class instances from adjacent transitional instances in the representation space, enhancing their representational capacity. Extensive experiments across various scenarios and datasets with differing degrees of imbalance demonstrate that DGMSCL consistently outperforms existing baseline models. Specifically, DGMSCL achieves higher overall classification accuracy, as evidenced by significantly improved average F1-score, G-mean, and kappa coefficient across multiple datasets. Moreover, classification results on a real-world power data show that DGMSCL generalizes well to real-world application.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107131"},"PeriodicalIF":6.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When low-light meets flares: Towards Synchronous Flare Removal and Brightness Enhancement.
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-17 DOI: 10.1016/j.neunet.2025.107149
Jiahuan Ren, Zhao Zhang, Suiyi Zhao, Jicong Fan, Zhongqiu Zhao, Yang Zhao, Richang Hong, Meng Wang

Low-light image enhancement (LLIE) aims to improve the visibility and illumination of low-light images. However, real-world low-light images are usually accompanied with flares caused by light sources, which make it difficult to discern the content of dark images. In this case, current LLIE and nighttime flare removal methods face challenges in handling these flared low-light images effectively: (1) Flares in dark images will disturb the content of images and cause uneven lighting, potentially resulting in overexposure or chromatic aberration; (2) the slight noise in low-light images may be amplified during the process of enhancement, leading to speckle noise and blur in the enhanced images; (3) the nighttime flare removal methods usually ignore the detailed information in dark regions, which may cause inaccurate representation. To tackle the above challenges yet meaningful problems well, we propose a novel image enhancement task called Flared Low-Light Image Enhancement (FLLIE). We first synthesize several flared low-light datasets as the training/inference data, based on which we develop a novel Fourier transform-based deep FLLIE network termed Synchronous Flare Removal and Brightness Enhancement (SFRBE). Specifically, a Residual Directional Fourier Block (RDFB) is introduced that learns in the frequency domain to extract accurate global information and capture detailed features from multiple directions. Extensive experiments on three flared low-light datasets and some real flared low-light images demonstrate the effectiveness of SFRBE for FLLIE.

{"title":"When low-light meets flares: Towards Synchronous Flare Removal and Brightness Enhancement.","authors":"Jiahuan Ren, Zhao Zhang, Suiyi Zhao, Jicong Fan, Zhongqiu Zhao, Yang Zhao, Richang Hong, Meng Wang","doi":"10.1016/j.neunet.2025.107149","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107149","url":null,"abstract":"<p><p>Low-light image enhancement (LLIE) aims to improve the visibility and illumination of low-light images. However, real-world low-light images are usually accompanied with flares caused by light sources, which make it difficult to discern the content of dark images. In this case, current LLIE and nighttime flare removal methods face challenges in handling these flared low-light images effectively: (1) Flares in dark images will disturb the content of images and cause uneven lighting, potentially resulting in overexposure or chromatic aberration; (2) the slight noise in low-light images may be amplified during the process of enhancement, leading to speckle noise and blur in the enhanced images; (3) the nighttime flare removal methods usually ignore the detailed information in dark regions, which may cause inaccurate representation. To tackle the above challenges yet meaningful problems well, we propose a novel image enhancement task called Flared Low-Light Image Enhancement (FLLIE). We first synthesize several flared low-light datasets as the training/inference data, based on which we develop a novel Fourier transform-based deep FLLIE network termed Synchronous Flare Removal and Brightness Enhancement (SFRBE). Specifically, a Residual Directional Fourier Block (RDFB) is introduced that learns in the frequency domain to extract accurate global information and capture detailed features from multiple directions. Extensive experiments on three flared low-light datasets and some real flared low-light images demonstrate the effectiveness of SFRBE for FLLIE.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107149"},"PeriodicalIF":6.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143042963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Out-of-Distribution Detection via outlier exposure in federated learning. 联邦学习中基于离群暴露的分布外检测。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-17 DOI: 10.1016/j.neunet.2025.107141
Gu-Bon Jeong, Dong-Wan Choi

Among various out-of-distribution (OOD) detection methods in neural networks, outlier exposure (OE) using auxiliary data has shown to achieve practical performance. However, existing OE methods are typically assumed to run in a centralized manner, and thus are not feasible for a standard federated learning (FL) setting where each client has low computing power and cannot collect a variety of auxiliary samples. To address this issue, we propose a practical yet realistic OE scenario in FL where only the central server has a large amount of outlier data and a relatively small amount of in-distribution (ID) data is given to each client. For this scenario, we introduce an effective OE-based OOD detection method, called internal separation & backstage collaboration, which makes the best use of many auxiliary outlier samples without sacrificing the ultimate goal of FL, that is, privacy preservation as well as collaborative training performance. The most challenging part is how to make the same effect in our scenario as in joint centralized training with outliers and ID samples. Our main strategy (internal separation) is to jointly train the feature vectors of an internal layer with outliers in the back layers of the global model, while ensuring privacy preservation. We also suggest an collaborative approach (backstage collaboration) where multiple back layers are trained together to detect OOD samples. Our extensive experiments demonstrate that our method shows remarkable detection performance, compared to baseline approaches in the proposed OE scenario.

在各种神经网络的out- distribution (OOD)检测方法中,利用辅助数据的outlier exposure (OE)已被证明具有较好的实用性。然而,现有的OE方法通常假定以集中的方式运行,因此对于标准的联邦学习(FL)设置是不可用的,因为每个客户机的计算能力都很低,不能收集各种辅助样本。为了解决这个问题,我们在FL中提出了一个实用而现实的OE场景,其中只有中央服务器拥有大量的离群数据,而相对少量的分布内(ID)数据被提供给每个客户端。针对这种场景,我们引入了一种有效的基于oe的OOD检测方法,称为内部分离&后台协作,在不牺牲FL的最终目标即隐私保护和协同训练性能的前提下,充分利用了众多辅助离群样本。最具挑战性的部分是如何在我们的场景中取得与异常值和ID样本联合集中训练相同的效果。我们的主要策略(内部分离)是在保证隐私保护的同时,与全局模型后层的离群值共同训练内层的特征向量。我们还建议采用协作方法(后台协作),其中多个后台层一起训练以检测OOD样本。我们的大量实验表明,与提出的OE场景中的基线方法相比,我们的方法具有显着的检测性能。
{"title":"Out-of-Distribution Detection via outlier exposure in federated learning.","authors":"Gu-Bon Jeong, Dong-Wan Choi","doi":"10.1016/j.neunet.2025.107141","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107141","url":null,"abstract":"<p><p>Among various out-of-distribution (OOD) detection methods in neural networks, outlier exposure (OE) using auxiliary data has shown to achieve practical performance. However, existing OE methods are typically assumed to run in a centralized manner, and thus are not feasible for a standard federated learning (FL) setting where each client has low computing power and cannot collect a variety of auxiliary samples. To address this issue, we propose a practical yet realistic OE scenario in FL where only the central server has a large amount of outlier data and a relatively small amount of in-distribution (ID) data is given to each client. For this scenario, we introduce an effective OE-based OOD detection method, called internal separation & backstage collaboration, which makes the best use of many auxiliary outlier samples without sacrificing the ultimate goal of FL, that is, privacy preservation as well as collaborative training performance. The most challenging part is how to make the same effect in our scenario as in joint centralized training with outliers and ID samples. Our main strategy (internal separation) is to jointly train the feature vectors of an internal layer with outliers in the back layers of the global model, while ensuring privacy preservation. We also suggest an collaborative approach (backstage collaboration) where multiple back layers are trained together to detect OOD samples. Our extensive experiments demonstrate that our method shows remarkable detection performance, compared to baseline approaches in the proposed OE scenario.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107141"},"PeriodicalIF":6.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing bias in source-free unsupervised domain adaptation for regression.
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-17 DOI: 10.1016/j.neunet.2025.107161
Qianshan Zhan, Xiao-Jun Zeng, Qian Wang

Due to data privacy and storage concerns, Source-Free Unsupervised Domain Adaptation (SFUDA) focuses on improving an unlabelled target domain by leveraging a pre-trained source model without access to source data. While existing studies attempt to train target models by mitigating biases induced by noisy pseudo labels, they often lack theoretical guarantees for fully reducing biases and have predominantly addressed classification tasks rather than regression ones. To address these gaps, our analysis delves into the generalisation error bound of the target model, aiming to understand the intrinsic limitations of pseudo-label-based SFUDA methods. Theoretical results reveal that biases influencing generalisation error extend beyond the commonly highlighted label inconsistency bias, which denotes the mismatch between pseudo labels and ground truths, and the feature-label mapping bias, which represents the difference between the proxy target regressor and the real target regressor. Equally significant is the feature misalignment bias, indicating the misalignment between the estimated and real target feature distributions. This factor is frequently neglected or not explicitly addressed in current studies. Additionally, the label inconsistency bias can be unbounded in regression due to the continuous label space, further complicating SFUDA for regression tasks. Guided by these theoretical insights, we propose a Bias-Reduced Regression (BRR) method for SFUDA in regression. This method incorporates Feature Distribution Alignment (FDA) to reduce the feature misalignment bias, Hybrid Reliability Evaluation (HRE) to reduce the feature-label mapping bias and pseudo label updating to mitigate the label inconsistency bias. Experiments demonstrate the superior performance of the proposed BRR, and the effectiveness of FDA and HRE in reducing biases for regression tasks in SFUDA.

{"title":"Reducing bias in source-free unsupervised domain adaptation for regression.","authors":"Qianshan Zhan, Xiao-Jun Zeng, Qian Wang","doi":"10.1016/j.neunet.2025.107161","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107161","url":null,"abstract":"<p><p>Due to data privacy and storage concerns, Source-Free Unsupervised Domain Adaptation (SFUDA) focuses on improving an unlabelled target domain by leveraging a pre-trained source model without access to source data. While existing studies attempt to train target models by mitigating biases induced by noisy pseudo labels, they often lack theoretical guarantees for fully reducing biases and have predominantly addressed classification tasks rather than regression ones. To address these gaps, our analysis delves into the generalisation error bound of the target model, aiming to understand the intrinsic limitations of pseudo-label-based SFUDA methods. Theoretical results reveal that biases influencing generalisation error extend beyond the commonly highlighted label inconsistency bias, which denotes the mismatch between pseudo labels and ground truths, and the feature-label mapping bias, which represents the difference between the proxy target regressor and the real target regressor. Equally significant is the feature misalignment bias, indicating the misalignment between the estimated and real target feature distributions. This factor is frequently neglected or not explicitly addressed in current studies. Additionally, the label inconsistency bias can be unbounded in regression due to the continuous label space, further complicating SFUDA for regression tasks. Guided by these theoretical insights, we propose a Bias-Reduced Regression (BRR) method for SFUDA in regression. This method incorporates Feature Distribution Alignment (FDA) to reduce the feature misalignment bias, Hybrid Reliability Evaluation (HRE) to reduce the feature-label mapping bias and pseudo label updating to mitigate the label inconsistency bias. Experiments demonstrate the superior performance of the proposed BRR, and the effectiveness of FDA and HRE in reducing biases for regression tasks in SFUDA.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107161"},"PeriodicalIF":6.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CPJN: News recommendation with a content and popularity joint network.
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-17 DOI: 10.1016/j.neunet.2025.107177
Zixuan Chen, Songqiao Han, Hailiang Huang, You Wu

Users may click on a news because they are interested in its content or because the news contains important information and is very popular. Modeling these two aspects is crucial for accurate news recommendation. Most existing studies focused on capturing users' preferences towards news content, and thus they are limited in investigating in depth users' preferences towards news popularity and independently capturing user content and popularity preferences. In this article, we further improve recommendation performance by proposing a news recommendation with content and popularity joint network (CPJN) model. The CPJN contains a content-based network, a popularity-based network, and an adaptive combination network. The content-based network generates a users' preference feature towards news content by eliminating popularity bias in important information extracted from user side information (such as city and age) and uses the information with the eliminated popularity bias to enhance users' preference representation towards news content. The popularity-based network generates a user preference feature towards news popularity by eliminating content bias that is enhanced through news side information (such as category and author). Furthermore, since users exhibit differing degrees of sensitivity towards news popularity, we propose an adaptive combination network to integrate these two preferences for recommendation. Extensive experiments on two real-world datasets demonstrate the effectiveness of CPJN. Compared to the state-of-the-art baseline, CPJN achieved average improvements of 1.493 % in accuracy rate and 1.502 % in recall rate.

{"title":"CPJN: News recommendation with a content and popularity joint network.","authors":"Zixuan Chen, Songqiao Han, Hailiang Huang, You Wu","doi":"10.1016/j.neunet.2025.107177","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107177","url":null,"abstract":"<p><p>Users may click on a news because they are interested in its content or because the news contains important information and is very popular. Modeling these two aspects is crucial for accurate news recommendation. Most existing studies focused on capturing users' preferences towards news content, and thus they are limited in investigating in depth users' preferences towards news popularity and independently capturing user content and popularity preferences. In this article, we further improve recommendation performance by proposing a news recommendation with content and popularity joint network (CPJN) model. The CPJN contains a content-based network, a popularity-based network, and an adaptive combination network. The content-based network generates a users' preference feature towards news content by eliminating popularity bias in important information extracted from user side information (such as city and age) and uses the information with the eliminated popularity bias to enhance users' preference representation towards news content. The popularity-based network generates a user preference feature towards news popularity by eliminating content bias that is enhanced through news side information (such as category and author). Furthermore, since users exhibit differing degrees of sensitivity towards news popularity, we propose an adaptive combination network to integrate these two preferences for recommendation. Extensive experiments on two real-world datasets demonstrate the effectiveness of CPJN. Compared to the state-of-the-art baseline, CPJN achieved average improvements of 1.493 % in accuracy rate and 1.502 % in recall rate.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107177"},"PeriodicalIF":6.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On latent dynamics learning in nonlinear reduced order modeling.
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-17 DOI: 10.1016/j.neunet.2025.107146
Nicola Farenga, Stefania Fresca, Simone Brivio, Andrea Manzoni

In this work, we present the novel mathematical framework of latent dynamics models (LDMs) for reduced order modeling of parameterized nonlinear time-dependent PDEs. Our framework casts this latter task as a nonlinear dimensionality reduction problem, while constraining the latent state to evolve accordingly to an unknown dynamical system. A time-continuous setting is employed to derive error and stability estimates for the LDM approximation of the full order model (FOM) solution. We analyze the impact of using an explicit Runge-Kutta scheme in the time-discrete setting, resulting in the ΔLDM formulation, and further explore the learnable setting, ΔLDMθ, where deep neural networks approximate the discrete LDM components, while providing a bounded approximation error with respect to the FOM. Moreover, we extend the concept of parameterized Neural ODE - a possible way to build data-driven dynamical systems with varying input parameters - to be a convolutional architecture, where the input parameters information is injected by means of an affine modulation mechanism, while designing a convolutional autoencoder neural network able to retain spatial-coherence, thus enhancing interpretability at the latent level. Numerical experiments, including the Burgers' and the advection-diffusion-reaction equations, demonstrate the framework's ability to obtain a time-continuous approximation of the FOM solution, thus being able to query the LDM approximation at any given time instance while retaining a prescribed level of accuracy. Our findings highlight the remarkable potential of the proposed LDMs, representing a mathematically rigorous framework to enhance the accuracy and approximation capabilities of reduced order modeling for time-dependent parameterized PDEs.

{"title":"On latent dynamics learning in nonlinear reduced order modeling.","authors":"Nicola Farenga, Stefania Fresca, Simone Brivio, Andrea Manzoni","doi":"10.1016/j.neunet.2025.107146","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107146","url":null,"abstract":"<p><p>In this work, we present the novel mathematical framework of latent dynamics models (LDMs) for reduced order modeling of parameterized nonlinear time-dependent PDEs. Our framework casts this latter task as a nonlinear dimensionality reduction problem, while constraining the latent state to evolve accordingly to an unknown dynamical system. A time-continuous setting is employed to derive error and stability estimates for the LDM approximation of the full order model (FOM) solution. We analyze the impact of using an explicit Runge-Kutta scheme in the time-discrete setting, resulting in the ΔLDM formulation, and further explore the learnable setting, ΔLDM<sub>θ</sub>, where deep neural networks approximate the discrete LDM components, while providing a bounded approximation error with respect to the FOM. Moreover, we extend the concept of parameterized Neural ODE - a possible way to build data-driven dynamical systems with varying input parameters - to be a convolutional architecture, where the input parameters information is injected by means of an affine modulation mechanism, while designing a convolutional autoencoder neural network able to retain spatial-coherence, thus enhancing interpretability at the latent level. Numerical experiments, including the Burgers' and the advection-diffusion-reaction equations, demonstrate the framework's ability to obtain a time-continuous approximation of the FOM solution, thus being able to query the LDM approximation at any given time instance while retaining a prescribed level of accuracy. Our findings highlight the remarkable potential of the proposed LDMs, representing a mathematically rigorous framework to enhance the accuracy and approximation capabilities of reduced order modeling for time-dependent parameterized PDEs.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107146"},"PeriodicalIF":6.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143061436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new pipeline with ultimate search efficiency for neural architecture search.
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-17 DOI: 10.1016/j.neunet.2025.107163
Wenbo Liu, Xiaoyun Qiao, Chunyu Zhao, Tao Deng, Fei Yan

We present a novel neural architecture search pipeline designed to enhance search efficiency through optimized data and algorithms. Leveraging dataset distillation techniques, our pipeline condenses large-scale target datasets into more streamlined proxy datasets, effectively reducing the computational overhead associated with identifying optimal neural architectures. To accommodate diverse approaches to synthetic dataset utilization, our pipeline comprises two distinct schemes. Scheme 1 involves constructing rich data from various Bases |B|, while Scheme 2 focuses on establishing high-quality relationship mappings within the data. Models generated through Scheme 1 exhibit outstanding scalability, demonstrating superior performance when transferred to larger, more complex tasks. Despite utilizing fewer data, Scheme 2 maintains performance levels without degradation on the source dataset. Furthermore, our research extends to the inherent challenges present in DARTS-derived algorithms, particularly in the selection of candidate operations based on architectural parameters. We identify architectural parameter disparities across different edges, highlighting the occurrence of "Selection Errors" during the model generation process, and propose an enhanced search algorithm. Our proposed algorithm comprises three components-attention, regularization, and normalization-aiding in the rapid identification of high-quality models using data generated from proxy datasets. Experimental results demonstrate a significant reduction in search time, with high-quality models generated in as little as two minutes using our proposed pipeline. Through comprehensive experimentation, we meticulously validate the efficacy of both schemes and algorithms.

{"title":"A new pipeline with ultimate search efficiency for neural architecture search.","authors":"Wenbo Liu, Xiaoyun Qiao, Chunyu Zhao, Tao Deng, Fei Yan","doi":"10.1016/j.neunet.2025.107163","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107163","url":null,"abstract":"<p><p>We present a novel neural architecture search pipeline designed to enhance search efficiency through optimized data and algorithms. Leveraging dataset distillation techniques, our pipeline condenses large-scale target datasets into more streamlined proxy datasets, effectively reducing the computational overhead associated with identifying optimal neural architectures. To accommodate diverse approaches to synthetic dataset utilization, our pipeline comprises two distinct schemes. Scheme 1 involves constructing rich data from various Bases |B|, while Scheme 2 focuses on establishing high-quality relationship mappings within the data. Models generated through Scheme 1 exhibit outstanding scalability, demonstrating superior performance when transferred to larger, more complex tasks. Despite utilizing fewer data, Scheme 2 maintains performance levels without degradation on the source dataset. Furthermore, our research extends to the inherent challenges present in DARTS-derived algorithms, particularly in the selection of candidate operations based on architectural parameters. We identify architectural parameter disparities across different edges, highlighting the occurrence of \"Selection Errors\" during the model generation process, and propose an enhanced search algorithm. Our proposed algorithm comprises three components-attention, regularization, and normalization-aiding in the rapid identification of high-quality models using data generated from proxy datasets. Experimental results demonstrate a significant reduction in search time, with high-quality models generated in as little as two minutes using our proposed pipeline. Through comprehensive experimentation, we meticulously validate the efficacy of both schemes and algorithms.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107163"},"PeriodicalIF":6.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143076079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neural Networks
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1