Pub Date : 2025-01-13DOI: 10.1016/j.neunet.2025.107136
Ya-Fen Ye, Jie Wang, Wei-Jie Chen
This paper discusses the nuanced domain of nonlinear feature selection in heterogeneous systems. To address this challenge, we present a sparsity-driven methodology, namely nonlinear feature selection for support vector quantile regression (NFS-SVQR). This method includes a binary-diagonal matrix, featuring 0 and 1 elements, to address the complexities of feature selection within intricate nonlinear systems. Moreover, NFS-SVQR integrates a quantile parameter to effectively address the intrinsic challenges of heterogeneity within nonlinear feature selection processes. Consequently, NFS-SVQR excels not only in precisely identifying representative features but also in comprehensively capturing heterogeneous information within high-dimensional datasets. Through feature selection experiments the enhanced performance of NFS-SVQR in capturing heterogeneous information and selecting representative features is demonstrated.
{"title":"Nonlinear feature selection for support vector quantile regression.","authors":"Ya-Fen Ye, Jie Wang, Wei-Jie Chen","doi":"10.1016/j.neunet.2025.107136","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107136","url":null,"abstract":"<p><p>This paper discusses the nuanced domain of nonlinear feature selection in heterogeneous systems. To address this challenge, we present a sparsity-driven methodology, namely nonlinear feature selection for support vector quantile regression (NFS-SVQR). This method includes a binary-diagonal matrix, featuring 0 and 1 elements, to address the complexities of feature selection within intricate nonlinear systems. Moreover, NFS-SVQR integrates a quantile parameter to effectively address the intrinsic challenges of heterogeneity within nonlinear feature selection processes. Consequently, NFS-SVQR excels not only in precisely identifying representative features but also in comprehensively capturing heterogeneous information within high-dimensional datasets. Through feature selection experiments the enhanced performance of NFS-SVQR in capturing heterogeneous information and selecting representative features is demonstrated.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107136"},"PeriodicalIF":6.0,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-12DOI: 10.1016/j.neunet.2025.107153
Jiawei Guo, Sen Zhang, Nikta Amiri, Lingyu Yu, Yi Wang
Lamb waves are widely used for defect detection in structural health monitoring, and various methods are developed for Lamb wave data analysis. This paper presents an unsupervised Adversarial Transformer model for anomalous Lamb wave pattern detection by analyzing the spatiotemporal images generated by a hybrid PZT-scanning laser Doppler vibrometer (SLDV). The model includes the global attention and the local attention mechanisms, and both are trained adversarially. Given the different natures between the normal and anomalous wave patterns, global attention allows accurate reconstruction of normal wave data but is less capable of reproducing anomalous data and, hence, can be used for anomalous wave pattern detection. Local attention, however, serves as a sparring partner in the proposed adversarial training process to boost the quality of global attention. In addition, a new segment replacement strategy is also proposed to make global attention consistently extract textural contents found in normal data, which, however, are noticeably different from anomalies, leading to superior model performance. Our Adversarial Transformer model is also compared with several benchmark models and demonstrates an overall accuracy of 97.1 % for anomalous wave pattern detection. It is also confirmed that global attention and local attention in adversarial training are responsible for the superior performance of our model over the benchmark models (including the native Transformer model).
{"title":"An adversarial transformer for anomalous lamb wave pattern detection.","authors":"Jiawei Guo, Sen Zhang, Nikta Amiri, Lingyu Yu, Yi Wang","doi":"10.1016/j.neunet.2025.107153","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107153","url":null,"abstract":"<p><p>Lamb waves are widely used for defect detection in structural health monitoring, and various methods are developed for Lamb wave data analysis. This paper presents an unsupervised Adversarial Transformer model for anomalous Lamb wave pattern detection by analyzing the spatiotemporal images generated by a hybrid PZT-scanning laser Doppler vibrometer (SLDV). The model includes the global attention and the local attention mechanisms, and both are trained adversarially. Given the different natures between the normal and anomalous wave patterns, global attention allows accurate reconstruction of normal wave data but is less capable of reproducing anomalous data and, hence, can be used for anomalous wave pattern detection. Local attention, however, serves as a sparring partner in the proposed adversarial training process to boost the quality of global attention. In addition, a new segment replacement strategy is also proposed to make global attention consistently extract textural contents found in normal data, which, however, are noticeably different from anomalies, leading to superior model performance. Our Adversarial Transformer model is also compared with several benchmark models and demonstrates an overall accuracy of 97.1 % for anomalous wave pattern detection. It is also confirmed that global attention and local attention in adversarial training are responsible for the superior performance of our model over the benchmark models (including the native Transformer model).</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107153"},"PeriodicalIF":6.0,"publicationDate":"2025-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-10DOI: 10.1016/j.neunet.2025.107128
Mingxuan Liu, Jiankai Tang, Yongli Chen, Haoxiang Li, Jiahao Qi, Siwei Li, Kegang Wang, Jie Gan, Yuntao Wang, Hong Chen
Artificial neural networks (ANNs) can help camera-based remote photoplethysmography (rPPG) in measuring cardiac activity and physiological signals from facial videos, such as pulse wave, heart rate and respiration rate with better accuracy. However, most existing ANN-based methods require substantial computing resources, which poses challenges for effective deployment on mobile devices. Spiking neural networks (SNNs), on the other hand, hold immense potential for energy-efficient deep learning owing to their binary and event-driven architecture. To the best of our knowledge, we are the first to introduce SNNs into the realm of rPPG, proposing a hybrid neural network (HNN) model, the Spiking-PhysFormer, aimed at reducing power consumption. Specifically, the proposed Spiking-PhyFormer consists of an ANN-based patch embedding block, SNN-based transformer blocks, and an ANN-based predictor head. First, to simplify the transformer block while preserving its capacity to aggregate local and global spatio-temporal features, we design a parallel spike transformer block to replace sequential sub-blocks. Additionally, we propose a simplified spiking self-attention mechanism that omits the value parameter without compromising the model's performance. Experiments conducted on four datasets-PURE, UBFC-rPPG, UBFC-Phys, and MMPD demonstrate that the proposed model achieves a 10.1% reduction in power consumption compared to PhysFormer. Additionally, the power consumption of the transformer block is reduced by a factor of 12.2, while maintaining decent performance as PhysFormer and other ANN-based models.
{"title":"Spiking-PhysFormer: Camera-based remote photoplethysmography with parallel spike-driven transformer.","authors":"Mingxuan Liu, Jiankai Tang, Yongli Chen, Haoxiang Li, Jiahao Qi, Siwei Li, Kegang Wang, Jie Gan, Yuntao Wang, Hong Chen","doi":"10.1016/j.neunet.2025.107128","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107128","url":null,"abstract":"<p><p>Artificial neural networks (ANNs) can help camera-based remote photoplethysmography (rPPG) in measuring cardiac activity and physiological signals from facial videos, such as pulse wave, heart rate and respiration rate with better accuracy. However, most existing ANN-based methods require substantial computing resources, which poses challenges for effective deployment on mobile devices. Spiking neural networks (SNNs), on the other hand, hold immense potential for energy-efficient deep learning owing to their binary and event-driven architecture. To the best of our knowledge, we are the first to introduce SNNs into the realm of rPPG, proposing a hybrid neural network (HNN) model, the Spiking-PhysFormer, aimed at reducing power consumption. Specifically, the proposed Spiking-PhyFormer consists of an ANN-based patch embedding block, SNN-based transformer blocks, and an ANN-based predictor head. First, to simplify the transformer block while preserving its capacity to aggregate local and global spatio-temporal features, we design a parallel spike transformer block to replace sequential sub-blocks. Additionally, we propose a simplified spiking self-attention mechanism that omits the value parameter without compromising the model's performance. Experiments conducted on four datasets-PURE, UBFC-rPPG, UBFC-Phys, and MMPD demonstrate that the proposed model achieves a 10.1% reduction in power consumption compared to PhysFormer. Additionally, the power consumption of the transformer block is reduced by a factor of 12.2, while maintaining decent performance as PhysFormer and other ANN-based models.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107128"},"PeriodicalIF":6.0,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-09DOI: 10.1016/j.neunet.2024.107094
Yuntao Shou, Haozhi Lan, Xiangyong Cao
Graph Neural Networks (GNNs) have received extensive research attention due to their powerful information aggregation capabilities. Despite the success of GNNs, most of them suffer from the popularity bias issue in a graph caused by a small number of popular categories. Additionally, real graph datasets always contain incorrect node labels, which hinders GNNs from learning effective node representations. Graph contrastive learning (GCL) has been shown to be effective in solving the above problems for node classification tasks. Most existing GCL methods are implemented by randomly removing edges and nodes to create multiple contrasting views, and then maximizing the mutual information (MI) between these contrasting views to improve the node feature representation. However, maximizing the mutual information between multiple contrasting views may lead the model to learn some redundant information irrelevant to the node classification task. To tackle this issue, we propose an effective Contrastive Graph Representation Learning with Adversarial Cross-view Reconstruction and Information Bottleneck (CGRL) for node classification, which can adaptively learn to mask the nodes and edges in the graph to obtain the optimal graph structure representation. Furthermore, we innovatively introduce the information bottleneck theory into GCLs to remove redundant information in multiple contrasting views while retaining as much information as possible about node classification. Moreover, we add noise perturbations to the original views and reconstruct the augmented views by constructing adversarial views to improve the robustness of node feature representation. We also verified through theoretical analysis the effectiveness of this cross-attempt reconstruction mechanism and information bottleneck theory in capturing graph structure information and improving model generalization performance. Extensive experiments on real-world public datasets demonstrate that our method significantly outperforms existing state-of-the-art algorithms.
{"title":"Contrastive Graph Representation Learning with Adversarial Cross-View Reconstruction and Information Bottleneck.","authors":"Yuntao Shou, Haozhi Lan, Xiangyong Cao","doi":"10.1016/j.neunet.2024.107094","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107094","url":null,"abstract":"<p><p>Graph Neural Networks (GNNs) have received extensive research attention due to their powerful information aggregation capabilities. Despite the success of GNNs, most of them suffer from the popularity bias issue in a graph caused by a small number of popular categories. Additionally, real graph datasets always contain incorrect node labels, which hinders GNNs from learning effective node representations. Graph contrastive learning (GCL) has been shown to be effective in solving the above problems for node classification tasks. Most existing GCL methods are implemented by randomly removing edges and nodes to create multiple contrasting views, and then maximizing the mutual information (MI) between these contrasting views to improve the node feature representation. However, maximizing the mutual information between multiple contrasting views may lead the model to learn some redundant information irrelevant to the node classification task. To tackle this issue, we propose an effective Contrastive Graph Representation Learning with Adversarial Cross-view Reconstruction and Information Bottleneck (CGRL) for node classification, which can adaptively learn to mask the nodes and edges in the graph to obtain the optimal graph structure representation. Furthermore, we innovatively introduce the information bottleneck theory into GCLs to remove redundant information in multiple contrasting views while retaining as much information as possible about node classification. Moreover, we add noise perturbations to the original views and reconstruct the augmented views by constructing adversarial views to improve the robustness of node feature representation. We also verified through theoretical analysis the effectiveness of this cross-attempt reconstruction mechanism and information bottleneck theory in capturing graph structure information and improving model generalization performance. Extensive experiments on real-world public datasets demonstrate that our method significantly outperforms existing state-of-the-art algorithms.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107094"},"PeriodicalIF":6.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142972994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-09DOI: 10.1016/j.neunet.2025.107134
Yu Cheng, Jiawei Zheng, Binquan Wu, Qianli Ma
Sequential Recommendation is based on modelling sequential dependencies in user interactions to produce subsequent recommendation results. However, due to the diversity of users' interests and the uncertainty of their behaviours, not all historical interactions in users' interaction sequences are relevant to their next-interaction intents, which hinders generating accurate sequential recommendations. To this end, a novel Sequential Recommendation method, Dynamic-Skip for Sequential Recommendation (DyS4Rec), is proposed in this study. Specifically, by a Long-Short Term Memory (LSTM) with dynamic skip connections, allows DyS4Rec to skip irrelevant interactions to more accurately capture long-term dependencies, which are related to users' next-interaction intents. Furthermore, a Personalized Module (PM) is designed to guide the skipping process and add more personalization to the recommendation results. In this way, DyS4Rec can adaptively learn to exclude the impact of irrelevant historical interactions to precisely model users' personalized interaction intents and generate more accurate sequential recommendations. Extensive experiments on five public real-world datasets (containing items ranging from a few thousand to hundreds of thousands) showcase that DyS4Rec outperforms other state-of-the-art counterparts (by 1% to 12%). Moreover, visualization analyses demonstrate that DyS4Rec can indeed perform meaningful jumps in modelling user interactions to exclude the influence of irrelevant historical interactions and generate more accurate sequential recommendations.
{"title":"Sequential recommendation via agent-based irrelevancy skipping.","authors":"Yu Cheng, Jiawei Zheng, Binquan Wu, Qianli Ma","doi":"10.1016/j.neunet.2025.107134","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107134","url":null,"abstract":"<p><p>Sequential Recommendation is based on modelling sequential dependencies in user interactions to produce subsequent recommendation results. However, due to the diversity of users' interests and the uncertainty of their behaviours, not all historical interactions in users' interaction sequences are relevant to their next-interaction intents, which hinders generating accurate sequential recommendations. To this end, a novel Sequential Recommendation method, Dynamic-Skip for Sequential Recommendation (DyS4Rec), is proposed in this study. Specifically, by a Long-Short Term Memory (LSTM) with dynamic skip connections, allows DyS4Rec to skip irrelevant interactions to more accurately capture long-term dependencies, which are related to users' next-interaction intents. Furthermore, a Personalized Module (PM) is designed to guide the skipping process and add more personalization to the recommendation results. In this way, DyS4Rec can adaptively learn to exclude the impact of irrelevant historical interactions to precisely model users' personalized interaction intents and generate more accurate sequential recommendations. Extensive experiments on five public real-world datasets (containing items ranging from a few thousand to hundreds of thousands) showcase that DyS4Rec outperforms other state-of-the-art counterparts (by 1% to 12%). Moreover, visualization analyses demonstrate that DyS4Rec can indeed perform meaningful jumps in modelling user interactions to exclude the influence of irrelevant historical interactions and generate more accurate sequential recommendations.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107134"},"PeriodicalIF":6.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-09DOI: 10.1016/j.neunet.2025.107130
Haoran Yang, Junli Wang, Rui Duan, Changwei Wang, Chungang Yan
Active learning on graphs (ALG) has emerged as a compelling research field due to its capacity to address the challenge of label scarcity. Existing ALG methods incorporate diversity into their query strategies to maximize the gains from node sampling, improving robustness and reducing redundancy in graph learning. However, they often overlook the complex entanglement of latent factors inherent in graph-structured data. This oversight can lead to a sampling process that fails to ensure diversity at a finer-grained level, thereby missing the opportunity to sample more valuable nodes. To this end, we propose a novel approach, Disentangled Active Learning on Graphs (DALG). In this work, we first design the Disenconv-AL layer to learn disentangled feature embedding, then construct the influence graph for each node and create a dedicated "memory list" to store the resultant influence weights. On this basis, our approach aims to make the model not excessively focus on a few latent factors during the sampling phase. Specifically, we prioritize addressing latent factors with the most significant impact on the sampled node in the previous round, thereby ensuring that current sampling can better focus on other latent factors. Compared with existing methodologies, our approach pioneers reach diversity from the latent factor that drives the formation of graph data at a finer-grained level, thereby enabling further improvements in the benefits delivered with a limited labeling budget. Extensive experiments across eight public datasets show that DALG surpasses state-of-the-art graph active learning methods, achieving an improvement of up to approximately 15% in both Micro-F1 and Macro-F1.
{"title":"Disentangled Active Learning on Graphs.","authors":"Haoran Yang, Junli Wang, Rui Duan, Changwei Wang, Chungang Yan","doi":"10.1016/j.neunet.2025.107130","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107130","url":null,"abstract":"<p><p>Active learning on graphs (ALG) has emerged as a compelling research field due to its capacity to address the challenge of label scarcity. Existing ALG methods incorporate diversity into their query strategies to maximize the gains from node sampling, improving robustness and reducing redundancy in graph learning. However, they often overlook the complex entanglement of latent factors inherent in graph-structured data. This oversight can lead to a sampling process that fails to ensure diversity at a finer-grained level, thereby missing the opportunity to sample more valuable nodes. To this end, we propose a novel approach, Disentangled Active Learning on Graphs (DALG). In this work, we first design the Disenconv-AL layer to learn disentangled feature embedding, then construct the influence graph for each node and create a dedicated \"memory list\" to store the resultant influence weights. On this basis, our approach aims to make the model not excessively focus on a few latent factors during the sampling phase. Specifically, we prioritize addressing latent factors with the most significant impact on the sampled node in the previous round, thereby ensuring that current sampling can better focus on other latent factors. Compared with existing methodologies, our approach pioneers reach diversity from the latent factor that drives the formation of graph data at a finer-grained level, thereby enabling further improvements in the benefits delivered with a limited labeling budget. Extensive experiments across eight public datasets show that DALG surpasses state-of-the-art graph active learning methods, achieving an improvement of up to approximately 15% in both Micro-F1 and Macro-F1.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107130"},"PeriodicalIF":6.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-09DOI: 10.1016/j.neunet.2025.107122
Zhijun Zhang, Jian Zhang, Weijian Mai
Talking face generation is a promising approach within various domains, such as digital assistants, video editing, and virtual video conferences. Previous works with audio-driven talking faces focused primarily on the synchronization between audio and video. However, existing methods still have certain limitations in synthesizing photo-realistic video with high identity preservation, audiovisual synchronization, and facial details like blink movements. To solve these problems, a novel talking face generation framework, termed video portraits transformer (VPT) with controllable blink movements is proposed and applied. It separates the process of video generation into two stages, i.e., audio-to-landmark and landmark-to-face stages. In the audio-to-landmark stage, the transformer encoder serves as the generator used for predicting whole facial landmarks from given audio and continuous eye aspect ratio (EAR). During the landmark-to-face stage, the video-to-video (vid-to-vid) network is employed to transfer landmarks into realistic talking face videos. Moreover, to imitate real blink movements during inference, a transformer-based spontaneous blink generation module is devised to generate the EAR sequence. Extensive experiments demonstrate that the VPT method can produce photo-realistic videos of talking faces with natural blink movements, and the spontaneous blink generation module can generate blink movements close to the real blink duration distribution and frequency.
{"title":"VPT: Video portraits transformer for realistic talking face generation.","authors":"Zhijun Zhang, Jian Zhang, Weijian Mai","doi":"10.1016/j.neunet.2025.107122","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107122","url":null,"abstract":"<p><p>Talking face generation is a promising approach within various domains, such as digital assistants, video editing, and virtual video conferences. Previous works with audio-driven talking faces focused primarily on the synchronization between audio and video. However, existing methods still have certain limitations in synthesizing photo-realistic video with high identity preservation, audiovisual synchronization, and facial details like blink movements. To solve these problems, a novel talking face generation framework, termed video portraits transformer (VPT) with controllable blink movements is proposed and applied. It separates the process of video generation into two stages, i.e., audio-to-landmark and landmark-to-face stages. In the audio-to-landmark stage, the transformer encoder serves as the generator used for predicting whole facial landmarks from given audio and continuous eye aspect ratio (EAR). During the landmark-to-face stage, the video-to-video (vid-to-vid) network is employed to transfer landmarks into realistic talking face videos. Moreover, to imitate real blink movements during inference, a transformer-based spontaneous blink generation module is devised to generate the EAR sequence. Extensive experiments demonstrate that the VPT method can produce photo-realistic videos of talking faces with natural blink movements, and the spontaneous blink generation module can generate blink movements close to the real blink duration distribution and frequency.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107122"},"PeriodicalIF":6.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142972999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-08DOI: 10.1016/j.neunet.2025.107129
Jiahua Wu, Yuchun Fang
Conditional adversarial domain adaptation (CADA) is one of the most commonly used unsupervised domain adaptation (UDA) methods. CADA introduces multimodal information to the adversarial learning process to align the distributions of the labeled source domain and unlabeled target domain with mode match. However, CADA provides wrong multimodal information for challenging target features due to utilizing classifier predictions as the multimodal information, leading to distribution mismatch and less robust domain-invariant features. Compared to the recent state-of-the-art UDA methods, CADA also faces poor discriminability on the target domain. To tackle these challenges, we propose a novel unsupervised CADA framework named dual-view global and local category-attentive domain alignment (DV-GLCA). Specifically, to mitigate distribution mismatch and acquire more robust domain-invariant features, we integrate dual-view information into conditional adversarial domain adaptation and then utilize the substantial feature disparity between the two perspectives to better align the multimodal structures of the source and target distributions. Moreover, to learn more discriminative features of the target domain based on dual-view conditional adversarial domain adaptation (DV-CADA), we further propose global category-attentive domain alignment (GCA). We combine coding rate reduction and dual-view centroid alignment in GCA to amplify inter-category domain discrepancies while reducing intra-category domain differences globally. Additionally, to address challenging ambiguous samples during the training phase, we propose local category-attentive domain alignment (LCA). We introduce a new way of using contrastive domain discrepancy in LCA to move ambiguous samples closer to the correct category. Our method demonstrates leading performance on five UDA benchmarks, with extensive experiments showcasing its effectiveness.
{"title":"Dual-view global and local category-attentive domain alignment for unsupervised conditional adversarial domain adaptation.","authors":"Jiahua Wu, Yuchun Fang","doi":"10.1016/j.neunet.2025.107129","DOIUrl":"https://doi.org/10.1016/j.neunet.2025.107129","url":null,"abstract":"<p><p>Conditional adversarial domain adaptation (CADA) is one of the most commonly used unsupervised domain adaptation (UDA) methods. CADA introduces multimodal information to the adversarial learning process to align the distributions of the labeled source domain and unlabeled target domain with mode match. However, CADA provides wrong multimodal information for challenging target features due to utilizing classifier predictions as the multimodal information, leading to distribution mismatch and less robust domain-invariant features. Compared to the recent state-of-the-art UDA methods, CADA also faces poor discriminability on the target domain. To tackle these challenges, we propose a novel unsupervised CADA framework named dual-view global and local category-attentive domain alignment (DV-GLCA). Specifically, to mitigate distribution mismatch and acquire more robust domain-invariant features, we integrate dual-view information into conditional adversarial domain adaptation and then utilize the substantial feature disparity between the two perspectives to better align the multimodal structures of the source and target distributions. Moreover, to learn more discriminative features of the target domain based on dual-view conditional adversarial domain adaptation (DV-CADA), we further propose global category-attentive domain alignment (GCA). We combine coding rate reduction and dual-view centroid alignment in GCA to amplify inter-category domain discrepancies while reducing intra-category domain differences globally. Additionally, to address challenging ambiguous samples during the training phase, we propose local category-attentive domain alignment (LCA). We introduce a new way of using contrastive domain discrepancy in LCA to move ambiguous samples closer to the correct category. Our method demonstrates leading performance on five UDA benchmarks, with extensive experiments showcasing its effectiveness.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107129"},"PeriodicalIF":6.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-08DOI: 10.1016/j.neunet.2024.107075
Matteo Priorelli, Ivilin Peev Stoianov
By dynamic planning, we refer to the ability of the human brain to infer and impose motor trajectories related to cognitive decisions. A recent paradigm, active inference, brings fundamental insights into the adaptation of biological organisms, constantly striving to minimize prediction errors to restrict themselves to life-compatible states. Over the past years, many studies have shown how human and animal behaviors could be explained in terms of active inference - either as discrete decision-making or continuous motor control - inspiring innovative solutions in robotics and artificial intelligence. Still, the literature lacks a comprehensive outlook on effectively planning realistic actions in changing environments. Setting ourselves the goal of modeling complex tasks such as tool use, we delve into the topic of dynamic planning in active inference, keeping in mind two crucial aspects of biological behavior: the capacity to understand and exploit affordances for object manipulation, and to learn the hierarchical interactions between the self and the environment, including other agents. We start from a simple unit and gradually describe more advanced structures, comparing recently proposed design choices and providing basic examples. This study distances itself from traditional views centered on neural networks and reinforcement learning, and points toward a yet unexplored direction in active inference: hybrid representations in hierarchical models.
{"title":"Dynamic planning in hierarchical active inference.","authors":"Matteo Priorelli, Ivilin Peev Stoianov","doi":"10.1016/j.neunet.2024.107075","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107075","url":null,"abstract":"<p><p>By dynamic planning, we refer to the ability of the human brain to infer and impose motor trajectories related to cognitive decisions. A recent paradigm, active inference, brings fundamental insights into the adaptation of biological organisms, constantly striving to minimize prediction errors to restrict themselves to life-compatible states. Over the past years, many studies have shown how human and animal behaviors could be explained in terms of active inference - either as discrete decision-making or continuous motor control - inspiring innovative solutions in robotics and artificial intelligence. Still, the literature lacks a comprehensive outlook on effectively planning realistic actions in changing environments. Setting ourselves the goal of modeling complex tasks such as tool use, we delve into the topic of dynamic planning in active inference, keeping in mind two crucial aspects of biological behavior: the capacity to understand and exploit affordances for object manipulation, and to learn the hierarchical interactions between the self and the environment, including other agents. We start from a simple unit and gradually describe more advanced structures, comparing recently proposed design choices and providing basic examples. This study distances itself from traditional views centered on neural networks and reinforcement learning, and points toward a yet unexplored direction in active inference: hybrid representations in hierarchical models.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107075"},"PeriodicalIF":6.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-08DOI: 10.1016/j.neunet.2024.107091
Jiamin Liu, Lei Wang, Heng Lian
We consider kernel-based supervised learning using random Fourier features, focusing on its statistical error bounds and generalization properties with general loss functions. Beyond the least squares loss, existing results only demonstrate worst-case analysis with rate n-1/2 and the number of features at least comparable to n, and refined-case analysis where it can achieve almost n-1 rate when the kernel's eigenvalue decay is exponential and the number of features is again at least comparable to n. For the least squares loss, the results are much richer and the optimal rates can be achieved under the source and capacity assumptions, with the number of features smaller than n. In this paper, for both losses with Lipschitz derivative and Lipschitz losses, we successfully establish faster rates with number of features much smaller than n, which are the same as the rates and number of features for the least squares loss. More specifically, in the attainable case (the true function is in the RKHS), we obtain the rate n-2ξ2ξ+γ which is the same as the standard method without using approximation, using o(n) features, where ξ characterizes the smoothness of the true function and γ characterizes the decay rate of the eigenvalues of the integral operator. Thus our results answer an important open question regarding random features.
{"title":"Improved analysis of supervised learning in the RKHS with random features: Beyond least squares.","authors":"Jiamin Liu, Lei Wang, Heng Lian","doi":"10.1016/j.neunet.2024.107091","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107091","url":null,"abstract":"<p><p>We consider kernel-based supervised learning using random Fourier features, focusing on its statistical error bounds and generalization properties with general loss functions. Beyond the least squares loss, existing results only demonstrate worst-case analysis with rate n<sup>-1/2</sup> and the number of features at least comparable to n, and refined-case analysis where it can achieve almost n<sup>-1</sup> rate when the kernel's eigenvalue decay is exponential and the number of features is again at least comparable to n. For the least squares loss, the results are much richer and the optimal rates can be achieved under the source and capacity assumptions, with the number of features smaller than n. In this paper, for both losses with Lipschitz derivative and Lipschitz losses, we successfully establish faster rates with number of features much smaller than n, which are the same as the rates and number of features for the least squares loss. More specifically, in the attainable case (the true function is in the RKHS), we obtain the rate n<sup>-2ξ2ξ+γ</sup> which is the same as the standard method without using approximation, using o(n) features, where ξ characterizes the smoothness of the true function and γ characterizes the decay rate of the eigenvalues of the integral operator. Thus our results answer an important open question regarding random features.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107091"},"PeriodicalIF":6.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142984111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}