Affective brain-computer interface is an important part of realizing emotional human-computer interaction. However, existing objective individual differences among subjects significantly hinder the application of electroencephalography (EEG) emotion recognition. Existing methods still lack the complete extraction of subject-invariant representations for EEG and the ability to fuse valuable information from multiple subjects to facilitate the emotion recognition of the target subject. To address the above challenges, we propose a Multi-source Selective Graph Domain Adaptation Network (MSGDAN), which can better utilize data from different source subjects and perform more robust emotion recognition on the target subject. The proposed network extracts and selects the individual information specific to each subject, where public information refers to subject-invariant components from multi-source subjects. Moreover, the graph domain adaptation network captures both functional connectivity and regional states of the brain via a dynamic graph network and then integrates graph domain adaptation to ensure the invariance of both functional connectivity and regional states. To evaluate our method, we conduct cross-subject emotion recognition experiments on the SEED, SEED-IV, and DEAP datasets. The results demonstrate that the MSGDAN has superior classification performance.
{"title":"Multi-source Selective Graph Domain Adaptation Network for cross-subject EEG emotion recognition.","authors":"Jing Wang, Xiaojun Ning, Wei Xu, Yunze Li, Ziyu Jia, Youfang Lin","doi":"10.1016/j.neunet.2024.106742","DOIUrl":"10.1016/j.neunet.2024.106742","url":null,"abstract":"<p><p>Affective brain-computer interface is an important part of realizing emotional human-computer interaction. However, existing objective individual differences among subjects significantly hinder the application of electroencephalography (EEG) emotion recognition. Existing methods still lack the complete extraction of subject-invariant representations for EEG and the ability to fuse valuable information from multiple subjects to facilitate the emotion recognition of the target subject. To address the above challenges, we propose a Multi-source Selective Graph Domain Adaptation Network (MSGDAN), which can better utilize data from different source subjects and perform more robust emotion recognition on the target subject. The proposed network extracts and selects the individual information specific to each subject, where public information refers to subject-invariant components from multi-source subjects. Moreover, the graph domain adaptation network captures both functional connectivity and regional states of the brain via a dynamic graph network and then integrates graph domain adaptation to ensure the invariance of both functional connectivity and regional states. To evaluate our method, we conduct cross-subject emotion recognition experiments on the SEED, SEED-IV, and DEAP datasets. The results demonstrate that the MSGDAN has superior classification performance.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"180 ","pages":"106742"},"PeriodicalIF":6.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-09-22DOI: 10.1016/j.neunet.2024.106756
Lin Qiu, Fajie Wang, Wenzhen Qu, Yan Gu, Qing-Hua Qin
This study introduces an innovative neural network framework named spectral integrated neural networks (SINNs) to address both forward and inverse dynamic problems in three-dimensional space. In the SINNs, the spectral integration technique is utilized for temporal discretization, followed by the application of a fully connected neural network to solve the resulting partial differential equations in the spatial domain. Furthermore, the polynomial basis functions are employed to expand the unknown function, with the goal of improving the performance of SINNs in tackling inverse problems. The performance of the developed framework is evaluated through several dynamic benchmark examples encompassing linear and nonlinear heat conduction problems, linear and nonlinear wave propagation problems, inverse problem of heat conduction, and long-time heat conduction problem. The numerical results demonstrate that the SINNs can effectively and accurately solve forward and inverse problems involving heat conduction and wave propagation. Additionally, the SINNs provide precise and stable solutions for dynamic problems with extended time durations. Compared to commonly used physics-informed neural networks, the SINNs exhibit superior performance with enhanced convergence speed, computational accuracy, and efficiency.
本研究介绍了一种名为光谱集成神经网络(SINNs)的创新神经网络框架,用于解决三维空间中的正向和反向动态问题。在 SINNs 中,利用频谱积分技术进行时间离散化,然后应用全连接神经网络求解空间域的偏微分方程。此外,还采用多项式基函数来扩展未知函数,目的是提高 SINN 在处理逆问题时的性能。通过几个动态基准示例,包括线性和非线性热传导问题、线性和非线性波传播问题、热传导逆问题和长时间热传导问题,对所开发框架的性能进行了评估。数值结果表明,SINN 可以有效、准确地解决涉及热传导和波传播的正向和反向问题。此外,SINN 还能为时间持续较长的动态问题提供精确而稳定的解决方案。与常用的物理信息神经网络相比,SINN 在收敛速度、计算精度和效率方面表现出更优越的性能。
{"title":"Spectral integrated neural networks (SINNs) for solving forward and inverse dynamic problems.","authors":"Lin Qiu, Fajie Wang, Wenzhen Qu, Yan Gu, Qing-Hua Qin","doi":"10.1016/j.neunet.2024.106756","DOIUrl":"10.1016/j.neunet.2024.106756","url":null,"abstract":"<p><p>This study introduces an innovative neural network framework named spectral integrated neural networks (SINNs) to address both forward and inverse dynamic problems in three-dimensional space. In the SINNs, the spectral integration technique is utilized for temporal discretization, followed by the application of a fully connected neural network to solve the resulting partial differential equations in the spatial domain. Furthermore, the polynomial basis functions are employed to expand the unknown function, with the goal of improving the performance of SINNs in tackling inverse problems. The performance of the developed framework is evaluated through several dynamic benchmark examples encompassing linear and nonlinear heat conduction problems, linear and nonlinear wave propagation problems, inverse problem of heat conduction, and long-time heat conduction problem. The numerical results demonstrate that the SINNs can effectively and accurately solve forward and inverse problems involving heat conduction and wave propagation. Additionally, the SINNs provide precise and stable solutions for dynamic problems with extended time durations. Compared to commonly used physics-informed neural networks, the SINNs exhibit superior performance with enhanced convergence speed, computational accuracy, and efficiency.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"180 ","pages":"106756"},"PeriodicalIF":6.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Signed graphs have been widely applied to model real-world complex networks with positive and negative links, and signed graph embedding has become a popular topic in the field of signed graph analysis. Although various signed graph embedding methods have been proposed, most of them still suffer from the generality problem. Namely, they cannot simultaneously achieve the satisfactory performance in multiple downstream tasks. In view of this, in this paper we propose a signed embedding method named MOSGCN which exhibits two significant characteristics. Firstly, MOSGCN designs a multi-order neighborhood feature fusion strategy based on the structural balance theory, enabling it to adaptively capture local and global structure features for more informative node representations. Secondly, MOSGCN is trained by using the signed graph contrastive learning framework, which further helps it learn more discriminative and robust node representations, leading to the better generality. We select link sign prediction and community detection as the downstream tasks, and conduct extensive experiments to test the effectiveness of MOSGCN on four benchmark datasets. The results illustrate the good generality of MOSGCN and the superiority by comparing to state-of-the-art methods.
{"title":"Signed graph embedding via multi-order neighborhood feature fusion and contrastive learning.","authors":"Chaobo He, Hao Cheng, Jiaqi Yang, Yong Tang, Quanlong Guan","doi":"10.1016/j.neunet.2024.106897","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.106897","url":null,"abstract":"<p><p>Signed graphs have been widely applied to model real-world complex networks with positive and negative links, and signed graph embedding has become a popular topic in the field of signed graph analysis. Although various signed graph embedding methods have been proposed, most of them still suffer from the generality problem. Namely, they cannot simultaneously achieve the satisfactory performance in multiple downstream tasks. In view of this, in this paper we propose a signed embedding method named MOSGCN which exhibits two significant characteristics. Firstly, MOSGCN designs a multi-order neighborhood feature fusion strategy based on the structural balance theory, enabling it to adaptively capture local and global structure features for more informative node representations. Secondly, MOSGCN is trained by using the signed graph contrastive learning framework, which further helps it learn more discriminative and robust node representations, leading to the better generality. We select link sign prediction and community detection as the downstream tasks, and conduct extensive experiments to test the effectiveness of MOSGCN on four benchmark datasets. The results illustrate the good generality of MOSGCN and the superiority by comparing to state-of-the-art methods.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"182 ","pages":"106897"},"PeriodicalIF":6.0,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-17DOI: 10.1016/j.neunet.2024.106898
Yinqian Sun, Feifei Zhao, Zhuoya Zhao, Yi Zeng
Inspired by the brain's information processing using binary spikes, spiking neural networks (SNNs) offer significant reductions in energy consumption and are more adept at incorporating multi-scale biological characteristics. In SNNs, spiking neurons serve as the fundamental information processing units. However, in most models, these neurons are typically simplified, focusing primarily on the leaky integrate-and-fire (LIF) point neuron model while neglecting the structural properties of biological neurons. This simplification hampers the computational and learning capabilities of SNNs. In this paper, we propose a brain-inspired deep distributional reinforcement learning algorithm based on SNNs, which integrates a bio-inspired multi-compartment neuron (MCN) model with a population coding approach. The proposed MCN model simulates the structure and function of apical dendritic, basal dendritic, and somatic compartments, achieving computational power comparable to that of biological neurons. Additionally, we introduce an implicit fractional embedding method based on population coding of spiking neurons. We evaluated our model on Atari games, and the experimental results demonstrate that it surpasses the vanilla FQF model, which utilizes traditional artificial neural networks (ANNs), as well as the Spiking-FQF models that are based on ANN-to-SNN conversion methods. Ablation studies further reveal that the proposed multi-compartment neuron model and the quantile fraction implicit population spike representation significantly enhance the performance of MCS-FQF while also reducing power consumption.
{"title":"Multi-compartment neuron and population encoding powered spiking neural network for deep distributional reinforcement learning.","authors":"Yinqian Sun, Feifei Zhao, Zhuoya Zhao, Yi Zeng","doi":"10.1016/j.neunet.2024.106898","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.106898","url":null,"abstract":"<p><p>Inspired by the brain's information processing using binary spikes, spiking neural networks (SNNs) offer significant reductions in energy consumption and are more adept at incorporating multi-scale biological characteristics. In SNNs, spiking neurons serve as the fundamental information processing units. However, in most models, these neurons are typically simplified, focusing primarily on the leaky integrate-and-fire (LIF) point neuron model while neglecting the structural properties of biological neurons. This simplification hampers the computational and learning capabilities of SNNs. In this paper, we propose a brain-inspired deep distributional reinforcement learning algorithm based on SNNs, which integrates a bio-inspired multi-compartment neuron (MCN) model with a population coding approach. The proposed MCN model simulates the structure and function of apical dendritic, basal dendritic, and somatic compartments, achieving computational power comparable to that of biological neurons. Additionally, we introduce an implicit fractional embedding method based on population coding of spiking neurons. We evaluated our model on Atari games, and the experimental results demonstrate that it surpasses the vanilla FQF model, which utilizes traditional artificial neural networks (ANNs), as well as the Spiking-FQF models that are based on ANN-to-SNN conversion methods. Ablation studies further reveal that the proposed multi-compartment neuron model and the quantile fraction implicit population spike representation significantly enhance the performance of MCS-FQF while also reducing power consumption.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"182 ","pages":"106898"},"PeriodicalIF":6.0,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-17DOI: 10.1016/j.neunet.2024.106904
Fei Yu, Yue Lin, Wei Yao, Shuo Cai, Hairong Lin, Yi Li
In Industrial Internet of Things (IIoT) production and operation processes, a substantial amount of video data is generated, often containing sensitive personal and commercial information. This paper proposed three new multiscroll Hopfield neural network (MHNN) systems by utilizing an improved segmented nonlinear non-ideal magnetic-controlled memristor model for electromagnetic radiation. Through dynamical methods, the constructed neural network's multidimensional multiscroll attractors and initial offset boosting behavior are analyzed. The observed initial offset boosting behavior demonstrates the system has extreme multistability. Secondly, a video encryption application based on the MHNN system is implemented on the Raspberry Pi platform. This approach encrypts each frame of the extracted video image using a novel encryption algorithm through frame-by-frame encryption, achieving significant encryption results with an information entropy calculation result of 7.9973. This provides strong protection for video data generated in IIoT. Finally, the proposed MHNN system is implemented on Field-Programmable Gate Array (FPGA) digital hardware platform.
在工业物联网(IIoT)的生产和运营过程中,会产生大量视频数据,其中往往包含敏感的个人信息和商业信息。本文利用改进的电磁辐射分段非线性非理想磁控忆阻器模型,提出了三种新型多卷霍普菲尔德神经网络(MHNN)系统。通过动力学方法,分析了所构建神经网络的多维多卷吸引子和初始偏移提升行为。观察到的初始偏移提升行为表明该系统具有极高的多稳定性。其次,在 Raspberry Pi 平台上实现了基于 MHNN 系统的视频加密应用。该方法使用一种新颖的加密算法,通过逐帧加密对提取的视频图像的每一帧进行加密,取得了显著的加密效果,信息熵计算结果为 7.9973。这为物联网中生成的视频数据提供了强有力的保护。最后,在现场可编程门阵列(FPGA)数字硬件平台上实现了所提出的 MHNN 系统。
{"title":"Multiscroll hopfield neural network with extreme multistability and its application in video encryption for IIoT.","authors":"Fei Yu, Yue Lin, Wei Yao, Shuo Cai, Hairong Lin, Yi Li","doi":"10.1016/j.neunet.2024.106904","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.106904","url":null,"abstract":"<p><p>In Industrial Internet of Things (IIoT) production and operation processes, a substantial amount of video data is generated, often containing sensitive personal and commercial information. This paper proposed three new multiscroll Hopfield neural network (MHNN) systems by utilizing an improved segmented nonlinear non-ideal magnetic-controlled memristor model for electromagnetic radiation. Through dynamical methods, the constructed neural network's multidimensional multiscroll attractors and initial offset boosting behavior are analyzed. The observed initial offset boosting behavior demonstrates the system has extreme multistability. Secondly, a video encryption application based on the MHNN system is implemented on the Raspberry Pi platform. This approach encrypts each frame of the extracted video image using a novel encryption algorithm through frame-by-frame encryption, achieving significant encryption results with an information entropy calculation result of 7.9973. This provides strong protection for video data generated in IIoT. Finally, the proposed MHNN system is implemented on Field-Programmable Gate Array (FPGA) digital hardware platform.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"182 ","pages":"106904"},"PeriodicalIF":6.0,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-16DOI: 10.1016/j.neunet.2024.106856
Zengfa Dou, Nian Peng, Weiming Hou, Xianghua Xie, Xiaoke Ma
Clustering of multi-view data divides objects into groups by preserving structure of clusters in all views, requiring simultaneously takes into consideration diversity and consistency of various views, corresponding to the shared and specific components of various views. Current algorithms fail to fully characterize and balance diversity and consistency of various views, resulting in the undesirable performance. Here, a novel Multi-View Clustering with Deep non-negative matrix factorization and Multi-Level Representation (MVC-DMLR) learning is proposed, which integrates feature learning, multi-level topology representation, and clustering of multi-view data. Specifically, MVC-DMLR first learns multi-level representation (also called deep features) of objects with deep nonnegative matrix factorization (DNMF), facilitating the exploitation of hierarchical structure of multi-view data. Then, it learns multi-level graphs for each view from multi-level representation, where relations between diversity and consistency are addressed at various resolutions. MVC-DMLR integrates multi-level representation learning, multi-level topology representation learning and clustering, which is formulated as an optimization problem. Experimental results show the superiority of MVC-DMLR to baselines in terms of accuracy, F1-score, normalized mutual information and adjusted rand index.
{"title":"Learning multi-level topology representation for multi-view clustering with deep non-negative matrix factorization.","authors":"Zengfa Dou, Nian Peng, Weiming Hou, Xianghua Xie, Xiaoke Ma","doi":"10.1016/j.neunet.2024.106856","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.106856","url":null,"abstract":"<p><p>Clustering of multi-view data divides objects into groups by preserving structure of clusters in all views, requiring simultaneously takes into consideration diversity and consistency of various views, corresponding to the shared and specific components of various views. Current algorithms fail to fully characterize and balance diversity and consistency of various views, resulting in the undesirable performance. Here, a novel Multi-View Clustering with Deep non-negative matrix factorization and Multi-Level Representation (MVC-DMLR) learning is proposed, which integrates feature learning, multi-level topology representation, and clustering of multi-view data. Specifically, MVC-DMLR first learns multi-level representation (also called deep features) of objects with deep nonnegative matrix factorization (DNMF), facilitating the exploitation of hierarchical structure of multi-view data. Then, it learns multi-level graphs for each view from multi-level representation, where relations between diversity and consistency are addressed at various resolutions. MVC-DMLR integrates multi-level representation learning, multi-level topology representation learning and clustering, which is formulated as an optimization problem. Experimental results show the superiority of MVC-DMLR to baselines in terms of accuracy, F1-score, normalized mutual information and adjusted rand index.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"182 ","pages":"106856"},"PeriodicalIF":6.0,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sparse Bayesian learning has promoted many effective frameworks of brain activity decoding for the brain-computer interface, including the direct reconstruction of muscle activity using brain recordings. However, existing sparse Bayesian learning algorithms mainly use Gaussian distribution as error assumption in the reconstruction task, which is not necessarily the truth in the real-world application. On the other hand, brain recording is known to be highly noisy and contains many non-Gaussian noises, which could lead to large performance degradation for sparse Bayesian learning algorithms. The goal of this paper is to propose a novel robust implementation of sparse Bayesian learning so that robustness and sparseness can be realized simultaneously. Motivated by the exceptional robustness of maximum correntropy criterion (MCC), we proposed integrating MCC to the sparse Bayesian learning regime. To be specific, we derived the explicit error assumption inherent in the MCC, and then leveraged it for the likelihood function. Meanwhile, we utilized the automatic relevance determination technique as the sparse prior distribution. To fully evaluate the proposed method, a synthetic example and a real-world muscle activity reconstruction task with two different brain modalities were leveraged. Experimental results showed, our proposed sparse Bayesian correntropy learning framework significantly improves the robustness for the noisy regression tasks. Our proposed algorithm could realize higher correlation coefficients and lower root mean squared errors for the real-world muscle activity reconstruction scenario. Sparse Bayesian correntropy learning provides a powerful approach for brain activity decoding which will promote the development of brain-computer interface technology.
{"title":"Sparse Bayesian correntropy learning for robust muscle activity reconstruction from noisy brain recordings.","authors":"Yuanhao Li, Badong Chen, Natsue Yoshimura, Yasuharu Koike, Okito Yamashita","doi":"10.1016/j.neunet.2024.106899","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.106899","url":null,"abstract":"<p><p>Sparse Bayesian learning has promoted many effective frameworks of brain activity decoding for the brain-computer interface, including the direct reconstruction of muscle activity using brain recordings. However, existing sparse Bayesian learning algorithms mainly use Gaussian distribution as error assumption in the reconstruction task, which is not necessarily the truth in the real-world application. On the other hand, brain recording is known to be highly noisy and contains many non-Gaussian noises, which could lead to large performance degradation for sparse Bayesian learning algorithms. The goal of this paper is to propose a novel robust implementation of sparse Bayesian learning so that robustness and sparseness can be realized simultaneously. Motivated by the exceptional robustness of maximum correntropy criterion (MCC), we proposed integrating MCC to the sparse Bayesian learning regime. To be specific, we derived the explicit error assumption inherent in the MCC, and then leveraged it for the likelihood function. Meanwhile, we utilized the automatic relevance determination technique as the sparse prior distribution. To fully evaluate the proposed method, a synthetic example and a real-world muscle activity reconstruction task with two different brain modalities were leveraged. Experimental results showed, our proposed sparse Bayesian correntropy learning framework significantly improves the robustness for the noisy regression tasks. Our proposed algorithm could realize higher correlation coefficients and lower root mean squared errors for the real-world muscle activity reconstruction scenario. Sparse Bayesian correntropy learning provides a powerful approach for brain activity decoding which will promote the development of brain-computer interface technology.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"182 ","pages":"106899"},"PeriodicalIF":6.0,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-16DOI: 10.1016/j.neunet.2024.106908
Zhanxuan Hu, Yu Duan, Yaming Zhang, Rong Wang, Feiping Nie
Generalized Category Discovery (GCD) addresses a more realistic and challenging setting in semi-supervised visual recognition, where unlabeled data contains samples from both known and novel categories. Recently, prototypical classifier has shown prominent performance on this issue, with the Softmax-based Cross-Entropy loss (SCE) commonly employed to optimize the distance between a sample and prototypes. However, the inherent non-bijectiveness of SCE prevents it from resolving intraclass relations among samples, resulting in semantic ambiguity. To mitigate this issue, we propose Distribution Consistency Regularization (DCR) for the prototypical classifier. By leveraging a simple intraclass consistency loss, we enforce the classifier to yield consistent distributions for samples belonging to the same class. In doing so, we equip the classifier to better capture local structures and alleviate semantic ambiguity. Additionally, we propose using partial labels, rather than hard pseudo labels, to explore potential positive pairs in unlabeled data, thereby reducing the risk of introducing noisy supervisory signals. DCR requires no external sophisticated module, rendering the enhanced model concise and efficient. Extensive experiments validate consistent performance benefits of DCR while achieving competitive or better performance on six benchmarks. Hence, our method can serve as a strong baseline for GCD. Our code is available at: https://github.com/yichenwang231/DCR.
{"title":"Prototypical classifier with distribution consistency regularization for generalized category discovery: A strong baseline.","authors":"Zhanxuan Hu, Yu Duan, Yaming Zhang, Rong Wang, Feiping Nie","doi":"10.1016/j.neunet.2024.106908","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.106908","url":null,"abstract":"<p><p>Generalized Category Discovery (GCD) addresses a more realistic and challenging setting in semi-supervised visual recognition, where unlabeled data contains samples from both known and novel categories. Recently, prototypical classifier has shown prominent performance on this issue, with the Softmax-based Cross-Entropy loss (SCE) commonly employed to optimize the distance between a sample and prototypes. However, the inherent non-bijectiveness of SCE prevents it from resolving intraclass relations among samples, resulting in semantic ambiguity. To mitigate this issue, we propose Distribution Consistency Regularization (DCR) for the prototypical classifier. By leveraging a simple intraclass consistency loss, we enforce the classifier to yield consistent distributions for samples belonging to the same class. In doing so, we equip the classifier to better capture local structures and alleviate semantic ambiguity. Additionally, we propose using partial labels, rather than hard pseudo labels, to explore potential positive pairs in unlabeled data, thereby reducing the risk of introducing noisy supervisory signals. DCR requires no external sophisticated module, rendering the enhanced model concise and efficient. Extensive experiments validate consistent performance benefits of DCR while achieving competitive or better performance on six benchmarks. Hence, our method can serve as a strong baseline for GCD. Our code is available at: https://github.com/yichenwang231/DCR.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"182 ","pages":"106908"},"PeriodicalIF":6.0,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}