首页 > 最新文献

IEEE Transactions on Cognitive and Developmental Systems最新文献

英文 中文
Electroencephalogram-Based Unified Approach for Multiple Neurodevelopmental Disorders Detection in Children Using Successive Multivariate Variational Mode Decomposition 基于脑电图的儿童多种神经发育障碍的连续多元变分模式分解统一检测方法
IF 4.9 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-01 DOI: 10.1109/TCDS.2025.3556888
Ujjawal Chandela;Kazi Newaj Faisal;Rishi Raj Sharma
Early age identification and prompt intervention play a crucial role in mitigating the severity of neurodevelopmental disorders in children. Traditional diagnostic approaches can be lengthy, but there is growing research potential in using electroencephalogram (EEG) signals to detect attention deficit hyperactivity disorder (ADHD) and intellectual developmental disorder (IDD). By recording the electrical activity of the brain, EEG has emerged as a promising technique for the early identification of these disorders. This research proposes a novel integrated method for identifying multiple neurodevelopmental disorders from the EEG signals of children. The approach combines successive multivariate variational mode decomposition (SMVMD) for analyzing multicomponent nonstationary signals and a machine learning (ML)-based classifier, addressing the issue of inconsistent numbers of extracted features by introducing an energy-based feature integration approach. By integrating enhanced features from SMVMD with a K-nearest neighbor (KNN) classifier, the unified approach successfully detects two separate neurodevelopmental disorders from normal subjects. The proposed method demonstrates perfect classification scores in detecting IDD under three different scenarios and achieves 99.17% accuracy in classifying ADHD subjects from normal subjects. Evaluation against different ML-based classifiers confirms the effectiveness of the proposed feature extraction algorithm and highlights its superior performance compared to recent methods published on similar datasets.
早期识别和及时干预在减轻儿童神经发育障碍的严重程度方面起着至关重要的作用。传统的诊断方法可能很长,但利用脑电图(EEG)信号检测注意缺陷多动障碍(ADHD)和智力发育障碍(IDD)的研究潜力越来越大。通过记录大脑的电活动,脑电图已经成为早期识别这些疾病的一种很有前途的技术。本研究提出了一种从儿童脑电图信号中识别多种神经发育障碍的新方法。该方法结合了用于分析多分量非平稳信号的连续多元变分模式分解(SMVMD)和基于机器学习(ML)的分类器,通过引入基于能量的特征集成方法来解决提取特征数量不一致的问题。通过将SMVMD的增强特征与k -最近邻(KNN)分类器相结合,统一的方法成功地从正常受试者中检测出两种不同的神经发育障碍。在三种不同场景下,该方法对ADHD受试者和正常受试者的分类准确率达到99.17%。针对不同的基于ml的分类器的评估证实了所提出的特征提取算法的有效性,并突出了其与最近发表在类似数据集上的方法相比的优越性能。
{"title":"Electroencephalogram-Based Unified Approach for Multiple Neurodevelopmental Disorders Detection in Children Using Successive Multivariate Variational Mode Decomposition","authors":"Ujjawal Chandela;Kazi Newaj Faisal;Rishi Raj Sharma","doi":"10.1109/TCDS.2025.3556888","DOIUrl":"https://doi.org/10.1109/TCDS.2025.3556888","url":null,"abstract":"Early age identification and prompt intervention play a crucial role in mitigating the severity of neurodevelopmental disorders in children. Traditional diagnostic approaches can be lengthy, but there is growing research potential in using electroencephalogram (EEG) signals to detect attention deficit hyperactivity disorder (ADHD) and intellectual developmental disorder (IDD). By recording the electrical activity of the brain, EEG has emerged as a promising technique for the early identification of these disorders. This research proposes a novel integrated method for identifying multiple neurodevelopmental disorders from the EEG signals of children. The approach combines successive multivariate variational mode decomposition (SMVMD) for analyzing multicomponent nonstationary signals and a machine learning (ML)-based classifier, addressing the issue of inconsistent numbers of extracted features by introducing an energy-based feature integration approach. By integrating enhanced features from SMVMD with a K-nearest neighbor (KNN) classifier, the unified approach successfully detects two separate neurodevelopmental disorders from normal subjects. The proposed method demonstrates perfect classification scores in detecting IDD under three different scenarios and achieves 99.17% accuracy in classifying ADHD subjects from normal subjects. Evaluation against different ML-based classifiers confirms the effectiveness of the proposed feature extraction algorithm and highlights its superior performance compared to recent methods published on similar datasets.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 6","pages":"1350-1359"},"PeriodicalIF":4.9,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLARE: Cognitive Load Assessment in Real-Time With Multimodal Data CLARE:基于多模式数据的实时认知负荷评估
IF 4.9 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-27 DOI: 10.1109/TCDS.2025.3555517
Anubhav Bhatti;Prithila Angkan;Behnam Behinaein;Zunayed Mahmud;Dirk Rodenburg;Heather Braund;P. James Mclellan;Aaron Ruberto;Geoffery Harrison;Daryl Wilson;Adam Szulewski;Dan Howes;Ali Etemad;Paul Hungler
We present a novel multimodal dataset for cognitive load assessment in real-time (CLARE). The dataset contains physiological and gaze data from 24 participants with self-reported cognitive load scores as ground-truth labels. The dataset consists of four modalities, namely, electrocardiography (ECG), electrodermal activity (EDA), electroencephalogram (EEG), and gaze tracking. To map diverse levels of mental load on participants during experiments, each participant completed four 9-min sessions on a computer-based operator performance and mental workload task (the MATB-II software) with varying levels of complexity in 1 min segments. During the experiment, participants reported their cognitive load every 10 s. For the dataset, we also provide benchmark binary classification results with machine learning and deep learning models on two different evaluation schemes, namely, 10-fold and leave-one-subject-out (LOSO) cross-validation. Benchmark results show that for 10-fold evaluation, the convolutional neural network (CNN) based deep learning model achieves the best classification performance with ECG, EDA, and gaze. In contrast, for LOSO, the best performance is achieved by the deep learning model with ECG, EDA, and EEG.
我们提出了一个新的多模态数据集用于实时认知负荷评估(CLARE)。该数据集包含来自24名参与者的生理和凝视数据,他们以自我报告的认知负荷得分作为基本事实标签。该数据集由四种模式组成,即心电图(ECG)、皮电活动(EDA)、脑电图(EEG)和凝视跟踪。为了绘制实验期间参与者的不同程度的心理负荷,每个参与者在1分钟内完成了4个9分钟的基于计算机的操作员表现和心理负荷任务(MATB-II软件),这些任务具有不同程度的复杂性。在实验中,参与者每10秒报告一次他们的认知负荷。对于数据集,我们还提供了机器学习和深度学习模型在两种不同评估方案上的基准二分类结果,即10倍交叉验证和留一主体(LOSO)交叉验证。基准测试结果表明,在10倍评估中,基于卷积神经网络(CNN)的深度学习模型在ECG、EDA和凝视分类方面的性能最好。相比之下,对于LOSO,具有ECG, EDA和EEG的深度学习模型达到了最佳性能。
{"title":"CLARE: Cognitive Load Assessment in Real-Time With Multimodal Data","authors":"Anubhav Bhatti;Prithila Angkan;Behnam Behinaein;Zunayed Mahmud;Dirk Rodenburg;Heather Braund;P. James Mclellan;Aaron Ruberto;Geoffery Harrison;Daryl Wilson;Adam Szulewski;Dan Howes;Ali Etemad;Paul Hungler","doi":"10.1109/TCDS.2025.3555517","DOIUrl":"https://doi.org/10.1109/TCDS.2025.3555517","url":null,"abstract":"We present a novel multimodal dataset for cognitive load assessment in real-time (CLARE). The dataset contains physiological and gaze data from 24 participants with self-reported cognitive load scores as ground-truth labels. The dataset consists of four modalities, namely, electrocardiography (ECG), electrodermal activity (EDA), electroencephalogram (EEG), and gaze tracking. To map diverse levels of mental load on participants during experiments, each participant completed four 9-min sessions on a computer-based operator performance and mental workload task (the MATB-II software) with varying levels of complexity in 1 min segments. During the experiment, participants reported their cognitive load every 10 s. For the dataset, we also provide benchmark binary classification results with machine learning and deep learning models on two different evaluation schemes, namely, 10-fold and leave-one-subject-out (LOSO) cross-validation. Benchmark results show that for 10-fold evaluation, the convolutional neural network (CNN) based deep learning model achieves the best classification performance with ECG, EDA, and gaze. In contrast, for LOSO, the best performance is achieved by the deep learning model with ECG, EDA, and EEG.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 6","pages":"1337-1349"},"PeriodicalIF":4.9,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GoMIC: Enhancing Efficient Collaboration in Multiagent Reinforcement Learning Through Group-Specific Mutual Information GoMIC:通过群体特定互信息增强多智能体强化学习的高效协作
IF 4.9 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-27 DOI: 10.1109/TCDS.2025.3574031
Jichao Wang;Yi Li;Yichun Li;Shuai Mao;Zhaoyang Dong;Yang Tang
In cooperative multiagent reinforcement learning (MARL), previous research has predominantly concentrated on augmenting cooperation through the optimization of global behavioral correlations between agents, with mutual information (MI) typically serving as a crucial metric for correlation quantification. The existing approaches aim to enhance the behavioral correlation among agents to foster better cooperation and goal alignment by leveraging MI. However, it has been demonstrated that the cooperative capabilities among agents cannot be enhanced merely by directly increasing their overall behavioral correlations, particularly in environments with multiple subtasks or scenarios requiring dynamic team structures. To tackle this challenge, a MARL algorithm named group-oriented MI collaboration (GoMIC) is designed, which dynamically partitions agents and employs MI within each partition as an enhanced reward. GoMIC mitigates excessive reliance of individual policies on team-related information and fosters agents to acquire policies across varying team compositions. Experimental evaluations across various tasks in multiagent particle environment (MPE), level-based foraging (LBF), and StarCraft II (SC2) demonstrate the superior performance of GoMIC over some existing approaches, indicating its potential to improve collaboration in multiagent systems.
在合作多智能体强化学习(MARL)中,以往的研究主要集中在通过优化智能体之间的全局行为相关性来增强合作,互信息(MI)通常是量化相关的关键指标。现有的方法旨在通过利用智能智能来增强智能体之间的行为相关性,以促进更好的合作和目标一致性。然而,研究表明,智能体之间的合作能力不能仅仅通过直接增加它们的整体行为相关性来增强,特别是在具有多个子任务或需要动态团队结构的场景的环境中。为了解决这一挑战,设计了一种名为面向组的MI协作(GoMIC)的MARL算法,该算法动态划分代理并在每个分区内使用MI作为增强奖励。GoMIC减轻了个体策略对团队相关信息的过度依赖,并促进代理在不同的团队组成中获取策略。在多智能体粒子环境(MPE)、基于关卡的觅食(LBF)和《星际争霸2》(SC2)中进行的各种任务的实验评估表明,GoMIC比一些现有方法的性能更好,表明它有潜力改善多智能体系统中的协作。
{"title":"GoMIC: Enhancing Efficient Collaboration in Multiagent Reinforcement Learning Through Group-Specific Mutual Information","authors":"Jichao Wang;Yi Li;Yichun Li;Shuai Mao;Zhaoyang Dong;Yang Tang","doi":"10.1109/TCDS.2025.3574031","DOIUrl":"https://doi.org/10.1109/TCDS.2025.3574031","url":null,"abstract":"In cooperative multiagent reinforcement learning (MARL), previous research has predominantly concentrated on augmenting cooperation through the optimization of global behavioral correlations between agents, with mutual information (MI) typically serving as a crucial metric for correlation quantification. The existing approaches aim to enhance the behavioral correlation among agents to foster better cooperation and goal alignment by leveraging MI. However, it has been demonstrated that the cooperative capabilities among agents cannot be enhanced merely by directly increasing their overall behavioral correlations, particularly in environments with multiple subtasks or scenarios requiring dynamic team structures. To tackle this challenge, a MARL algorithm named group-oriented MI collaboration (GoMIC) is designed, which dynamically partitions agents and employs MI within each partition as an enhanced reward. GoMIC mitigates excessive reliance of individual policies on team-related information and fosters agents to acquire policies across varying team compositions. Experimental evaluations across various tasks in multiagent particle environment (MPE), level-based foraging (LBF), and StarCraft II (SC2) demonstrate the superior performance of GoMIC over some existing approaches, indicating its potential to improve collaboration in multiagent systems.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 6","pages":"1536-1547"},"PeriodicalIF":4.9,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AVTENet: A Human-Cognition-Inspired Audio-Visual Transformer-Based Ensemble Network for Video Deepfake Detection AVTENet:一种基于人类认知的基于视听变压器的视频深度假检测集成网络
IF 4.9 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-24 DOI: 10.1109/TCDS.2025.3554477
Ammarah Hashmi;Sahibzada Adil Shahzad;Chia Wen Lin;Yu Tsao;Hsin-Min Wang
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries. Most previous studies on detecting artificial intelligence-generated fake videos only utilize visual modality or audio modality. While some methods exploit audio and visual modalities to detect forged videos, they have not been comprehensively evaluated on multimodal datasets of deepfake videos involving acoustic and visual manipulations, and are mostly based on convolutional neural networks with low detection accuracy. Considering that human cognition instinctively integrates multisensory information including audio and visual cues to perceive and interpret content and the success of transformer in various fields, this study introduces the audio-visual transformer-based ensemble network (AVTENet). This innovative framework tackles the complexities of deepfake technology by integrating both acoustic and visual manipulations to enhance the accuracy of video forgery detection. Specifically, the proposed model integrates several purely transformer-based variants that capture video, audio, and audio-visual salient cues to reach a consensus in prediction. For evaluation, we use the recently released benchmark multimodal audio-video FakeAVCeleb dataset. For a detailed analysis, we evaluate AVTENet, its variants, and several existing methods on multiple test sets of the FakeAVCeleb dataset. Experimental results show that the proposed model outperforms all existing methods and achieves state-of-the-art performance on Testset-I and Testset-II of the FakeAVCeleb dataset. We also compare AVTENet against humans in detecting video forgery. The results show that AVTENet significantly outperforms humans.
最近超现实深度假视频的激增引起了人们对音频和视觉伪造威胁的关注。以往对人工智能生成的假视频检测的研究多采用视觉模态或听觉模态。虽然一些方法利用音频和视觉模式来检测伪造视频,但它们尚未在涉及声学和视觉操作的深度伪造视频的多模态数据集上进行全面评估,并且大多基于卷积神经网络,检测精度较低。考虑到人类的认知本能地整合包括视听线索在内的多感官信息来感知和解释内容,以及变压器在各个领域的成功,本研究引入了基于视听变压器的集成网络(AVTENet)。这个创新的框架通过整合声学和视觉操作来解决深度伪造技术的复杂性,以提高视频伪造检测的准确性。具体来说,所提出的模型集成了几个纯粹基于变压器的变体,这些变体捕获视频、音频和视听显著线索,以在预测中达成共识。为了进行评估,我们使用了最近发布的基准多模态音频-视频FakeAVCeleb数据集。为了进行详细的分析,我们在FakeAVCeleb数据集的多个测试集上评估了AVTENet及其变体和几种现有方法。实验结果表明,该模型优于所有现有方法,在FakeAVCeleb数据集的测试集i和测试集ii上达到了最先进的性能。我们还将AVTENet与人类在检测视频伪造方面进行了比较。结果表明,AVTENet的表现明显优于人类。
{"title":"AVTENet: A Human-Cognition-Inspired Audio-Visual Transformer-Based Ensemble Network for Video Deepfake Detection","authors":"Ammarah Hashmi;Sahibzada Adil Shahzad;Chia Wen Lin;Yu Tsao;Hsin-Min Wang","doi":"10.1109/TCDS.2025.3554477","DOIUrl":"https://doi.org/10.1109/TCDS.2025.3554477","url":null,"abstract":"The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries. Most previous studies on detecting artificial intelligence-generated fake videos only utilize visual modality or audio modality. While some methods exploit audio and visual modalities to detect forged videos, they have not been comprehensively evaluated on multimodal datasets of deepfake videos involving acoustic and visual manipulations, and are mostly based on convolutional neural networks with low detection accuracy. Considering that human cognition instinctively integrates multisensory information including audio and visual cues to perceive and interpret content and the success of transformer in various fields, this study introduces the audio-visual transformer-based ensemble network (AVTENet). This innovative framework tackles the complexities of deepfake technology by integrating both acoustic and visual manipulations to enhance the accuracy of video forgery detection. Specifically, the proposed model integrates several purely transformer-based variants that capture video, audio, and audio-visual salient cues to reach a consensus in prediction. For evaluation, we use the recently released benchmark multimodal audio-video FakeAVCeleb dataset. For a detailed analysis, we evaluate AVTENet, its variants, and several existing methods on multiple test sets of the FakeAVCeleb dataset. Experimental results show that the proposed model outperforms all existing methods and achieves state-of-the-art performance on Testset-I and Testset-II of the FakeAVCeleb dataset. We also compare AVTENet against humans in detecting video forgery. The results show that AVTENet significantly outperforms humans.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 6","pages":"1360-1376"},"PeriodicalIF":4.9,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Supervised Object Pose Estimation With Multitask Learning 基于多任务学习的自监督对象姿态估计
IF 4.9 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-21 DOI: 10.1109/TCDS.2025.3571813
Dinh-Cuong Hoang;Phan Xuan Tan;Ta Huu Anh Duong;Tuan-Minh Huynh;Duc-Manh Nguyen;Anh-Nhat Nguyen;Duc-Long Pham;Van-Duc Vu;Thu-Uyen Nguyen;Ngoc-Anh Hoang;Khanh-Toan Phan;Duc-Thanh Tran;Van-Thiep Nguyen;Ngoc-Trung Ho;Cong-Trinh Tran;Van-Hiep Duong
Object pose estimation using learning-based methods often necessitates vast amounts of meticulously labeled training data. The process of capturing real-world object images under diverse conditions and annotating these images with 6 degrees of freedom (6DOF) object poses is both time-consuming and resource-intensive. In this study, we propose an innovative approach to monocular 6-D pose estimation through self-supervised learning, eliminating the need for labor-intensive manual annotations. Our method initiates by training a multitask neural network in a fully supervised manner, leveraging synthetic RGBD data. We leverage semantic segmentation, instance-level depth estimation, and vector-field prediction as auxiliary tasks to enhance the primary task of pose estimation. Subsequently, we harness advancements in multitask learning to further self-supervise the model using unlabeled real-world RGB data. A pivotal element of our self-supervised object pose estimation is a geometry-guided pseudolabel filtering module that relies on estimated depth from instance-level depth estimation. Our extensive experiments conducted on benchmark datasets demonstrate the effectiveness and potential of our approach in achieving accurate monocular 6-D pose estimation. Importantly, our method showcases a promising avenue for overcoming the challenges associated with the labor-intensive annotation process, offering a more efficient and scalable solution for real-world object pose estimation.
使用基于学习的方法进行物体姿态估计通常需要大量精心标记的训练数据。在不同条件下捕获真实世界的物体图像,并用6自由度(6DOF)物体姿态对这些图像进行注释的过程既耗时又耗费资源。在这项研究中,我们提出了一种创新的方法,通过自监督学习来估计单眼6-D姿态,消除了对劳动密集型人工注释的需要。我们的方法通过利用合成RGBD数据,以完全监督的方式训练一个多任务神经网络来启动。我们利用语义分割、实例级深度估计和向量场预测作为辅助任务来增强姿态估计的主要任务。随后,我们利用多任务学习的进展,使用未标记的真实世界RGB数据进一步自我监督模型。我们的自监督对象姿态估计的关键元素是一个几何引导的伪标签滤波模块,它依赖于来自实例级深度估计的估计深度。我们在基准数据集上进行的大量实验证明了我们的方法在实现准确的单目6-D姿态估计方面的有效性和潜力。重要的是,我们的方法为克服与劳动密集型注释过程相关的挑战提供了一条有前途的途径,为现实世界的物体姿态估计提供了更有效和可扩展的解决方案。
{"title":"Self-Supervised Object Pose Estimation With Multitask Learning","authors":"Dinh-Cuong Hoang;Phan Xuan Tan;Ta Huu Anh Duong;Tuan-Minh Huynh;Duc-Manh Nguyen;Anh-Nhat Nguyen;Duc-Long Pham;Van-Duc Vu;Thu-Uyen Nguyen;Ngoc-Anh Hoang;Khanh-Toan Phan;Duc-Thanh Tran;Van-Thiep Nguyen;Ngoc-Trung Ho;Cong-Trinh Tran;Van-Hiep Duong","doi":"10.1109/TCDS.2025.3571813","DOIUrl":"https://doi.org/10.1109/TCDS.2025.3571813","url":null,"abstract":"Object pose estimation using learning-based methods often necessitates vast amounts of meticulously labeled training data. The process of capturing real-world object images under diverse conditions and annotating these images with 6 degrees of freedom (6DOF) object poses is both time-consuming and resource-intensive. In this study, we propose an innovative approach to monocular 6-D pose estimation through self-supervised learning, eliminating the need for labor-intensive manual annotations. Our method initiates by training a multitask neural network in a fully supervised manner, leveraging synthetic RGBD data. We leverage semantic segmentation, instance-level depth estimation, and vector-field prediction as auxiliary tasks to enhance the primary task of pose estimation. Subsequently, we harness advancements in multitask learning to further self-supervise the model using unlabeled real-world RGB data. A pivotal element of our self-supervised object pose estimation is a geometry-guided pseudolabel filtering module that relies on estimated depth from instance-level depth estimation. Our extensive experiments conducted on benchmark datasets demonstrate the effectiveness and potential of our approach in achieving accurate monocular 6-D pose estimation. Importantly, our method showcases a promising avenue for overcoming the challenges associated with the labor-intensive annotation process, offering a more efficient and scalable solution for real-world object pose estimation.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 6","pages":"1548-1564"},"PeriodicalIF":4.9,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Discriminative Network for Emotion Recognition Across Individuals 跨个体情绪识别的多模态判别网络
IF 4.9 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-18 DOI: 10.1109/TCDS.2025.3552124
Minxu Liu;Donghai Guan;Chuhang Zheng;Qi Zhu
Multimodal emotion recognition is gaining significant attention for ability to fuse complementary information from diverse physiological and behavioral signals, which benefits the understanding of emotional disorders. However, challenges arise in multimodal fusion due to uncertainties inherent in different modalities, such as complex signal coupling and modality heterogeneity. Furthermore, the feature distribution drift in intersubject emotion recognition hinders the generalization ability of the method and significantly degrades performance on new individuals. To address the above issues, we propose a cross-subject multimodal emotion robust recognition framework that effectively extracts subject-independent intrinsic emotional identification information from heterogeneous multimodal emotion data. First, we develop a multichannel network with self-attention and cross-attention mechanisms to capture modality-specific and complementary features among different modalities, respectively. Second, we incorporate contrastive loss into the multichannel attention network to enhance feature extraction across different channels, thereby facilitating the disentanglement of emotion-specific information. Moreover, a self-expression learning-based network layer is devised to enhance feature discriminability and subject alignment. It aligns samples in a discriminative space using block diagonal matrices and maps multiple individuals to a shared subspace using a block off-diagonal matrix. Finally, attention is used to merge multichannel features, and multilayer perceptron is employed for classification. Experimental results on multimodal emotion datasets confirm that our proposed approach surpasses the current state-of-the-art in terms of emotion recognition accuracy, with particularly significant gains observed in the challenging cross-subject multimodal recognition scenarios.
多模态情绪识别因其融合多种生理和行为信号的互补信息的能力而备受关注,有助于理解情绪障碍。然而,由于不同模态固有的不确定性,如复杂的信号耦合和模态异质性,在多模态融合中出现了挑战。此外,主体间情绪识别的特征分布漂移阻碍了方法的泛化能力,显著降低了方法在新个体上的表现。为了解决上述问题,我们提出了一个跨主体多模态情感鲁棒识别框架,该框架可以有效地从异构多模态情感数据中提取与主体无关的内在情感识别信息。首先,我们开发了一个具有自注意和交叉注意机制的多渠道网络,分别捕捉不同模式之间的特定模式和互补特征。其次,我们将对比损失纳入多通道注意网络,以增强跨不同通道的特征提取,从而促进情绪特定信息的解纠缠。此外,设计了基于自表达学习的网络层,增强了特征可辨别性和主题一致性。它使用块对角矩阵在判别空间中对齐样本,并使用块非对角矩阵将多个个体映射到共享子空间。最后,利用注意力对多通道特征进行合并,并利用多层感知器进行分类。在多模态情感数据集上的实验结果证实,我们提出的方法在情感识别准确性方面超越了当前的最先进技术,在具有挑战性的跨学科多模态识别场景中观察到特别显著的收益。
{"title":"Multimodal Discriminative Network for Emotion Recognition Across Individuals","authors":"Minxu Liu;Donghai Guan;Chuhang Zheng;Qi Zhu","doi":"10.1109/TCDS.2025.3552124","DOIUrl":"https://doi.org/10.1109/TCDS.2025.3552124","url":null,"abstract":"Multimodal emotion recognition is gaining significant attention for ability to fuse complementary information from diverse physiological and behavioral signals, which benefits the understanding of emotional disorders. However, challenges arise in multimodal fusion due to uncertainties inherent in different modalities, such as complex signal coupling and modality heterogeneity. Furthermore, the feature distribution drift in intersubject emotion recognition hinders the generalization ability of the method and significantly degrades performance on new individuals. To address the above issues, we propose a cross-subject multimodal emotion robust recognition framework that effectively extracts subject-independent intrinsic emotional identification information from heterogeneous multimodal emotion data. First, we develop a multichannel network with self-attention and cross-attention mechanisms to capture modality-specific and complementary features among different modalities, respectively. Second, we incorporate contrastive loss into the multichannel attention network to enhance feature extraction across different channels, thereby facilitating the disentanglement of emotion-specific information. Moreover, a self-expression learning-based network layer is devised to enhance feature discriminability and subject alignment. It aligns samples in a discriminative space using block diagonal matrices and maps multiple individuals to a shared subspace using a block off-diagonal matrix. Finally, attention is used to merge multichannel features, and multilayer perceptron is employed for classification. Experimental results on multimodal emotion datasets confirm that our proposed approach surpasses the current state-of-the-art in terms of emotion recognition accuracy, with particularly significant gains observed in the challenging cross-subject multimodal recognition scenarios.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 5","pages":"1323-1335"},"PeriodicalIF":4.9,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Actor–Critic for Physically Heterogeneous Multiagent Reinforcement Learning 物理异构多智能体强化学习的混合行为-评价
IF 4.9 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-16 DOI: 10.1109/TCDS.2025.3570497
Tianyi Hu;Zhiqiang Pu;Xiaolin Ai;Tenghai Qiu;Yanyan Liang;Jianqiang Yi
This article focuses on cooperative policy learning for physically heterogeneous multiagent system (PHet-MAS), where agents have different observation spaces, action spaces, and local state transitions. Due to the various input–output structures of agents’ policies in PHet-MAS, it is difficult to employ parameter sharing techniques for sample efficiency. Moreover, a totally heterogeneous policy design impedes agents from utilizing the training experience of their companions and increases the risk of environmental nonstationarity. To address the above issues, we propose hybrid heterogeneous actor–critic (HHAC), a method for the policy learning of PHet-MAS. The framework of HHAC consists of a hybrid actor and a hybrid critic, both containing globally shared and locally shared modules. The locally shared modules can be customized according to the actual physical properties of agents, while the globally shared modules can help extract and utilize the common information among agents. In the hybrid critic, a behavioral intention module is designed to alleviate the environmental nonstationary issue caused by evolving heterogeneous policies. Finally, a hybrid network training method is developed to address challenges in sample construction and training stability of hybrid networks. As evidenced by experimental results, HHAC exhibits superior performance enhancements over baseline approaches and can facilitate PHet-MAS in learning sophisticated and instructive policies.
本文主要关注物理异构多智能体系统(PHet-MAS)的合作策略学习,其中智能体具有不同的观察空间、动作空间和局部状态转换。由于PHet-MAS中智能体策略的投入产出结构不同,很难采用参数共享技术来提高样本效率。此外,完全异构的策略设计阻碍了智能体利用同伴的训练经验,并增加了环境非平稳性的风险。为了解决上述问题,我们提出了混合异构行为者批评(HHAC)方法,这是一种用于PHet-MAS策略学习的方法。HHAC框架由一个混合参与者和一个混合评论家组成,两者都包含全局共享和局部共享的模块。局部共享模块可以根据agent的实际物理属性进行定制,全局共享模块可以提取和利用agent之间的公共信息。在混合批评中,设计了一个行为意图模块,以缓解异质性政策演变带来的环境非平稳性问题。最后,提出了一种混合网络训练方法,解决了混合网络在样本构造和训练稳定性方面存在的问题。正如实验结果所证明的那样,HHAC表现出比基线方法更好的性能增强,可以促进PHet-MAS学习复杂和有指导意义的政策。
{"title":"Hybrid Actor–Critic for Physically Heterogeneous Multiagent Reinforcement Learning","authors":"Tianyi Hu;Zhiqiang Pu;Xiaolin Ai;Tenghai Qiu;Yanyan Liang;Jianqiang Yi","doi":"10.1109/TCDS.2025.3570497","DOIUrl":"https://doi.org/10.1109/TCDS.2025.3570497","url":null,"abstract":"This article focuses on cooperative policy learning for physically heterogeneous multiagent system (PHet-MAS), where agents have different observation spaces, action spaces, and local state transitions. Due to the various input–output structures of agents’ policies in PHet-MAS, it is difficult to employ parameter sharing techniques for sample efficiency. Moreover, a totally heterogeneous policy design impedes agents from utilizing the training experience of their companions and increases the risk of environmental nonstationarity. To address the above issues, we propose <italic>hybrid heterogeneous actor–critic</i> (HHAC), a method for the policy learning of PHet-MAS. The framework of HHAC consists of a hybrid actor and a hybrid critic, both containing globally shared and locally shared modules. The locally shared modules can be customized according to the actual physical properties of agents, while the globally shared modules can help extract and utilize the common information among agents. In the hybrid critic, a behavioral intention module is designed to alleviate the environmental nonstationary issue caused by evolving heterogeneous policies. Finally, a hybrid network training method is developed to address challenges in sample construction and training stability of hybrid networks. As evidenced by experimental results, HHAC exhibits superior performance enhancements over baseline approaches and can facilitate PHet-MAS in learning sophisticated and instructive policies.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 6","pages":"1520-1535"},"PeriodicalIF":4.9,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event Based-Recognition 基于帧-事件识别的桥接尖峰神经网络和记忆支持转换器
IF 4.9 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-12 DOI: 10.1109/TCDS.2025.3568833
Xiao Wang;Yao Rong;Zongzhen Wu;Lin Zhu;Bo Jiang;Jin Tang;Yonghong Tian
Event camera-based pattern recognition is a newly arising research topic in recent years. Current researchers usually transform the event streams into images, graphs, or voxels, and adopt deep neural networks for event-based classification. Although good performance can be achieved on simple event recognition datasets, however, their results may still be limited due to the following two issues. First, they adopt spatial sparse event streams for recognition only, which may fail to capture the color and detailed texture information well. Second, they adopt either spiking neural networks (SNN) for energy-efficient recognition with suboptimal results, or artificial neural networks (ANN) for energy-intensive, high-performance recognition. However, few of them consider achieving a balance between these two aspects. In this article, we formally propose to recognize patterns by fusing RGB frames and event streams simultaneously and propose a new RGB frame-event recognition framework to address the aforementioned issues. The proposed method contains four main modules, i.e., memory support Transformer network for RGB frame encoding, spiking neural network for raw event stream encoding, multimodal bottleneck fusion module for RGB-Event feature aggregation, and prediction head. Due to the scarcity of RGB-Event based classification dataset, we also propose a large-scale PokerEvent dataset which contains 114 classes, and 27 102 frame-event pairs recorded using a DVS346 event camera. Extensive experiments on two RGB-event based classification datasets fully validated the effectiveness of our proposed framework. We hope this work will boost the development of pattern recognition by fusing RGB frames and event streams. Both our dataset and source code of this work will be released at https://github.com/Event-AHU/SSTFormer.
基于事件摄像机的模式识别是近年来新兴的研究课题。目前的研究人员通常将事件流转换为图像、图形或体素,并采用深度神经网络进行基于事件的分类。虽然在简单的事件识别数据集上可以取得良好的性能,但是由于以下两个问题,其结果仍然可能受到限制。首先,它们仅采用空间稀疏事件流进行识别,可能无法很好地捕获颜色和纹理细节信息。其次,他们要么采用峰值神经网络(SNN)进行节能识别,但结果不理想,要么采用人工神经网络(ANN)进行能源密集型、高性能识别。然而,很少有人考虑在这两个方面之间取得平衡。在本文中,我们正式提出通过同时融合RGB帧和事件流来识别模式,并提出了一个新的RGB帧-事件识别框架来解决上述问题。该方法包含四个主要模块,即内存支持的RGB帧编码的Transformer网络、原始事件流编码的spike神经网络、RGB- event特征聚合的多模态瓶颈融合模块和预测头。基于rgb事件分类数据集的稀缺性,我们还提出了一个大规模的PokerEvent数据集,该数据集包含114个类和27102个帧-事件对,使用DVS346事件相机记录。在两个基于rgb事件的分类数据集上的大量实验充分验证了我们提出的框架的有效性。我们希望这项工作能够通过融合RGB帧和事件流来促进模式识别的发展。我们的数据集和源代码都将在https://github.com/Event-AHU/SSTFormer上发布。
{"title":"SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event Based-Recognition","authors":"Xiao Wang;Yao Rong;Zongzhen Wu;Lin Zhu;Bo Jiang;Jin Tang;Yonghong Tian","doi":"10.1109/TCDS.2025.3568833","DOIUrl":"https://doi.org/10.1109/TCDS.2025.3568833","url":null,"abstract":"Event camera-based pattern recognition is a newly arising research topic in recent years. Current researchers usually transform the event streams into images, graphs, or voxels, and adopt deep neural networks for event-based classification. Although good performance can be achieved on simple event recognition datasets, however, their results may still be limited due to the following two issues. First, they adopt spatial sparse event streams for recognition only, which may fail to capture the color and detailed texture information well. Second, they adopt either spiking neural networks (SNN) for energy-efficient recognition with suboptimal results, or artificial neural networks (ANN) for energy-intensive, high-performance recognition. However, few of them consider achieving a balance between these two aspects. In this article, we formally propose to recognize patterns by fusing RGB frames and event streams simultaneously and propose a new RGB frame-event recognition framework to address the aforementioned issues. The proposed method contains four main modules, i.e., memory support Transformer network for RGB frame encoding, spiking neural network for raw event stream encoding, multimodal bottleneck fusion module for RGB-Event feature aggregation, and prediction head. Due to the scarcity of RGB-Event based classification dataset, we also propose a large-scale PokerEvent dataset which contains 114 classes, and 27 102 frame-event pairs recorded using a DVS346 event camera. Extensive experiments on two RGB-event based classification datasets fully validated the effectiveness of our proposed framework. We hope this work will boost the development of pattern recognition by fusing RGB frames and event streams. Both our dataset and source code of this work will be released at <uri>https://github.com/Event-AHU/SSTFormer</uri>.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 6","pages":"1488-1502"},"PeriodicalIF":4.9,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
H-GRAIL: A Robotic Motivational Architecture to Tackle Open-Ended Learning Challenges H-GRAIL:解决开放式学习挑战的机器人激励架构
IF 4.9 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-12 DOI: 10.1109/TCDS.2025.3569352
Alejandro Romero;Gianluca Baldassarre;Richard J. Duro;Vieri Giuliano Santucci
This article addresses the challenge of developing artificial agents capable of autonomously discovering interesting environmental states, setting them as goals, and learning the necessary skills and curricula to achieve these goals—an essential requirement for deploying robotic systems in real-world scenarios. In such environments, robots must adapt to unforeseen situations, learn new skills, and manage unexpected changes autonomously, which is central to open-ended learning (OEL). We present hierarchical goal-discovery robotic architecture for intrinsically-motivated learning (H-GRAIL) an architecture designed to foster autonomous OEL in robotic agents. The novelty of H-GRAIL compared to existing approaches, which often address isolated challenges in OEL, is that it integrates multiple mechanisms that enable robots to autonomously discover new goals, acquire skills, and manage learning processes in dynamic, nonstationary environments. We present tests that demonstrate the advantages of this approach in enabling robots to achieve different goals in nonstationary environments and simultaneously address many of the challenges inherent to OEL.
本文解决了开发人工代理的挑战,这些人工代理能够自主发现有趣的环境状态,将其设置为目标,并学习实现这些目标所需的技能和课程——这是在现实场景中部署机器人系统的基本要求。在这样的环境中,机器人必须适应不可预见的情况,学习新技能,并自主管理意外变化,这是开放式学习(OEL)的核心。我们提出了用于内在动机学习(H-GRAIL)的分层目标发现机器人架构,该架构旨在促进机器人代理中的自主OEL。与现有方法相比,H-GRAIL的新颖之处在于,它集成了多种机制,使机器人能够自主发现新目标,获得技能,并在动态、非平稳环境中管理学习过程。现有方法通常解决OEL中的孤立挑战。我们提出的测试证明了这种方法在使机器人在非固定环境中实现不同目标方面的优势,同时解决了OEL固有的许多挑战。
{"title":"H-GRAIL: A Robotic Motivational Architecture to Tackle Open-Ended Learning Challenges","authors":"Alejandro Romero;Gianluca Baldassarre;Richard J. Duro;Vieri Giuliano Santucci","doi":"10.1109/TCDS.2025.3569352","DOIUrl":"https://doi.org/10.1109/TCDS.2025.3569352","url":null,"abstract":"This article addresses the challenge of developing artificial agents capable of autonomously discovering interesting environmental states, setting them as goals, and learning the necessary skills and curricula to achieve these goals—an essential requirement for deploying robotic systems in real-world scenarios. In such environments, robots must adapt to unforeseen situations, learn new skills, and manage unexpected changes autonomously, which is central to open-ended learning (OEL). We present hierarchical goal-discovery robotic architecture for intrinsically-motivated learning (H-GRAIL) an architecture designed to foster autonomous OEL in robotic agents. The novelty of H-GRAIL compared to existing approaches, which often address isolated challenges in OEL, is that it integrates multiple mechanisms that enable robots to autonomously discover new goals, acquire skills, and manage learning processes in dynamic, nonstationary environments. We present tests that demonstrate the advantages of this approach in enabling robots to achieve different goals in nonstationary environments and simultaneously address many of the challenges inherent to OEL.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 6","pages":"1503-1519"},"PeriodicalIF":4.9,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11002551","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature and Semantic Matching Multisource Domain Adaptation for Diagnostic Classification of Neuropsychiatric Disorders 特征与语义匹配多源域自适应在神经精神疾病诊断分类中的应用
IF 4.9 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-06 DOI: 10.1109/TCDS.2025.3567521
Minghao Dai;Jianpo Su;Zhipeng Fan;Chenyu Wang;Limin Peng;Dewen Hu;Ling-Li Zeng
Multisite resting-state functional magnetic resonance imaging (rs-fMRI) data have been increasingly utilized for diagnostic classification of neuropsychiatric disorders such as schizophrenia (SCZ) and major depressive disorder (MDD). However, the cross-site generalization ability of deep networks is limited due to the significant intersite data heterogeneity caused by different MRI scanners or scanning protocols. To address this issue, we propose a feature and semantic matching multisource domain adaptation method (FSM-MSDA) to learn site-invariant disorder-related feature representations. In FSM-MSDA, we adopt separate feature extractors for multiple source domains, and propose an accurate feature matching module to align the category-level feature distributions across multiple source domains and target domain. In addition, we also propose a semantic feature alignment module to eliminate the distribution discrepancy in high-level semantic features extracted from target samples by different source classifiers. Extensive experiments based on multisite fMRI data of SCZ and MDD show the superiority and robustness of FSM-MSDA compared with state-of-the-art methods. Besides, FSM-MSDA achieves the average accuracy of 80.8% in the classification of SCZ, meeting the clinically diagnostic accuracy threshold of 80%. Shared discriminative brain regions including the middle temporal gyrus and the cerebellum regions are identified in the diagnostic classification of SCZ and MDD.
多位点静息状态功能磁共振成像(rs-fMRI)数据越来越多地用于神经精神疾病的诊断分类,如精神分裂症(SCZ)和重度抑郁症(MDD)。然而,由于不同的MRI扫描仪或扫描协议导致的显著的站点间数据异质性,深度网络的跨站点泛化能力受到限制。为了解决这一问题,我们提出了一种特征和语义匹配的多源域自适应方法(FSM-MSDA)来学习位点不变紊乱相关的特征表示。在FSM-MSDA中,我们对多个源域采用了单独的特征提取器,并提出了一个精确的特征匹配模块来对齐多个源域和目标域的类别级特征分布。此外,我们还提出了语义特征对齐模块,以消除不同源分类器从目标样本中提取的高级语义特征的分布差异。基于SCZ和MDD多位点fMRI数据的大量实验表明,FSM-MSDA与现有方法相比具有优越性和鲁棒性。FSM-MSDA对SCZ的分类平均准确率为80.8%,达到80%的临床诊断准确率阈值。在SCZ和MDD的诊断分类中,确定了包括颞中回和小脑在内的共同判别脑区。
{"title":"Feature and Semantic Matching Multisource Domain Adaptation for Diagnostic Classification of Neuropsychiatric Disorders","authors":"Minghao Dai;Jianpo Su;Zhipeng Fan;Chenyu Wang;Limin Peng;Dewen Hu;Ling-Li Zeng","doi":"10.1109/TCDS.2025.3567521","DOIUrl":"https://doi.org/10.1109/TCDS.2025.3567521","url":null,"abstract":"Multisite resting-state functional magnetic resonance imaging (rs-fMRI) data have been increasingly utilized for diagnostic classification of neuropsychiatric disorders such as schizophrenia (SCZ) and major depressive disorder (MDD). However, the cross-site generalization ability of deep networks is limited due to the significant intersite data heterogeneity caused by different MRI scanners or scanning protocols. To address this issue, we propose a feature and semantic matching multisource domain adaptation method (FSM-MSDA) to learn site-invariant disorder-related feature representations. In FSM-MSDA, we adopt separate feature extractors for multiple source domains, and propose an accurate feature matching module to align the category-level feature distributions across multiple source domains and target domain. In addition, we also propose a semantic feature alignment module to eliminate the distribution discrepancy in high-level semantic features extracted from target samples by different source classifiers. Extensive experiments based on multisite fMRI data of SCZ and MDD show the superiority and robustness of FSM-MSDA compared with state-of-the-art methods. Besides, FSM-MSDA achieves the average accuracy of 80.8% in the classification of SCZ, meeting the clinically diagnostic accuracy threshold of 80%. Shared discriminative brain regions including the middle temporal gyrus and the cerebellum regions are identified in the diagnostic classification of SCZ and MDD.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 6","pages":"1474-1487"},"PeriodicalIF":4.9,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Cognitive and Developmental Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1