首页 > 最新文献

IEEE transactions on artificial intelligence最新文献

英文 中文
IEEE Transactions on Artificial Intelligence Publication Information IEEE人工智能学报
Pub Date : 2025-07-31 DOI: 10.1109/TAI.2025.3590995
{"title":"IEEE Transactions on Artificial Intelligence Publication Information","authors":"","doi":"10.1109/TAI.2025.3590995","DOIUrl":"https://doi.org/10.1109/TAI.2025.3590995","url":null,"abstract":"","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 8","pages":"C2-C2"},"PeriodicalIF":0.0,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11106308","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144751092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collective Performance Induced by Social and Individual Learning in Any Population Structure: An Evolutionary Game Approach 任何群体结构中由社会和个人学习诱导的集体表现:一种进化博弈方法
Pub Date : 2025-07-25 DOI: 10.1109/TAI.2025.3592636
Zhifang Li;Jingwei Zhang;Xiaojie Chen;Attila Szolnoki
Collective decision-making is vital and widespread in human and artificial societies. Individuals often choose the option by assessing the intrinsic values of options in decision-making through individual learning. But they are also influenced by peer pressure and select the option by conformity-based social learning. A central question is whether the population can settle on the most beneficial option when social learning is involved. Previous studies concerning social learning focused on well-mixed populations where individuals are equally likely to interact with each other. But real social interactions are often more subtle that are modeled by a graph. Therefore, it is challenging to theoretically analyze the effect of social learning on collective decision-making in structured populations. To address this issue, using evolutionary game theory we propose an evolutionary model of binary options jointly integrating individual and social learning in any population structure. We first derive the average fraction of the option with higher merit by means of coalescing random walks and find that the introduction of conformity-based social learning is detrimental to collective performance of decision-making. Interestingly, however, our theoretical analysis reveals that the majority of the population always favors the option with higher merit regardless of the preference of social learning. Importantly, these theoretical predictions are valid for any population structure and they are verified by intensive numerical simulations made in three representative static interaction structures. We further show that they hold in dynamic networks via computer simulations. We also demonstrate the robustness of our findings to different conformity-based social learning procedures.
在人类和人工社会中,集体决策是至关重要和普遍的。个体在决策过程中往往通过个体学习来评估选择的内在价值。但他们也受到同伴压力的影响,并通过基于从众的社会学习做出选择。一个核心问题是,当涉及到社会学习时,人们是否能做出最有益的选择。先前关于社会学习的研究主要集中在混合良好的人群中,在这些人群中,个体之间的互动是均等的。但真实的社会互动往往比用图表来建模要微妙得多。因此,从理论上分析社会学习对结构化群体集体决策的影响具有挑战性。为了解决这一问题,我们利用进化博弈论提出了一个二元期权的进化模型,在任何种群结构中共同整合个人和社会学习。我们首先通过合并随机漫步的方法推导出具有较高价值的选项的平均分数,并发现引入基于从众的社会学习对决策的集体绩效是有害的。然而,有趣的是,我们的理论分析表明,无论社会学习的偏好如何,大多数人总是倾向于价值更高的选择。重要的是,这些理论预测对任何种群结构都是有效的,并且在三个具有代表性的静态相互作用结构中进行了大量的数值模拟。我们通过计算机模拟进一步证明了它们在动态网络中是成立的。我们还证明了我们的研究结果对不同的基于从众的社会学习程序的稳健性。
{"title":"Collective Performance Induced by Social and Individual Learning in Any Population Structure: An Evolutionary Game Approach","authors":"Zhifang Li;Jingwei Zhang;Xiaojie Chen;Attila Szolnoki","doi":"10.1109/TAI.2025.3592636","DOIUrl":"https://doi.org/10.1109/TAI.2025.3592636","url":null,"abstract":"Collective decision-making is vital and widespread in human and artificial societies. Individuals often choose the option by assessing the intrinsic values of options in decision-making through individual learning. But they are also influenced by peer pressure and select the option by conformity-based social learning. A central question is whether the population can settle on the most beneficial option when social learning is involved. Previous studies concerning social learning focused on well-mixed populations where individuals are equally likely to interact with each other. But real social interactions are often more subtle that are modeled by a graph. Therefore, it is challenging to theoretically analyze the effect of social learning on collective decision-making in structured populations. To address this issue, using evolutionary game theory we propose an evolutionary model of binary options jointly integrating individual and social learning in any population structure. We first derive the average fraction of the option with higher merit by means of coalescing random walks and find that the introduction of conformity-based social learning is detrimental to collective performance of decision-making. Interestingly, however, our theoretical analysis reveals that the majority of the population always favors the option with higher merit regardless of the preference of social learning. Importantly, these theoretical predictions are valid for any population structure and they are verified by intensive numerical simulations made in three representative static interaction structures. We further show that they hold in dynamic networks via computer simulations. We also demonstrate the robustness of our findings to different conformity-based social learning procedures.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 2","pages":"1143-1157"},"PeriodicalIF":0.0,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146176027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SleepLog: Local-Global Deep Fusion Learning for Sleep Staging Transformer 睡眠分级变压器的局部-全局深度融合学习
Pub Date : 2025-07-25 DOI: 10.1109/TAI.2025.3591587
Jingpeng Sun;Chen Chen;Weiping Ding;Xiyuan Hu
Sleep disorders affect a significant portion of the global population and contribute to increased overall mortality. Automatic sleep staging through analyzing physiological signals is pivotal in expanding sleep assessment and diagnostic capabilities. However, due to the complex nonstationary characteristics of physiological signals and the individual differences between subjects, obtaining the effective features of the physiological signals is still challenging. To this end, we propose a novel Transformer-based sleep staging method, SleepLog, to combine local and global information for feature extraction. First, a convolutional neural network (CNN)-based module was used to extract the local information to capture the features of sleep characteristic wave events. Then, we extract the global information that reflects the transformation between different characteristic waves using a self-attention-based patch encoder module. Furthermore, the local and global information was fed to the Transformer encoder module to enable the class (CLS) token of each branch to extract supplementary information from the associated features. Finally, we propose a simple yet effective cross-attention-based feature fusion module, which uses a single class token for each branch as a query to exchange information with other branches. The proposed cross-attention only requires linear time for both computational and memory complexity. To validate the performance of the proposed method, we evaluate SleepLog on a publicly available dataset Sleep-EDF. The experimental results show that the proposed model can maintain superior performance, indicating that it has the potential to develop and apply a home-environment automatic sleep staging system.
睡眠障碍影响着全球很大一部分人口,并导致总体死亡率上升。通过分析生理信号进行自动睡眠分期是扩展睡眠评估和诊断能力的关键。然而,由于生理信号复杂的非平稳特征和被试之间的个体差异,获取生理信号的有效特征仍然是一个挑战。为此,我们提出了一种新的基于transformer的睡眠分期方法,即sleepplog,将局部和全局信息结合起来进行特征提取。首先,利用卷积神经网络(CNN)模块提取局部信息,捕捉睡眠特征波事件的特征;然后,我们使用基于自注意的贴片编码器模块提取反映不同特征波之间转换的全局信息。此外,将本地和全局信息馈送到Transformer编码器模块,以使每个分支的类(CLS)令牌能够从关联的特性中提取补充信息。最后,我们提出了一个简单而有效的基于交叉注意的特征融合模块,该模块使用每个分支的单个类令牌作为查询来与其他分支交换信息。所提出的交叉注意只需要线性时间来计算和记忆复杂性。为了验证所提出方法的性能,我们在公开可用的数据集Sleep-EDF上评估了sleepplog。实验结果表明,该模型能够保持较好的性能,具有开发和应用家庭环境自动睡眠分期系统的潜力。
{"title":"SleepLog: Local-Global Deep Fusion Learning for Sleep Staging Transformer","authors":"Jingpeng Sun;Chen Chen;Weiping Ding;Xiyuan Hu","doi":"10.1109/TAI.2025.3591587","DOIUrl":"https://doi.org/10.1109/TAI.2025.3591587","url":null,"abstract":"Sleep disorders affect a significant portion of the global population and contribute to increased overall mortality. Automatic sleep staging through analyzing physiological signals is pivotal in expanding sleep assessment and diagnostic capabilities. However, due to the complex nonstationary characteristics of physiological signals and the individual differences between subjects, obtaining the effective features of the physiological signals is still challenging. To this end, we propose a novel Transformer-based sleep staging method, SleepLog, to combine local and global information for feature extraction. First, a convolutional neural network (CNN)-based module was used to extract the local information to capture the features of sleep characteristic wave events. Then, we extract the global information that reflects the transformation between different characteristic waves using a self-attention-based patch encoder module. Furthermore, the local and global information was fed to the Transformer encoder module to enable the class (CLS) token of each branch to extract supplementary information from the associated features. Finally, we propose a simple yet effective cross-attention-based feature fusion module, which uses a single class token for each branch as a query to exchange information with other branches. The proposed cross-attention only requires linear time for both computational and memory complexity. To validate the performance of the proposed method, we evaluate SleepLog on a publicly available dataset Sleep-EDF. The experimental results show that the proposed model can maintain superior performance, indicating that it has the potential to develop and apply a home-environment automatic sleep staging system.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 2","pages":"1084-1096"},"PeriodicalIF":0.0,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146176014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learn Once Plan Arbitrarily (LOPA): Dynamic Observation-Based Deep Reinforcement Learning Method for Global Path Planning in Mountainous Terrain Environment LOPA:基于动态观测的山地地形环境全局路径规划深度强化学习方法
Pub Date : 2025-07-25 DOI: 10.1109/TAI.2025.3592648
Shuqiao Huang;Mingxin Hou;Xiaofang Yuan;Xiru Wu;Yaonan Wang;Guoming Huang
Deep reinforcement learning (DRL) methods have recently shown promise in path planning tasks. However, when dealing with global planning tasks in mountainous terrain (2.5D) environment, these methods face serious challenges such as poor convergence and generalization. To this end, we propose learn once plan arbitrarily (LOPA), an enhanced DRL method that learns on a single map yet generalizes to topographically similar terrains. Consequently, it enables path planning across multiple mountainous terrain maps while balancing path distance and energy consumption. First, we analyze the reasons for convergence and generalization problems from the perspective of DRL’s observation, revealing that the conventional design causes DRL to be interfered with irrelevant map information. Second, we develop the LOPA, which utilizes a novel dynamic observation mechanism to attain an improved capability in focusing on key information of the observation. Such a mechanism is realized by two steps: 1) a dynamic observation model is built to transform the DRL’s observation into two dynamic views: local and global, significantly guiding the LOPA to focus on the key information of the given maps; and 2) a dual-channel network is constructed to process these two views and integrate them to attain an improved reasoning capability. Meanwhile, through Rademacher Complexity analysis, we provide theoretical justification for LOPA’s improved generalization capability, demonstrating a lower upper bound on the generalization error. The LOPA is validated through multiobjective global path planning experiments conducted on both simulated and real maps. The results suggest that LOPA has improved convergence and generalization performance, as well as great planning efficiency.
深度强化学习(DRL)方法最近在路径规划任务中显示出前景。然而,当处理山地地形(2.5D)环境下的全局规划任务时,这些方法面临着收敛性和泛化性差等严峻挑战。为此,我们提出了一种增强的DRL方法LOPA,它可以在单个地图上学习,但可以推广到地形相似的地形。因此,它可以在平衡路径距离和能量消耗的同时,在多个山区地形地图上进行路径规划。首先,我们从DRL观测的角度分析了出现收敛和泛化问题的原因,揭示了传统设计导致DRL受到无关地图信息的干扰。其次,我们开发了LOPA,它利用了一种新的动态观测机制,提高了对观测关键信息的关注能力。该机制通过两步实现:1)建立动态观测模型,将DRL的观测结果转化为局部和全局两个动态视图,显著引导LOPA关注给定地图的关键信息;2)构建双通道网络来处理和整合这两种观点,以提高推理能力。同时,通过Rademacher复杂度分析,为LOPA提高泛化能力提供了理论依据,给出了LOPA泛化误差的下界。通过在模拟地图和真实地图上进行多目标全局路径规划实验,验证了LOPA的有效性。结果表明,LOPA具有较好的收敛和泛化性能,规划效率较高。
{"title":"Learn Once Plan Arbitrarily (LOPA): Dynamic Observation-Based Deep Reinforcement Learning Method for Global Path Planning in Mountainous Terrain Environment","authors":"Shuqiao Huang;Mingxin Hou;Xiaofang Yuan;Xiru Wu;Yaonan Wang;Guoming Huang","doi":"10.1109/TAI.2025.3592648","DOIUrl":"https://doi.org/10.1109/TAI.2025.3592648","url":null,"abstract":"Deep reinforcement learning (DRL) methods have recently shown promise in path planning tasks. However, when dealing with global planning tasks in mountainous terrain (2.5D) environment, these methods face serious challenges such as poor convergence and generalization. To this end, we propose learn once plan arbitrarily (LOPA), an enhanced DRL method that learns on a single map yet generalizes to topographically similar terrains. Consequently, it enables path planning across multiple mountainous terrain maps while balancing path distance and energy consumption. First, we analyze the reasons for convergence and generalization problems from the perspective of DRL’s observation, revealing that the conventional design causes DRL to be interfered with irrelevant map information. Second, we develop the LOPA, which utilizes a novel dynamic observation mechanism to attain an improved capability in focusing on key information of the observation. Such a mechanism is realized by two steps: 1) a dynamic observation model is built to transform the DRL’s observation into two dynamic views: local and global, significantly guiding the LOPA to focus on the key information of the given maps; and 2) a dual-channel network is constructed to process these two views and integrate them to attain an improved reasoning capability. Meanwhile, through Rademacher Complexity analysis, we provide theoretical justification for LOPA’s improved generalization capability, demonstrating a lower upper bound on the generalization error. The LOPA is validated through multiobjective global path planning experiments conducted on both simulated and real maps. The results suggest that LOPA has improved convergence and generalization performance, as well as great planning efficiency.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 2","pages":"1168-1184"},"PeriodicalIF":0.0,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146176003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AR2FL: Anomaly-Resistant Robust Framework for Federated Learning AR2FL:抗异常鲁棒联邦学习框架
Pub Date : 2025-07-25 DOI: 10.1109/TAI.2025.3592635
Mayank Kumar Kundalwal;Deepak Mishra
Federated learning (FL) enables collaborative model training across decentralized data sources while preserving privacy. However, FL systems are vulnerable to attacks from malicious clients that can degrade model performance and compromise integrity. In this work, we propose anomaly-resistant robust framework for federated learning (AR2FL), an anomaly-resistant and robust framework that enhances FL aggregation by leveraging mean latent representations of client updates. This data-driven approach enables the server to estimate interclient similarity and dynamically scale clients contributions, reducing the influence of anomalous or adversarial updates. Unlike methods based on fixed distance metrics such as cosine similarity or Euclidean distance, AR2FL captures deeper statistical patterns in the latent space, enabling more accurate and secure model updates. Experiments on several datasets show AR2FL maintains strong accuracy, fast convergence, and high robustness, making it suitable for secure large-scale FL.
联邦学习(FL)支持跨分散数据源的协作模型训练,同时保护隐私。然而,FL系统容易受到来自恶意客户端的攻击,从而降低模型性能并损害完整性。在这项工作中,我们提出了用于联邦学习的抗异常鲁棒框架(AR2FL),这是一个抗异常和鲁棒框架,通过利用客户端更新的平均潜在表示来增强FL聚合。这种数据驱动的方法使服务器能够估计客户端之间的相似性并动态扩展客户端贡献,从而减少异常或对抗性更新的影响。与基于余弦相似度或欧几里得距离等固定距离度量的方法不同,AR2FL在潜在空间中捕获更深层次的统计模式,从而实现更准确和安全的模型更新。在多个数据集上的实验表明,AR2FL具有较强的精度、较快的收敛速度和较高的鲁棒性,适合于安全的大规模FL。
{"title":"AR2FL: Anomaly-Resistant Robust Framework for Federated Learning","authors":"Mayank Kumar Kundalwal;Deepak Mishra","doi":"10.1109/TAI.2025.3592635","DOIUrl":"https://doi.org/10.1109/TAI.2025.3592635","url":null,"abstract":"Federated learning (FL) enables collaborative model training across decentralized data sources while preserving privacy. However, FL systems are vulnerable to attacks from malicious clients that can degrade model performance and compromise integrity. In this work, we propose anomaly-resistant robust framework for federated learning (AR2FL), an anomaly-resistant and robust framework that enhances FL aggregation by leveraging mean latent representations of client updates. This data-driven approach enables the server to estimate interclient similarity and dynamically scale clients contributions, reducing the influence of anomalous or adversarial updates. Unlike methods based on fixed distance metrics such as cosine similarity or Euclidean distance, AR2FL captures deeper statistical patterns in the latent space, enabling more accurate and secure model updates. Experiments on several datasets show AR2FL maintains strong accuracy, fast convergence, and high robustness, making it suitable for secure large-scale FL.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 2","pages":"1131-1142"},"PeriodicalIF":0.0,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146176020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balanced Sampling and Reusing Imaginary Data for World Models in Reinforcement Learning 强化学习中世界模型的平衡采样和虚数据复用
Pub Date : 2025-07-24 DOI: 10.1109/TAI.2025.3592174
Qianyu Wang;Xuekai Wei;Jielu Yan;Leong Hou U;Huayan Pu;Jun Luo;Weijia Jia;Mingliang Zhou
Deep reinforcement learning (DRL) has shown significant success in domains such as computer vision and robot control. However, DRL agents often suffer from low sample efficiency, limiting their practical applicability in industrial settings. Recent advances in model-based DRL, particularly model-based approaches, have sought to address this issue by leveraging imaginary data to improve decision-making and sampling efficiency. Despite their promise, these methods face challenges such as overreliance on early experiences in the replay buffer and under-utilization of imaginary data, which can lead to overfitting and suboptimal policy optimization. To overcome these limitations, we propose a novel reinforcement learning framework, balanced sampling and reusing imaginary data (BSRID), which introduces two key innovations: 1) a BS mechanism that ensures uniform sampling rates to mitigate bias toward early experiences; and 2) a RID strategy that enhances policy optimization by increasing update frequency and maximizing the utility of imaginary data. The experimental results on the Atari 100k benchmark demonstrate that BSRID significantly improves sample efficiency and achieves state-of-the-art (SOTA) performance. This work provides a robust and efficient solution for DRL applications in scenarios requiring high sample efficiency and reliable decision making.
深度强化学习(DRL)在计算机视觉和机器人控制等领域取得了重大成功。然而,DRL试剂通常存在样品效率低的问题,限制了它们在工业环境中的实际适用性。基于模型的DRL的最新进展,特别是基于模型的方法,试图通过利用虚构数据来提高决策和采样效率来解决这个问题。尽管这些方法很有前途,但它们面临着一些挑战,比如过度依赖重放缓冲区中的早期经验和虚拟数据的利用不足,这可能导致过拟合和次优策略优化。为了克服这些限制,我们提出了一个新的强化学习框架,平衡采样和重用虚构数据(BSRID),它引入了两个关键创新:1)一个BS机制,确保统一的采样率,以减轻对早期经验的偏见;2)通过增加虚拟数据的更新频率和最大化虚拟数据的效用来增强策略优化的RID策略。在Atari 100k基准测试上的实验结果表明,BSRID显着提高了样本效率并达到了最先进的(SOTA)性能。这项工作为需要高样本效率和可靠决策的场景中的DRL应用提供了一个强大而高效的解决方案。
{"title":"Balanced Sampling and Reusing Imaginary Data for World Models in Reinforcement Learning","authors":"Qianyu Wang;Xuekai Wei;Jielu Yan;Leong Hou U;Huayan Pu;Jun Luo;Weijia Jia;Mingliang Zhou","doi":"10.1109/TAI.2025.3592174","DOIUrl":"https://doi.org/10.1109/TAI.2025.3592174","url":null,"abstract":"Deep reinforcement learning (DRL) has shown significant success in domains such as computer vision and robot control. However, DRL agents often suffer from low sample efficiency, limiting their practical applicability in industrial settings. Recent advances in model-based DRL, particularly model-based approaches, have sought to address this issue by leveraging imaginary data to improve decision-making and sampling efficiency. Despite their promise, these methods face challenges such as overreliance on early experiences in the replay buffer and under-utilization of imaginary data, which can lead to overfitting and suboptimal policy optimization. To overcome these limitations, we propose a novel reinforcement learning framework, balanced sampling and reusing imaginary data (BSRID), which introduces two key innovations: 1) a BS mechanism that ensures uniform sampling rates to mitigate bias toward early experiences; and 2) a RID strategy that enhances policy optimization by increasing update frequency and maximizing the utility of imaginary data. The experimental results on the Atari 100k benchmark demonstrate that BSRID significantly improves sample efficiency and achieves state-of-the-art (SOTA) performance. This work provides a robust and efficient solution for DRL applications in scenarios requiring high sample efficiency and reliable decision making.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 2","pages":"1118-1130"},"PeriodicalIF":0.0,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing ADHD Detection: An Autoencoder Approach for Multimodal Classification 优化ADHD检测:一种多模态分类的自编码器方法
Pub Date : 2025-07-24 DOI: 10.1109/TAI.2025.3592157
Christian Nash;Rajesh Nair;Syed Mohsen Naqvi
Attention deficit hyperactivity disorder (ADHD) is commonly found in children, with the prevalence in adults said to be under-reported. In this article, we aim to detect adult ADHD symptoms using two autoencoder architectures. We train and test on the novel multimodal ADHD dataset recorded under the Intelligent Sensing ADHD Trial in collaboration with the Cumbria, Northumberland, Tyne and Wear NHS Foundation Trust, U.K. The autoencoder architectures perform an image reconstruction task to optimize the latent bottleneck feature space to perform downstream classification tasks to detect ADHD subjects or control participants. The RGB video data is specifically exploited to inform the autoencoders about the hyperactivity symptoms. The Audio data is used to further support hyperactivity symptoms while also hoping to gain scope on inattentive symptoms. The self report questionnaire is a subjective measure, where the individual can provide details of ADHD symptoms that they experience. It is a vital data source to include in the proposed work for providing the autoencoders with previously unidentifiable symptoms. An ablation study is undertaken to demonstrate the effectiveness of the individual data modality, attempting to distinguish the associated discriminatory power. Using rigorous validation techniques, we achieve a state-of-the-art classification accuracy, sensitivity, and specificity of 98.9%, 99.2%, and 98.5%, respectively. With ADHD classification being a preliminary subjective decision, the proposed work demonstrates that an objective system can provide robust support to ADHD clinicians in the future.
注意缺陷多动障碍(ADHD)常见于儿童,而成人的患病率据说被低估了。在本文中,我们的目标是使用两种自编码器架构来检测成人ADHD症状。我们与英国坎布里亚郡、诺森伯兰郡、泰恩和威尔郡NHS基金会信托基金会合作,在智能感知ADHD试验下记录的新型多模态ADHD数据集上进行训练和测试。自动编码器架构执行图像重建任务以优化潜在瓶颈特征空间,以执行下游分类任务以检测ADHD受试者或控制参与者。RGB视频数据被专门用来通知自编码器有关多动症的症状。音频数据用于进一步支持多动症状,同时也希望获得注意力不集中症状的范围。自我报告问卷是一种主观测量,个人可以提供他们所经历的ADHD症状的细节。在建议的工作中,它是一个重要的数据源,用于向自动编码器提供以前无法识别的症状。一项消融研究是为了证明个人数据模式的有效性,试图区分相关的歧视权力。使用严格的验证技术,我们实现了最先进的分类准确率,灵敏度和特异性分别为98.9%,99.2%和98.5%。由于ADHD的分类是一个初步的主观决定,因此提出的工作表明,一个客观的系统可以为未来的ADHD临床医生提供强有力的支持。
{"title":"Optimizing ADHD Detection: An Autoencoder Approach for Multimodal Classification","authors":"Christian Nash;Rajesh Nair;Syed Mohsen Naqvi","doi":"10.1109/TAI.2025.3592157","DOIUrl":"https://doi.org/10.1109/TAI.2025.3592157","url":null,"abstract":"Attention deficit hyperactivity disorder (ADHD) is commonly found in children, with the prevalence in adults said to be under-reported. In this article, we aim to detect adult ADHD symptoms using two autoencoder architectures. We train and test on the novel multimodal ADHD dataset recorded under the Intelligent Sensing ADHD Trial in collaboration with the Cumbria, Northumberland, Tyne and Wear NHS Foundation Trust, U.K. The autoencoder architectures perform an image reconstruction task to optimize the latent bottleneck feature space to perform downstream classification tasks to detect ADHD subjects or control participants. The RGB video data is specifically exploited to inform the autoencoders about the hyperactivity symptoms. The Audio data is used to further support hyperactivity symptoms while also hoping to gain scope on inattentive symptoms. The self report questionnaire is a subjective measure, where the individual can provide details of ADHD symptoms that they experience. It is a vital data source to include in the proposed work for providing the autoencoders with previously unidentifiable symptoms. An ablation study is undertaken to demonstrate the effectiveness of the individual data modality, attempting to distinguish the associated discriminatory power. Using rigorous validation techniques, we achieve a state-of-the-art classification accuracy, sensitivity, and specificity of 98.9%, 99.2%, and 98.5%, respectively. With ADHD classification being a preliminary subjective decision, the proposed work demonstrates that an objective system can provide robust support to ADHD clinicians in the future.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 2","pages":"1107-1117"},"PeriodicalIF":0.0,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hybrid Clinical Knowledge-Driven Transformer for Breast Ultrasound Video Classification 一种用于乳腺超声视频分类的混合临床知识驱动转换器
Pub Date : 2025-07-22 DOI: 10.1109/TAI.2025.3591580
Min Liu;Zhao Yao;Mutian Li;Chenqian Zhao;Jiale Xu;Jinhua Yu
Early diagnosis of breast cancer is critical for reducing mortality rates. Dynamic ultrasound videos contain rich tumor-specific features, offering valuable information for clinical diagnosis. In standard clinical practice, sonographers typically first identify keyframes before scanning the surrounding area of it to gather more information. Previous research based on ultrasound videos has been devoted to temporal modeling while neglecting the contribution of keyframes to tumor diagnosis. In this article, we propose a two-stage hybrid network, hybrid keyframe-guided video transformer (HKVT), to model both static keyframe and dynamic video information in breast ultrasound videos. In the first stage, the model uses a multiinstance learning paradigm to construct an efficient video classification model that automatically identifies keyframes using self-attention scores. In the second stage, the embedding tokens of the keyframe are extracted, and a keyframe-guided transformer block is constructed for ultrasound video classification. Specifically, we designed a keyframe-guided temporal attention module and a keyframe-guided spatial coattention module to incorporate static keyframe features alongside dynamic video features. We evaluated the proposed model on an internal dataset of 342 patients and an external test dataset of 119 patients. The HKVT model achieved an area under the curve (AUC) of 0.921 on the internal dataset and 0.901 on the external test dataset, outperforming other state-of-the-art models. Furthermore, our model demonstrated robust performance on 242 multicenter test cases, outperforming other models by at least 2.1% in AUC. These results demonstrate the superiority of our approach for breast ultrasound video classification.
乳腺癌的早期诊断对于降低死亡率至关重要。动态超声影像包含丰富的肿瘤特异性特征,为临床诊断提供有价值的信息。在标准的临床实践中,超声医师通常首先识别关键帧,然后扫描周围区域以收集更多信息。以往基于超声视频的研究主要集中在时间建模上,而忽略了关键帧对肿瘤诊断的贡献。在本文中,我们提出了一种两级混合网络,混合关键帧引导视频变压器(HKVT),以模拟乳房超声视频中的静态关键帧和动态视频信息。在第一阶段,该模型使用多实例学习范式构建一个高效的视频分类模型,该模型使用自注意分数自动识别关键帧。第二阶段提取关键帧的嵌入令牌,构建关键帧导向的超声视频分类变压器块;具体来说,我们设计了一个关键帧引导的时间注意模块和一个关键帧引导的空间共同注意模块,以结合静态关键帧特征和动态视频特征。我们在包含342名患者的内部数据集和包含119名患者的外部测试数据集上评估了所提出的模型。HKVT模型在内部数据集和外部测试数据集上的曲线下面积(AUC)分别为0.921和0.901,优于其他最先进的模型。此外,我们的模型在242个多中心测试用例上表现出了强大的性能,在AUC上比其他模型至少高出2.1%。这些结果证明了我们的方法在乳腺超声视频分类中的优越性。
{"title":"A Hybrid Clinical Knowledge-Driven Transformer for Breast Ultrasound Video Classification","authors":"Min Liu;Zhao Yao;Mutian Li;Chenqian Zhao;Jiale Xu;Jinhua Yu","doi":"10.1109/TAI.2025.3591580","DOIUrl":"https://doi.org/10.1109/TAI.2025.3591580","url":null,"abstract":"Early diagnosis of breast cancer is critical for reducing mortality rates. Dynamic ultrasound videos contain rich tumor-specific features, offering valuable information for clinical diagnosis. In standard clinical practice, sonographers typically first identify keyframes before scanning the surrounding area of it to gather more information. Previous research based on ultrasound videos has been devoted to temporal modeling while neglecting the contribution of keyframes to tumor diagnosis. In this article, we propose a two-stage hybrid network, hybrid keyframe-guided video transformer (HKVT), to model both static keyframe and dynamic video information in breast ultrasound videos. In the first stage, the model uses a multiinstance learning paradigm to construct an efficient video classification model that automatically identifies keyframes using self-attention scores. In the second stage, the embedding tokens of the keyframe are extracted, and a keyframe-guided transformer block is constructed for ultrasound video classification. Specifically, we designed a keyframe-guided temporal attention module and a keyframe-guided spatial coattention module to incorporate static keyframe features alongside dynamic video features. We evaluated the proposed model on an internal dataset of 342 patients and an external test dataset of 119 patients. The HKVT model achieved an area under the curve (AUC) of 0.921 on the internal dataset and 0.901 on the external test dataset, outperforming other state-of-the-art models. Furthermore, our model demonstrated robust performance on 242 multicenter test cases, outperforming other models by at least 2.1% in AUC. These results demonstrate the superiority of our approach for breast ultrasound video classification.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 2","pages":"1062-1072"},"PeriodicalIF":0.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146176018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge Distillation for an Ensemble of Students From a Pyramid of Teachers With Diverse Perspective 从不同视角的教师金字塔看集体学生的知识提炼
Pub Date : 2025-07-22 DOI: 10.1109/TAI.2025.3591588
Shilajit Banerjee;Angshuman Paul
Knowledge distillation (KD) can be used for enhancing the performance of a lightweight student models with the help of knowledge from heavier teacher models. Most KD methods for classification use a one-teacher one-student architecture where only one teacher is responsible for transferring knowledge to a student for all the classes. However, when the number of classes increases, it may become difficult for a single teacher to learn the salient characteristics of all the classes. This may also adversely affect the performance of a student in a KD approach. In this article, we present a novel KD method where an ensemble of lightweight students is trained by a pyramid of teachers. At the top level of the pyramid, we have one teacher who learns all the class labels under consideration. As we go down the pyramid, the number of teachers increases at each level. However, except for the top level, each teacher learns a smaller subset of classes compared with its upper levels. Hence, different teachers learn different perspectives of the classification problem. In addition, as we move down the pyramid, the teachers become more and more specialized. On the contrary, as we move upward, the teachers learn a broader and broader perspective about the classification problem. We design a novel distillation loss to distill the knowledge between the student and the pyramid of teachers. Experimental results on publicly available datasets show the effectiveness of the proposed method. The code can be found at https://github.com/Shilajit77/Pyramid-Distill/tree/main.
知识蒸馏(Knowledge distillation, KD)可以利用较重的教师模型的知识来增强轻量级学生模型的性能。大多数用于分类的KD方法使用一名教师一名学生的架构,其中只有一名教师负责将所有课程的知识传递给学生。然而,当班级数量增加时,单个教师可能很难掌握所有班级的显著特征。这也可能对学生在KD方法中的表现产生不利影响。在本文中,我们提出了一种新颖的KD方法,其中由教师金字塔训练轻量级学生的集合。在金字塔的顶层,我们有一位老师负责学习所有的班级标签。当我们沿着金字塔往下走的时候,每一层的教师数量都在增加。然而,除了最高水平,每个老师学习的课程比其上层更小。因此,不同的老师对分类问题有不同的看法。此外,随着金字塔的向下移动,教师变得越来越专业化。相反,当我们向上移动时,教师对分类问题的看法就会越来越广泛。我们设计了一种新的蒸馏损失来提取学生和教师金字塔之间的知识。在公开数据集上的实验结果表明了该方法的有效性。代码可以在https://github.com/Shilajit77/Pyramid-Distill/tree/main上找到。
{"title":"Knowledge Distillation for an Ensemble of Students From a Pyramid of Teachers With Diverse Perspective","authors":"Shilajit Banerjee;Angshuman Paul","doi":"10.1109/TAI.2025.3591588","DOIUrl":"https://doi.org/10.1109/TAI.2025.3591588","url":null,"abstract":"Knowledge distillation (KD) can be used for enhancing the performance of a lightweight student models with the help of knowledge from heavier teacher models. Most KD methods for classification use a one-teacher one-student architecture where only one teacher is responsible for transferring knowledge to a student for all the classes. However, when the number of classes increases, it may become difficult for a single teacher to learn the salient characteristics of all the classes. This may also adversely affect the performance of a student in a KD approach. In this article, we present a novel KD method where an ensemble of lightweight students is trained by a pyramid of teachers. At the top level of the pyramid, we have one teacher who learns all the class labels under consideration. As we go down the pyramid, the number of teachers increases at each level. However, except for the top level, each teacher learns a smaller subset of classes compared with its upper levels. Hence, different teachers learn different perspectives of the classification problem. In addition, as we move down the pyramid, the teachers become more and more specialized. On the contrary, as we move upward, the teachers learn a broader and broader perspective about the classification problem. We design a novel distillation loss to distill the knowledge between the student and the pyramid of teachers. Experimental results on publicly available datasets show the effectiveness of the proposed method. The code can be found at <uri>https://github.com/Shilajit77/Pyramid-Distill/tree/main</uri>.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 2","pages":"1097-1106"},"PeriodicalIF":0.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLARITY: A Lightweight Multimodal Transformer for Harmful Content Detection CLARITY:用于有害物质检测的轻型多模态变压器
Pub Date : 2025-07-22 DOI: 10.1109/TAI.2025.3591585
Gautam Siddharth Kashyap;Niharika Jain;Ebad Shabbir;Harsh Joshi;Usman Naseem;Jiechao Gao
Social media platforms are vital to modern communication, but they also enable the spread of harmful content, such as hate speech and misinformation. Current detection models, while accurate, are often resource-intensive and unsuitable for real-time or resource-constrained environments. Moreover, even models that incorporate multilingual capabilities often fail to generalize effectively across different languages. To address this challenge, we propose CLARITY, a novel lightweight cross-modal transformer architecture designed for efficient and scalable harmful content detection. Unlike traditional models, CLARITY achieves faster processing while maintaining accuracy, making it accessible to a wider range of platforms and devices. CLARITY integrates text, image, and audio modalities to capture complex, multimodal interactions that enhance detection across diverse content types. By employing contrastive learning, CLARITY accurately distinguishes between reclaimed language and genuinely harmful content, significantly reducing false positives and promoting inclusivity, particularly for marginalized communities. Additionally, CLARITY incorporates a domain adaptation module with cross-lingual and multilingual, enabling it to generalize effectively across various platforms and ensuring robust performance even in dynamic online environments. We evaluate CLARITY across multiple benchmark datasets and GPUs, including Kaggle’s Tesla P100, Colab Pro’s NVIDIA T4, and NVIDIA A100. The results demonstrate a significant reduction in inference time, with the A100 achieving an average inference time of 0.85 s per instance—over 30% faster than traditional models—while maintaining competitive accuracy.
社交媒体平台对现代交流至关重要,但它们也助长了仇恨言论和错误信息等有害内容的传播。当前的检测模型虽然准确,但往往是资源密集型的,不适合实时或资源受限的环境。此外,即使是包含多语言功能的模型也常常不能有效地在不同的语言之间进行泛化。为了应对这一挑战,我们提出了CLARITY,这是一种新颖的轻量级跨模态变压器架构,专为高效和可扩展的有害内容检测而设计。与传统模型不同,CLARITY在保持准确性的同时实现了更快的处理速度,使其可用于更广泛的平台和设备。CLARITY集成了文本、图像和音频模式,以捕获复杂的多模式交互,增强对不同内容类型的检测。通过采用对比学习,CLARITY能够准确区分再生语言和真正有害的内容,大大减少误报,促进包容性,特别是对边缘化社区。此外,CLARITY集成了一个跨语言和多语言的领域适应模块,使其能够有效地在各种平台上进行推广,并确保即使在动态在线环境中也具有强大的性能。我们在多个基准数据集和gpu上评估CLARITY,包括Kaggle的Tesla P100、Colab Pro的NVIDIA T4和NVIDIA A100。结果表明,推理时间显著缩短,A100实现了每个实例0.85秒的平均推理时间——比传统模型快30%以上——同时保持了相当的准确性。
{"title":"CLARITY: A Lightweight Multimodal Transformer for Harmful Content Detection","authors":"Gautam Siddharth Kashyap;Niharika Jain;Ebad Shabbir;Harsh Joshi;Usman Naseem;Jiechao Gao","doi":"10.1109/TAI.2025.3591585","DOIUrl":"https://doi.org/10.1109/TAI.2025.3591585","url":null,"abstract":"Social media platforms are vital to modern communication, but they also enable the spread of harmful content, such as hate speech and misinformation. Current detection models, while accurate, are often resource-intensive and unsuitable for real-time or resource-constrained environments. Moreover, even models that incorporate multilingual capabilities often fail to generalize effectively across different languages. To address this challenge, we propose CLARITY, a novel lightweight cross-modal transformer architecture designed for efficient and scalable harmful content detection. Unlike traditional models, CLARITY achieves faster processing while maintaining accuracy, making it accessible to a wider range of platforms and devices. CLARITY integrates text, image, and audio modalities to capture complex, multimodal interactions that enhance detection across diverse content types. By employing contrastive learning, CLARITY accurately distinguishes between reclaimed language and genuinely harmful content, significantly reducing false positives and promoting inclusivity, particularly for marginalized communities. Additionally, CLARITY incorporates a domain adaptation module with cross-lingual and multilingual, enabling it to generalize effectively across various platforms and ensuring robust performance even in dynamic online environments. We evaluate CLARITY across multiple benchmark datasets and GPUs, including Kaggle’s Tesla P100, Colab Pro’s NVIDIA T4, and NVIDIA A100. The results demonstrate a significant reduction in inference time, with the A100 achieving an average inference time of 0.85 s per instance—over 30% faster than traditional models—while maintaining competitive accuracy.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 2","pages":"1073-1083"},"PeriodicalIF":0.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on artificial intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1