Pub Date : 2024-12-03DOI: 10.1109/TCDS.2024.3511394
Kuijun Wu;Jingjia Yuan;Xianliang Ge;Ioannis Kakkos;Linze Qian;Sujie Wang;Yamei Yu;Chuantao Li;Yu Sun
In various real-world situations, inappropriate mental workload (MWL) can impair task performance and may cause operational safety risks. Growing efforts have been made to reveal the underlying neural mechanisms of MWL. However, most studies have been limited to well-controlled cognitive tasks, overlooking the exploration of the underlying neural mechanisms in close-to-real human–machine interaction tasks. Here, we investigated the brain network reorganization in response to MWL in a close-to-real simulated flight task. Specifically, a dual-task (primary flight simulation + secondary auditory choice reaction time task) design flight simulation paradigm to mimic real-flight cognitive challenges was introduced to induce varying levels of MWL. The perceived subjective task difficulty and secondary task performance validated the effectiveness of our experimental design. Moreover, multilevel MWL classification was performed to delve into the changes of functional connectivity (FC) in response to different MWL and achieved satisfactory performance (three levels, accuracy $=$ 71.85%). Further inspection of the discriminative FCs highlighted the importance of frontal and parietal-occipital brain regions in MWL modulation. Additional graph theoretical analysis revealed increased information transfer efficiency across distributed brain regions with the increase of MWL. Overall, our research offers valuable insights into the neural mechanisms underlying MWL, with potential implications for improving safety in aviation contexts.
{"title":"Brain Network Reorganization in Response to Multilevel Mental Workload in Simulated Flight Tasks","authors":"Kuijun Wu;Jingjia Yuan;Xianliang Ge;Ioannis Kakkos;Linze Qian;Sujie Wang;Yamei Yu;Chuantao Li;Yu Sun","doi":"10.1109/TCDS.2024.3511394","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3511394","url":null,"abstract":"In various real-world situations, inappropriate mental workload (MWL) can impair task performance and may cause operational safety risks. Growing efforts have been made to reveal the underlying neural mechanisms of MWL. However, most studies have been limited to well-controlled cognitive tasks, overlooking the exploration of the underlying neural mechanisms in close-to-real human–machine interaction tasks. Here, we investigated the brain network reorganization in response to MWL in a close-to-real simulated flight task. Specifically, a dual-task (primary flight simulation + secondary auditory choice reaction time task) design flight simulation paradigm to mimic real-flight cognitive challenges was introduced to induce varying levels of MWL. The perceived subjective task difficulty and secondary task performance validated the effectiveness of our experimental design. Moreover, multilevel MWL classification was performed to delve into the changes of functional connectivity (FC) in response to different MWL and achieved satisfactory performance (three levels, accuracy <inline-formula><tex-math>$=$</tex-math></inline-formula> 71.85%). Further inspection of the discriminative FCs highlighted the importance of frontal and parietal-occipital brain regions in MWL modulation. Additional graph theoretical analysis revealed increased information transfer efficiency across distributed brain regions with the increase of MWL. Overall, our research offers valuable insights into the neural mechanisms underlying MWL, with potential implications for improving safety in aviation contexts.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 3","pages":"698-709"},"PeriodicalIF":5.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144213651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-03DOI: 10.1109/TCDS.2024.3482591
{"title":"IEEE Transactions on Cognitive and Developmental Systems Publication Information","authors":"","doi":"10.1109/TCDS.2024.3482591","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3482591","url":null,"abstract":"","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"C2-C2"},"PeriodicalIF":5.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10774066","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142761427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-29DOI: 10.1109/TCDS.2024.3504256
Yixing Lan;Hao Gao;Xin Xu;Qiang Fang;Yujun Zeng
Multiagent reinforcement learning (MARL) has received increasing attention and been used to solve cooperative multiagent decision-making and learning control tasks. However, the high complexity of the joint action space and the nonstationary learning process are two major problems that negatively impact on the sample efficiency and solution quality of MARL. To this end, this article proposes a novel approach named sequential MARL with role assignment using transformer (SMART). By learning the effects of different actions on state transitions and rewards, SMART realizes the action abstraction of the original action space and the adaptive role cognitive modeling of multiagent, which reduces the complexity of the multiagent exploration and learning process. Meanwhile, SMART uses causal transformer networks to update role assignment policy and action selection policy sequentially, alleviating the influence of nonstationary multiagent policy learning. The convergence characteristic of SMART is theoretically analyzed. Extensive experiments on the challenging Google football and StarCraft multiagent challenge are conducted, demonstrating that compared with mainstream MARL algorithms such as MAT and HAPPO, SMART achieves a new state-of-the-art performance. Meanwhile, the learned policies through SMART have good generalization ability when the number of agents changes.
{"title":"SMART: Sequential Multiagent Reinforcement Learning With Role Assignment Using Transformer","authors":"Yixing Lan;Hao Gao;Xin Xu;Qiang Fang;Yujun Zeng","doi":"10.1109/TCDS.2024.3504256","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3504256","url":null,"abstract":"Multiagent reinforcement learning (MARL) has received increasing attention and been used to solve cooperative multiagent decision-making and learning control tasks. However, the high complexity of the joint action space and the nonstationary learning process are two major problems that negatively impact on the sample efficiency and solution quality of MARL. To this end, this article proposes a novel approach named sequential MARL with role assignment using transformer (SMART). By learning the effects of different actions on state transitions and rewards, SMART realizes the action abstraction of the original action space and the adaptive role cognitive modeling of multiagent, which reduces the complexity of the multiagent exploration and learning process. Meanwhile, SMART uses causal transformer networks to update role assignment policy and action selection policy sequentially, alleviating the influence of nonstationary multiagent policy learning. The convergence characteristic of SMART is theoretically analyzed. Extensive experiments on the challenging Google football and StarCraft multiagent challenge are conducted, demonstrating that compared with mainstream MARL algorithms such as MAT and HAPPO, SMART achieves a new state-of-the-art performance. Meanwhile, the learned policies through SMART have good generalization ability when the number of agents changes.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 3","pages":"615-630"},"PeriodicalIF":5.0,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144213542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Autonomous sensory meridian response (ASMR) is an experience-dependent sensation in response to audio and audio–visual triggers. The acoustical characteristics of audio trigger have been speculated to be in connection with ASMR. To explore the effect of audio trigger’s frequency on ASMR and then to discover ASMR’s mechanism, the ASMR phenomenon under random-frequency audio, high-frequency audio, low-frequency audio, original audio, white-noise and rest were analyzed by EEG. The differential entropy and power spectral density were applied to quantitative analysis. The results suggest the audio’s frequency can modulate the brain activities on θ, α, β, γ, and high γ frequencies. Moreover, ASMR responder and nonresponder may be more sensitive to low-frequency audio and white-noise by suppressing brain activities of central areas in γ and high γ frequencies. Further, for ASMR responders, ASMR evoked by low-frequency audio trigger may involve more attentional selection or semantic processing and may not alter the brain functions in information processing and execution.
{"title":"The Effect of Audio Trigger’s Frequency on Autonomous Sensory Meridian Response","authors":"Lili Li;Zhiqing Wu;Zhongliang Yu;Zhibin He;Zhizhong Wang;Liyu Lin;Shaolong Kuang","doi":"10.1109/TCDS.2024.3506039","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3506039","url":null,"abstract":"Autonomous sensory meridian response (ASMR) is an experience-dependent sensation in response to audio and audio–visual triggers. The acoustical characteristics of audio trigger have been speculated to be in connection with ASMR. To explore the effect of audio trigger’s frequency on ASMR and then to discover ASMR’s mechanism, the ASMR phenomenon under random-frequency audio, high-frequency audio, low-frequency audio, original audio, white-noise and rest were analyzed by EEG. The differential entropy and power spectral density were applied to quantitative analysis. The results suggest the audio’s frequency can modulate the brain activities on <italic>θ</i>, <italic>α</i>, <italic>β</i>, <italic>γ</i>, and high <italic>γ</i> frequencies. Moreover, ASMR responder and nonresponder may be more sensitive to low-frequency audio and white-noise by suppressing brain activities of central areas in <italic>γ</i> and high <italic>γ</i> frequencies. Further, for ASMR responders, ASMR evoked by low-frequency audio trigger may involve more attentional selection or semantic processing and may not alter the brain functions in information processing and execution.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 3","pages":"672-681"},"PeriodicalIF":5.0,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144213653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-27DOI: 10.1109/TCDS.2024.3506060
Bing Li;Dong Zhang;Cheng Huang;Yun Xian;Ming Li;Dah-Jye Lee
Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection. Serious fisheye lens distortion in the peripheral region of the image leads to degraded performance of the existing head pose estimation models trained on undistorted images. This article presents a new approach for head pose estimation that uses the knowledge of head location in the image to reduce the negative effect of fisheye distortion. We develop an end-to-end convolutional neural network to estimate the head pose with the multitask learning of head pose and head location. Our proposed network estimates the head pose directly from the fisheye image without the operation of rectification or calibration. We also created a fisheye-distorted version of the three popular head pose estimation datasets, BIWI, 300W-LP, and AFLW2000 for our experiments. Experimental results show that our network remarkably improves the accuracy of head pose estimation compared with other state-of-the-art one-stage and two-stage methods.
{"title":"Location-Guided Head Pose Estimation for Fisheye Image","authors":"Bing Li;Dong Zhang;Cheng Huang;Yun Xian;Ming Li;Dah-Jye Lee","doi":"10.1109/TCDS.2024.3506060","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3506060","url":null,"abstract":"Camera with a fisheye or ultra-wide lens covers a wide field of view that cannot be modeled by the perspective projection. Serious fisheye lens distortion in the peripheral region of the image leads to degraded performance of the existing head pose estimation models trained on undistorted images. This article presents a new approach for head pose estimation that uses the knowledge of head location in the image to reduce the negative effect of fisheye distortion. We develop an end-to-end convolutional neural network to estimate the head pose with the multitask learning of head pose and head location. Our proposed network estimates the head pose directly from the fisheye image without the operation of rectification or calibration. We also created a fisheye-distorted version of the three popular head pose estimation datasets, BIWI, 300W-LP, and AFLW2000 for our experiments. Experimental results show that our network remarkably improves the accuracy of head pose estimation compared with other state-of-the-art one-stage and two-stage methods.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 3","pages":"682-697"},"PeriodicalIF":5.0,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144213660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A biomathematical model is a framework that calculates corresponding indices based on biological and physiological parameters, and can be used to study the fatigue states of submarine crew members during long-duration operations. Submarine personnel are prone to fatigue and decreased vigilance, leading to unnecessary risks. Sleep quality plays a crucial role in assessing human vigilance; however, traditional biomathematical models generally categorize human sleep into two different pressure stages based on circadian rhythms. To accurately classify sleep stages based on physiological signals, this article proposes a novel deep learning architecture using single-channel EEG signals. This architecture comprises four modules: beginning with a feature preliminary extraction module employing a multiscale convolutional neural network (MSCNN), followed by a feature aggregation module combining reparameterizable large kernel network with temporal convolutions network (RepLKnet), then utilizing a multivariate weighted recurrent network as the tensor encoder (MWRN), and finally, decoding with a dynamic graph convolutional neural network (DGCNN). The output is provided by a final classifier. We assessed the effectiveness of the proposed model using two publicly available datasets. The results demonstrate that our model surpasses current leading benchmarks.
{"title":"A Biomathematical Model for Classifying Sleep Stages Using Deep Learning Techniques","authors":"Ruijie He;Wei Tong;Miaomiao Zhang;Guangyu Zhu;Edmond Q. Wu","doi":"10.1109/TCDS.2024.3503767","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3503767","url":null,"abstract":"A biomathematical model is a framework that calculates corresponding indices based on biological and physiological parameters, and can be used to study the fatigue states of submarine crew members during long-duration operations. Submarine personnel are prone to fatigue and decreased vigilance, leading to unnecessary risks. Sleep quality plays a crucial role in assessing human vigilance; however, traditional biomathematical models generally categorize human sleep into two different pressure stages based on circadian rhythms. To accurately classify sleep stages based on physiological signals, this article proposes a novel deep learning architecture using single-channel EEG signals. This architecture comprises four modules: beginning with a feature preliminary extraction module employing a multiscale convolutional neural network (MSCNN), followed by a feature aggregation module combining reparameterizable large kernel network with temporal convolutions network (RepLKnet), then utilizing a multivariate weighted recurrent network as the tensor encoder (MWRN), and finally, decoding with a dynamic graph convolutional neural network (DGCNN). The output is provided by a final classifier. We assessed the effectiveness of the proposed model using two publicly available datasets. The results demonstrate that our model surpasses current leading benchmarks.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 3","pages":"659-671"},"PeriodicalIF":5.0,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144213544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spiking neural networks (SNNs) are known for brain-inspired architecture and low power consumption. Leveraging biocompatibility and self-attention mechanism, Spiking Transformers become the most promising SNN architecture with high accuracy. However, Spiking Transformers still faces the challenge of high training costs, such as a 51$M$ network requiring 181 training hours on ImageNet. In this work, we explore feature pruning to reduce training costs and overcome two challenges: high pruning ratio and lightweight pruning methods. We first analyze the spiking features and find the potential for a high pruning ratio. The majority of information is concentrated on a part of the spiking features in spiking transformer, which suggests that we can keep this part of the tokens and prune the others. To achieve lightweight, a parameter-free spatial–temporal spiking feature pruning method is proposed, which uses only a simple addition-sorting operation. The spiking features/tokens with high spike accumulation values are selected for training. The others are pruned and merged through a compensation module called Softmatch. Experimental results demonstrate that our method reduces training costs without compromising image classification accuracy. On ImageNet, our approach reduces the training time from 181 to 128 h while achieving comparable accuracy (83.13% versus 83.07%).
{"title":"Spatial–Temporal Spiking Feature Pruning in Spiking Transformer","authors":"Zhaokun Zhou;Kaiwei Che;Jun Niu;Man Yao;Guoqi Li;Li Yuan;Guibo Luo;Yuesheng Zhu","doi":"10.1109/TCDS.2024.3500018","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3500018","url":null,"abstract":"Spiking neural networks (SNNs) are known for brain-inspired architecture and low power consumption. Leveraging biocompatibility and self-attention mechanism, Spiking Transformers become the most promising SNN architecture with high accuracy. However, Spiking Transformers still faces the challenge of high training costs, such as a 51<inline-formula><tex-math>$M$</tex-math></inline-formula> network requiring 181 training hours on ImageNet. In this work, we explore feature pruning to reduce training costs and overcome two challenges: high pruning ratio and lightweight pruning methods. We first analyze the spiking features and find the potential for a high pruning ratio. The majority of information is concentrated on a part of the spiking features in spiking transformer, which suggests that we can keep this part of the tokens and prune the others. To achieve lightweight, a parameter-free spatial–temporal spiking feature pruning method is proposed, which uses only a simple addition-sorting operation. The spiking features/tokens with high spike accumulation values are selected for training. The others are pruned and merged through a compensation module called Softmatch. Experimental results demonstrate that our method reduces training costs without compromising image classification accuracy. On ImageNet, our approach reduces the training time from 181 to 128 h while achieving comparable accuracy (83.13% versus 83.07%).","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 3","pages":"644-658"},"PeriodicalIF":5.0,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144213652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-11DOI: 10.1109/TCDS.2024.3496566
Feng Yang;Yichao Cao;Xuanpeng Li;Weigong Zhang
Detecting human–object interactions (HOI) presents a formidable challenge, necessitating the discernment of intricate, high-level relationships between humans and objects. Recent studies have explored HOI vision-and-language modeling (HOI-VLM), which leverages linguistic information inspired by cross-modal technology. Despite its promise, current methodologies face challenges due to the constraints of limited annotation vocabularies and suboptimal word embeddings, which hinder effective alignment with visual features and consequently, the efficient transfer of linguistic knowledge. In this work, we propose a novel cross-modal framework that leverages external propositional knowledge which harmonize annotation text with a broader spectrum of world knowledge, enabling a more explicit and unambiguous representation of complex semantic relationships. Additionally, considering the prevalence of multiple complexities due to the symbiotic or distinctive relationships inherent in one HO pair, along with the identical interactions occurring with diverse HO pairs (e.g., “human ride bicycle” versus “human ride horse”). The challenge lies in understanding the subtle differences and similarities between interactions involving different objects or occurring in varied contexts. To this end, we propose the Jaccard contrast strategy to simultaneously optimize cross-modal representation consistency across HO pairs (especially for cases where multiple interactions occur), which encompasses both vision-to-vision and vision-to-knowledge alignment objectives. The effectiveness of our proposed method is comprehensively validated through extensive experiments, showcasing its superiority in the field of HOI analysis.
{"title":"Interaction Is Worth More Explanations: Improving Human–Object Interaction Representation With Propositional Knowledge","authors":"Feng Yang;Yichao Cao;Xuanpeng Li;Weigong Zhang","doi":"10.1109/TCDS.2024.3496566","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3496566","url":null,"abstract":"Detecting human–object interactions (HOI) presents a formidable challenge, necessitating the discernment of intricate, high-level relationships between humans and objects. Recent studies have explored HOI vision-and-language modeling (HOI-VLM), which leverages linguistic information inspired by cross-modal technology. Despite its promise, current methodologies face challenges due to the constraints of limited annotation vocabularies and suboptimal word embeddings, which hinder effective alignment with visual features and consequently, the efficient transfer of linguistic knowledge. In this work, we propose a novel cross-modal framework that leverages external propositional knowledge which harmonize annotation text with a broader spectrum of world knowledge, enabling a more explicit and unambiguous representation of complex semantic relationships. Additionally, considering the prevalence of multiple complexities due to the symbiotic or distinctive relationships inherent in one HO pair, along with the identical interactions occurring with diverse HO pairs (e.g., “human ride bicycle” versus “human ride horse”). The challenge lies in understanding the subtle differences and similarities between interactions involving different objects or occurring in varied contexts. To this end, we propose the Jaccard contrast strategy to simultaneously optimize cross-modal representation consistency across HO pairs (especially for cases where multiple interactions occur), which encompasses both vision-to-vision and vision-to-knowledge alignment objectives. The effectiveness of our proposed method is comprehensively validated through extensive experiments, showcasing its superiority in the field of HOI analysis.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 3","pages":"631-643"},"PeriodicalIF":5.0,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144213612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-05DOI: 10.1109/TCDS.2024.3492199
Xiang Zhang;Xiang Shen;Yiwen Wang
Brain–machine interfaces (BMIs) offer significant promise for enabling paralyzed individuals to control external devices using their brain signals. One challenge is that during the online brain control (BC) process, subjects may not be completely immersed in the task, particularly when multiple steps are needed to achieve a goal. The decoder indiscriminately takes the less engaged trials as training data, which might decrease the decoding accuracy. In this article, we propose an alternative kernel RL-based decoder that trains online with continuous parameter update. We model neural activity from the medial prefrontal cortex (mPFC), a reward-related brain region, to represent task engagement. This information is incorporated into a stochastic learning rate using an exponential model, which measures the relevancy of neural data. The proposed algorithm was evaluated in the experiment where rats performed a cursor-reaching BC task. We found the neural activities from mPFC contained the engagement information which was negatively correlated with trial response time. Moreover, compared to the RL method without task engagement modeling, our proposed method enhanced the training efficiency. It used half of the training data to achieve the same reconstruction accuracy of the cursor trajectory. The results demonstrate the potential of our RL framework for improving online BC tasks.
{"title":"Modeling Task Engagement to Regulate Reinforcement Learning-Based Decoding for Online Brain Control","authors":"Xiang Zhang;Xiang Shen;Yiwen Wang","doi":"10.1109/TCDS.2024.3492199","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3492199","url":null,"abstract":"Brain–machine interfaces (BMIs) offer significant promise for enabling paralyzed individuals to control external devices using their brain signals. One challenge is that during the online brain control (BC) process, subjects may not be completely immersed in the task, particularly when multiple steps are needed to achieve a goal. The decoder indiscriminately takes the less engaged trials as training data, which might decrease the decoding accuracy. In this article, we propose an alternative kernel RL-based decoder that trains online with continuous parameter update. We model neural activity from the medial prefrontal cortex (mPFC), a reward-related brain region, to represent task engagement. This information is incorporated into a stochastic learning rate using an exponential model, which measures the relevancy of neural data. The proposed algorithm was evaluated in the experiment where rats performed a cursor-reaching BC task. We found the neural activities from mPFC contained the engagement information which was negatively correlated with trial response time. Moreover, compared to the RL method without task engagement modeling, our proposed method enhanced the training efficiency. It used half of the training data to achieve the same reconstruction accuracy of the cursor trajectory. The results demonstrate the potential of our RL framework for improving online BC tasks.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 3","pages":"606-614"},"PeriodicalIF":5.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144213532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-05DOI: 10.1109/TCDS.2024.3492181
Xiang Wu;Juyang Weng
The foveated nature of the human vision system (HVS) means the acuity on the retina peaks at the center of the fovea and gradually descends to the periphery with increasing eccentricity. Foveation is general-purpose, meaning the fovea is more often used than the periphery. Self-generated saccades dynamically project the fovea to different parts of the visual world so that the high-acuity fovea can process interested parts at different times. It is still unclear why biological vision uses foveation. This work is the first foveated neural network as far as we are aware, but it has a limited scope. We study two subjects here as follows. 1) We design a biological density of cones (BDOCs) foveation method for image warping to simulate a biologically plausible foveated retina using a commonly available uniform-pixel camera. 2) The subject of this article is not specific to tasks, but we choose a challenging task, visual navigation, as an example of quantitative and spatiotemporal tasks, and compare it with deep learning. Our experimental results showed that 1) the BDOC foveation is logically and visually correct; and 2) the developmental network (DN) performs better than deep learning in a surprising way and foveation helps both network types.
{"title":"Developmental Networks With Foveation","authors":"Xiang Wu;Juyang Weng","doi":"10.1109/TCDS.2024.3492181","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3492181","url":null,"abstract":"The foveated nature of the human vision system (HVS) means the acuity on the retina peaks at the center of the fovea and gradually descends to the periphery with increasing eccentricity. Foveation is general-purpose, meaning the fovea is more often used than the periphery. Self-generated saccades dynamically project the fovea to different parts of the visual world so that the high-acuity fovea can process interested parts at different times. It is still unclear why biological vision uses foveation. This work is the first foveated neural network as far as we are aware, but it has a limited scope. We study two subjects here as follows. 1) We design a biological density of cones (BDOCs) foveation method for image warping to simulate a biologically plausible foveated retina using a commonly available uniform-pixel camera. 2) The subject of this article is not specific to tasks, but we choose a challenging task, visual navigation, as an example of quantitative and spatiotemporal tasks, and compare it with deep learning. Our experimental results showed that 1) the BDOC foveation is logically and visually correct; and 2) the developmental network (DN) performs better than deep learning in a surprising way and foveation helps both network types.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 3","pages":"592-605"},"PeriodicalIF":5.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144213530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}