Pub Date : 2025-03-05DOI: 10.1109/TCDS.2025.3566649
Wei Xu;Tianfei Zhou;Taoyuan Zhang;Jie Li;Peiyin Chen;Jia Pan;Xiaofeng Liu
Vision language models (VLMs) have demonstrated strong general capabilities and achieved great success in areas such as image understanding and reasoning. Visual prompts enhance the focus of VLMs on designated areas, but their fine-grained grounding has not been fully developed. Recent research has used set-of-mark (SoM) approach to unleash the grounding capabilities of generative pretrained transformer-4 with vision (GPT-4V), achieving significant benchmark performance. However, SoM still has problems with label offset and hallucination of VLMs, and the grounding ability of VLMs remains limited, making it challenging to handle complex scenarios in human–robot interaction. To address these limitations and provide more accurate and less hallucinatory results, we propose contextual set-of-mark (ConSoM), a new SoM-based prompting mechanism that leverages dual-image inputs and contextual semantic information of images. Experiments demonstrate that ConSoM has distinct advantages in visual grounding, improving by 11% compared with the baseline on the dataset Refcocog. Furthermore, we evaluated ConSoM’s grounding abilities in five indoor scenarios, where it exhibited strong robustness in complex environments and under occlusion conditions. We also introduced a scalable annotation method for pixel-level question-answering dataset. The accuracy, scalability, and depth of world knowledge make ConSoM a highly effective approach for future human–robot interactions.
{"title":"Exploring Grounding Abilities in Vision-Language Models Through Contextual Perception","authors":"Wei Xu;Tianfei Zhou;Taoyuan Zhang;Jie Li;Peiyin Chen;Jia Pan;Xiaofeng Liu","doi":"10.1109/TCDS.2025.3566649","DOIUrl":"https://doi.org/10.1109/TCDS.2025.3566649","url":null,"abstract":"Vision language models (VLMs) have demonstrated strong general capabilities and achieved great success in areas such as image understanding and reasoning. Visual prompts enhance the focus of VLMs on designated areas, but their fine-grained grounding has not been fully developed. Recent research has used set-of-mark (SoM) approach to unleash the grounding capabilities of generative pretrained transformer-4 with vision (GPT-4V), achieving significant benchmark performance. However, SoM still has problems with label offset and hallucination of VLMs, and the grounding ability of VLMs remains limited, making it challenging to handle complex scenarios in human–robot interaction. To address these limitations and provide more accurate and less hallucinatory results, we propose contextual set-of-mark (ConSoM), a new SoM-based prompting mechanism that leverages dual-image inputs and contextual semantic information of images. Experiments demonstrate that ConSoM has distinct advantages in visual grounding, improving by 11% compared with the baseline on the dataset Refcocog. Furthermore, we evaluated ConSoM’s grounding abilities in five indoor scenarios, where it exhibited strong robustness in complex environments and under occlusion conditions. We also introduced a scalable annotation method for pixel-level question-answering dataset. The accuracy, scalability, and depth of world knowledge make ConSoM a highly effective approach for future human–robot interactions.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 6","pages":"1461-1473"},"PeriodicalIF":4.9,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-02DOI: 10.1109/TCDS.2025.3566229
Zhipeng Cai;Hongxiang Gao;Min Wu;Jianqing Li;Chengyu Liu
Emotion recognition remains a challenging yet essential task in affective computing, spanning fields from psychology to human-computer interaction. This study introduces a novel approach to improve emotion recognition by integrating multimodal physiological signal interaction networks with graph neural networks. We explored five undirected functional connectivity methods for constructing physiologic networks: Pearson correlation coefficient, maximal information coefficient, phase-locking value, phase lag index, and time-delay stability (TDS). These methods capture the relationships between the featured waveforms from electroencephalography and peripheral signals (electrocardiography, respiration, and skin conductance). The resulting physiologic networks, combined with extracted waveform features, were fed into graph attention networks (GATs) and graph isomorphism networks (GINs) for emotion classification. Our model was trained on the DEAP dataset and tested on the MAHNOB-HCI dataset to evaluate its generalizability. The TDS-based GAT and GIN models demonstrated superior performance in recognizing arousal and valence states compared with the traditional classifiers like support vector machines, convolutional neural networks, and standard graph convolutional neural networks. Specifically, the proposed method achieved outstanding $F1$ scores of 83.38% for arousal and 82.52% for valence on cross-dataset emotion recognition. These results underscore the importance of incorporating dynamic signal coupling and multimodal physiological data to improve emotion recognition accuracy and robustness across different datasets, highlighting the potential of this approach for practical applications.
{"title":"A Unified Physiological Signal Interaction Network for Cross-Dataset Emotion Recognition","authors":"Zhipeng Cai;Hongxiang Gao;Min Wu;Jianqing Li;Chengyu Liu","doi":"10.1109/TCDS.2025.3566229","DOIUrl":"https://doi.org/10.1109/TCDS.2025.3566229","url":null,"abstract":"Emotion recognition remains a challenging yet essential task in affective computing, spanning fields from psychology to human-computer interaction. This study introduces a novel approach to improve emotion recognition by integrating multimodal physiological signal interaction networks with graph neural networks. We explored five undirected functional connectivity methods for constructing physiologic networks: Pearson correlation coefficient, maximal information coefficient, phase-locking value, phase lag index, and time-delay stability (TDS). These methods capture the relationships between the featured waveforms from electroencephalography and peripheral signals (electrocardiography, respiration, and skin conductance). The resulting physiologic networks, combined with extracted waveform features, were fed into graph attention networks (GATs) and graph isomorphism networks (GINs) for emotion classification. Our model was trained on the DEAP dataset and tested on the MAHNOB-HCI dataset to evaluate its generalizability. The TDS-based GAT and GIN models demonstrated superior performance in recognizing arousal and valence states compared with the traditional classifiers like support vector machines, convolutional neural networks, and standard graph convolutional neural networks. Specifically, the proposed method achieved outstanding <inline-formula><tex-math>$F1$</tex-math></inline-formula> scores of 83.38% for arousal and 82.52% for valence on cross-dataset emotion recognition. These results underscore the importance of incorporating dynamic signal coupling and multimodal physiological data to improve emotion recognition accuracy and robustness across different datasets, highlighting the potential of this approach for practical applications.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 6","pages":"1447-1460"},"PeriodicalIF":4.9,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145705861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-06DOI: 10.1109/TCDS.2025.3533704
Huajin Tang
{"title":"Editorial: 2025 New Year Message From the Editor-in-Chief","authors":"Huajin Tang","doi":"10.1109/TCDS.2025.3533704","DOIUrl":"https://doi.org/10.1109/TCDS.2025.3533704","url":null,"abstract":"","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 1","pages":"2-2"},"PeriodicalIF":5.0,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10877686","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-06DOI: 10.1109/TCDS.2024.3518202
{"title":"IEEE Transactions on Cognitive and Developmental Systems Information for Authors","authors":"","doi":"10.1109/TCDS.2024.3518202","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3518202","url":null,"abstract":"","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 1","pages":"C4-C4"},"PeriodicalIF":5.0,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10877685","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143360868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-06DOI: 10.1109/TCDS.2024.3518198
{"title":"IEEE Transactions on Cognitive and Developmental Systems Publication Information","authors":"","doi":"10.1109/TCDS.2024.3518198","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3518198","url":null,"abstract":"","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 1","pages":"C2-C2"},"PeriodicalIF":5.0,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10877687","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-06DOI: 10.1109/TCDS.2024.3518200
{"title":"IEEE Computational Intelligence Society Information","authors":"","doi":"10.1109/TCDS.2024.3518200","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3518200","url":null,"abstract":"","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 1","pages":"C3-C3"},"PeriodicalIF":5.0,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10877688","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143360869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1109/TCDS.2024.3521617
{"title":"2024 Index IEEE Transactions on Cognitive and Developmental Systems Vol. 16","authors":"","doi":"10.1109/TCDS.2024.3521617","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3521617","url":null,"abstract":"","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"1-35"},"PeriodicalIF":5.0,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10817819","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-23DOI: 10.1109/TCDS.2024.3520976
Yokhesh K. Tamilselvam;Jacky Ganguly;Mandar S. Jog;Rajni V. Patel
Sensorimotor integration (SMI) is a complex process that allows humans to perceive and interact with their environment. Any impairment in SMI may impact the day-to-day functioning of humans, particularly evident in Parkinson’s Disease (PD). SMI is critical to accurate perception and modulation of motor outputs. Therefore, understanding the associated neural pathways and mathematical underpinnings is crucial. In this article, a systematic review of the proposed neural and computational models associated with SMI is performed. While the neural models discuss the neural architecture and regions, the computational models explore the mathematical or computational mechanisms involved in SMI. The article then explores how PD may impair SMI, reviewing studies that discuss deficits in the perception of various modalities, pointing to an SMI impairment. This helps in understanding the nature of SMI deficits in PD. Overall, the review offers comprehensive insights into the basis of SMI and the effect of PD on SMI, enabling clinicians to better understand the SMI mechanisms and facilitate the development of targeted therapies to mitigate SMI deficits in PD.
{"title":"Sensorimotor Integration: A Review of Neural and Computational Models and the Impact of Parkinson’s Disease","authors":"Yokhesh K. Tamilselvam;Jacky Ganguly;Mandar S. Jog;Rajni V. Patel","doi":"10.1109/TCDS.2024.3520976","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3520976","url":null,"abstract":"Sensorimotor integration (SMI) is a complex process that allows humans to perceive and interact with their environment. Any impairment in SMI may impact the day-to-day functioning of humans, particularly evident in Parkinson’s Disease (PD). SMI is critical to accurate perception and modulation of motor outputs. Therefore, understanding the associated neural pathways and mathematical underpinnings is crucial. In this article, a systematic review of the proposed neural and computational models associated with SMI is performed. While the neural models discuss the neural architecture and regions, the computational models explore the mathematical or computational mechanisms involved in SMI. The article then explores how PD may impair SMI, reviewing studies that discuss deficits in the perception of various modalities, pointing to an SMI impairment. This helps in understanding the nature of SMI deficits in PD. Overall, the review offers comprehensive insights into the basis of SMI and the effect of PD on SMI, enabling clinicians to better understand the SMI mechanisms and facilitate the development of targeted therapies to mitigate SMI deficits in PD.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 1","pages":"3-21"},"PeriodicalIF":5.0,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-03DOI: 10.1109/TCDS.2024.3482595
{"title":"IEEE Transactions on Cognitive and Developmental Systems Information for Authors","authors":"","doi":"10.1109/TCDS.2024.3482595","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3482595","url":null,"abstract":"","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"C4-C4"},"PeriodicalIF":5.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10774065","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142761336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-03DOI: 10.1109/TCDS.2024.3482593
{"title":"IEEE Computational Intelligence Society Information","authors":"","doi":"10.1109/TCDS.2024.3482593","DOIUrl":"https://doi.org/10.1109/TCDS.2024.3482593","url":null,"abstract":"","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 6","pages":"C3-C3"},"PeriodicalIF":5.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10774067","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142761337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}