Zhibin Li;Yuxuan Zhang;Weirong Dong;Jing Yang;Jiansong Feng;Chengbi Zhang;Xiaolong Chen;Taihong Wang
{"title":"基于并行阻抗感知策略的关联学习结构增强跨模态物体感知","authors":"Zhibin Li;Yuxuan Zhang;Weirong Dong;Jing Yang;Jiansong Feng;Chengbi Zhang;Xiaolong Chen;Taihong Wang","doi":"10.1109/TIM.2025.3534219","DOIUrl":null,"url":null,"abstract":"Human beings can infer the shape and material characteristics of grasping objects based on multisensory information, which is still a technical challenge for modern robots. The cross-modal object perception mechanism holds promise to assist robots in effectively executing various operations or interactive tasks in complex applications, particularly in harsh visual scenes. Here, we present an associated learning architecture equipped with a parallel impedance sensing strategy, which enhances the perception of captured objects by integrating visual data with somatosensory data from frequency division multiplexing (FDM) parallel impedance and finger bending angles of the robotic hand. We design a cross-modal generative adversarial network (CGAN) in this architecture to achieve cross-modal feature learning for two types of sensory data, mimicking the psychological cognition of human senses. Additionally, the dynamic attention fusion mechanism is employed for feature transfer and fusion learning, enabling the network to adaptively adjust weights based on input cross-modal features, resulting in dynamic feature fusion. The architecture has undergone training and testing with ten categories of objects, successfully achieving cross-modal feature learning and fusion recognition of the two sensory data. Under low-quality image conditions, the recognition accuracy of attention fusion reaches up to 94.0%, significantly surpassing the accuracy of vision alone. This highlights the potential of our architecture to enhance robots to accurately perceive the outside world by integrating visual and somatosensory data, especially in challenging visual environments.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-11"},"PeriodicalIF":5.9000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Associated Learning Architecture Equipped With Parallel Impedance Sensing Strategy to Enhance Cross-Modal Object Perception\",\"authors\":\"Zhibin Li;Yuxuan Zhang;Weirong Dong;Jing Yang;Jiansong Feng;Chengbi Zhang;Xiaolong Chen;Taihong Wang\",\"doi\":\"10.1109/TIM.2025.3534219\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human beings can infer the shape and material characteristics of grasping objects based on multisensory information, which is still a technical challenge for modern robots. The cross-modal object perception mechanism holds promise to assist robots in effectively executing various operations or interactive tasks in complex applications, particularly in harsh visual scenes. Here, we present an associated learning architecture equipped with a parallel impedance sensing strategy, which enhances the perception of captured objects by integrating visual data with somatosensory data from frequency division multiplexing (FDM) parallel impedance and finger bending angles of the robotic hand. We design a cross-modal generative adversarial network (CGAN) in this architecture to achieve cross-modal feature learning for two types of sensory data, mimicking the psychological cognition of human senses. Additionally, the dynamic attention fusion mechanism is employed for feature transfer and fusion learning, enabling the network to adaptively adjust weights based on input cross-modal features, resulting in dynamic feature fusion. The architecture has undergone training and testing with ten categories of objects, successfully achieving cross-modal feature learning and fusion recognition of the two sensory data. Under low-quality image conditions, the recognition accuracy of attention fusion reaches up to 94.0%, significantly surpassing the accuracy of vision alone. This highlights the potential of our architecture to enhance robots to accurately perceive the outside world by integrating visual and somatosensory data, especially in challenging visual environments.\",\"PeriodicalId\":13341,\"journal\":{\"name\":\"IEEE Transactions on Instrumentation and Measurement\",\"volume\":\"74 \",\"pages\":\"1-11\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2025-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Instrumentation and Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10879100/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10879100/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Associated Learning Architecture Equipped With Parallel Impedance Sensing Strategy to Enhance Cross-Modal Object Perception
Human beings can infer the shape and material characteristics of grasping objects based on multisensory information, which is still a technical challenge for modern robots. The cross-modal object perception mechanism holds promise to assist robots in effectively executing various operations or interactive tasks in complex applications, particularly in harsh visual scenes. Here, we present an associated learning architecture equipped with a parallel impedance sensing strategy, which enhances the perception of captured objects by integrating visual data with somatosensory data from frequency division multiplexing (FDM) parallel impedance and finger bending angles of the robotic hand. We design a cross-modal generative adversarial network (CGAN) in this architecture to achieve cross-modal feature learning for two types of sensory data, mimicking the psychological cognition of human senses. Additionally, the dynamic attention fusion mechanism is employed for feature transfer and fusion learning, enabling the network to adaptively adjust weights based on input cross-modal features, resulting in dynamic feature fusion. The architecture has undergone training and testing with ten categories of objects, successfully achieving cross-modal feature learning and fusion recognition of the two sensory data. Under low-quality image conditions, the recognition accuracy of attention fusion reaches up to 94.0%, significantly surpassing the accuracy of vision alone. This highlights the potential of our architecture to enhance robots to accurately perceive the outside world by integrating visual and somatosensory data, especially in challenging visual environments.
期刊介绍:
Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.