Shuzhan Hu;Yiping Duan;Xiaoming Tao;Jian Chu;Jianhua Lu
{"title":"基于脑电图的大脑启发式视觉注意力建模,用于智能机器人技术","authors":"Shuzhan Hu;Yiping Duan;Xiaoming Tao;Jian Chu;Jianhua Lu","doi":"10.1109/JSTSP.2024.3408100","DOIUrl":null,"url":null,"abstract":"Vision, as the primary perceptual mode for intelligent robots, plays a crucial role in various human-robot interaction (HRI) scenarios. In certain situations, it is essential to utilize the visual sensors to capture videos for humans, assisting them in tasks like exploration missions. However, the increasing amount of video information brings great challenges for data transmission and storage. Therefore, there is an urgent need to develop more efficient video compression strategies to address this challenge. When perceiving a video, humans tend to pay more attention to some specific clips, which may occupy a small part of the whole video content, but largely affect the perceptual quality. This human visual attention (VA) mechanism provides valuable inspiration for optimizing video compression methods for HRI scenarios. Therefore, we combine psychophysiological paradigms and machine learning methods to model human VA and introduce it into the bitrate allocation to fully utilize the limited resources. Specifically, we collect electroencephalographic (EEG) data when humans watch videos, constructing an EEG dataset reflecting VA. Based on the dataset, we propose a VA measurement model to determine the VA states of humans in their underlying brain responses. Then, a brain-inspired VA prediction model is established to obtain VA metrics directly from the videos. Finally, based on the VA metric, more bitrates are allocated to the clips that humans pay more attention to. The experimental results show that our proposed methods can accurately determine the humans' VA states and predict the VA metrics evoked by different video clips. Furthermore, the bitrate allocation method based on the VA metric can achieve better perceptual quality at low bitrates.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":null,"pages":null},"PeriodicalIF":8.7000,"publicationDate":"2024-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Brain-Inspired Visual Attention Modeling Based on EEG for Intelligent Robotics\",\"authors\":\"Shuzhan Hu;Yiping Duan;Xiaoming Tao;Jian Chu;Jianhua Lu\",\"doi\":\"10.1109/JSTSP.2024.3408100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vision, as the primary perceptual mode for intelligent robots, plays a crucial role in various human-robot interaction (HRI) scenarios. In certain situations, it is essential to utilize the visual sensors to capture videos for humans, assisting them in tasks like exploration missions. However, the increasing amount of video information brings great challenges for data transmission and storage. Therefore, there is an urgent need to develop more efficient video compression strategies to address this challenge. When perceiving a video, humans tend to pay more attention to some specific clips, which may occupy a small part of the whole video content, but largely affect the perceptual quality. This human visual attention (VA) mechanism provides valuable inspiration for optimizing video compression methods for HRI scenarios. Therefore, we combine psychophysiological paradigms and machine learning methods to model human VA and introduce it into the bitrate allocation to fully utilize the limited resources. Specifically, we collect electroencephalographic (EEG) data when humans watch videos, constructing an EEG dataset reflecting VA. Based on the dataset, we propose a VA measurement model to determine the VA states of humans in their underlying brain responses. Then, a brain-inspired VA prediction model is established to obtain VA metrics directly from the videos. Finally, based on the VA metric, more bitrates are allocated to the clips that humans pay more attention to. The experimental results show that our proposed methods can accurately determine the humans' VA states and predict the VA metrics evoked by different video clips. Furthermore, the bitrate allocation method based on the VA metric can achieve better perceptual quality at low bitrates.\",\"PeriodicalId\":13038,\"journal\":{\"name\":\"IEEE Journal of Selected Topics in Signal Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":8.7000,\"publicationDate\":\"2024-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Selected Topics in Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10543027/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10543027/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
摘要
视觉作为智能机器人的主要感知模式,在各种人机交互(HRI)场景中发挥着至关重要的作用。在某些情况下,必须利用视觉传感器为人类捕捉视频,协助人类执行探索任务等任务。然而,视频信息量的不断增加给数据传输和存储带来了巨大挑战。因此,迫切需要开发更高效的视频压缩策略来应对这一挑战。人类在感知视频时,往往会对一些特定片段给予更多关注,这些片段可能只占整个视频内容的一小部分,但却在很大程度上影响着感知质量。这种人类视觉注意力(VA)机制为优化 HRI 场景下的视频压缩方法提供了宝贵的灵感。因此,我们结合心理生理学范式和机器学习方法,对人类视觉注意力进行建模,并将其引入比特率分配,以充分利用有限的资源。具体来说,我们收集了人类观看视频时的脑电图(EEG)数据,构建了一个反映 VA 的 EEG 数据集。基于该数据集,我们提出了一个 VA 测量模型,以确定人类大脑底层反应中的 VA 状态。然后,建立大脑启发的 VA 预测模型,直接从视频中获取 VA 指标。最后,根据 VA 指标,为人类更关注的片段分配更多比特率。实验结果表明,我们提出的方法可以准确判断人类的 VA 状态,并预测不同视频片段所唤起的 VA 指标。此外,基于 VA 指标的比特率分配方法可以在低比特率下获得更好的感知质量。
Brain-Inspired Visual Attention Modeling Based on EEG for Intelligent Robotics
Vision, as the primary perceptual mode for intelligent robots, plays a crucial role in various human-robot interaction (HRI) scenarios. In certain situations, it is essential to utilize the visual sensors to capture videos for humans, assisting them in tasks like exploration missions. However, the increasing amount of video information brings great challenges for data transmission and storage. Therefore, there is an urgent need to develop more efficient video compression strategies to address this challenge. When perceiving a video, humans tend to pay more attention to some specific clips, which may occupy a small part of the whole video content, but largely affect the perceptual quality. This human visual attention (VA) mechanism provides valuable inspiration for optimizing video compression methods for HRI scenarios. Therefore, we combine psychophysiological paradigms and machine learning methods to model human VA and introduce it into the bitrate allocation to fully utilize the limited resources. Specifically, we collect electroencephalographic (EEG) data when humans watch videos, constructing an EEG dataset reflecting VA. Based on the dataset, we propose a VA measurement model to determine the VA states of humans in their underlying brain responses. Then, a brain-inspired VA prediction model is established to obtain VA metrics directly from the videos. Finally, based on the VA metric, more bitrates are allocated to the clips that humans pay more attention to. The experimental results show that our proposed methods can accurately determine the humans' VA states and predict the VA metrics evoked by different video clips. Furthermore, the bitrate allocation method based on the VA metric can achieve better perceptual quality at low bitrates.
期刊介绍:
The IEEE Journal of Selected Topics in Signal Processing (JSTSP) focuses on the Field of Interest of the IEEE Signal Processing Society, which encompasses the theory and application of various signal processing techniques. These techniques include filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals using digital or analog devices. The term "signal" covers a wide range of data types, including audio, video, speech, image, communication, geophysical, sonar, radar, medical, musical, and others.
The journal format allows for in-depth exploration of signal processing topics, enabling the Society to cover both established and emerging areas. This includes interdisciplinary fields such as biomedical engineering and language processing, as well as areas not traditionally associated with engineering.