用于动作识别的人类视觉通路与深度卷积神经网络：后期层而非早期层的表征对应性。

IF 3 3区医学 Q2 NEUROSCIENCES Journal of Cognitive Neuroscience Pub Date : 2024-10-29 DOI:10.1162/jocn_a_02233

Yujia Peng;Xizi Gong;Hongjing Lu;Fang Fang

{"title":"用于动作识别的人类视觉通路与深度卷积神经网络：后期层而非早期层的表征对应性。","authors":"Yujia Peng;Xizi Gong;Hongjing Lu;Fang Fang","doi":"10.1162/jocn_a_02233","DOIUrl":null,"url":null,"abstract":"Deep convolutional neural networks (DCNNs) have attained human-level performance for object categorization and exhibited representation alignment between network layers and brain regions. Does such representation alignment naturally extend to other visual tasks beyond recognizing objects in static images? In this study, we expanded the exploration to the recognition of human actions from videos and assessed the representation capabilities and alignment of two-stream DCNNs in comparison with brain regions situated along ventral and dorsal pathways. Using decoding analysis and representational similarity analysis, we show that DCNN models do not show hierarchical representation alignment to human brain across visual regions when processing action videos. Instead, later layers of DCNN models demonstrate greater representation similarities to the human visual cortex. These findings were revealed for two display formats: photorealistic avatars with full-body information and simplified stimuli in the point-light display. The discrepancies in representation alignment suggest fundamental differences in how DCNNs and the human brain represent dynamic visual information related to actions.","PeriodicalId":51081,"journal":{"name":"Journal of Cognitive Neuroscience","volume":"36 11","pages":"2458-2480"},"PeriodicalIF":3.0000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10738325","citationCount":"0","resultStr":"{\"title\":\"Human Visual Pathways for Action Recognition versus Deep Convolutional Neural Networks: Representation Correspondence in Late but Not Early Layers\",\"authors\":\"Yujia Peng;Xizi Gong;Hongjing Lu;Fang Fang\",\"doi\":\"10.1162/jocn_a_02233\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep convolutional neural networks (DCNNs) have attained human-level performance for object categorization and exhibited representation alignment between network layers and brain regions. Does such representation alignment naturally extend to other visual tasks beyond recognizing objects in static images? In this study, we expanded the exploration to the recognition of human actions from videos and assessed the representation capabilities and alignment of two-stream DCNNs in comparison with brain regions situated along ventral and dorsal pathways. Using decoding analysis and representational similarity analysis, we show that DCNN models do not show hierarchical representation alignment to human brain across visual regions when processing action videos. Instead, later layers of DCNN models demonstrate greater representation similarities to the human visual cortex. These findings were revealed for two display formats: photorealistic avatars with full-body information and simplified stimuli in the point-light display. The discrepancies in representation alignment suggest fundamental differences in how DCNNs and the human brain represent dynamic visual information related to actions.\",\"PeriodicalId\":51081,\"journal\":{\"name\":\"Journal of Cognitive Neuroscience\",\"volume\":\"36 11\",\"pages\":\"2458-2480\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10738325\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cognitive Neuroscience\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10738325/\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"NEUROSCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cognitive Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://ieeexplore.ieee.org/document/10738325/","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"NEUROSCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

深度卷积神经网络（DCNN）在物体分类方面的表现已达到人类水平，并显示出网络层与大脑区域之间的表征一致性。除了识别静态图像中的物体，这种表征一致性是否还能自然扩展到其他视觉任务？在这项研究中，我们将探索范围扩大到从视频中识别人类动作，并评估了双流 DCNN 与位于腹侧和背侧通路的脑区的表征能力和一致性。通过解码分析和表征相似性分析，我们发现 DCNN 模型在处理动作视频时并没有显示出与人脑各视觉区域的分层表征一致性。相反，DCNN 模型的后几层与人类视觉皮层表现出更大的表征相似性。这些发现针对两种显示格式：具有全身信息的逼真化身和点光源显示屏中的简化刺激。表征一致性的差异表明，DCNN 和人类大脑在如何表征与动作相关的动态视觉信息方面存在根本差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Human Visual Pathways for Action Recognition versus Deep Convolutional Neural Networks: Representation Correspondence in Late but Not Early Layers

Deep convolutional neural networks (DCNNs) have attained human-level performance for object categorization and exhibited representation alignment between network layers and brain regions. Does such representation alignment naturally extend to other visual tasks beyond recognizing objects in static images? In this study, we expanded the exploration to the recognition of human actions from videos and assessed the representation capabilities and alignment of two-stream DCNNs in comparison with brain regions situated along ventral and dorsal pathways. Using decoding analysis and representational similarity analysis, we show that DCNN models do not show hierarchical representation alignment to human brain across visual regions when processing action videos. Instead, later layers of DCNN models demonstrate greater representation similarities to the human visual cortex. These findings were revealed for two display formats: photorealistic avatars with full-body information and simplified stimuli in the point-light display. The discrepancies in representation alignment suggest fundamental differences in how DCNNs and the human brain represent dynamic visual information related to actions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊