Andrea Portscher, Sebastian Stabinger, A. Rodríguez-Sánchez
{"title":"基于卷积神经网络的混合图像注意力评价","authors":"Andrea Portscher, Sebastian Stabinger, A. Rodríguez-Sánchez","doi":"10.1109/IPAS55744.2022.10052853","DOIUrl":null,"url":null,"abstract":"In neuroscientific experiments, blended images are used to examine how attention mechanisms in the human brain work. They are particularly suited for this research area, as a subject needs to focus on particular features in an image to be able to classify superimposed objects. As Convolutional Neural Networks (CNNs) take some inspiration from the mammalian visual system – such as the hierarchical structure where different levels of abstraction are processed on different network layers – we examine how CNNs perform on this task. More specifically, we evaluate the performance of four popular CNN architectures (ResNet18, ResNet50, CORnet-Z, and Inception V3) on the classification of objects in blended images. Since humans can rather easily solve this task by applying object-based attention, we also augment all architectures with a multi-headed self-attention mechanism to examine its effect on performance. Lastly, we analyse if there is a correlation between the similarity of a network architecture's structure to the human visual system and its ability to correctly classify objects in blended images. Our findings showed that adding a self-attention mechanism reliably increases the similarity to the V4 area of the human ventral stream, an area where attention has a large influence on the processing of visual stimuli.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"Five 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating Attention in Convolutional Neural Networks for Blended Images\",\"authors\":\"Andrea Portscher, Sebastian Stabinger, A. Rodríguez-Sánchez\",\"doi\":\"10.1109/IPAS55744.2022.10052853\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In neuroscientific experiments, blended images are used to examine how attention mechanisms in the human brain work. They are particularly suited for this research area, as a subject needs to focus on particular features in an image to be able to classify superimposed objects. As Convolutional Neural Networks (CNNs) take some inspiration from the mammalian visual system – such as the hierarchical structure where different levels of abstraction are processed on different network layers – we examine how CNNs perform on this task. More specifically, we evaluate the performance of four popular CNN architectures (ResNet18, ResNet50, CORnet-Z, and Inception V3) on the classification of objects in blended images. Since humans can rather easily solve this task by applying object-based attention, we also augment all architectures with a multi-headed self-attention mechanism to examine its effect on performance. Lastly, we analyse if there is a correlation between the similarity of a network architecture's structure to the human visual system and its ability to correctly classify objects in blended images. Our findings showed that adding a self-attention mechanism reliably increases the similarity to the V4 area of the human ventral stream, an area where attention has a large influence on the processing of visual stimuli.\",\"PeriodicalId\":322228,\"journal\":{\"name\":\"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)\",\"volume\":\"Five 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPAS55744.2022.10052853\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPAS55744.2022.10052853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluating Attention in Convolutional Neural Networks for Blended Images
In neuroscientific experiments, blended images are used to examine how attention mechanisms in the human brain work. They are particularly suited for this research area, as a subject needs to focus on particular features in an image to be able to classify superimposed objects. As Convolutional Neural Networks (CNNs) take some inspiration from the mammalian visual system – such as the hierarchical structure where different levels of abstraction are processed on different network layers – we examine how CNNs perform on this task. More specifically, we evaluate the performance of four popular CNN architectures (ResNet18, ResNet50, CORnet-Z, and Inception V3) on the classification of objects in blended images. Since humans can rather easily solve this task by applying object-based attention, we also augment all architectures with a multi-headed self-attention mechanism to examine its effect on performance. Lastly, we analyse if there is a correlation between the similarity of a network architecture's structure to the human visual system and its ability to correctly classify objects in blended images. Our findings showed that adding a self-attention mechanism reliably increases the similarity to the V4 area of the human ventral stream, an area where attention has a large influence on the processing of visual stimuli.