Ziya Yu, Chi Zhang, Linyuan Wang, Li Tong, Bin Yan
{"title":"不同目标驱动cnn对基于深度学习的视觉编码模型性能的影响","authors":"Ziya Yu, Chi Zhang, Linyuan Wang, Li Tong, Bin Yan","doi":"10.1145/3354031.3354045","DOIUrl":null,"url":null,"abstract":"A convolutional neural network with outstanding performance in computer vision can be used to construct an encoding model that simulates the process of human visual information processing. However, training goal of the network may have impacted the performance of encoding model. Most neural networks used to establish encoding models in the past were performed image classification task, the task of which is single. While in the process of human's visual perception, multiple tasks are performed simultaneously. Thus, the existing encoding model does not well satisfy the diversity and complexity of the human visual mechanism. In this paper, we first established a feature extraction model based on Fully Convolutional Network (FCN) and Visual Geometry Group (VGG) with similar network structure but different training goal, and employed Regularize Orthogonal Matching Pursuit (ROMP) to establish the response model, which can predict the stimuli-evoked responses measured by functional magnetic resonance imaging (fMRI). The results revealed that the convolutional neural networks trained by different visual tasks had significant difference in the performance of visual encoding with almost the same network structure. The VGG-based encoding model can achieve a higher performance in most voxels of ROIs. We concluded that classification task in computer vision can better fit the visual mechanism of human compared to visual segmentation task.","PeriodicalId":286321,"journal":{"name":"Proceedings of the 4th International Conference on Biomedical Signal and Image Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Different Goal-driven CNNs Affect Performance of Visual Encoding Models based on Deep Learning\",\"authors\":\"Ziya Yu, Chi Zhang, Linyuan Wang, Li Tong, Bin Yan\",\"doi\":\"10.1145/3354031.3354045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A convolutional neural network with outstanding performance in computer vision can be used to construct an encoding model that simulates the process of human visual information processing. However, training goal of the network may have impacted the performance of encoding model. Most neural networks used to establish encoding models in the past were performed image classification task, the task of which is single. While in the process of human's visual perception, multiple tasks are performed simultaneously. Thus, the existing encoding model does not well satisfy the diversity and complexity of the human visual mechanism. In this paper, we first established a feature extraction model based on Fully Convolutional Network (FCN) and Visual Geometry Group (VGG) with similar network structure but different training goal, and employed Regularize Orthogonal Matching Pursuit (ROMP) to establish the response model, which can predict the stimuli-evoked responses measured by functional magnetic resonance imaging (fMRI). The results revealed that the convolutional neural networks trained by different visual tasks had significant difference in the performance of visual encoding with almost the same network structure. The VGG-based encoding model can achieve a higher performance in most voxels of ROIs. We concluded that classification task in computer vision can better fit the visual mechanism of human compared to visual segmentation task.\",\"PeriodicalId\":286321,\"journal\":{\"name\":\"Proceedings of the 4th International Conference on Biomedical Signal and Image Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th International Conference on Biomedical Signal and Image Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3354031.3354045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Biomedical Signal and Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3354031.3354045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Different Goal-driven CNNs Affect Performance of Visual Encoding Models based on Deep Learning
A convolutional neural network with outstanding performance in computer vision can be used to construct an encoding model that simulates the process of human visual information processing. However, training goal of the network may have impacted the performance of encoding model. Most neural networks used to establish encoding models in the past were performed image classification task, the task of which is single. While in the process of human's visual perception, multiple tasks are performed simultaneously. Thus, the existing encoding model does not well satisfy the diversity and complexity of the human visual mechanism. In this paper, we first established a feature extraction model based on Fully Convolutional Network (FCN) and Visual Geometry Group (VGG) with similar network structure but different training goal, and employed Regularize Orthogonal Matching Pursuit (ROMP) to establish the response model, which can predict the stimuli-evoked responses measured by functional magnetic resonance imaging (fMRI). The results revealed that the convolutional neural networks trained by different visual tasks had significant difference in the performance of visual encoding with almost the same network structure. The VGG-based encoding model can achieve a higher performance in most voxels of ROIs. We concluded that classification task in computer vision can better fit the visual mechanism of human compared to visual segmentation task.