Maytus Piriyajitakonkij, Sirawaj Itthipuripat, Ian Ballard, Ioannis Pappas
{"title":"是什么让一张脸看起来像一顶帽子用图像三胞胎解耦低级和高级视觉属性","authors":"Maytus Piriyajitakonkij, Sirawaj Itthipuripat, Ian Ballard, Ioannis Pappas","doi":"arxiv-2409.02241","DOIUrl":null,"url":null,"abstract":"In visual decision making, high-level features, such as object categories,\nhave a strong influence on choice. However, the impact of low-level features on\nbehavior is less understood partly due to the high correlation between high-\nand low-level features in the stimuli presented (e.g., objects of the same\ncategory are more likely to share low-level features). To disentangle these\neffects, we propose a method that de-correlates low- and high-level visual\nproperties in a novel set of stimuli. Our method uses two Convolutional Neural\nNetworks (CNNs) as candidate models of the ventral visual stream: the CORnet-S\nthat has high neural predictivity in high-level, IT-like responses and the\nVGG-16 that has high neural predictivity in low-level responses. Triplets\n(root, image1, image2) of stimuli are parametrized by the level of low- and\nhigh-level similarity of images extracted from the different layers. These\nstimuli are then used in a decision-making task where participants are tasked\nto choose the most similar-to-the-root image. We found that different networks\nshow differing abilities to predict the effects of low-versus-high-level\nsimilarity: while CORnet-S outperforms VGG-16 in explaining human choices based\non high-level similarity, VGG-16 outperforms CORnet-S in explaining human\nchoices based on low-level similarity. Using Brain-Score, we observed that the\nbehavioral prediction abilities of different layers of these networks\nqualitatively corresponded to their ability to explain neural activity at\ndifferent levels of the visual hierarchy. In summary, our algorithm for\nstimulus set generation enables the study of how different representations in\nthe visual stream affect high-level cognitive behaviors.","PeriodicalId":501517,"journal":{"name":"arXiv - QuanBio - Neurons and Cognition","volume":"38 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"What makes a face looks like a hat: Decoupling low-level and high-level Visual Properties with Image Triplets\",\"authors\":\"Maytus Piriyajitakonkij, Sirawaj Itthipuripat, Ian Ballard, Ioannis Pappas\",\"doi\":\"arxiv-2409.02241\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In visual decision making, high-level features, such as object categories,\\nhave a strong influence on choice. However, the impact of low-level features on\\nbehavior is less understood partly due to the high correlation between high-\\nand low-level features in the stimuli presented (e.g., objects of the same\\ncategory are more likely to share low-level features). To disentangle these\\neffects, we propose a method that de-correlates low- and high-level visual\\nproperties in a novel set of stimuli. Our method uses two Convolutional Neural\\nNetworks (CNNs) as candidate models of the ventral visual stream: the CORnet-S\\nthat has high neural predictivity in high-level, IT-like responses and the\\nVGG-16 that has high neural predictivity in low-level responses. Triplets\\n(root, image1, image2) of stimuli are parametrized by the level of low- and\\nhigh-level similarity of images extracted from the different layers. These\\nstimuli are then used in a decision-making task where participants are tasked\\nto choose the most similar-to-the-root image. We found that different networks\\nshow differing abilities to predict the effects of low-versus-high-level\\nsimilarity: while CORnet-S outperforms VGG-16 in explaining human choices based\\non high-level similarity, VGG-16 outperforms CORnet-S in explaining human\\nchoices based on low-level similarity. Using Brain-Score, we observed that the\\nbehavioral prediction abilities of different layers of these networks\\nqualitatively corresponded to their ability to explain neural activity at\\ndifferent levels of the visual hierarchy. In summary, our algorithm for\\nstimulus set generation enables the study of how different representations in\\nthe visual stream affect high-level cognitive behaviors.\",\"PeriodicalId\":501517,\"journal\":{\"name\":\"arXiv - QuanBio - Neurons and Cognition\",\"volume\":\"38 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Neurons and Cognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.02241\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Neurons and Cognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.02241","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
What makes a face looks like a hat: Decoupling low-level and high-level Visual Properties with Image Triplets
In visual decision making, high-level features, such as object categories,
have a strong influence on choice. However, the impact of low-level features on
behavior is less understood partly due to the high correlation between high-
and low-level features in the stimuli presented (e.g., objects of the same
category are more likely to share low-level features). To disentangle these
effects, we propose a method that de-correlates low- and high-level visual
properties in a novel set of stimuli. Our method uses two Convolutional Neural
Networks (CNNs) as candidate models of the ventral visual stream: the CORnet-S
that has high neural predictivity in high-level, IT-like responses and the
VGG-16 that has high neural predictivity in low-level responses. Triplets
(root, image1, image2) of stimuli are parametrized by the level of low- and
high-level similarity of images extracted from the different layers. These
stimuli are then used in a decision-making task where participants are tasked
to choose the most similar-to-the-root image. We found that different networks
show differing abilities to predict the effects of low-versus-high-level
similarity: while CORnet-S outperforms VGG-16 in explaining human choices based
on high-level similarity, VGG-16 outperforms CORnet-S in explaining human
choices based on low-level similarity. Using Brain-Score, we observed that the
behavioral prediction abilities of different layers of these networks
qualitatively corresponded to their ability to explain neural activity at
different levels of the visual hierarchy. In summary, our algorithm for
stimulus set generation enables the study of how different representations in
the visual stream affect high-level cognitive behaviors.