{"title":"视觉表征的通用维度","authors":"Zirui Chen, Michael F. Bonner","doi":"arxiv-2408.12804","DOIUrl":null,"url":null,"abstract":"Do neural network models of vision learn brain-aligned representations\nbecause they share architectural constraints and task objectives with\nbiological vision or because they learn universal features of natural image\nprocessing? We characterized the universality of hundreds of thousands of\nrepresentational dimensions from visual neural networks with varied\nconstruction. We found that networks with varied architectures and task\nobjectives learn to represent natural images using a shared set of latent\ndimensions, despite appearing highly distinct at a surface level. Next, by\ncomparing these networks with human brain representations measured with fMRI,\nwe found that the most brain-aligned representations in neural networks are\nthose that are universal and independent of a network's specific\ncharacteristics. Remarkably, each network can be reduced to fewer than ten of\nits most universal dimensions with little impact on its representational\nsimilarity to the human brain. These results suggest that the underlying\nsimilarities between artificial and biological vision are primarily governed by\na core set of universal image representations that are convergently learned by\ndiverse systems.","PeriodicalId":501517,"journal":{"name":"arXiv - QuanBio - Neurons and Cognition","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Universal dimensions of visual representation\",\"authors\":\"Zirui Chen, Michael F. Bonner\",\"doi\":\"arxiv-2408.12804\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Do neural network models of vision learn brain-aligned representations\\nbecause they share architectural constraints and task objectives with\\nbiological vision or because they learn universal features of natural image\\nprocessing? We characterized the universality of hundreds of thousands of\\nrepresentational dimensions from visual neural networks with varied\\nconstruction. We found that networks with varied architectures and task\\nobjectives learn to represent natural images using a shared set of latent\\ndimensions, despite appearing highly distinct at a surface level. Next, by\\ncomparing these networks with human brain representations measured with fMRI,\\nwe found that the most brain-aligned representations in neural networks are\\nthose that are universal and independent of a network's specific\\ncharacteristics. Remarkably, each network can be reduced to fewer than ten of\\nits most universal dimensions with little impact on its representational\\nsimilarity to the human brain. These results suggest that the underlying\\nsimilarities between artificial and biological vision are primarily governed by\\na core set of universal image representations that are convergently learned by\\ndiverse systems.\",\"PeriodicalId\":501517,\"journal\":{\"name\":\"arXiv - QuanBio - Neurons and Cognition\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Neurons and Cognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.12804\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Neurons and Cognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.12804","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Do neural network models of vision learn brain-aligned representations
because they share architectural constraints and task objectives with
biological vision or because they learn universal features of natural image
processing? We characterized the universality of hundreds of thousands of
representational dimensions from visual neural networks with varied
construction. We found that networks with varied architectures and task
objectives learn to represent natural images using a shared set of latent
dimensions, despite appearing highly distinct at a surface level. Next, by
comparing these networks with human brain representations measured with fMRI,
we found that the most brain-aligned representations in neural networks are
those that are universal and independent of a network's specific
characteristics. Remarkably, each network can be reduced to fewer than ten of
its most universal dimensions with little impact on its representational
similarity to the human brain. These results suggest that the underlying
similarities between artificial and biological vision are primarily governed by
a core set of universal image representations that are convergently learned by
diverse systems.