Automated construction of cognitive maps with visual predictive coding

IF 18.8 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Nature Machine Intelligence Pub Date : 2024-07-18 DOI:10.1038/s42256-024-00863-1

James Gornet, Matt Thomson

{"title":"Automated construction of cognitive maps with visual predictive coding","authors":"James Gornet, Matt Thomson","doi":"10.1038/s42256-024-00863-1","DOIUrl":null,"url":null,"abstract":"Humans construct internal cognitive maps of their environment directly from sensory inputs without access to a system of explicit coordinates or distance measurements. Although machine learning algorithms like simultaneous localization and mapping utilize specialized inference procedures to identify visual features and construct spatial maps from visual and odometry data, the general nature of cognitive maps in the brain suggests a unified mapping algorithmic strategy that can generalize to auditory, tactile and linguistic inputs. Here we demonstrate that predictive coding provides a natural and versatile neural network algorithm for constructing spatial maps using sensory data. We introduce a framework in which an agent navigates a virtual environment while engaging in visual predictive coding using a self-attention-equipped convolutional neural network. While learning a next-image prediction task, the agent automatically constructs an internal representation of the environment that quantitatively reflects spatial distances. The internal map enables the agent to pinpoint its location relative to landmarks using only visual information.The predictive coding network generates a vectorized encoding of the environment that supports vector navigation, where individual latent space units delineate localized, overlapping neighbourhoods in the environment. Broadly, our work introduces predictive coding as a unified algorithmic framework for constructing cognitive maps that can naturally extend to the mapping of auditory, sensorimotor and linguistic inputs. Constructing spatial maps from sensory inputs is challenging in both neuroscience and artificial intelligence. Gornet and Thomson show that as an agent navigates an environment, a self-attention neural network using predictive coding can recover the environment’s map in its latent space.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"6 7","pages":"820-833"},"PeriodicalIF":18.8000,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-024-00863-1.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.nature.com/articles/s42256-024-00863-1","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Humans construct internal cognitive maps of their environment directly from sensory inputs without access to a system of explicit coordinates or distance measurements. Although machine learning algorithms like simultaneous localization and mapping utilize specialized inference procedures to identify visual features and construct spatial maps from visual and odometry data, the general nature of cognitive maps in the brain suggests a unified mapping algorithmic strategy that can generalize to auditory, tactile and linguistic inputs. Here we demonstrate that predictive coding provides a natural and versatile neural network algorithm for constructing spatial maps using sensory data. We introduce a framework in which an agent navigates a virtual environment while engaging in visual predictive coding using a self-attention-equipped convolutional neural network. While learning a next-image prediction task, the agent automatically constructs an internal representation of the environment that quantitatively reflects spatial distances. The internal map enables the agent to pinpoint its location relative to landmarks using only visual information.The predictive coding network generates a vectorized encoding of the environment that supports vector navigation, where individual latent space units delineate localized, overlapping neighbourhoods in the environment. Broadly, our work introduces predictive coding as a unified algorithmic framework for constructing cognitive maps that can naturally extend to the mapping of auditory, sensorimotor and linguistic inputs. Constructing spatial maps from sensory inputs is challenging in both neuroscience and artificial intelligence. Gornet and Thomson show that as an agent navigates an environment, a self-attention neural network using predictive coding can recover the environment’s map in its latent space.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用视觉预测编码自动构建认知地图

人类直接从感官输入构建内部环境认知地图，而无需使用明确的坐标或距离测量系统。虽然机器学习算法（如同步定位和映射）利用专门的推理程序来识别视觉特征，并从视觉和里程数据中构建空间地图，但大脑中认知地图的普遍性表明，一种统一的映射算法策略可以推广到听觉、触觉和语言输入。在这里，我们证明了预测编码为利用感官数据构建空间地图提供了一种自然、通用的神经网络算法。我们介绍了一个框架，在这个框架中，一个代理在虚拟环境中进行导航，同时利用一个配备自我注意力的卷积神经网络进行视觉预测编码。在学习下一个图像预测任务时，代理会自动构建一个定量反映空间距离的内部环境表征。预测编码网络可生成支持矢量导航的环境矢量化编码，其中单个潜在空间单元可划定环境中局部重叠的邻域。从广义上讲，我们的工作将预测编码引入了构建认知地图的统一算法框架，该框架可自然扩展到听觉、感觉运动和语言输入的映射。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Nature Machine Intelligence Multiple-

CiteScore

36.90

自引率

2.10%

发文量

127

期刊介绍： Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements. To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects. Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.

期刊最新文献

Large language models that replace human participants can harmfully misportray and flatten identity groups Rethinking machine unlearning for large language models Image-based generation for molecule design with SketchMol Towards a more inductive world for drug repurposing approaches Benchmarking AI-powered docking methods from the perspective of virtual screening