{"title":"OVL-MAP: An Online Visual Language Map Approach for Vision-and-Language Navigation in Continuous Environments","authors":"Shuhuan Wen;Ziyuan Zhang;Yuxiang Sun;Zhiwen Wang","doi":"10.1109/LRA.2025.3540577","DOIUrl":null,"url":null,"abstract":"Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to navigate 3D environments based on visual observations and natural language instructions. Existing approaches, focused on topological and semantic maps, often face limitations in accurately understanding and adapting to complex or previously unseen environments, particularly due to static and offline map constructions. To address these challenges, this letter proposes OVL-MAP, an innovative algorithm comprising three key modules: an online vision-and-language map construction module, a waypoint prediction module, and an action decision module. The online map construction module leverages robust open-vocabulary semantic segmentation to dynamically enhance the agent's scene understanding. The waypoint prediction module processes natural language instructions to identify task-relevant regions, predict sub-goal locations, and guide trajectory planning. The action decision module utilizes the DD-PPO strategy for effective navigation. Evaluations on the Robo-VLN and R2R-CE datasets demonstrate that OVL-MAP significantly improves navigation performance and exhibits stronger generalization in unknown environments.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3294-3301"},"PeriodicalIF":4.6000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10879413/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to navigate 3D environments based on visual observations and natural language instructions. Existing approaches, focused on topological and semantic maps, often face limitations in accurately understanding and adapting to complex or previously unseen environments, particularly due to static and offline map constructions. To address these challenges, this letter proposes OVL-MAP, an innovative algorithm comprising three key modules: an online vision-and-language map construction module, a waypoint prediction module, and an action decision module. The online map construction module leverages robust open-vocabulary semantic segmentation to dynamically enhance the agent's scene understanding. The waypoint prediction module processes natural language instructions to identify task-relevant regions, predict sub-goal locations, and guide trajectory planning. The action decision module utilizes the DD-PPO strategy for effective navigation. Evaluations on the Robo-VLN and R2R-CE datasets demonstrate that OVL-MAP significantly improves navigation performance and exhibits stronger generalization in unknown environments.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.