{"title":"RSTNet:用于估计深度和自我运动的递归时空网络","authors":"Tuo Feng;Dongbing Gu","doi":"10.1109/TETCI.2024.3360329","DOIUrl":null,"url":null,"abstract":"Depth map and ego-motion estimations from monocular consecutive images are challenging to unsupervised learning Visual Odometry (VO) approaches. This paper proposes a novel VO architecture: Recurrent Spatial-Temporal Network (RSTNet), which can estimate the depth map and ego-motion from monocular consecutive images. The main contributions in this paper include a novel RST-encoder layer and its corresponding RST-decoder layer, which can preserve and recover spatial and temporal features from inputs. Our RSTNet extracts appearance features from input images, and extracts structure and temporal features from intermediate results for ego-motion estimation. Our RSTNet also includes a pre-trained network to detect dynamic objects from the difference between full and rigid optical flows. A novel auto-mask scheme is designed in the loss function to deal with some challenging scenes. Our evaluation results on the KITTI odometry benchmark show our RSTNet outperforms some of the existing unsupervised learning approaches.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RSTNet: Recurrent Spatial-Temporal Networks for Estimating Depth and Ego-Motion\",\"authors\":\"Tuo Feng;Dongbing Gu\",\"doi\":\"10.1109/TETCI.2024.3360329\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Depth map and ego-motion estimations from monocular consecutive images are challenging to unsupervised learning Visual Odometry (VO) approaches. This paper proposes a novel VO architecture: Recurrent Spatial-Temporal Network (RSTNet), which can estimate the depth map and ego-motion from monocular consecutive images. The main contributions in this paper include a novel RST-encoder layer and its corresponding RST-decoder layer, which can preserve and recover spatial and temporal features from inputs. Our RSTNet extracts appearance features from input images, and extracts structure and temporal features from intermediate results for ego-motion estimation. Our RSTNet also includes a pre-trained network to detect dynamic objects from the difference between full and rigid optical flows. A novel auto-mask scheme is designed in the loss function to deal with some challenging scenes. Our evaluation results on the KITTI odometry benchmark show our RSTNet outperforms some of the existing unsupervised learning approaches.\",\"PeriodicalId\":13135,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2024-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10438027/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10438027/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
RSTNet: Recurrent Spatial-Temporal Networks for Estimating Depth and Ego-Motion
Depth map and ego-motion estimations from monocular consecutive images are challenging to unsupervised learning Visual Odometry (VO) approaches. This paper proposes a novel VO architecture: Recurrent Spatial-Temporal Network (RSTNet), which can estimate the depth map and ego-motion from monocular consecutive images. The main contributions in this paper include a novel RST-encoder layer and its corresponding RST-decoder layer, which can preserve and recover spatial and temporal features from inputs. Our RSTNet extracts appearance features from input images, and extracts structure and temporal features from intermediate results for ego-motion estimation. Our RSTNet also includes a pre-trained network to detect dynamic objects from the difference between full and rigid optical flows. A novel auto-mask scheme is designed in the loss function to deal with some challenging scenes. Our evaluation results on the KITTI odometry benchmark show our RSTNet outperforms some of the existing unsupervised learning approaches.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.