Mennatullah Siam, Sara Elkerdawy, M. Gamal, Moemen Abdel-Razek, Martin Jägersand, Hong Zhang
{"title":"实时分割与外观,运动和几何","authors":"Mennatullah Siam, Sara Elkerdawy, M. Gamal, Moemen Abdel-Razek, Martin Jägersand, Hong Zhang","doi":"10.1109/IROS.2018.8594088","DOIUrl":null,"url":null,"abstract":"Real-time Segmentation is of crucial importance to robotics related applications such as autonomous driving, driving assisted systems, and traffic monitoring from unmanned aerial vehicles imagery. We propose a novel two-stream convolutional network for motion segmentation, which exploits flow and geometric cues to balance the accuracy and computational efficiency trade-offs. The geometric cues take advantage of the domain knowledge of the application. In case of mostly planar scenes from high altitude unmanned aerial vehicles (UAVs), homography compensated flow is used. While in the case of urban scenes in autonomous driving, with GPS/IMU sensory data available, sparse projected depth estimates and odometry information are used. The network provides 4.7⨯ speedup over the state of the art networks in motion segmentation from 153ms to 36ms, at the expense of a reduction in the segmentation accuracy in terms of pixel boundaries. This enables the network to perform real-time on a Jetson T⨯2. In order to recuperate some of the accuracy loss, geometric priors is used while still achieving a much improved computational efficiency with respect to the state-of-the-art. The usage of geometric priors improved the segmentation in UAV imagery by 5.2 % using the metric of IoU over the baseline network. While on KITTI-MoSeg the sparse depth estimates improved the segmentation by 12.5 % over the baseline. Our proposed motion segmentation solution is verified on the popular KITTI and VIVID datasets, with additional labels we have produced. The code for our work is publicly available at11https://github.com/MSiam/RTMotSeg_Geom","PeriodicalId":6640,"journal":{"name":"2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)","volume":"2021 1","pages":"5793-5800"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Real-Time Segmentation with Appearance, Motion and Geometry\",\"authors\":\"Mennatullah Siam, Sara Elkerdawy, M. Gamal, Moemen Abdel-Razek, Martin Jägersand, Hong Zhang\",\"doi\":\"10.1109/IROS.2018.8594088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Real-time Segmentation is of crucial importance to robotics related applications such as autonomous driving, driving assisted systems, and traffic monitoring from unmanned aerial vehicles imagery. We propose a novel two-stream convolutional network for motion segmentation, which exploits flow and geometric cues to balance the accuracy and computational efficiency trade-offs. The geometric cues take advantage of the domain knowledge of the application. In case of mostly planar scenes from high altitude unmanned aerial vehicles (UAVs), homography compensated flow is used. While in the case of urban scenes in autonomous driving, with GPS/IMU sensory data available, sparse projected depth estimates and odometry information are used. The network provides 4.7⨯ speedup over the state of the art networks in motion segmentation from 153ms to 36ms, at the expense of a reduction in the segmentation accuracy in terms of pixel boundaries. This enables the network to perform real-time on a Jetson T⨯2. In order to recuperate some of the accuracy loss, geometric priors is used while still achieving a much improved computational efficiency with respect to the state-of-the-art. The usage of geometric priors improved the segmentation in UAV imagery by 5.2 % using the metric of IoU over the baseline network. While on KITTI-MoSeg the sparse depth estimates improved the segmentation by 12.5 % over the baseline. Our proposed motion segmentation solution is verified on the popular KITTI and VIVID datasets, with additional labels we have produced. The code for our work is publicly available at11https://github.com/MSiam/RTMotSeg_Geom\",\"PeriodicalId\":6640,\"journal\":{\"name\":\"2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)\",\"volume\":\"2021 1\",\"pages\":\"5793-5800\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IROS.2018.8594088\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IROS.2018.8594088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Real-Time Segmentation with Appearance, Motion and Geometry
Real-time Segmentation is of crucial importance to robotics related applications such as autonomous driving, driving assisted systems, and traffic monitoring from unmanned aerial vehicles imagery. We propose a novel two-stream convolutional network for motion segmentation, which exploits flow and geometric cues to balance the accuracy and computational efficiency trade-offs. The geometric cues take advantage of the domain knowledge of the application. In case of mostly planar scenes from high altitude unmanned aerial vehicles (UAVs), homography compensated flow is used. While in the case of urban scenes in autonomous driving, with GPS/IMU sensory data available, sparse projected depth estimates and odometry information are used. The network provides 4.7⨯ speedup over the state of the art networks in motion segmentation from 153ms to 36ms, at the expense of a reduction in the segmentation accuracy in terms of pixel boundaries. This enables the network to perform real-time on a Jetson T⨯2. In order to recuperate some of the accuracy loss, geometric priors is used while still achieving a much improved computational efficiency with respect to the state-of-the-art. The usage of geometric priors improved the segmentation in UAV imagery by 5.2 % using the metric of IoU over the baseline network. While on KITTI-MoSeg the sparse depth estimates improved the segmentation by 12.5 % over the baseline. Our proposed motion segmentation solution is verified on the popular KITTI and VIVID datasets, with additional labels we have produced. The code for our work is publicly available at11https://github.com/MSiam/RTMotSeg_Geom