Xuezhi Xiang , Xiankun Zhou , Yingxin Wei , Xi Wang , Yulong Qiao
{"title":"Scene flow estimation from point cloud based on grouped relative self-attention","authors":"Xuezhi Xiang , Xiankun Zhou , Yingxin Wei , Xi Wang , Yulong Qiao","doi":"10.1016/j.imavis.2024.105368","DOIUrl":null,"url":null,"abstract":"<div><div>3D scene flow estimation is a fundamental task in computer vision, which aims to estimate the 3D motions of point clouds. The point cloud is disordered, and the point density in the local area of the same object is non-uniform. The features extracted by previous methods are not discriminative enough to obtain accurate scene flow. Besides, scene flow may be misestimated when two adjacent frames of point clouds have large movements. From our observation, the quality of point cloud feature extraction and the correlations of two-frame point clouds directly affect the accuracy of scene flow estimation. Therefore, we propose an improved self-attention structure named Grouped Relative Self-Attention (GRSA) that simultaneously utilizes the grouping operation and offset subtraction operation with normalization refinement to process point clouds. Specifically, we embed the Grouped Relative Self-Attention (GRSA) into feature extraction and each stage of flow refinement to gain lightweight but efficient self-attention respectively, which can extract discriminative point features and enhance the point correlations to be more adaptable to long-distance movements. Furthermore, we use a comprehensive loss function to avoid outliers and obtain robust results. We evaluate our method on the FlyingThings3D and KITTI datasets and achieve superior performance. In particular, our method outperforms all other methods on the FlyingThings3D dataset, where Outliers achieves a 16.9% improvement. On the KITTI dataset, Outliers also achieves a 6.7% improvement.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105368"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004736","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
3D scene flow estimation is a fundamental task in computer vision, which aims to estimate the 3D motions of point clouds. The point cloud is disordered, and the point density in the local area of the same object is non-uniform. The features extracted by previous methods are not discriminative enough to obtain accurate scene flow. Besides, scene flow may be misestimated when two adjacent frames of point clouds have large movements. From our observation, the quality of point cloud feature extraction and the correlations of two-frame point clouds directly affect the accuracy of scene flow estimation. Therefore, we propose an improved self-attention structure named Grouped Relative Self-Attention (GRSA) that simultaneously utilizes the grouping operation and offset subtraction operation with normalization refinement to process point clouds. Specifically, we embed the Grouped Relative Self-Attention (GRSA) into feature extraction and each stage of flow refinement to gain lightweight but efficient self-attention respectively, which can extract discriminative point features and enhance the point correlations to be more adaptable to long-distance movements. Furthermore, we use a comprehensive loss function to avoid outliers and obtain robust results. We evaluate our method on the FlyingThings3D and KITTI datasets and achieve superior performance. In particular, our method outperforms all other methods on the FlyingThings3D dataset, where Outliers achieves a 16.9% improvement. On the KITTI dataset, Outliers also achieves a 6.7% improvement.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.