通过层次平滑度细化缩小场景流估计中的领域差距

IF 5.2 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-04-27 DOI:10.1145/3661823

Dejun Zhang, Mian Zhang, Xuefeng Tan, Jun Liu

{"title":"通过层次平滑度细化缩小场景流估计中的领域差距","authors":"Dejun Zhang, Mian Zhang, Xuefeng Tan, Jun Liu","doi":"10.1145/3661823","DOIUrl":null,"url":null,"abstract":"<p>This paper introduces SmoothFlowNet3D, an innovative encoder-decoder architecture specifically designed for bridging the domain gap in scene flow estimation. To achieve this goal, SmoothFlowNet3D divides the scene flow estimation task into two stages: initial scene flow estimation and smoothness refinement. Specifically, SmoothFlowNet3D comprises a hierarchical encoder that extracts multi-scale point cloud features from two consecutive frames, along with a hierarchical decoder responsible for predicting the initial scene flow and further refining it to achieve smoother estimation. To generate the initial scene flow, a cross-frame nearest neighbor search operation is performed between the features extracted from two consecutive frames, resulting in forward and backward flow embeddings. These embeddings are then combined to form the bidirectional flow embedding, serving as input for predicting the initial scene flow. Additionally, a flow smoothing module based on the self-attention mechanism is proposed to predict the smoothing error and facilitate the refinement of the initial scene flow for more accurate and smoother estimation results. Extensive experiments demonstrate that the proposed SmoothFlowNet3D approach achieves state-of-the-art performance on both synthetic datasets and real LiDAR point clouds, confirming its effectiveness in enhancing scene flow smoothness.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"1 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bridging the Domain Gap in Scene Flow Estimation via Hierarchical Smoothness Refinement\",\"authors\":\"Dejun Zhang, Mian Zhang, Xuefeng Tan, Jun Liu\",\"doi\":\"10.1145/3661823\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This paper introduces SmoothFlowNet3D, an innovative encoder-decoder architecture specifically designed for bridging the domain gap in scene flow estimation. To achieve this goal, SmoothFlowNet3D divides the scene flow estimation task into two stages: initial scene flow estimation and smoothness refinement. Specifically, SmoothFlowNet3D comprises a hierarchical encoder that extracts multi-scale point cloud features from two consecutive frames, along with a hierarchical decoder responsible for predicting the initial scene flow and further refining it to achieve smoother estimation. To generate the initial scene flow, a cross-frame nearest neighbor search operation is performed between the features extracted from two consecutive frames, resulting in forward and backward flow embeddings. These embeddings are then combined to form the bidirectional flow embedding, serving as input for predicting the initial scene flow. Additionally, a flow smoothing module based on the self-attention mechanism is proposed to predict the smoothing error and facilitate the refinement of the initial scene flow for more accurate and smoother estimation results. Extensive experiments demonstrate that the proposed SmoothFlowNet3D approach achieves state-of-the-art performance on both synthetic datasets and real LiDAR point clouds, confirming its effectiveness in enhancing scene flow smoothness.</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3661823\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3661823","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了 SmoothFlowNet3D，这是一种创新的编码器-解码器架构，专门用于缩小场景流估计领域的差距。为实现这一目标，SmoothFlowNet3D 将场景流估算任务分为两个阶段：初始场景流估算和平滑度细化。具体来说，SmoothFlowNet3D 由一个分层编码器和一个分层解码器组成，前者负责从两个连续帧中提取多尺度点云特征，后者负责预测初始场景流并进一步细化以实现更平滑的估算。为了生成初始场景流，需要对从两个连续帧中提取的特征进行跨帧近邻搜索操作，从而生成前向流和后向流嵌入。然后将这些内嵌组合起来形成双向流内嵌，作为预测初始场景流的输入。此外，还提出了一个基于自我注意机制的流平滑模块，用于预测平滑误差，并促进初始场景流的细化，以获得更准确、更平滑的估计结果。大量实验证明，所提出的 SmoothFlowNet3D 方法在合成数据集和真实激光雷达点云上都达到了最先进的性能，证实了它在增强场景流平滑度方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Bridging the Domain Gap in Scene Flow Estimation via Hierarchical Smoothness Refinement

This paper introduces SmoothFlowNet3D, an innovative encoder-decoder architecture specifically designed for bridging the domain gap in scene flow estimation. To achieve this goal, SmoothFlowNet3D divides the scene flow estimation task into two stages: initial scene flow estimation and smoothness refinement. Specifically, SmoothFlowNet3D comprises a hierarchical encoder that extracts multi-scale point cloud features from two consecutive frames, along with a hierarchical decoder responsible for predicting the initial scene flow and further refining it to achieve smoother estimation. To generate the initial scene flow, a cross-frame nearest neighbor search operation is performed between the features extracted from two consecutive frames, resulting in forward and backward flow embeddings. These embeddings are then combined to form the bidirectional flow embedding, serving as input for predicting the initial scene flow. Additionally, a flow smoothing module based on the self-attention mechanism is proposed to predict the smoothing error and facilitate the refinement of the initial scene flow for more accurate and smoother estimation results. Extensive experiments demonstrate that the proposed SmoothFlowNet3D approach achieves state-of-the-art performance on both synthetic datasets and real LiDAR point clouds, confirming its effectiveness in enhancing scene flow smoothness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.