{"title":"MVF-GNN: Multi-View Fusion With GNN for 3D Semantic Segmentation","authors":"Zhenxiang Du;Minglun Ren;Wei Chu;Nengying Chen","doi":"10.1109/LRA.2025.3534693","DOIUrl":null,"url":null,"abstract":"Due to the high cost of obtaining 3D annotations and the accumulation of many 2D datasets with 2D semantic labels, deploying multi-view 2D images for 3D semantic segmentation has attracted widespread attention. Fusion of multi-view information requires establishing local-to-local as well as local-to-global dependencies among multiple views. However, previous methods that are based on 2D annotations supervision cannot model local-to-local and local-to-global dependencies simultaneously. In this letter, we propose a novel multi-view fusion framework with graph neural networks (MVF-GNN) for multi-view interaction and integration. First, a multi-view graph based on the associated pixels in multiple views is constructed. Then, a multi-scale multi-view graph attention network (MSMVGAT) module is introduced to perform graph reasoning on multi-view graphs at different scales. Finally, an attention multi-view graph aggregation (AMVGA) module is introduced to learn the importance of different views and integrate multi-view features. Experiments on the ScanNetv2 benchmark dataset show that our method outperforms state-of-the-art 2D/3D semantic segmentation methods based on 2D annotations supervision.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3262-3269"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10855616/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Due to the high cost of obtaining 3D annotations and the accumulation of many 2D datasets with 2D semantic labels, deploying multi-view 2D images for 3D semantic segmentation has attracted widespread attention. Fusion of multi-view information requires establishing local-to-local as well as local-to-global dependencies among multiple views. However, previous methods that are based on 2D annotations supervision cannot model local-to-local and local-to-global dependencies simultaneously. In this letter, we propose a novel multi-view fusion framework with graph neural networks (MVF-GNN) for multi-view interaction and integration. First, a multi-view graph based on the associated pixels in multiple views is constructed. Then, a multi-scale multi-view graph attention network (MSMVGAT) module is introduced to perform graph reasoning on multi-view graphs at different scales. Finally, an attention multi-view graph aggregation (AMVGA) module is introduced to learn the importance of different views and integrate multi-view features. Experiments on the ScanNetv2 benchmark dataset show that our method outperforms state-of-the-art 2D/3D semantic segmentation methods based on 2D annotations supervision.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.