{"title":"Image-point cloud embedding network for simultaneous image-based farmland instance extraction and point cloud-based semantic segmentation","authors":"Jinpeng Li, Yuan Li, Shuhang Zhang, Yiping Chen","doi":"10.1016/j.jag.2025.104361","DOIUrl":null,"url":null,"abstract":"Farmland extraction has been a pivotal research challenge for decades in remote sensing. Breakthrough progress has been made by relevant studies due to the advanced deep learning-based techniques. However, existing methods still pay little attention to the simultaneous instance-level farmland extraction and semantic-based 3D attribute analysis, which are essential for enabling more various agricultural applications. Additionally, most bimodal methods apply simple projection to convert high-dimensional features to low-dimensional space for feature interaction, which inevitably underutilizes the advantages of bimodal learning and leads to lamentable information loss. To address this issue, we propose a novel end-to-end bimodal network, named Image-Point Cloud Embedding Network (IPCE-Net), that innovatively employs a dual-stream branch architecture to concurrently perform image-based farmland instance segmentation and point cloud-based semantic segmentation. Furthermore, by leveraging the Heterogeneous Conversion Module (HCM), the IPCE-Net effectively reconciles the modality disparities between images and point clouds and achieves stage-by-stage feature interaction during the bimodal learning process, thus achieving higher performance than unimodal learning. Experiments on two datasets show that IPCE-Net achieves superior performance in both farmland instance extraction and point cloud semantic segmentation tasks. For farmland instance extraction, the instance-level mAP and pixel-level IoU metrics reach 74.9% and 79.6%, respectively, being considerably higher than other classical image-based instance segmentation methods. For the point cloud semantic segmentation, the OA and mIoU metrics are 93.8% and 66.1%, with a remarkable improvement of at least 1.3% and 8.2%, respectively, compared with the state-of-the-art semantic segmentation approaches. Moreover, intelligent analysis based on the interconnection of IPCE-Net and GPT-4 transforms the abstract categorical information into easy-to-understand measurable information, demonstrating its great potential for practical applications in precision and smart agriculture.","PeriodicalId":50341,"journal":{"name":"International Journal of Applied Earth Observation and Geoinformation","volume":"102 1","pages":""},"PeriodicalIF":7.5000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Earth Observation and Geoinformation","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1016/j.jag.2025.104361","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
Farmland extraction has been a pivotal research challenge for decades in remote sensing. Breakthrough progress has been made by relevant studies due to the advanced deep learning-based techniques. However, existing methods still pay little attention to the simultaneous instance-level farmland extraction and semantic-based 3D attribute analysis, which are essential for enabling more various agricultural applications. Additionally, most bimodal methods apply simple projection to convert high-dimensional features to low-dimensional space for feature interaction, which inevitably underutilizes the advantages of bimodal learning and leads to lamentable information loss. To address this issue, we propose a novel end-to-end bimodal network, named Image-Point Cloud Embedding Network (IPCE-Net), that innovatively employs a dual-stream branch architecture to concurrently perform image-based farmland instance segmentation and point cloud-based semantic segmentation. Furthermore, by leveraging the Heterogeneous Conversion Module (HCM), the IPCE-Net effectively reconciles the modality disparities between images and point clouds and achieves stage-by-stage feature interaction during the bimodal learning process, thus achieving higher performance than unimodal learning. Experiments on two datasets show that IPCE-Net achieves superior performance in both farmland instance extraction and point cloud semantic segmentation tasks. For farmland instance extraction, the instance-level mAP and pixel-level IoU metrics reach 74.9% and 79.6%, respectively, being considerably higher than other classical image-based instance segmentation methods. For the point cloud semantic segmentation, the OA and mIoU metrics are 93.8% and 66.1%, with a remarkable improvement of at least 1.3% and 8.2%, respectively, compared with the state-of-the-art semantic segmentation approaches. Moreover, intelligent analysis based on the interconnection of IPCE-Net and GPT-4 transforms the abstract categorical information into easy-to-understand measurable information, demonstrating its great potential for practical applications in precision and smart agriculture.
期刊介绍:
The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.