Yingxuan He , Wei Chen , Zhou Huang , Qingpeng Wang
{"title":"MoMFormer: Mixture of modality transformer model for vegetation extraction under shadow conditions","authors":"Yingxuan He , Wei Chen , Zhou Huang , Qingpeng Wang","doi":"10.1016/j.ecoinf.2024.102818","DOIUrl":null,"url":null,"abstract":"<div><p>Accurate estimation of fractional vegetation coverage (FVC) is essential for assessing the ecological environment and acquiring ecological information. However, under natural lighting conditions, shadows in vegetation scenes can easily lead to confusion between shadowed vegetation and shadowed soil, leading to misclassification and omission errors. This issue limits the precision of both vegetation extraction and FVC estimation. To address this challenge, this study introduces a novel deep learning model, the Mixture of Modality Transformer (MoMFormer), which is specifically designed to mitigate shadow interference in vegetation extraction. Our model uses the Swin-transformer V2 as a feature extractor, effectively capturing vegetation features from a dual-modality (regular-exposure RGB and high dynamic range HDR) dataset. A dynamic aggregation module (DAM) is integrated to adaptively blend the most relevant vegetation features. We selected several state-of-the-art (SOTA) methods and conducted extensive experiments using a self-annotated dataset featuring diverse vegetation–soil scenes and compare our model with several state-of-the-art methods. The results demonstrate that MoMFormer achieves an accuracy of 89.43 % on the HDR-RGB dual-modality dataset, with an FVC accuracy of 87.57 %, outperforming other algorithms and demonstrating high vegetation extraction accuracy and adaptability under natural lighting conditions. This research offers new insights into accurate vegetation information extraction in naturally lit environments with shadows, providing robust technical support for high-precision validation of vegetation coverage products and algorithms based on multimodal data. The code and datasets used in this study are publicly available at <span><span>https://github.com/hhhxiaohe/MoMFormer</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1574954124003601/pdfft?md5=f86e3b9567567c1cac9fdc7b86af1f24&pid=1-s2.0-S1574954124003601-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954124003601","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate estimation of fractional vegetation coverage (FVC) is essential for assessing the ecological environment and acquiring ecological information. However, under natural lighting conditions, shadows in vegetation scenes can easily lead to confusion between shadowed vegetation and shadowed soil, leading to misclassification and omission errors. This issue limits the precision of both vegetation extraction and FVC estimation. To address this challenge, this study introduces a novel deep learning model, the Mixture of Modality Transformer (MoMFormer), which is specifically designed to mitigate shadow interference in vegetation extraction. Our model uses the Swin-transformer V2 as a feature extractor, effectively capturing vegetation features from a dual-modality (regular-exposure RGB and high dynamic range HDR) dataset. A dynamic aggregation module (DAM) is integrated to adaptively blend the most relevant vegetation features. We selected several state-of-the-art (SOTA) methods and conducted extensive experiments using a self-annotated dataset featuring diverse vegetation–soil scenes and compare our model with several state-of-the-art methods. The results demonstrate that MoMFormer achieves an accuracy of 89.43 % on the HDR-RGB dual-modality dataset, with an FVC accuracy of 87.57 %, outperforming other algorithms and demonstrating high vegetation extraction accuracy and adaptability under natural lighting conditions. This research offers new insights into accurate vegetation information extraction in naturally lit environments with shadows, providing robust technical support for high-precision validation of vegetation coverage products and algorithms based on multimodal data. The code and datasets used in this study are publicly available at https://github.com/hhhxiaohe/MoMFormer.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.