{"title":"How high are we? Large-scale building height estimation at 10 m using Sentinel-1 SAR and Sentinel-2 MSI time series","authors":"Ritu Yadav, Andrea Nascetti, Yifang Ban","doi":"10.1016/j.rse.2024.114556","DOIUrl":null,"url":null,"abstract":"Accurate building height estimation is essential to support urbanization monitoring, environmental impact analysis and sustainable urban planning. However, conducting large-scale building height estimation remains a significant challenge. While deep learning (DL) has proven effective for large-scale mapping tasks, there is a lack of advanced DL models specifically tailored for height estimation, particularly when using open-source Earth observation data. In this study, we propose T-SwinUNet, an advanced DL model for large-scale building height estimation leveraging Sentinel-1 SAR and Sentinel-2 multispectral time series. T-SwinUNet model contains a feature extractor with local/global feature comprehension capabilities, a temporal attention module to learn the correlation between constant and variable features of building objects over time and an efficient multitask decoder to predict building height at 10 m spatial resolution. The model is trained and evaluated on data from the Netherlands, Switzerland, Estonia, and Germany, and its generalizability is evaluated on an out-of-distribution (OOD) test set from ten additional cities from other European countries. Our study incorporates extensive model evaluations, ablation experiments, and comparisons with established models. T-SwinUNet predicts building height with a Root Mean Square Error (RMSE) of 1.89 m, outperforming state-of-the-art models at 10 m spatial resolution. Its strong generalization to the OOD test set (RMSE of 3.2 m) underscores its potential for low-cost building height estimation across Europe, with future scalability to other regions. Furthermore, the assessment at 100 m resolution reveals that T-SwinUNet (0.29 m RMSE, 0.75 <span><span style=\"\"></span><span data-mathml='<math xmlns=\"http://www.w3.org/1998/Math/MathML\"><msup is=\"true\"><mrow is=\"true\"><mi is=\"true\">R</mi></mrow><mrow is=\"true\"><mn is=\"true\">2</mn></mrow></msup></math>' role=\"presentation\" style=\"font-size: 90%; display: inline-block; position: relative;\" tabindex=\"0\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"2.432ex\" role=\"img\" style=\"vertical-align: -0.235ex;\" viewbox=\"0 -945.9 1213.4 1047.3\" width=\"2.818ex\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><g fill=\"currentColor\" stroke=\"currentColor\" stroke-width=\"0\" transform=\"matrix(1 0 0 -1 0 0)\"><g is=\"true\"><g is=\"true\"><g is=\"true\"><use xlink:href=\"#MJMATHI-52\"></use></g></g><g is=\"true\" transform=\"translate(759,410)\"><g is=\"true\"><use transform=\"scale(0.707)\" xlink:href=\"#MJMAIN-32\"></use></g></g></g></g></svg><span role=\"presentation\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><msup is=\"true\"><mrow is=\"true\"><mi is=\"true\">R</mi></mrow><mrow is=\"true\"><mn is=\"true\">2</mn></mrow></msup></math></span></span><script type=\"math/mml\"><math><msup is=\"true\"><mrow is=\"true\"><mi is=\"true\">R</mi></mrow><mrow is=\"true\"><mn is=\"true\">2</mn></mrow></msup></math></script></span>) also outperformed the global building height product GHSL-Built-H R2023A product(0.56 m RMSE and 0.37 <span><span style=\"\"></span><span data-mathml='<math xmlns=\"http://www.w3.org/1998/Math/MathML\"><msup is=\"true\"><mrow is=\"true\"><mi is=\"true\">R</mi></mrow><mrow is=\"true\"><mn is=\"true\">2</mn></mrow></msup></math>' role=\"presentation\" style=\"font-size: 90%; display: inline-block; position: relative;\" tabindex=\"0\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"2.432ex\" role=\"img\" style=\"vertical-align: -0.235ex;\" viewbox=\"0 -945.9 1213.4 1047.3\" width=\"2.818ex\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><g fill=\"currentColor\" stroke=\"currentColor\" stroke-width=\"0\" transform=\"matrix(1 0 0 -1 0 0)\"><g is=\"true\"><g is=\"true\"><g is=\"true\"><use xlink:href=\"#MJMATHI-52\"></use></g></g><g is=\"true\" transform=\"translate(759,410)\"><g is=\"true\"><use transform=\"scale(0.707)\" xlink:href=\"#MJMAIN-32\"></use></g></g></g></g></svg><span role=\"presentation\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><msup is=\"true\"><mrow is=\"true\"><mi is=\"true\">R</mi></mrow><mrow is=\"true\"><mn is=\"true\">2</mn></mrow></msup></math></span></span><script type=\"math/mml\"><math><msup is=\"true\"><mrow is=\"true\"><mi is=\"true\">R</mi></mrow><mrow is=\"true\"><mn is=\"true\">2</mn></mrow></msup></math></script></span>). Our implementation is available at: <span><span>https://github.com/RituYadav92/Building-Height-Estimation</span><svg aria-label=\"Opens in new window\" focusable=\"false\" height=\"20\" viewbox=\"0 0 8 8\"><path d=\"M1.12949 2.1072V1H7V6.85795H5.89111V2.90281L0.784057 8L0 7.21635L5.11902 2.1072H1.12949Z\"></path></svg></span>.","PeriodicalId":417,"journal":{"name":"Remote Sensing of Environment","volume":"10 1","pages":""},"PeriodicalIF":11.1000,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote Sensing of Environment","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.rse.2024.114556","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate building height estimation is essential to support urbanization monitoring, environmental impact analysis and sustainable urban planning. However, conducting large-scale building height estimation remains a significant challenge. While deep learning (DL) has proven effective for large-scale mapping tasks, there is a lack of advanced DL models specifically tailored for height estimation, particularly when using open-source Earth observation data. In this study, we propose T-SwinUNet, an advanced DL model for large-scale building height estimation leveraging Sentinel-1 SAR and Sentinel-2 multispectral time series. T-SwinUNet model contains a feature extractor with local/global feature comprehension capabilities, a temporal attention module to learn the correlation between constant and variable features of building objects over time and an efficient multitask decoder to predict building height at 10 m spatial resolution. The model is trained and evaluated on data from the Netherlands, Switzerland, Estonia, and Germany, and its generalizability is evaluated on an out-of-distribution (OOD) test set from ten additional cities from other European countries. Our study incorporates extensive model evaluations, ablation experiments, and comparisons with established models. T-SwinUNet predicts building height with a Root Mean Square Error (RMSE) of 1.89 m, outperforming state-of-the-art models at 10 m spatial resolution. Its strong generalization to the OOD test set (RMSE of 3.2 m) underscores its potential for low-cost building height estimation across Europe, with future scalability to other regions. Furthermore, the assessment at 100 m resolution reveals that T-SwinUNet (0.29 m RMSE, 0.75 ) also outperformed the global building height product GHSL-Built-H R2023A product(0.56 m RMSE and 0.37 ). Our implementation is available at: https://github.com/RituYadav92/Building-Height-Estimation.
期刊介绍:
Remote Sensing of Environment (RSE) serves the Earth observation community by disseminating results on the theory, science, applications, and technology that contribute to advancing the field of remote sensing. With a thoroughly interdisciplinary approach, RSE encompasses terrestrial, oceanic, and atmospheric sensing.
The journal emphasizes biophysical and quantitative approaches to remote sensing at local to global scales, covering a diverse range of applications and techniques.
RSE serves as a vital platform for the exchange of knowledge and advancements in the dynamic field of remote sensing.