{"title":"ResNet50 in remote sensing and agriculture: evaluating image captioning performance for high spectral data","authors":"Chengping Zhang, Imran Iqbal, Uzair Aslam Bhatti, Jinru Liu, Emad Mahrous Awwad, Nadia Sarhan","doi":"10.1007/s12665-024-11950-2","DOIUrl":null,"url":null,"abstract":"<div><p>Remote sensing image captioning is crucial as it enables the automatic interpretation and description of complex images captured from satellite or aerial sensors, facilitating the efficient analysis and understanding of vast amounts of geospatial data. This capability is essential for various applications, including environmental monitoring, disaster management, urban planning, and agricultural assessment, where accurate and timely information is vital for decision-making and response. This paper aims to evaluate deep learning models for image captioning in the context of remote sensing data and specifically compares Vision Transformer (ViT) and ResNet50 architectures. Utilizing the BLEU score to evaluate the quality of generated captions, the research explores the models' capabilities across varying sample sizes: The amount of samples included 25, 50, 75, and 100 samples. As it is shown in the tables above, the Vision Transformer outperforms the ResNet50 model in most cases, with the highest BLEU score of 0. 5507 at 50 samples, which indicates the superiority in learning global dependencies for image understanding and text generation. Nonetheless, the performance of ViT decreases slightly when the number of samples is greater than 50, which might be attributed to overfitting or scalability. On the other hand, ResNet50 shows a gradual increase in BLEU score with the increase in sample size and attains the maximum BLEU score of 0. 4783 at 100 samples, meaning that it is most effective with large data sets where it can fully take advantage of the learning algorithm. This work also discusses the advantages and disadvantages of the two models and makes suggestions on when it is suitable to use which model for image captioning tasks in remote sensing, thus helps to advance the discussion on model selection and improvement for image captioning tasks.</p></div>","PeriodicalId":542,"journal":{"name":"Environmental Earth Sciences","volume":"83 23","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Earth Sciences","FirstCategoryId":"93","ListUrlMain":"https://link.springer.com/article/10.1007/s12665-024-11950-2","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Remote sensing image captioning is crucial as it enables the automatic interpretation and description of complex images captured from satellite or aerial sensors, facilitating the efficient analysis and understanding of vast amounts of geospatial data. This capability is essential for various applications, including environmental monitoring, disaster management, urban planning, and agricultural assessment, where accurate and timely information is vital for decision-making and response. This paper aims to evaluate deep learning models for image captioning in the context of remote sensing data and specifically compares Vision Transformer (ViT) and ResNet50 architectures. Utilizing the BLEU score to evaluate the quality of generated captions, the research explores the models' capabilities across varying sample sizes: The amount of samples included 25, 50, 75, and 100 samples. As it is shown in the tables above, the Vision Transformer outperforms the ResNet50 model in most cases, with the highest BLEU score of 0. 5507 at 50 samples, which indicates the superiority in learning global dependencies for image understanding and text generation. Nonetheless, the performance of ViT decreases slightly when the number of samples is greater than 50, which might be attributed to overfitting or scalability. On the other hand, ResNet50 shows a gradual increase in BLEU score with the increase in sample size and attains the maximum BLEU score of 0. 4783 at 100 samples, meaning that it is most effective with large data sets where it can fully take advantage of the learning algorithm. This work also discusses the advantages and disadvantages of the two models and makes suggestions on when it is suitable to use which model for image captioning tasks in remote sensing, thus helps to advance the discussion on model selection and improvement for image captioning tasks.
期刊介绍:
Environmental Earth Sciences is an international multidisciplinary journal concerned with all aspects of interaction between humans, natural resources, ecosystems, special climates or unique geographic zones, and the earth:
Water and soil contamination caused by waste management and disposal practices
Environmental problems associated with transportation by land, air, or water
Geological processes that may impact biosystems or humans
Man-made or naturally occurring geological or hydrological hazards
Environmental problems associated with the recovery of materials from the earth
Environmental problems caused by extraction of minerals, coal, and ores, as well as oil and gas, water and alternative energy sources
Environmental impacts of exploration and recultivation – Environmental impacts of hazardous materials
Management of environmental data and information in data banks and information systems
Dissemination of knowledge on techniques, methods, approaches and experiences to improve and remediate the environment
In pursuit of these topics, the geoscientific disciplines are invited to contribute their knowledge and experience. Major disciplines include: hydrogeology, hydrochemistry, geochemistry, geophysics, engineering geology, remediation science, natural resources management, environmental climatology and biota, environmental geography, soil science and geomicrobiology.