Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators

IF 5.3 2区地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Pub Date : 2025-01-14 DOI:10.1109/JSTARS.2025.3528192

Jacob Beck;Lukas Malte Kemeter;Konrad Dürrbeck;Mohamed Hesham Ibrahim Abdalla;Frauke Kreuter

{"title":"Toward Integrating ChatGPT Into Satellite Image Annotation Workflows: A Comparison of Label Quality and Costs of Human and Automated Annotators","authors":"Jacob Beck;Lukas Malte Kemeter;Konrad Dürrbeck;Mohamed Hesham Ibrahim Abdalla;Frauke Kreuter","doi":"10.1109/JSTARS.2025.3528192","DOIUrl":null,"url":null,"abstract":"High-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of large language models (LLMs), their popularity for generating automated annotations has grown, extending possibilities and complexity of designing an efficient annotation strategy. Increasingly, computer vision capabilities have been integrated into general-purpose LLMs like ChatGPT. This raises the question of how effectively LLMs can be used in satellite image annotation tasks and how they compare to traditional annotator types. This study presents a comprehensive investigation and comparison of various human and automated annotators for image classification. We evaluate the feasibility and economic competitiveness of using the ChatGPT4-V model for a complex land usage annotation task and compare it with alternative human annotators. A set of satellite images is annotated by a domain expert and 15 additional human and automated annotators, differing in expertise and costs. Our analyzes examine the annotation quality loss between the expert and other annotators. This comparison is conducted through, first, descriptive analyzes, second, fitting linear probability models, and third, comparing F1-scores. Ultimately, we simulate annotation strategies where samples are split according to an automatically assigned certainty score. Routing low-certainty images to human annotators can cut total annotation costs by over 50% with minimal impact on label quality. We discuss implications regarding the economic competitiveness of annotation strategies, prompt engineering, and the task-specificity of expertise.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"18 ","pages":"4366-4381"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10841407","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10841407/","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

High-quality annotations are a critical success factor for machine learning (ML) applications. To achieve this, we have traditionally relied on human annotators, navigating the challenges of limited budgets and the varying task-specific expertise, costs, and availability. Since the emergence of large language models (LLMs), their popularity for generating automated annotations has grown, extending possibilities and complexity of designing an efficient annotation strategy. Increasingly, computer vision capabilities have been integrated into general-purpose LLMs like ChatGPT. This raises the question of how effectively LLMs can be used in satellite image annotation tasks and how they compare to traditional annotator types. This study presents a comprehensive investigation and comparison of various human and automated annotators for image classification. We evaluate the feasibility and economic competitiveness of using the ChatGPT4-V model for a complex land usage annotation task and compare it with alternative human annotators. A set of satellite images is annotated by a domain expert and 15 additional human and automated annotators, differing in expertise and costs. Our analyzes examine the annotation quality loss between the expert and other annotators. This comparison is conducted through, first, descriptive analyzes, second, fitting linear probability models, and third, comparing F1-scores. Ultimately, we simulate annotation strategies where samples are split according to an automatically assigned certainty score. Routing low-certainty images to human annotators can cut total annotation costs by over 50% with minimal impact on label quality. We discuss implications regarding the economic competitiveness of annotation strategies, prompt engineering, and the task-specificity of expertise.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

将ChatGPT集成到卫星图像注释工作流程：人工和自动注释器的标签质量和成本的比较

高质量的注释是机器学习（ML）应用程序成功的关键因素。为了实现这一点，我们传统上依赖于人类注释者，以应对有限的预算和不同的任务特定的专业知识、成本和可用性的挑战。自从大型语言模型（llm）出现以来，它们在生成自动注释方面的普及程度不断提高，扩展了设计高效注释策略的可能性和复杂性。越来越多地，计算机视觉功能已经集成到像ChatGPT这样的通用llm中。这就提出了在卫星图像注释任务中如何有效地使用llm的问题，以及它们与传统注释器类型的比较。本研究提出了一个全面的调查和比较各种人工和自动注释器的图像分类。我们评估了使用ChatGPT4-V模型进行复杂土地利用注释任务的可行性和经济竞争力，并将其与其他人工注释器进行了比较。一组卫星图像由领域专家和15个额外的人工和自动注释者进行注释，这些注释者的专业知识和成本各不相同。我们的分析考察了专家和其他注释者之间的注释质量损失。这种比较首先是描述性分析，其次是线性概率模型的拟合，第三是f1分数的比较。最后，我们模拟标注策略，其中样本根据自动分配的确定性分数进行分割。将低确定性图像路由给人类注释者可以将总注释成本降低50%以上，对标签质量的影响最小。我们讨论了注释策略的经济竞争力、提示工程和专业知识的任务特异性的含义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 地学-成像科学与照相技术

CiteScore

9.30

自引率

10.90%

发文量

563

审稿时长

4.7 months

期刊介绍： The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing addresses the growing field of applications in Earth observations and remote sensing, and also provides a venue for the rapidly expanding special issues that are being sponsored by the IEEE Geosciences and Remote Sensing Society. The journal draws upon the experience of the highly successful “IEEE Transactions on Geoscience and Remote Sensing” and provide a complementary medium for the wide range of topics in applied earth observations. The ‘Applications’ areas encompasses the societal benefit areas of the Global Earth Observations Systems of Systems (GEOSS) program. Through deliberations over two years, ministers from 50 countries agreed to identify nine areas where Earth observation could positively impact the quality of life and health of their respective countries. Some of these are areas not traditionally addressed in the IEEE context. These include biodiversity, health and climate. Yet it is the skill sets of IEEE members, in areas such as observations, communications, computers, signal processing, standards and ocean engineering, that form the technical underpinnings of GEOSS. Thus, the Journal attracts a broad range of interests that serves both present members in new ways and expands the IEEE visibility into new areas.