ChatEarthNet:支持视觉语言地理基础模型的全球图像-文本数据集

IF 11.2 1区 地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY Earth System Science Data Pub Date : 2024-06-27 DOI:10.5194/essd-2024-140
Zhenghang Yuan, Zhitong Xiong, Lichao Mou, Xiao Xiang Zhu
{"title":"ChatEarthNet:支持视觉语言地理基础模型的全球图像-文本数据集","authors":"Zhenghang Yuan, Zhitong Xiong, Lichao Mou, Xiao Xiang Zhu","doi":"10.5194/essd-2024-140","DOIUrl":null,"url":null,"abstract":"<strong>Abstract.</strong> The rapid development of remote sensing technology has led to an exponential growth in satellite images, yet their inherent complexity often makes them difficult for non-expert users to understand. Natural language, as a carrier of human knowledge, can bridge common users and complicated satellite imagery. Additionally, when paired with visual data, natural language can be utilized to train large vision-language foundation models, significantly improving performance in various tasks. Despite these advancements, the remote sensing community still faces a challenge due to the lack of large- scale, high-quality vision-language datasets for satellite images. To address this challenge, we introduce a new image-text dataset, providing high-quality natural language descriptions for global-scale satellite data. Specifically, we utilize Sentinel-2 data for its global coverage as the foundational image source, employing semantic segmentation labels from the European Space Agency’s WorldCover project to enrich the descriptions of land covers. By conducting in-depth semantic analysis, we formulate detailed prompts to elicit rich descriptions from ChatGPT. We then include a manual verification process to enhance the dataset’s quality further. This step involves manual inspection and correction to refine the dataset. Finally, we offer the community ChatEarthNet, a large-scale image-text dataset characterized by global coverage, high quality, wide-ranging diversity, and detailed descriptions. ChatEarthNet consists of 163,488 image-text pairs with captions generated by ChatGPT3.5 and an additional 10,000 image-text pairs with captions generated by ChatGPT-4V(ision). This dataset has significant potential for both training and evaluating vision-language geo-foundation models for remote sensing. The code is publicly available at https://doi.org/10.5281/zenodo.11004358 (Yuan et al., 2024b), and the ChatEarthNet dataset is at https://doi.org/10.5281/zenodo.11003436 (Yuan et al., 2024c).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":null,"pages":null},"PeriodicalIF":11.2000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models\",\"authors\":\"Zhenghang Yuan, Zhitong Xiong, Lichao Mou, Xiao Xiang Zhu\",\"doi\":\"10.5194/essd-2024-140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<strong>Abstract.</strong> The rapid development of remote sensing technology has led to an exponential growth in satellite images, yet their inherent complexity often makes them difficult for non-expert users to understand. Natural language, as a carrier of human knowledge, can bridge common users and complicated satellite imagery. Additionally, when paired with visual data, natural language can be utilized to train large vision-language foundation models, significantly improving performance in various tasks. Despite these advancements, the remote sensing community still faces a challenge due to the lack of large- scale, high-quality vision-language datasets for satellite images. To address this challenge, we introduce a new image-text dataset, providing high-quality natural language descriptions for global-scale satellite data. Specifically, we utilize Sentinel-2 data for its global coverage as the foundational image source, employing semantic segmentation labels from the European Space Agency’s WorldCover project to enrich the descriptions of land covers. By conducting in-depth semantic analysis, we formulate detailed prompts to elicit rich descriptions from ChatGPT. We then include a manual verification process to enhance the dataset’s quality further. This step involves manual inspection and correction to refine the dataset. Finally, we offer the community ChatEarthNet, a large-scale image-text dataset characterized by global coverage, high quality, wide-ranging diversity, and detailed descriptions. ChatEarthNet consists of 163,488 image-text pairs with captions generated by ChatGPT3.5 and an additional 10,000 image-text pairs with captions generated by ChatGPT-4V(ision). This dataset has significant potential for both training and evaluating vision-language geo-foundation models for remote sensing. The code is publicly available at https://doi.org/10.5281/zenodo.11004358 (Yuan et al., 2024b), and the ChatEarthNet dataset is at https://doi.org/10.5281/zenodo.11003436 (Yuan et al., 2024c).\",\"PeriodicalId\":48747,\"journal\":{\"name\":\"Earth System Science Data\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":11.2000,\"publicationDate\":\"2024-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Earth System Science Data\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.5194/essd-2024-140\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOSCIENCES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth System Science Data","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.5194/essd-2024-140","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

摘要遥感技术的飞速发展使卫星图像呈指数级增长,但其固有的复杂性往往使非专业用户难以理解。自然语言作为人类知识的载体,可以在普通用户和复杂的卫星图像之间架起一座桥梁。此外,在与视觉数据配对时,自然语言可用于训练大型视觉语言基础模型,从而显著提高各种任务的性能。尽管取得了这些进步,遥感界仍然面临着一个挑战,那就是缺乏大规模、高质量的卫星图像视觉语言数据集。为了应对这一挑战,我们引入了一个新的图像-文本数据集,为全球范围的卫星数据提供高质量的自然语言描述。具体来说,我们利用 Sentinel-2 数据的全球覆盖范围作为基础图像源,采用欧洲航天局 WorldCover 项目的语义分割标签来丰富土地覆盖的描述。通过深入的语义分析,我们制定了详细的提示,以便从 ChatGPT 中获得丰富的描述。然后,我们加入了人工验证流程,以进一步提高数据集的质量。这一步骤包括人工检查和修正,以完善数据集。最后,我们为社区提供了大型图像-文本数据集 ChatEarthNet,该数据集具有全球覆盖、高质量、广泛多样性和详细描述等特点。ChatEarthNet 包含由 ChatGPT3.5 生成标题的 163,488 对图像-文本,以及由 ChatGPT-4V(ision) 生成标题的另外 10,000 对图像-文本。该数据集在训练和评估遥感视觉语言地理基础模型方面具有巨大潜力。代码可在 https://doi.org/10.5281/zenodo.11004358(Yuan et al.,2024b)上公开获取,ChatEarthNet 数据集可在 https://doi.org/10.5281/zenodo.11003436(Yuan et al.,2024c)上公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models
Abstract. The rapid development of remote sensing technology has led to an exponential growth in satellite images, yet their inherent complexity often makes them difficult for non-expert users to understand. Natural language, as a carrier of human knowledge, can bridge common users and complicated satellite imagery. Additionally, when paired with visual data, natural language can be utilized to train large vision-language foundation models, significantly improving performance in various tasks. Despite these advancements, the remote sensing community still faces a challenge due to the lack of large- scale, high-quality vision-language datasets for satellite images. To address this challenge, we introduce a new image-text dataset, providing high-quality natural language descriptions for global-scale satellite data. Specifically, we utilize Sentinel-2 data for its global coverage as the foundational image source, employing semantic segmentation labels from the European Space Agency’s WorldCover project to enrich the descriptions of land covers. By conducting in-depth semantic analysis, we formulate detailed prompts to elicit rich descriptions from ChatGPT. We then include a manual verification process to enhance the dataset’s quality further. This step involves manual inspection and correction to refine the dataset. Finally, we offer the community ChatEarthNet, a large-scale image-text dataset characterized by global coverage, high quality, wide-ranging diversity, and detailed descriptions. ChatEarthNet consists of 163,488 image-text pairs with captions generated by ChatGPT3.5 and an additional 10,000 image-text pairs with captions generated by ChatGPT-4V(ision). This dataset has significant potential for both training and evaluating vision-language geo-foundation models for remote sensing. The code is publicly available at https://doi.org/10.5281/zenodo.11004358 (Yuan et al., 2024b), and the ChatEarthNet dataset is at https://doi.org/10.5281/zenodo.11003436 (Yuan et al., 2024c).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Earth System Science Data
Earth System Science Data GEOSCIENCES, MULTIDISCIPLINARYMETEOROLOGY-METEOROLOGY & ATMOSPHERIC SCIENCES
CiteScore
18.00
自引率
5.30%
发文量
231
审稿时长
35 weeks
期刊介绍: Earth System Science Data (ESSD) is an international, interdisciplinary journal that publishes articles on original research data in order to promote the reuse of high-quality data in the field of Earth system sciences. The journal welcomes submissions of original data or data collections that meet the required quality standards and have the potential to contribute to the goals of the journal. It includes sections dedicated to regular-length articles, brief communications (such as updates to existing data sets), commentaries, review articles, and special issues. ESSD is abstracted and indexed in several databases, including Science Citation Index Expanded, Current Contents/PCE, Scopus, ADS, CLOCKSS, CNKI, DOAJ, EBSCO, Gale/Cengage, GoOA (CAS), and Google Scholar, among others.
期刊最新文献
Reconstructing long-term (1980–2022) daily ground particulate matter concentrations in India (LongPMInd) Distributions of in situ parameters, dissolved (in)organic carbon, and nutrients in the water column and pore waters of Arctic fjords (western Spitsbergen) during a melting season Insights from a topo-bathymetric and oceanographic dataset for coastal flooding studies: the French Flooding Prevention Action Program of Saint-Malo Retrieval of dominant methane (CH4) emission sources, the first high-resolution (1–2 m) dataset of storage tanks of China in 2000–2021 Climatological distribution of ocean acidification variables along the North American ocean margins
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1