空间任务中 ChatGPT-4、Gemini、Claude-3 和 Copilot 的正确性比较

IF 2.3 3区地球科学 Q2 GEOGRAPHY Transactions in GIS Pub Date : 2024-08-13 DOI:10.1111/tgis.13233

Hartwig H. Hochmair, Levente Juhász, Takoda Kemp

{"title":"空间任务中 ChatGPT-4、Gemini、Claude-3 和 Copilot 的正确性比较","authors":"Hartwig H. Hochmair, Levente Juhász, Takoda Kemp","doi":"10.1111/tgis.13233","DOIUrl":null,"url":null,"abstract":"Generative AI including large language models (LLMs) has recently gained significant interest in the geoscience community through its versatile task‐solving capabilities including programming, arithmetic reasoning, generation of sample data, time‐series forecasting, toponym recognition, or image classification. Existing performance assessments of LLMs for spatial tasks have primarily focused on ChatGPT, whereas other chatbots received less attention. To narrow this research gap, this study conducts a zero‐shot correctness evaluation for a set of 76 spatial tasks across seven task categories assigned to four prominent chatbots, that is, ChatGPT‐4, Gemini, Claude‐3, and Copilot. The chatbots generally performed well on tasks related to spatial literacy, GIS theory, and interpretation of programming code and functions, but revealed weaknesses in mapping, code writing, and spatial reasoning. Furthermore, there was a significant difference in the correctness of results between the four chatbots. Responses from repeated tasks assigned to each chatbot showed a high level of consistency in responses with matching rates of over 80% for most task categories in the four chatbots.","PeriodicalId":47842,"journal":{"name":"Transactions in GIS","volume":"14 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Correctness Comparison of ChatGPT‐4, Gemini, Claude‐3, and Copilot for Spatial Tasks\",\"authors\":\"Hartwig H. Hochmair, Levente Juhász, Takoda Kemp\",\"doi\":\"10.1111/tgis.13233\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generative AI including large language models (LLMs) has recently gained significant interest in the geoscience community through its versatile task‐solving capabilities including programming, arithmetic reasoning, generation of sample data, time‐series forecasting, toponym recognition, or image classification. Existing performance assessments of LLMs for spatial tasks have primarily focused on ChatGPT, whereas other chatbots received less attention. To narrow this research gap, this study conducts a zero‐shot correctness evaluation for a set of 76 spatial tasks across seven task categories assigned to four prominent chatbots, that is, ChatGPT‐4, Gemini, Claude‐3, and Copilot. The chatbots generally performed well on tasks related to spatial literacy, GIS theory, and interpretation of programming code and functions, but revealed weaknesses in mapping, code writing, and spatial reasoning. Furthermore, there was a significant difference in the correctness of results between the four chatbots. Responses from repeated tasks assigned to each chatbot showed a high level of consistency in responses with matching rates of over 80% for most task categories in the four chatbots.\",\"PeriodicalId\":47842,\"journal\":{\"name\":\"Transactions in GIS\",\"volume\":\"14 1\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transactions in GIS\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1111/tgis.13233\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GEOGRAPHY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions in GIS","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1111/tgis.13233","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOGRAPHY","Score":null,"Total":0}

引用次数: 0

摘要

包括大型语言模型（LLMs）在内的生成式人工智能具有多种任务解决能力，包括编程、算术推理、样本数据生成、时间序列预测、地名识别或图像分类，因此最近在地球科学界引起了极大的兴趣。现有的空间任务 LLM 性能评估主要集中在 ChatGPT 上，而其他聊天机器人受到的关注较少。为了缩小这一研究差距，本研究对分配给四个著名聊天机器人（即 ChatGPT-4、Gemini、Claude-3 和 Copilot）的七个任务类别的 76 个空间任务进行了零误差正确性评估。这些聊天机器人在与空间素养、GIS 理论以及程序代码和函数解释相关的任务中表现一般，但在绘图、代码编写和空间推理方面表现较弱。此外，四个聊天机器人在结果的正确性方面也存在显著差异。从分配给每个聊天机器人的重复任务的回答来看，四个聊天机器人的回答具有高度的一致性，大多数任务类别的匹配率都超过了 80%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Correctness Comparison of ChatGPT‐4, Gemini, Claude‐3, and Copilot for Spatial Tasks

Generative AI including large language models (LLMs) has recently gained significant interest in the geoscience community through its versatile task‐solving capabilities including programming, arithmetic reasoning, generation of sample data, time‐series forecasting, toponym recognition, or image classification. Existing performance assessments of LLMs for spatial tasks have primarily focused on ChatGPT, whereas other chatbots received less attention. To narrow this research gap, this study conducts a zero‐shot correctness evaluation for a set of 76 spatial tasks across seven task categories assigned to four prominent chatbots, that is, ChatGPT‐4, Gemini, Claude‐3, and Copilot. The chatbots generally performed well on tasks related to spatial literacy, GIS theory, and interpretation of programming code and functions, but revealed weaknesses in mapping, code writing, and spatial reasoning. Furthermore, there was a significant difference in the correctness of results between the four chatbots. Responses from repeated tasks assigned to each chatbot showed a high level of consistency in responses with matching rates of over 80% for most task categories in the four chatbots.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transactions in GIS GEOGRAPHY-

CiteScore

4.60

自引率

8.30%

发文量

116

期刊介绍： Transactions in GIS is an international journal which provides a forum for high quality, original research articles, review articles, short notes and book reviews that focus on: - practical and theoretical issues influencing the development of GIS - the collection, analysis, modelling, interpretation and display of spatial data within GIS - the connections between GIS and related technologies - new GIS applications which help to solve problems affecting the natural or built environments, or business