流利但不真实:ChatGPT与其他人工智能聊天机器人在人文科学写作中的熟练程度和独创性的比较分析

IF 2.8 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Future Internet Pub Date : 2023-10-13 DOI:10.3390/fi15100336

Edisa Lozić, Benjamin Štular

{"title":"流利但不真实:ChatGPT与其他人工智能聊天机器人在人文科学写作中的熟练程度和独创性的比较分析","authors":"Edisa Lozić, Benjamin Štular","doi":"10.3390/fi15100336","DOIUrl":null,"url":null,"abstract":"Historically, mastery of writing was deemed essential to human progress. However, recent advances in generative AI have marked an inflection point in this narrative, including for scientific writing. This article provides a comprehensive analysis of the capabilities and limitations of six AI chatbots in scholarly writing in the humanities and archaeology. The methodology was based on tagging AI-generated content for quantitative accuracy and qualitative precision by human experts. Quantitative accuracy assessed the factual correctness in a manner similar to grading students, while qualitative precision gauged the scientific contribution similar to reviewing a scientific article. In the quantitative test, ChatGPT-4 scored near the passing grade (−5) whereas ChatGPT-3.5 (−18), Bing (−21) and Bard (−31) were not far behind. Claude 2 (−75) and Aria (−80) scored much lower. In the qualitative test, all AI chatbots, but especially ChatGPT-4, demonstrated proficiency in recombining existing knowledge, but all failed to generate original scientific content. As a side note, our results suggest that with ChatGPT-4, the size of large language models has reached a plateau. Furthermore, this paper underscores the intricate and recursive nature of human research. This process of transforming raw data into refined knowledge is computationally irreducible, highlighting the challenges AI chatbots face in emulating human originality in scientific writing. Our results apply to the state of affairs in the third quarter of 2023. In conclusion, while large language models have revolutionised content generation, their ability to produce original scientific contributions in the humanities remains limited. We expect this to change in the near future as current large language model-based AI chatbots evolve into large language model-powered software.","PeriodicalId":37982,"journal":{"name":"Future Internet","volume":"53 1","pages":"0"},"PeriodicalIF":2.8000,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fluent but Not Factual: A Comparative Analysis of ChatGPT and Other AI Chatbots’ Proficiency and Originality in Scientific Writing for Humanities\",\"authors\":\"Edisa Lozić, Benjamin Štular\",\"doi\":\"10.3390/fi15100336\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Historically, mastery of writing was deemed essential to human progress. However, recent advances in generative AI have marked an inflection point in this narrative, including for scientific writing. This article provides a comprehensive analysis of the capabilities and limitations of six AI chatbots in scholarly writing in the humanities and archaeology. The methodology was based on tagging AI-generated content for quantitative accuracy and qualitative precision by human experts. Quantitative accuracy assessed the factual correctness in a manner similar to grading students, while qualitative precision gauged the scientific contribution similar to reviewing a scientific article. In the quantitative test, ChatGPT-4 scored near the passing grade (−5) whereas ChatGPT-3.5 (−18), Bing (−21) and Bard (−31) were not far behind. Claude 2 (−75) and Aria (−80) scored much lower. In the qualitative test, all AI chatbots, but especially ChatGPT-4, demonstrated proficiency in recombining existing knowledge, but all failed to generate original scientific content. As a side note, our results suggest that with ChatGPT-4, the size of large language models has reached a plateau. Furthermore, this paper underscores the intricate and recursive nature of human research. This process of transforming raw data into refined knowledge is computationally irreducible, highlighting the challenges AI chatbots face in emulating human originality in scientific writing. Our results apply to the state of affairs in the third quarter of 2023. In conclusion, while large language models have revolutionised content generation, their ability to produce original scientific contributions in the humanities remains limited. We expect this to change in the near future as current large language model-based AI chatbots evolve into large language model-powered software.\",\"PeriodicalId\":37982,\"journal\":{\"name\":\"Future Internet\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2023-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Internet\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/fi15100336\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Internet","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/fi15100336","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 1

摘要

从历史上看，掌握文字被认为是人类进步的必要条件。然而，生成式人工智能的最新进展标志着这一叙事的转折点，包括科学写作。本文全面分析了六种人工智能聊天机器人在人文科学和考古学学术写作中的能力和局限性。该方法基于对人工智能生成的内容进行标记，以便由人类专家进行定量准确性和定性精度。定量准确性评估事实正确性的方式类似于给学生打分，而定性精度评估科学贡献的方式类似于审查一篇科学文章。在定量测试中，ChatGPT-4得分接近及格(- 5)，而ChatGPT-3.5 (- 18)， Bing(- 21)和Bard(- 31)紧随其后。Claude 2(- 75)和Aria(- 80)得分要低得多。在定性测试中，所有AI聊天机器人，尤其是ChatGPT-4，都表现出对现有知识重组的熟练程度，但都未能产生原创的科学内容。作为旁注，我们的结果表明，使用ChatGPT-4，大型语言模型的规模已经达到了一个平台。此外，本文强调了人类研究的复杂性和递归性。这种将原始数据转化为精炼知识的过程在计算上是不可约的，这凸显了人工智能聊天机器人在模仿人类科学写作原创性方面面临的挑战。我们的结果适用于2023年第三季度的情况。总之，尽管大型语言模型彻底改变了内容生成，但它们在人文科学领域做出原创科学贡献的能力仍然有限。随着目前基于大型语言模型的人工智能聊天机器人进化成基于大型语言模型的软件，我们预计这种情况将在不久的将来发生变化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fluent but Not Factual: A Comparative Analysis of ChatGPT and Other AI Chatbots’ Proficiency and Originality in Scientific Writing for Humanities

Historically, mastery of writing was deemed essential to human progress. However, recent advances in generative AI have marked an inflection point in this narrative, including for scientific writing. This article provides a comprehensive analysis of the capabilities and limitations of six AI chatbots in scholarly writing in the humanities and archaeology. The methodology was based on tagging AI-generated content for quantitative accuracy and qualitative precision by human experts. Quantitative accuracy assessed the factual correctness in a manner similar to grading students, while qualitative precision gauged the scientific contribution similar to reviewing a scientific article. In the quantitative test, ChatGPT-4 scored near the passing grade (−5) whereas ChatGPT-3.5 (−18), Bing (−21) and Bard (−31) were not far behind. Claude 2 (−75) and Aria (−80) scored much lower. In the qualitative test, all AI chatbots, but especially ChatGPT-4, demonstrated proficiency in recombining existing knowledge, but all failed to generate original scientific content. As a side note, our results suggest that with ChatGPT-4, the size of large language models has reached a plateau. Furthermore, this paper underscores the intricate and recursive nature of human research. This process of transforming raw data into refined knowledge is computationally irreducible, highlighting the challenges AI chatbots face in emulating human originality in scientific writing. Our results apply to the state of affairs in the third quarter of 2023. In conclusion, while large language models have revolutionised content generation, their ability to produce original scientific contributions in the humanities remains limited. We expect this to change in the near future as current large language model-based AI chatbots evolve into large language model-powered software.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Future Internet Computer Science-Computer Networks and Communications

CiteScore

7.10

自引率

5.90%

发文量

303

审稿时长

11 weeks

期刊介绍： Future Internet is a scholarly open access journal which provides an advanced forum for science and research concerned with evolution of Internet technologies and related smart systems for “Net-Living” development. The general reference subject is therefore the evolution towards the future internet ecosystem, which is feeding a continuous, intensive, artificial transformation of the lived environment, for a widespread and significant improvement of well-being in all spheres of human life (private, public, professional). Included topics are: • advanced communications network infrastructures • evolution of internet basic services • internet of things • netted peripheral sensors • industrial internet • centralized and distributed data centers • embedded computing • cloud computing • software defined network functions and network virtualization • cloud-let and fog-computing • big data, open data and analytical tools • cyber-physical systems • network and distributed operating systems • web services • semantic structures and related software tools • artificial and augmented intelligence • augmented reality • system interoperability and flexible service composition • smart mission-critical system architectures • smart terminals and applications • pro-sumer tools for application design and development • cyber security compliance • privacy compliance • reliability compliance • dependability compliance • accountability compliance • trust compliance • technical quality of basic services.