A corpus-driven comparative analysis of AI in academic discourse: Investigating ChatGPT-generated academic texts in social sciences

IF 1.1 3区 文学 0 LANGUAGE & LINGUISTICS Lingua Pub Date : 2024-11-09 DOI:10.1016/j.lingua.2024.103838
Giordano Tudino, Yan Qin
{"title":"A corpus-driven comparative analysis of AI in academic discourse: Investigating ChatGPT-generated academic texts in social sciences","authors":"Giordano Tudino,&nbsp;Yan Qin","doi":"10.1016/j.lingua.2024.103838","DOIUrl":null,"url":null,"abstract":"<div><div>Since its release in 2022, ChatGPT has found widespread application across various disciplines. While previous studies on Generative AI’s capabilities have predominantly concentrated on content quality assessments, little attention has been directed toward investigating the model’s linguistic patterns compared to human-generated language. To address this gap, we built two specialized corpora comprised of academic texts in social sciences generated by ChatGPT-4o mini and selected the Elsevier OA CC-BY Corpus as a reference for comparison, with a view to identifying commonalities and differences between AI-generated and human academic language and determining whether academic language instructions improve the model’s output in terms of formal rigor. The findings revealed limitations in ChatGPT’s handling of academic discourse in the following respects: overuse of infrequent “academic” vocabulary, limited use of subordination, and syntactic and semantic homogeneity. Besides, the effect of specific language-oriented prompts is primarily reflected in minor lexical adjustments. This study expands the scope of corpus linguistics research by incorporating AI-generated texts into the analytical framework and lays the groundwork for future improvements in the language model’s genre discrimination.</div></div>","PeriodicalId":47955,"journal":{"name":"Lingua","volume":"312 ","pages":"Article 103838"},"PeriodicalIF":1.1000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lingua","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0024384124001694","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Since its release in 2022, ChatGPT has found widespread application across various disciplines. While previous studies on Generative AI’s capabilities have predominantly concentrated on content quality assessments, little attention has been directed toward investigating the model’s linguistic patterns compared to human-generated language. To address this gap, we built two specialized corpora comprised of academic texts in social sciences generated by ChatGPT-4o mini and selected the Elsevier OA CC-BY Corpus as a reference for comparison, with a view to identifying commonalities and differences between AI-generated and human academic language and determining whether academic language instructions improve the model’s output in terms of formal rigor. The findings revealed limitations in ChatGPT’s handling of academic discourse in the following respects: overuse of infrequent “academic” vocabulary, limited use of subordination, and syntactic and semantic homogeneity. Besides, the effect of specific language-oriented prompts is primarily reflected in minor lexical adjustments. This study expands the scope of corpus linguistics research by incorporating AI-generated texts into the analytical framework and lays the groundwork for future improvements in the language model’s genre discrimination.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对学术话语中的人工智能进行语料库驱动的比较分析:调查由 ChatGPT 生成的社会科学学术文本
自 2022 年发布以来,ChatGPT 已在各个学科中得到广泛应用。以往关于生成式人工智能能力的研究主要集中在内容质量评估方面,而很少有人关注研究该模型与人类生成语言相比的语言模式。为了弥补这一不足,我们建立了两个由 ChatGPT-4o mini 生成的社会科学学术文本组成的专门语料库,并选择爱思唯尔 OA CC-BY 语料库作为对比参考,以期找出人工智能生成的学术语言与人类生成的学术语言之间的共性和差异,并确定学术语言指导是否能提高模型在形式严谨性方面的输出。研究结果表明,ChatGPT 在处理学术话语方面存在以下局限:过度使用不常用的 "学术 "词汇、从属关系使用有限、句法和语义同质化。此外,特定语言导向提示的效果主要体现在词汇的细微调整上。本研究将人工智能生成的文本纳入分析框架,拓展了语料库语言学的研究范围,并为今后改进语言模型的体裁辨别能力奠定了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Lingua
Lingua Multiple-
CiteScore
2.50
自引率
9.10%
发文量
93
审稿时长
24 weeks
期刊介绍: Lingua publishes papers of any length, if justified, as well as review articles surveying developments in the various fields of linguistics, and occasional discussions. A considerable number of pages in each issue are devoted to critical book reviews. Lingua also publishes Lingua Franca articles consisting of provocative exchanges expressing strong opinions on central topics in linguistics; The Decade In articles which are educational articles offering the nonspecialist linguist an overview of a given area of study; and Taking up the Gauntlet special issues composed of a set number of papers examining one set of data and exploring whose theory offers the most insight with a minimal set of assumptions and a maximum of arguments.
期刊最新文献
Sentence processing in Turkish: A review and future directions First acquiring articles in a second language: A new approach to the study of language and social cognition Interpreter mediation as other-initiated self-repair in court: Effects on the defence in Chinese bilingual criminal trials The merger of falling tones: A perception study in Taiyuan Jin Chinese Visual priming and parsing preferences: A self-paced reading study of PP-attachment ambiguity in Dutch verb-final structures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1