{"title":"A corpus-driven comparative analysis of AI in academic discourse: Investigating ChatGPT-generated academic texts in social sciences","authors":"Giordano Tudino, Yan Qin","doi":"10.1016/j.lingua.2024.103838","DOIUrl":null,"url":null,"abstract":"<div><div>Since its release in 2022, ChatGPT has found widespread application across various disciplines. While previous studies on Generative AI’s capabilities have predominantly concentrated on content quality assessments, little attention has been directed toward investigating the model’s linguistic patterns compared to human-generated language. To address this gap, we built two specialized corpora comprised of academic texts in social sciences generated by ChatGPT-4o mini and selected the Elsevier OA CC-BY Corpus as a reference for comparison, with a view to identifying commonalities and differences between AI-generated and human academic language and determining whether academic language instructions improve the model’s output in terms of formal rigor. The findings revealed limitations in ChatGPT’s handling of academic discourse in the following respects: overuse of infrequent “academic” vocabulary, limited use of subordination, and syntactic and semantic homogeneity. Besides, the effect of specific language-oriented prompts is primarily reflected in minor lexical adjustments. This study expands the scope of corpus linguistics research by incorporating AI-generated texts into the analytical framework and lays the groundwork for future improvements in the language model’s genre discrimination.</div></div>","PeriodicalId":47955,"journal":{"name":"Lingua","volume":"312 ","pages":"Article 103838"},"PeriodicalIF":1.1000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lingua","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0024384124001694","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Since its release in 2022, ChatGPT has found widespread application across various disciplines. While previous studies on Generative AI’s capabilities have predominantly concentrated on content quality assessments, little attention has been directed toward investigating the model’s linguistic patterns compared to human-generated language. To address this gap, we built two specialized corpora comprised of academic texts in social sciences generated by ChatGPT-4o mini and selected the Elsevier OA CC-BY Corpus as a reference for comparison, with a view to identifying commonalities and differences between AI-generated and human academic language and determining whether academic language instructions improve the model’s output in terms of formal rigor. The findings revealed limitations in ChatGPT’s handling of academic discourse in the following respects: overuse of infrequent “academic” vocabulary, limited use of subordination, and syntactic and semantic homogeneity. Besides, the effect of specific language-oriented prompts is primarily reflected in minor lexical adjustments. This study expands the scope of corpus linguistics research by incorporating AI-generated texts into the analytical framework and lays the groundwork for future improvements in the language model’s genre discrimination.
期刊介绍:
Lingua publishes papers of any length, if justified, as well as review articles surveying developments in the various fields of linguistics, and occasional discussions. A considerable number of pages in each issue are devoted to critical book reviews. Lingua also publishes Lingua Franca articles consisting of provocative exchanges expressing strong opinions on central topics in linguistics; The Decade In articles which are educational articles offering the nonspecialist linguist an overview of a given area of study; and Taking up the Gauntlet special issues composed of a set number of papers examining one set of data and exploring whose theory offers the most insight with a minimal set of assumptions and a maximum of arguments.