The imitation game: Detecting human and AI-generated texts in the era of ChatGPT and BARD

IF 1.8 4区管理学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Information Science Pub Date : 2024-02-14 DOI:10.1177/01655515241227531

Kadhim Hayawi, Sakib Shahriar, Sujith Samuel Mathew

{"title":"The imitation game: Detecting human and AI-generated texts in the era of ChatGPT and BARD","authors":"Kadhim Hayawi, Sakib Shahriar, Sujith Samuel Mathew","doi":"10.1177/01655515241227531","DOIUrl":null,"url":null,"abstract":"The potential of artificial intelligence (AI)-based large language models (LLMs) holds considerable promise in revolutionising education, research and practice. However, distinguishing between human-written and AI-generated text has become a significant task. This article presents a comparative study, introducing a novel dataset of human-written and LLM-generated texts in different genres: essays, stories, poetry and Python code. We employ several machine learning models to classify the texts. Results demonstrate the efficacy of these models in discerning between human and AI-generated text, despite the dataset’s limited sample size. However, the task becomes more challenging when classifying GPT-generated text, particularly in story writing. The results indicate that the models exhibit superior performance in binary classification tasks, such as distinguishing human-generated text from a specific LLM, compared with the more complex multiclass tasks that involve discerning among human-generated and multiple LLMs. Our findings provide insightful implications for AI text detection, while our dataset paves the way for future research in this evolving area.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":"2015 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/01655515241227531","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The potential of artificial intelligence (AI)-based large language models (LLMs) holds considerable promise in revolutionising education, research and practice. However, distinguishing between human-written and AI-generated text has become a significant task. This article presents a comparative study, introducing a novel dataset of human-written and LLM-generated texts in different genres: essays, stories, poetry and Python code. We employ several machine learning models to classify the texts. Results demonstrate the efficacy of these models in discerning between human and AI-generated text, despite the dataset’s limited sample size. However, the task becomes more challenging when classifying GPT-generated text, particularly in story writing. The results indicate that the models exhibit superior performance in binary classification tasks, such as distinguishing human-generated text from a specific LLM, compared with the more complex multiclass tasks that involve discerning among human-generated and multiple LLMs. Our findings provide insightful implications for AI text detection, while our dataset paves the way for future research in this evolving area.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

模仿游戏：在 ChatGPT 和 BARD 时代检测人类和人工智能生成的文本

基于人工智能（AI）的大型语言模型（LLMs）潜力巨大，有望彻底改变教育、研究和实践。然而，区分人类撰写的文本和人工智能生成的文本已成为一项重要任务。本文介绍了一项比较研究，引入了一个由人类撰写和 LLM 生成的不同体裁文本的新数据集：散文、故事、诗歌和 Python 代码。我们采用多种机器学习模型对文本进行分类。结果表明，尽管数据集的样本量有限，但这些模型在区分人类和人工智能生成的文本方面非常有效。然而，在对 GPT 生成的文本进行分类时，任务变得更具挑战性，尤其是在故事写作方面。结果表明，模型在二元分类任务（如区分人类生成的文本和特定的 LLM）中表现出优异的性能，而在涉及区分人类生成的文本和多个 LLM 的更复杂的多分类任务中，模型的性能则有所下降。我们的研究结果为人工智能文本检测提供了深刻的启示，而我们的数据集则为这一不断发展的领域的未来研究铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Information Science 工程技术-计算机：信息系统

CiteScore

6.80

自引率

8.30%

发文量

121

审稿时长

4 months

期刊介绍： The Journal of Information Science is a peer-reviewed international journal of high repute covering topics of interest to all those researching and working in the sciences of information and knowledge management. The Editors welcome material on any aspect of information science theory, policy, application or practice that will advance thinking in the field.