ChatGPT’s scorecard after the performance in a series of tests conducted at the multi-country level: A pattern of responses of generative artificial intelligence or large language models

IF 3.6 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Current Research in Biotechnology Pub Date : 2024-01-01 DOI:10.1016/j.crbiot.2024.100194

Manojit Bhattacharya , Soumen Pal , Srijan Chatterjee , Abdulrahman Alshammari , Thamer H. Albekairi , Supriya Jagga , Elijah Ige Ohimain , Hatem Zayed , Siddappa N. Byrareddy , Sang-Soo Lee , Zhi-Hong Wen , Govindasamy Agoramoorthy , Prosun Bhattacharya , Chiranjib Chakraborty

{"title":"ChatGPT’s scorecard after the performance in a series of tests conducted at the multi-country level: A pattern of responses of generative artificial intelligence or large language models","authors":"Manojit Bhattacharya , Soumen Pal , Srijan Chatterjee , Abdulrahman Alshammari , Thamer H. Albekairi , Supriya Jagga , Elijah Ige Ohimain , Hatem Zayed , Siddappa N. Byrareddy , Sang-Soo Lee , Zhi-Hong Wen , Govindasamy Agoramoorthy , Prosun Bhattacharya , Chiranjib Chakraborty","doi":"10.1016/j.crbiot.2024.100194","DOIUrl":null,"url":null,"abstract":"<div>Recently, researchers have shown concern about the ChatGPT-derived answers. Here, we conducted a series of tests using ChatGPT by individual researcher at multi-country level to understand the pattern of its answer accuracy, reproducibility, answer length, plagiarism, and in-depth using two questionnaires (the first set with 15 MCQs and the second 15 KBQ). Among 15 MCQ-generated answers, 13 <math><mo>±</mo></math> 70 were correct (Median : 82.5; Coefficient variance : 4.85), 3 <math><mo>±</mo></math> 0.77 were incorrect (Median: 3, Coefficient variance: 25.81), and 1 to 10 were reproducible, and 11 to 15 were not. Among 15 KBQ, the length of each question (in words) is about 294.5 <math><mo>±</mo></math> 97.60 (mean range varies from 138.7 to 438.09), and the mean similarity index (in words) is about 29.53 <math><mo>±</mo></math> 11.40 (Coefficient variance: 38.62) for each question. The statistical models were also developed using analyzed parameters of answers. The study shows a pattern of ChatGPT-derive answers with correctness and incorrectness and urges for an error-free, next-generation LLM to avoid users’ misguidance.</div>","PeriodicalId":52676,"journal":{"name":"Current Research in Biotechnology","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590262824000200/pdfft?md5=c02e55a054a90a3e570a4fed3056ffaf&pid=1-s2.0-S2590262824000200-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Research in Biotechnology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590262824000200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, researchers have shown concern about the ChatGPT-derived answers. Here, we conducted a series of tests using ChatGPT by individual researcher at multi-country level to understand the pattern of its answer accuracy, reproducibility, answer length, plagiarism, and in-depth using two questionnaires (the first set with 15 MCQs and the second 15 KBQ). Among 15 MCQ-generated answers, 13 $\pm$ 70 were correct (Median : 82.5; Coefficient variance : 4.85), 3 $\pm$ 0.77 were incorrect (Median: 3, Coefficient variance: 25.81), and 1 to 10 were reproducible, and 11 to 15 were not. Among 15 KBQ, the length of each question (in words) is about 294.5 $\pm$ 97.60 (mean range varies from 138.7 to 438.09), and the mean similarity index (in words) is about 29.53 $\pm$ 11.40 (Coefficient variance: 38.62) for each question. The statistical models were also developed using analyzed parameters of answers. The study shows a pattern of ChatGPT-derive answers with correctness and incorrectness and urges for an error-free, next-generation LLM to avoid users’ misguidance.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ChatGPT 的记分卡是在多国进行一系列测试后得出的：生成式人工智能或大型语言模型的响应模式

最近，研究人员对 ChatGPT 派生答案表示担忧。在此，我们使用 ChatGPT 在多国范围内进行了一系列测试，以了解其答案的准确性、可重复性、答案长度、抄袭情况，并通过两份问卷（第一份问卷包含 15 个 MCQ，第二份问卷包含 15 个 KBQ）进行了深入研究。在 15 个 MCQ 生成的答案中，正确率为 13 ± 70（中位数：82.5；系数方差：4.85），错误率为 3 ± 0.77（中位数：3，系数方差：25.81），可重现性为 1 至 10，不可重现性为 11 至 15。在 15 个知识库问题中，每个问题的长度（以字为单位）约为 294.5 ± 97.60（平均范围在 138.7 至 438.09 之间），每个问题的平均相似度指数（以字为单位）约为 29.53 ± 11.40（系数方差：38.62）。此外，还利用分析的答案参数建立了统计模型。研究显示了 ChatGPT 派生答案的正确性和不正确性模式，并敦促开发无差错的下一代 LLM，以避免用户的误导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Current Research in Biotechnology Biochemistry, Genetics and Molecular Biology-Biotechnology

CiteScore

6.70

自引率

3.60%

发文量

审稿时长

38 days

期刊介绍： Current Research in Biotechnology (CRBIOT) is a new primary research, gold open access journal from Elsevier. CRBIOT publishes original papers, reviews, and short communications (including viewpoints and perspectives) resulting from research in biotechnology and biotech-associated disciplines. Current Research in Biotechnology is a peer-reviewed gold open access (OA) journal and upon acceptance all articles are permanently and freely available. It is a companion to the highly regarded review journal Current Opinion in Biotechnology (2018 CiteScore 8.450) and is part of the Current Opinion and Research (CO+RE) suite of journals. All CO+RE journals leverage the Current Opinion legacy-of editorial excellence, high-impact, and global reach-to ensure they are a widely read resource that is integral to scientists' workflow.