ChatGPT for Univariate Statistics: Validation of AI-Assisted Data Analysis in Healthcare Research.

IF 6 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES Journal of Medical Internet Research Pub Date : 2025-02-07 DOI:10.2196/63550

Michael R Ruta, Tony Gaidici, Chase Irwin, Jonathan Lifshitz

{"title":"ChatGPT for Univariate Statistics: Validation of AI-Assisted Data Analysis in Healthcare Research.","authors":"Michael R Ruta, Tony Gaidici, Chase Irwin, Jonathan Lifshitz","doi":"10.2196/63550","DOIUrl":null,"url":null,"abstract":"Background: ChatGPT, a conversational artificial intelligence developed by OpenAI, has rapidly become an invaluable tool for researchers. With the recent integration of Python code interpretation into the ChatGPT environment, there has been a significant increase in the potential utility of ChatGPT as a research tool, particularly in terms of data analysis applications.Objective: This study aimed to assess ChatGPT as a data analysis tool and provide researchers with a framework for applying ChatGPT to data management tasks, descriptive statistics, and inferential statistics.Methods: A subset of the National Inpatient Sample was extracted. Data analysis trials were divided into data processing, categorization, and tabulation, as well as descriptive and inferential statistics. For data processing, categorization, and tabulation assessments, ChatGPT was prompted to reclassify variables, subset variables, and present data, respectively. Descriptive statistics assessments included mean, SD, median, and IQR calculations. Inferential statistics assessments were conducted at varying levels of prompt specificity (\"Basic,\" \"Intermediate,\" and \"Advanced\"). Specific tests included chi-square, Pearson correlation, independent 2-sample t test, 1-way ANOVA, Fisher exact, Spearman correlation, Mann-Whitney U test, and Kruskal-Wallis H test. Outcomes from consecutive prompt-based trials were assessed against expected statistical values calculated in Python (Python Software Foundation), SAS (SAS Institute), and RStudio (Posit PBC).Results: ChatGPT accurately performed data processing, categorization, and tabulation across all trials. For descriptive statistics, it provided accurate means, SDs, medians, and IQRs across all trials. Inferential statistics accuracy against expected statistical values varied with prompt specificity: 32.5% accuracy for \"Basic\" prompts, 81.3% for \"Intermediate\" prompts, and 92.5% for \"Advanced\" prompts.Conclusions: ChatGPT shows promise as a tool for exploratory data analysis, particularly for researchers with some statistical knowledge and limited programming expertise. However, its application requires careful prompt construction and human oversight to ensure accuracy. As a supplementary tool, ChatGPT can enhance data analysis efficiency and broaden research accessibility.","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e63550"},"PeriodicalIF":6.0000,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11845875/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/63550","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: ChatGPT, a conversational artificial intelligence developed by OpenAI, has rapidly become an invaluable tool for researchers. With the recent integration of Python code interpretation into the ChatGPT environment, there has been a significant increase in the potential utility of ChatGPT as a research tool, particularly in terms of data analysis applications.

Objective: This study aimed to assess ChatGPT as a data analysis tool and provide researchers with a framework for applying ChatGPT to data management tasks, descriptive statistics, and inferential statistics.

Methods: A subset of the National Inpatient Sample was extracted. Data analysis trials were divided into data processing, categorization, and tabulation, as well as descriptive and inferential statistics. For data processing, categorization, and tabulation assessments, ChatGPT was prompted to reclassify variables, subset variables, and present data, respectively. Descriptive statistics assessments included mean, SD, median, and IQR calculations. Inferential statistics assessments were conducted at varying levels of prompt specificity ("Basic," "Intermediate," and "Advanced"). Specific tests included chi-square, Pearson correlation, independent 2-sample t test, 1-way ANOVA, Fisher exact, Spearman correlation, Mann-Whitney U test, and Kruskal-Wallis H test. Outcomes from consecutive prompt-based trials were assessed against expected statistical values calculated in Python (Python Software Foundation), SAS (SAS Institute), and RStudio (Posit PBC).

Results: ChatGPT accurately performed data processing, categorization, and tabulation across all trials. For descriptive statistics, it provided accurate means, SDs, medians, and IQRs across all trials. Inferential statistics accuracy against expected statistical values varied with prompt specificity: 32.5% accuracy for "Basic" prompts, 81.3% for "Intermediate" prompts, and 92.5% for "Advanced" prompts.

Conclusions: ChatGPT shows promise as a tool for exploratory data analysis, particularly for researchers with some statistical knowledge and limited programming expertise. However, its application requires careful prompt construction and human oversight to ensure accuracy. As a supplementary tool, ChatGPT can enhance data analysis efficiency and broaden research accessibility.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

单变量统计的ChatGPT：医疗保健研究中人工智能辅助数据分析的验证。

背景：ChatGPT是OpenAI开发的一种会话人工智能，已经迅速成为研究人员的宝贵工具。随着Python代码解释最近集成到ChatGPT环境中，ChatGPT作为研究工具的潜在效用显著增加，特别是在数据分析应用程序方面。目的：本研究旨在评估ChatGPT作为数据分析工具，并为研究人员提供将ChatGPT应用于数据管理任务、描述性统计和推理统计的框架。方法：抽取全国住院病人样本的一个子集。数据分析试验分为数据处理、分类和制表，以及描述性和推断性统计。对于数据处理、分类和制表评估，ChatGPT被提示分别对变量、子集变量和当前数据进行重新分类。描述性统计评估包括平均值、标准差、中位数和IQR计算。根据提示特异性的不同水平（“基本”、“中级”和“高级”）进行推理统计评估。具体检验包括卡方检验、Pearson相关检验、独立双样本t检验、单因素方差分析、Fisher精确检验、Spearman相关检验、Mann-Whitney U检验和Kruskal-Wallis H检验。根据Python （Python Software Foundation）、SAS （SAS Institute）和RStudio （Posit PBC）计算的预期统计值对连续基于提示的试验的结果进行评估。结果：ChatGPT在所有试验中准确地进行了数据处理、分类和制表。对于描述性统计，它提供了所有试验的准确均值、标准差、中位数和iqr。对预期统计值的推断统计准确性随着提示的特异性而变化：“基本”提示的准确率为32.5%，“中级”提示的准确率为81.3%，“高级”提示的准确率为92.5%。结论：ChatGPT显示了作为探索性数据分析工具的前景，特别是对于具有一些统计知识和有限编程专业知识的研究人员。然而，它的应用需要谨慎、及时的施工和人工监督，以确保准确性。ChatGPT作为辅助工具，可以提高数据分析效率，拓宽研究可及性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Medical Internet Research 医学-卫生保健

CiteScore

14.40

自引率

5.40%

发文量

654

审稿时长

1 months

期刊介绍： The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades. As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor. Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.