人工智能自动化卫生经济模型：评估大型语言模型潜在应用的案例研究》。

IF 2 Q2 ECONOMICS PharmacoEconomics Open Pub Date : 2024-03-01 Epub Date: 2024-02-10 DOI:10.1007/s41669-024-00477-8

Tim Reason, William Rawlinson, Julia Langham, Andy Gimblett, Bill Malcolm, Sven Klijn

{"title":"人工智能自动化卫生经济模型：评估大型语言模型潜在应用的案例研究》。","authors":"Tim Reason, William Rawlinson, Julia Langham, Andy Gimblett, Bill Malcolm, Sven Klijn","doi":"10.1007/s41669-024-00477-8","DOIUrl":null,"url":null,"abstract":"Background: Current generation large language models (LLMs) such as Generative Pre-Trained Transformer 4 (GPT-4) have achieved human-level performance on many tasks including the generation of computer code based on textual input. This study aimed to assess whether GPT-4 could be used to automatically programme two published health economic analyses.Methods: The two analyses were partitioned survival models evaluating interventions in non-small cell lung cancer (NSCLC) and renal cell carcinoma (RCC). We developed prompts which instructed GPT-4 to programme the NSCLC and RCC models in R, and which provided descriptions of each model's methods, assumptions and parameter values. The results of the generated scripts were compared to the published values from the original, human-programmed models. The models were replicated 15 times to capture variability in GPT-4's output.Results: GPT-4 fully replicated the NSCLC model with high accuracy: 100% (15/15) of the artificial intelligence (AI)-generated NSCLC models were error-free or contained a single minor error, and 93% (14/15) were completely error-free. GPT-4 closely replicated the RCC model, although human intervention was required to simplify an element of the model design (one of the model's fifteen input calculations) because it used too many sequential steps to be implemented in a single prompt. With this simplification, 87% (13/15) of the AI-generated RCC models were error-free or contained a single minor error, and 60% (9/15) were completely error-free. Error-free model scripts replicated the published incremental cost-effectiveness ratios to within 1%.Conclusion: This study provides a promising indication that GPT-4 can have practical applications in the automation of health economic model construction. Potential benefits include accelerated model development timelines and reduced costs of development. Further research is necessary to explore the generalisability of LLM-based automation across a larger sample of models.","PeriodicalId":19770,"journal":{"name":"PharmacoEconomics Open","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10884386/pdf/","citationCount":"0","resultStr":"{\"title\":\"Artificial Intelligence to Automate Health Economic Modelling: A Case Study to Evaluate the Potential Application of Large Language Models.\",\"authors\":\"Tim Reason, William Rawlinson, Julia Langham, Andy Gimblett, Bill Malcolm, Sven Klijn\",\"doi\":\"10.1007/s41669-024-00477-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Current generation large language models (LLMs) such as Generative Pre-Trained Transformer 4 (GPT-4) have achieved human-level performance on many tasks including the generation of computer code based on textual input. This study aimed to assess whether GPT-4 could be used to automatically programme two published health economic analyses.Methods: The two analyses were partitioned survival models evaluating interventions in non-small cell lung cancer (NSCLC) and renal cell carcinoma (RCC). We developed prompts which instructed GPT-4 to programme the NSCLC and RCC models in R, and which provided descriptions of each model's methods, assumptions and parameter values. The results of the generated scripts were compared to the published values from the original, human-programmed models. The models were replicated 15 times to capture variability in GPT-4's output.Results: GPT-4 fully replicated the NSCLC model with high accuracy: 100% (15/15) of the artificial intelligence (AI)-generated NSCLC models were error-free or contained a single minor error, and 93% (14/15) were completely error-free. GPT-4 closely replicated the RCC model, although human intervention was required to simplify an element of the model design (one of the model's fifteen input calculations) because it used too many sequential steps to be implemented in a single prompt. With this simplification, 87% (13/15) of the AI-generated RCC models were error-free or contained a single minor error, and 60% (9/15) were completely error-free. Error-free model scripts replicated the published incremental cost-effectiveness ratios to within 1%.Conclusion: This study provides a promising indication that GPT-4 can have practical applications in the automation of health economic model construction. Potential benefits include accelerated model development timelines and reduced costs of development. Further research is necessary to explore the generalisability of LLM-based automation across a larger sample of models.\",\"PeriodicalId\":19770,\"journal\":{\"name\":\"PharmacoEconomics Open\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10884386/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PharmacoEconomics Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s41669-024-00477-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/2/10 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PharmacoEconomics Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41669-024-00477-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/10 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：新一代大型语言模型（LLMs），如生成预训练转换器 4（GPT-4），在许多任务中都达到了人类水平，包括根据文本输入生成计算机代码。本研究旨在评估 GPT-4 能否用于自动编制两份已发布的卫生经济分析报告：这两项分析是评估非小细胞肺癌（NSCLC）和肾细胞癌（RCC）干预措施的分区生存模型。我们开发了一些提示，指导 GPT-4 在 R 语言中对 NSCLC 和 RCC 模型进行编程，并对每个模型的方法、假设和参数值进行了说明。生成脚本的结果与人类编程的原始模型的公布值进行了比较。这些模型被复制了 15 次，以捕捉 GPT-4 输出中的变化：GPT-4完全复制了NSCLC模型，准确率很高：人工智能（AI）生成的NSCLC模型100%（15/15）没有错误或包含一个小错误，93%（14/15）完全没有错误。GPT-4 密切复制了 RCC 模型，不过需要人工干预来简化模型设计中的一个元素（模型 15 项输入计算中的一项），因为它使用了太多的连续步骤，无法在单个提示中实现。经过简化后，87%（13/15）的人工智能生成的 RCC 模型没有错误或只包含一个小错误，60%（9/15）的模型完全没有错误。无差错模型脚本复制了已公布的增量成本效益比，误差在 1%以内：这项研究为 GPT-4 在卫生经济模型构建自动化方面的实际应用提供了良好的迹象。潜在的益处包括加快模型开发时间并降低开发成本。有必要开展进一步研究，探讨基于 LLM 的自动化在更大样本模型中的通用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Artificial Intelligence to Automate Health Economic Modelling: A Case Study to Evaluate the Potential Application of Large Language Models.

Background: Current generation large language models (LLMs) such as Generative Pre-Trained Transformer 4 (GPT-4) have achieved human-level performance on many tasks including the generation of computer code based on textual input. This study aimed to assess whether GPT-4 could be used to automatically programme two published health economic analyses.

Methods: The two analyses were partitioned survival models evaluating interventions in non-small cell lung cancer (NSCLC) and renal cell carcinoma (RCC). We developed prompts which instructed GPT-4 to programme the NSCLC and RCC models in R, and which provided descriptions of each model's methods, assumptions and parameter values. The results of the generated scripts were compared to the published values from the original, human-programmed models. The models were replicated 15 times to capture variability in GPT-4's output.

Results: GPT-4 fully replicated the NSCLC model with high accuracy: 100% (15/15) of the artificial intelligence (AI)-generated NSCLC models were error-free or contained a single minor error, and 93% (14/15) were completely error-free. GPT-4 closely replicated the RCC model, although human intervention was required to simplify an element of the model design (one of the model's fifteen input calculations) because it used too many sequential steps to be implemented in a single prompt. With this simplification, 87% (13/15) of the AI-generated RCC models were error-free or contained a single minor error, and 60% (9/15) were completely error-free. Error-free model scripts replicated the published incremental cost-effectiveness ratios to within 1%.

Conclusion: This study provides a promising indication that GPT-4 can have practical applications in the automation of health economic model construction. Potential benefits include accelerated model development timelines and reduced costs of development. Further research is necessary to explore the generalisability of LLM-based automation across a larger sample of models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PharmacoEconomics Open Multiple-

CiteScore

3.50

自引率

0.00%

发文量

审稿时长

8 weeks

期刊介绍： PharmacoEconomics - Open focuses on applied research on the economic implications and health outcomes associated with drugs, devices and other healthcare interventions. The journal includes, but is not limited to, the following research areas:Economic analysis of healthcare interventionsHealth outcomes researchCost-of-illness studiesQuality-of-life studiesAdditional digital features (including animated abstracts, video abstracts, slide decks, audio slides, instructional videos, infographics, podcasts and animations) can be published with articles; these are designed to increase the visibility, readership and educational value of the journal’s content. In addition, articles published in PharmacoEconomics -Open may be accompanied by plain language summaries to assist readers who have some knowledge of, but not in-depth expertise in, the area to understand important medical advances.All manuscripts are subject to peer review by international experts. Letters to the Editor are welcomed and will be considered for publication.