{"title":"人工智能用于泌尿学研究:数据科学的圣杯还是错误信息的潘多拉魔盒?","authors":"Ryan M Blake, Johnathan A Khusid","doi":"10.1089/end.2023.0703","DOIUrl":null,"url":null,"abstract":"<p><p><b><i>Introduction:</i></b> Artificial intelligence tools such as the large language models (LLMs) Bard and ChatGPT have generated significant research interest. Utilization of these LLMs to study the epidemiology of a target population could benefit urologists. We investigated whether Bard and ChatGPT can perform a large-scale calculation of the incidence and prevalence of kidney stone disease. <b><i>Materials and Methods:</i></b> We obtained reference values from two published studies, which used the National Health and Nutrition Examination Survey (NHANES) database to calculate the prevalence and incidence of kidney stone disease. We then tested the capability of Bard and ChatGPT to perform similar calculations using two different methods. First, we instructed the LLMs to access the data sets and independently perform the calculation. Second, we instructed the interfaces to generate a customized computer code, which could perform the calculation on downloaded data sets. <b><i>Results:</i></b> While ChatGPT denied the ability to access and perform calculations on the NHANES database, Bard intermittently claimed the ability to do so. Bard provided either accurate results or inaccurate and inconsistent results. For example, Bard's \"calculations\" for the incidence of kidney stones from 2015 to 2018 were 2.1% (95% CI 1.5-2.7), 1.75% (95% CI 1.6-1.9), and 0.8% (95% CI 0.7-0.9), while the published number was 2.1% (95% CI 1.5-2.7). Bard provided discrete mathematical details of its calculations, however, when prompted further, admitted to having obtained the numbers from online sources, including our chosen reference articles, rather than from a <i>de novo</i> calculation. Both LLMs were able to produce a code (Python) to use on the downloaded NHANES data sets, however, these would not readily execute. <b><i>Conclusions:</i></b> ChatGPT and Bard are currently incapable of performing epidemiologic calculations and lack transparency and accountability. Caution should be used, particularly with Bard, as claims of its capabilities were convincingly misleading, and results were inconsistent.</p>","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Artificial Intelligence for Urology Research: The Holy Grail of Data Science or Pandora's Box of Misinformation?\",\"authors\":\"Ryan M Blake, Johnathan A Khusid\",\"doi\":\"10.1089/end.2023.0703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b><i>Introduction:</i></b> Artificial intelligence tools such as the large language models (LLMs) Bard and ChatGPT have generated significant research interest. Utilization of these LLMs to study the epidemiology of a target population could benefit urologists. We investigated whether Bard and ChatGPT can perform a large-scale calculation of the incidence and prevalence of kidney stone disease. <b><i>Materials and Methods:</i></b> We obtained reference values from two published studies, which used the National Health and Nutrition Examination Survey (NHANES) database to calculate the prevalence and incidence of kidney stone disease. We then tested the capability of Bard and ChatGPT to perform similar calculations using two different methods. First, we instructed the LLMs to access the data sets and independently perform the calculation. Second, we instructed the interfaces to generate a customized computer code, which could perform the calculation on downloaded data sets. <b><i>Results:</i></b> While ChatGPT denied the ability to access and perform calculations on the NHANES database, Bard intermittently claimed the ability to do so. Bard provided either accurate results or inaccurate and inconsistent results. For example, Bard's \\\"calculations\\\" for the incidence of kidney stones from 2015 to 2018 were 2.1% (95% CI 1.5-2.7), 1.75% (95% CI 1.6-1.9), and 0.8% (95% CI 0.7-0.9), while the published number was 2.1% (95% CI 1.5-2.7). Bard provided discrete mathematical details of its calculations, however, when prompted further, admitted to having obtained the numbers from online sources, including our chosen reference articles, rather than from a <i>de novo</i> calculation. Both LLMs were able to produce a code (Python) to use on the downloaded NHANES data sets, however, these would not readily execute. <b><i>Conclusions:</i></b> ChatGPT and Bard are currently incapable of performing epidemiologic calculations and lack transparency and accountability. Caution should be used, particularly with Bard, as claims of its capabilities were convincingly misleading, and results were inconsistent.</p>\",\"PeriodicalId\":15723,\"journal\":{\"name\":\"Journal of endourology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of endourology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1089/end.2023.0703\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/5/15 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"UROLOGY & NEPHROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of endourology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1089/end.2023.0703","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
Artificial Intelligence for Urology Research: The Holy Grail of Data Science or Pandora's Box of Misinformation?
Introduction: Artificial intelligence tools such as the large language models (LLMs) Bard and ChatGPT have generated significant research interest. Utilization of these LLMs to study the epidemiology of a target population could benefit urologists. We investigated whether Bard and ChatGPT can perform a large-scale calculation of the incidence and prevalence of kidney stone disease. Materials and Methods: We obtained reference values from two published studies, which used the National Health and Nutrition Examination Survey (NHANES) database to calculate the prevalence and incidence of kidney stone disease. We then tested the capability of Bard and ChatGPT to perform similar calculations using two different methods. First, we instructed the LLMs to access the data sets and independently perform the calculation. Second, we instructed the interfaces to generate a customized computer code, which could perform the calculation on downloaded data sets. Results: While ChatGPT denied the ability to access and perform calculations on the NHANES database, Bard intermittently claimed the ability to do so. Bard provided either accurate results or inaccurate and inconsistent results. For example, Bard's "calculations" for the incidence of kidney stones from 2015 to 2018 were 2.1% (95% CI 1.5-2.7), 1.75% (95% CI 1.6-1.9), and 0.8% (95% CI 0.7-0.9), while the published number was 2.1% (95% CI 1.5-2.7). Bard provided discrete mathematical details of its calculations, however, when prompted further, admitted to having obtained the numbers from online sources, including our chosen reference articles, rather than from a de novo calculation. Both LLMs were able to produce a code (Python) to use on the downloaded NHANES data sets, however, these would not readily execute. Conclusions: ChatGPT and Bard are currently incapable of performing epidemiologic calculations and lack transparency and accountability. Caution should be used, particularly with Bard, as claims of its capabilities were convincingly misleading, and results were inconsistent.
期刊介绍:
Journal of Endourology, JE Case Reports, and Videourology are the leading peer-reviewed journal, case reports publication, and innovative videojournal companion covering all aspects of minimally invasive urology research, applications, and clinical outcomes.
The leading journal of minimally invasive urology for over 30 years, Journal of Endourology is the essential publication for practicing surgeons who want to keep up with the latest surgical technologies in endoscopic, laparoscopic, robotic, and image-guided procedures as they apply to benign and malignant diseases of the genitourinary tract. This flagship journal includes the companion videojournal Videourology™ with every subscription. While Journal of Endourology remains focused on publishing rigorously peer reviewed articles, Videourology accepts original videos containing material that has not been reported elsewhere, except in the form of an abstract or a conference presentation.
Journal of Endourology coverage includes:
The latest laparoscopic, robotic, endoscopic, and image-guided techniques for treating both benign and malignant conditions
Pioneering research articles
Controversial cases in endourology
Techniques in endourology with accompanying videos
Reviews and epochs in endourology
Endourology survey section of endourology relevant manuscripts published in other journals.