Pub Date : 2024-08-01DOI: 10.1089/end.2024.32789.pd
Prokar Dasgupta, Ashok Hemal, Glenn Preminger, Roger Sur
{"title":"The Rise of AI in Endourology and Robotic Surgery.","authors":"Prokar Dasgupta, Ashok Hemal, Glenn Preminger, Roger Sur","doi":"10.1089/end.2024.32789.pd","DOIUrl":"https://doi.org/10.1089/end.2024.32789.pd","url":null,"abstract":"","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141909807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2024-05-15DOI: 10.1089/end.2023.0703
Ryan M Blake, Johnathan A Khusid
Introduction: Artificial intelligence tools such as the large language models (LLMs) Bard and ChatGPT have generated significant research interest. Utilization of these LLMs to study the epidemiology of a target population could benefit urologists. We investigated whether Bard and ChatGPT can perform a large-scale calculation of the incidence and prevalence of kidney stone disease. Materials and Methods: We obtained reference values from two published studies, which used the National Health and Nutrition Examination Survey (NHANES) database to calculate the prevalence and incidence of kidney stone disease. We then tested the capability of Bard and ChatGPT to perform similar calculations using two different methods. First, we instructed the LLMs to access the data sets and independently perform the calculation. Second, we instructed the interfaces to generate a customized computer code, which could perform the calculation on downloaded data sets. Results: While ChatGPT denied the ability to access and perform calculations on the NHANES database, Bard intermittently claimed the ability to do so. Bard provided either accurate results or inaccurate and inconsistent results. For example, Bard's "calculations" for the incidence of kidney stones from 2015 to 2018 were 2.1% (95% CI 1.5-2.7), 1.75% (95% CI 1.6-1.9), and 0.8% (95% CI 0.7-0.9), while the published number was 2.1% (95% CI 1.5-2.7). Bard provided discrete mathematical details of its calculations, however, when prompted further, admitted to having obtained the numbers from online sources, including our chosen reference articles, rather than from a de novo calculation. Both LLMs were able to produce a code (Python) to use on the downloaded NHANES data sets, however, these would not readily execute. Conclusions: ChatGPT and Bard are currently incapable of performing epidemiologic calculations and lack transparency and accountability. Caution should be used, particularly with Bard, as claims of its capabilities were convincingly misleading, and results were inconsistent.
{"title":"Artificial Intelligence for Urology Research: The Holy Grail of Data Science or Pandora's Box of Misinformation?","authors":"Ryan M Blake, Johnathan A Khusid","doi":"10.1089/end.2023.0703","DOIUrl":"10.1089/end.2023.0703","url":null,"abstract":"<p><p><b><i>Introduction:</i></b> Artificial intelligence tools such as the large language models (LLMs) Bard and ChatGPT have generated significant research interest. Utilization of these LLMs to study the epidemiology of a target population could benefit urologists. We investigated whether Bard and ChatGPT can perform a large-scale calculation of the incidence and prevalence of kidney stone disease. <b><i>Materials and Methods:</i></b> We obtained reference values from two published studies, which used the National Health and Nutrition Examination Survey (NHANES) database to calculate the prevalence and incidence of kidney stone disease. We then tested the capability of Bard and ChatGPT to perform similar calculations using two different methods. First, we instructed the LLMs to access the data sets and independently perform the calculation. Second, we instructed the interfaces to generate a customized computer code, which could perform the calculation on downloaded data sets. <b><i>Results:</i></b> While ChatGPT denied the ability to access and perform calculations on the NHANES database, Bard intermittently claimed the ability to do so. Bard provided either accurate results or inaccurate and inconsistent results. For example, Bard's \"calculations\" for the incidence of kidney stones from 2015 to 2018 were 2.1% (95% CI 1.5-2.7), 1.75% (95% CI 1.6-1.9), and 0.8% (95% CI 0.7-0.9), while the published number was 2.1% (95% CI 1.5-2.7). Bard provided discrete mathematical details of its calculations, however, when prompted further, admitted to having obtained the numbers from online sources, including our chosen reference articles, rather than from a <i>de novo</i> calculation. Both LLMs were able to produce a code (Python) to use on the downloaded NHANES data sets, however, these would not readily execute. <b><i>Conclusions:</i></b> ChatGPT and Bard are currently incapable of performing epidemiologic calculations and lack transparency and accountability. Caution should be used, particularly with Bard, as claims of its capabilities were convincingly misleading, and results were inconsistent.</p>","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140305822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2024-05-31DOI: 10.1089/end.2023.0740
Jixuan Leng, Junfei Liu, Galen Cheng, Haohan Wang, Scott Quarrier, Jiebo Luo, Rajat Jain
Introduction: Chemical composition analysis is important in prevention counseling for kidney stone disease. Advances in laser technology have made dusting techniques more prevalent, but this offers no consistent way to collect enough material to send for chemical analysis, leading many to forgo this test. We developed a novel machine learning (ML) model to effectively assess stone composition based on intraoperative endoscopic video data. Methods: Two endourologists performed ureteroscopy for kidney stones ≥ 10 mm. Representative videos were recorded intraoperatively. Individual frames were extracted from the videos, and the stone was outlined by human tracing. An ML model, UroSAM, was built and trained to automatically identify kidney stones in the images and predict the majority stone composition as follows: calcium oxalate monohydrate (COM), dihydrate (COD), calcium phosphate (CAP), or uric acid (UA). UroSAM was built on top of the publicly available Segment Anything Model (SAM) and incorporated a U-Net convolutional neural network (CNN). Discussion: A total of 78 ureteroscopy videos were collected; 50 were used for the model after exclusions (32 COM, 8 COD, 8 CAP, 2 UA). The ML model segmented the images with 94.77% precision. Dice coefficient (0.9135) and Intersection over Union (0.8496) confirmed good segmentation performance of the ML model. A video-wise evaluation demonstrated 60% correct classification of stone composition. Subgroup analysis showed correct classification in 84.4% of COM videos. A post hoc adaptive threshold technique was used to mitigate biasing of the model toward COM because of data imbalance; this improved the overall correct classification to 62% while improving the classification of COD, CAP, and UA videos. Conclusions: This study demonstrates the effective development of UroSAM, an ML model that precisely identifies kidney stones from natural endoscopic video data. More high-quality video data will improve the performance of the model in classifying the majority stone composition.
导言 化学成分分析对于肾结石疾病的预防咨询非常重要。激光技术的进步使除尘技术更加普及,但这种方法无法始终如一地收集足够的材料送去进行化学分析,导致许多人放弃了这项检查。我们开发了一种新型机器学习(ML)模型,可根据术中内窥镜视频数据有效评估结石成分。方法 两名内镜医师对≥ 10 毫米的肾结石进行输尿管镜检查。术中录制了具有代表性的视频。从视频中提取单帧图像,并通过人工追踪勾勒出结石的轮廓。建立并训练了一个 ML 模型 UroSAM,用于自动识别图像中的肾结石,并预测结石的主要成分:一水草酸钙 (COM)、二水草酸钙 (COD)、磷酸钙 (CAP) 或尿酸 (UA)。UroSAM 建立在公开可用的分段任意模型 (SAM) 基础上,并结合了 U-Net 卷积神经网络 (CNN)。讨论 共收集了 78 个输尿管镜检查视频,在排除了 32 个 COM、8 个 COD、8 个 CAP 和 2 个 UA 之后,有 50 个视频被用于该模型。ML 模型分割图像的精确度为 94.77%。Dice coefficient(0.9135)和 Intersection over Union(0.8496)证实了 ML 模型良好的分割性能。视频评估显示,对结石成分的正确分类率为 60%。分组分析表明,84.4% 的 COM 视频分类正确。采用了一种事后自适应阈值技术,以减轻由于数据不平衡而导致的模型对 COM 的偏差--这将整体正确分类率提高到了 62%,同时改善了 COD、CAP 和 UA 视频的分类。结论 本研究证明了 UroSAM 的成功开发,这是一种能从自然内窥镜视频数据中精确识别肾结石的 ML 模型。更多高质量的视频数据将提高该模型对大多数结石成分进行分类的性能。
{"title":"Development of UroSAM: A Machine Learning Model to Automatically Identify Kidney Stone Composition from Endoscopic Video.","authors":"Jixuan Leng, Junfei Liu, Galen Cheng, Haohan Wang, Scott Quarrier, Jiebo Luo, Rajat Jain","doi":"10.1089/end.2023.0740","DOIUrl":"10.1089/end.2023.0740","url":null,"abstract":"<p><p><b><i>Introduction:</i></b> Chemical composition analysis is important in prevention counseling for kidney stone disease. Advances in laser technology have made dusting techniques more prevalent, but this offers no consistent way to collect enough material to send for chemical analysis, leading many to forgo this test. We developed a novel machine learning (ML) model to effectively assess stone composition based on intraoperative endoscopic video data. <b><i>Methods:</i></b> Two endourologists performed ureteroscopy for kidney stones ≥ 10 mm. Representative videos were recorded intraoperatively. Individual frames were extracted from the videos, and the stone was outlined by human tracing. An ML model, UroSAM, was built and trained to automatically identify kidney stones in the images and predict the majority stone composition as follows: calcium oxalate monohydrate (COM), dihydrate (COD), calcium phosphate (CAP), or uric acid (UA). UroSAM was built on top of the publicly available Segment Anything Model (SAM) and incorporated a U-Net convolutional neural network (CNN). <b><i>Discussion:</i></b> A total of 78 ureteroscopy videos were collected; 50 were used for the model after exclusions (32 COM, 8 COD, 8 CAP, 2 UA). The ML model segmented the images with 94.77% precision. Dice coefficient (0.9135) and Intersection over Union (0.8496) confirmed good segmentation performance of the ML model. A video-wise evaluation demonstrated 60% correct classification of stone composition. Subgroup analysis showed correct classification in 84.4% of COM videos. A <i>post hoc</i> adaptive threshold technique was used to mitigate biasing of the model toward COM because of data imbalance; this improved the overall correct classification to 62% while improving the classification of COD, CAP, and UA videos. <b><i>Conclusions:</i></b> This study demonstrates the effective development of UroSAM, an ML model that precisely identifies kidney stones from natural endoscopic video data. More high-quality video data will improve the performance of the model in classifying the majority stone composition.</p>","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140957832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2024-07-03DOI: 10.1089/end.2023.0700
Catalina Solano, Nick Tarazona, Gabriela Prieto Angarita, Andrea Ascencio Medina, Saralia Ruiz, Valentina Melo Pedroza, Olivier Traxer
Background: Among emerging AI technologies, Chat-Generative Pre-Trained Transformer (ChatGPT) emerges as a notable language model, uniquely developed through artificial intelligence research. Its proven versatility across various domains, from language translation to healthcare data processing, underscores its promise within medical documentation, diagnostics, research, and education. The current comprehensive review aimed to investigate the utility of ChatGPT in urology education and practice and to highlight its potential limitations. Methods: The authors conducted a comprehensive literature review of the use of ChatGPT and its applications in urology education, research, and practice. Through a systematic review of the literature, with a search strategy using databases, such as PubMed and Embase, we analyzed the advantages and limitations of using ChatGPT in urology and evaluated its potential impact. Results: A total of 78 records were eligible for inclusion. The benefits of ChatGPT were frequently cited across various contexts. In educational/academic benefits mentioned in 21 records (87.5%), ChatGPT showed the ability to assist urologists by offering precise information and responding to inquiries derived from patient data analysis, thereby supporting decision making; in 18 records (75%), advantages comprised personalized medicine, predictive capabilities for disease risks and outcomes, streamlining clinical workflows and improved diagnostics. Nevertheless, apprehensions were expressed regarding potential misinformation, underscoring the necessity for human supervision to guarantee patient safety and address ethical concerns. Conclusion: The potential applications of ChatGPT hold the capacity to bring about transformative changes in urology education, research, and practice. AI technology can serve as a useful tool to augment human intelligence; however, it is essential to use it in a responsible and ethical manner.
{"title":"ChatGPT in Urology: Bridging Knowledge and Practice for Tomorrow's Healthcare, a Comprehensive Review.","authors":"Catalina Solano, Nick Tarazona, Gabriela Prieto Angarita, Andrea Ascencio Medina, Saralia Ruiz, Valentina Melo Pedroza, Olivier Traxer","doi":"10.1089/end.2023.0700","DOIUrl":"10.1089/end.2023.0700","url":null,"abstract":"<p><p><b><i>Background:</i></b> Among emerging AI technologies, Chat-Generative Pre-Trained Transformer (ChatGPT) emerges as a notable language model, uniquely developed through artificial intelligence research. Its proven versatility across various domains, from language translation to healthcare data processing, underscores its promise within medical documentation, diagnostics, research, and education. The current comprehensive review aimed to investigate the utility of ChatGPT in urology education and practice and to highlight its potential limitations. <b><i>Methods:</i></b> The authors conducted a comprehensive literature review of the use of ChatGPT and its applications in urology education, research, and practice. Through a systematic review of the literature, with a search strategy using databases, such as PubMed and Embase, we analyzed the advantages and limitations of using ChatGPT in urology and evaluated its potential impact. <b><i>Results:</i></b> A total of 78 records were eligible for inclusion. The benefits of ChatGPT were frequently cited across various contexts. In educational/academic benefits mentioned in 21 records (87.5%), ChatGPT showed the ability to assist urologists by offering precise information and responding to inquiries derived from patient data analysis, thereby supporting decision making; in 18 records (75%), advantages comprised personalized medicine, predictive capabilities for disease risks and outcomes, streamlining clinical workflows and improved diagnostics. Nevertheless, apprehensions were expressed regarding potential misinformation, underscoring the necessity for human supervision to guarantee patient safety and address ethical concerns. <b><i>Conclusion:</i></b> The potential applications of ChatGPT hold the capacity to bring about transformative changes in urology education, research, and practice. AI technology can serve as a useful tool to augment human intelligence; however, it is essential to use it in a responsible and ethical manner.</p>","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141317401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin Shee, Andrew W Liu, Carter Chan, Heiko Yang, Wilson Sui, Manoj Desai, Sunita Ho, Thomas Chi, Marshall L Stoller
Objectives: The absence of predictive markers for kidney stone recurrence poses a challenge for the clinical management of stone disease. The unpredictability of stone events is also a significant limitation for clinical trials, where many patients must be enrolled to obtain sufficient stone events for analysis. In this study, we sought to use machine learning methods to identify a novel algorithm to predict stone recurrence. Subjects/Patients and Methods: Patients enrolled in the Registry for Stones of the Kidney and Ureter (ReSKU), a registry of nephrolithiasis patients collected between 2015-2020, with at least one prospectively collected 24-hour urine test (Litholink 24-hour urine test; Labcorp) were included in the training set. A validation set was obtained from chart review of stone patients not enrolled in ReSKU with 24-hour urine data. Stone events were defined as either an office visit where a patient reports symptomatic passage of stones or a surgical procedure for stone removal. Seven prediction classification methods were evaluated. Predictive analyses and receiver operator characteristics (ROC) curve generation were performed in R. Results: A training set of 423 kidney stone patients with stone event data and 24-hour urine samples were trained using the prediction classification methods. The highest performing prediction model was a Logistic Regression with ElasticNet machine learning model (area under curve [AUC] = 0.65). Restricting analysis to high confidence predictions significantly improved model accuracy (AUC = 0.82). The prediction model was validated on a validation set of 172 stone patients with stone event data and 24-hour urine samples. Prediction accuracy in the validation set demonstrated moderate discriminative ability (AUC = 0.64). Repeat modeling was performed with four of the highest scoring features, and ROC analyses demonstrated minimal loss in accuracy (AUC = 0.63). Conclusion: Machine-learning models based on 24-hour urine data can predict stone recurrences with a moderate degree of accuracy.
{"title":"A Novel Machine-Learning Algorithm to Predict Stone Recurrence with 24-Hour Urine Data.","authors":"Kevin Shee, Andrew W Liu, Carter Chan, Heiko Yang, Wilson Sui, Manoj Desai, Sunita Ho, Thomas Chi, Marshall L Stoller","doi":"10.1089/end.2023.0457","DOIUrl":"https://doi.org/10.1089/end.2023.0457","url":null,"abstract":"<p><p><b><i>Objectives:</i></b> The absence of predictive markers for kidney stone recurrence poses a challenge for the clinical management of stone disease. The unpredictability of stone events is also a significant limitation for clinical trials, where many patients must be enrolled to obtain sufficient stone events for analysis. In this study, we sought to use machine learning methods to identify a novel algorithm to predict stone recurrence. <b><i>Subjects/Patients and Methods:</i></b> Patients enrolled in the Registry for Stones of the Kidney and Ureter (ReSKU), a registry of nephrolithiasis patients collected between 2015-2020, with at least one prospectively collected 24-hour urine test (Litholink 24-hour urine test; Labcorp) were included in the training set. A validation set was obtained from chart review of stone patients not enrolled in ReSKU with 24-hour urine data. Stone events were defined as either an office visit where a patient reports symptomatic passage of stones or a surgical procedure for stone removal. Seven prediction classification methods were evaluated. Predictive analyses and receiver operator characteristics (ROC) curve generation were performed in R. <b><i>Results:</i></b> A training set of 423 kidney stone patients with stone event data and 24-hour urine samples were trained using the prediction classification methods. The highest performing prediction model was a Logistic Regression with ElasticNet machine learning model (area under curve [AUC] = 0.65). Restricting analysis to high confidence predictions significantly improved model accuracy (AUC = 0.82). The prediction model was validated on a validation set of 172 stone patients with stone event data and 24-hour urine samples. Prediction accuracy in the validation set demonstrated moderate discriminative ability (AUC = 0.64). Repeat modeling was performed with four of the highest scoring features, and ROC analyses demonstrated minimal loss in accuracy (AUC = 0.63). <b><i>Conclusion:</i></b> Machine-learning models based on 24-hour urine data can predict stone recurrences with a moderate degree of accuracy.</p>","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141909806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlotta Nedbal, Clara Cerrato, Victoria Jahrreiss, Amelia Pietropaolo, Andrea Benedetto Galosi, Daniele Castellani, Bhaskar Kumar Somani
Purpose: To analyze the bibliometric publication trend on the application of "Artificial Intelligence (AI) and its subsets (Machine Learning-ML, Virtual reality-VR, Radiomics) in Urolithiasis" over 3 decades. We looked at the publication trends associated with AI and stone disease, including both clinical and surgical applications, and training in endourology. Methods: Through a MeshTerms research on PubMed, we performed a comprehensive review from 1994-2023 for all published articles on "AI, ML, VR, and Radiomics." Articles were then divided into three categories as follows: A-Clinical (Nonsurgical), B-Clinical (Surgical), and C-Training articles, and articles were then assigned to following three periods: Period-1 (1994-2003), Period-2 (2004-2013), and Period-3 (2014-2023). Results: A total of 343 articles were noted (Groups A-129, B-163, and C-51), and trends increased from Period-1 to Period-2 at 123% (p = 0.009) and to period-3 at 453% (p = 0.003). This increase from Period-2 to Period-3 for groups A, B, and C was 476% (p = 0.019), 616% (0.001), and 185% (p < 0.001), respectively. Group A articles included rise in articles on "stone characteristics" (+2100%; p = 0.011), "renal function" (p = 0.002), "stone diagnosis" (+192%), "prediction of stone passage" (+400%), and "quality of life" (+1000%). Group B articles included rise in articles on "URS" (+2650%, p = 0.008), "PCNL"(+600%, p = 0.001), and "SWL" (+650%, p = 0.018). Articles on "Targeting" (+453%, p < 0.001), "Outcomes" (+850%, p = 0.013), and "Technological Innovation" (p = 0.0311) had rising trends. Group C articles included rise in articles on "PCNL" (+300%, p = 0.039) and "URS" (+188%, p = 0.003). Conclusion: Publications on AI and its subset areas for urolithiasis have seen an exponential increase over the last decade, with an increase in surgical and nonsurgical clinical areas, as well as in training. Future AI related growth in the field of endourology and urolithiasis is likely to improve training, patient centered decision-making, and clinical outcomes.
{"title":"Trends of \"Artificial Intelligence, Machine Learning, Virtual Reality, and Radiomics in Urolithiasis\" over the Last 30 Years (1994-2023) as Published in the Literature (PubMed): A Comprehensive Review.","authors":"Carlotta Nedbal, Clara Cerrato, Victoria Jahrreiss, Amelia Pietropaolo, Andrea Benedetto Galosi, Daniele Castellani, Bhaskar Kumar Somani","doi":"10.1089/end.2023.0263","DOIUrl":"10.1089/end.2023.0263","url":null,"abstract":"<p><p><b><i>Purpose:</i></b> To analyze the bibliometric publication trend on the application of \"Artificial Intelligence (AI) and its subsets (Machine Learning-ML, Virtual reality-VR, Radiomics) in Urolithiasis\" over 3 decades. We looked at the publication trends associated with AI and stone disease, including both clinical and surgical applications, and training in endourology. <b><i>Methods:</i></b> Through a MeshTerms research on PubMed, we performed a comprehensive review from 1994-2023 for all published articles on \"AI, ML, VR, and Radiomics.\" Articles were then divided into three categories as follows: A-Clinical (Nonsurgical), B-Clinical (Surgical), and C-Training articles, and articles were then assigned to following three periods: Period-1 (1994-2003), Period-2 (2004-2013), and Period-3 (2014-2023). <b><i>Results:</i></b> A total of 343 articles were noted (Groups A-129, B-163, and C-51), and trends increased from Period-1 to Period-2 at 123% (<i>p</i> = 0.009) and to period-3 at 453% (<i>p</i> = 0.003). This increase from Period-2 to Period-3 for groups A, B, and C was 476% (<i>p</i> = 0.019), 616% (0.001), and 185% (<i>p</i> < 0.001), respectively. Group A articles included rise in articles on \"stone characteristics\" (+2100%; <i>p</i> = 0.011), \"renal function\" (<i>p</i> = 0.002), \"stone diagnosis\" (+192%), \"prediction of stone passage\" (+400%), and \"quality of life\" (+1000%). Group B articles included rise in articles on \"URS\" (+2650%, <i>p</i> = 0.008), \"PCNL\"(+600%, <i>p</i> = 0.001), and \"SWL\" (+650%, <i>p</i> = 0.018). Articles on \"Targeting\" (+453%, <i>p</i> < 0.001), \"Outcomes\" (+850%, <i>p</i> = 0.013), and \"Technological Innovation\" (<i>p</i> = 0.0311) had rising trends. Group C articles included rise in articles on \"PCNL\" (+300%, <i>p</i> = 0.039) and \"URS\" (+188%, <i>p</i> = 0.003). <b><i>Conclusion:</i></b> Publications on AI and its subset areas for urolithiasis have seen an exponential increase over the last decade, with an increase in surgical and nonsurgical clinical areas, as well as in training. Future AI related growth in the field of endourology and urolithiasis is likely to improve training, patient centered decision-making, and clinical outcomes.</p>","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"54229309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2024-05-17DOI: 10.1089/end.2023.0696
Christopher Connors, Kavita Gupta, Johnathan A Khusid, Raymond Khargi, Alan J Yaghoubian, Micah Levy, Blair Gallante, William Atallah, Mantu Gupta
Introduction: Artificial intelligence (AI) platforms such as ChatGPT and Bard are increasingly utilized to answer patient health care questions. We present the first study to blindly evaluate AI-generated responses to common endourology patient questions against official patient education materials. Methods: Thirty-two questions and answers spanning kidney stones, ureteral stents, benign prostatic hyperplasia (BPH), and upper tract urothelial carcinoma were extracted from official Urology Care Foundation (UCF) patient education documents. The same questions were input into ChatGPT 4.0 and Bard, limiting responses to within ±10% of the word count of the corresponding UCF response to ensure fair comparison. Six endourologists blindly evaluated responses from each platform using Likert scales for accuracy, clarity, comprehensiveness, and patient utility. Reviewers identified which response they believed was not AI generated. Finally, Flesch-Kincaid Reading Grade Level formulas assessed the readability of each platform response. Ratings were compared using analysis of variance (ANOVA) and chi-square tests. Results: ChatGPT responses were rated the highest across all categories, including accuracy, comprehensiveness, clarity, and patient utility, while UCF answers were consistently scored the lowest, all p < 0.01. A subanalysis revealed that this trend was consistent across question categories (i.e., kidney stones, BPH, etc.). However, AI-generated responses were more likely to be classified at an advanced reading level, while UCF responses showed improved readability (college or higher reading level: ChatGPT = 100%, Bard = 66%, and UCF = 19%), p < 0.001. When asked to identify which answer was not AI generated, 54.2% of responses indicated ChatGPT, 26.6% indicated Bard, and only 19.3% correctly identified it as the UCF response. Conclusions: In a blind evaluation, AI-generated responses from ChatGPT and Bard surpassed the quality of official patient education materials in endourology, suggesting that current AI platforms are already a reliable resource for basic urologic care information. AI-generated responses do, however, tend to require a higher reading level, which may limit their applicability to a broader audience.
{"title":"Evaluation of the Current Status of Artificial Intelligence for Endourology Patient Education: A Blind Comparison of ChatGPT and Google Bard Against Traditional Information Resources.","authors":"Christopher Connors, Kavita Gupta, Johnathan A Khusid, Raymond Khargi, Alan J Yaghoubian, Micah Levy, Blair Gallante, William Atallah, Mantu Gupta","doi":"10.1089/end.2023.0696","DOIUrl":"10.1089/end.2023.0696","url":null,"abstract":"<p><p><b><i>Introduction:</i></b> Artificial intelligence (AI) platforms such as ChatGPT and Bard are increasingly utilized to answer patient health care questions. We present the first study to blindly evaluate AI-generated responses to common endourology patient questions against official patient education materials. <b><i>Methods:</i></b> Thirty-two questions and answers spanning kidney stones, ureteral stents, benign prostatic hyperplasia (BPH), and upper tract urothelial carcinoma were extracted from official Urology Care Foundation (UCF) patient education documents. The same questions were input into ChatGPT 4.0 and Bard, limiting responses to within ±10% of the word count of the corresponding UCF response to ensure fair comparison. Six endourologists blindly evaluated responses from each platform using Likert scales for accuracy, clarity, comprehensiveness, and patient utility. Reviewers identified which response they believed was not AI generated. Finally, Flesch-Kincaid Reading Grade Level formulas assessed the readability of each platform response. Ratings were compared using analysis of variance (ANOVA) and chi-square tests. <b><i>Results:</i></b> ChatGPT responses were rated the highest across all categories, including accuracy, comprehensiveness, clarity, and patient utility, while UCF answers were consistently scored the lowest, all <i>p</i> < 0.01. A subanalysis revealed that this trend was consistent across question categories (i.e., kidney stones, BPH, etc.). However, AI-generated responses were more likely to be classified at an advanced reading level, while UCF responses showed improved readability (college or higher reading level: ChatGPT = 100%, Bard = 66%, and UCF = 19%), <i>p</i> < 0.001. When asked to identify which answer was not AI generated, 54.2% of responses indicated ChatGPT, 26.6% indicated Bard, and only 19.3% correctly identified it as the UCF response. <b><i>Conclusions:</i></b> In a blind evaluation, AI-generated responses from ChatGPT and Bard surpassed the quality of official patient education materials in endourology, suggesting that current AI platforms are already a reliable resource for basic urologic care information. AI-generated responses do, however, tend to require a higher reading level, which may limit their applicability to a broader audience.</p>","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140028206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2024-05-16DOI: 10.1089/end.2023.0488
Ramon Luis Correa-Medero, Jiwoong Jeong, Bhavik Patel, Imon Banerjee, Haidar Abdul-Muhsin
Background: Differential kidney function assessment is an important part of preoperative evaluation of various urological interventions. It is obtained through dedicated nuclear medical imaging and is not yet implemented through conventional Imaging. Objective: We assess if differential kidney function can be obtained through evaluation of contrast-enhanced computed tomography(CT) using a combination of deep learning and (2D and 3D) radiomic features. Methods: All patients who underwent kidney nuclear scanning at Mayo Clinic sites between 2018-2022 were collected. CT scans of the kidneys were obtained within a 3-month interval before or after the nuclear scans were extracted. Patients who underwent a urological or radiological intervention within this time frame were excluded. A segmentation model was used to segment both kidneys. 2D and 3D radiomics features were extracted and compared between the two kidneys to compute delta radiomics and assess its ability to predict differential kidney function. Performance was reported using receiver operating characteristics, sensitivity, and specificity. Results: Studies from Arizona & Rochester formed our internal dataset (n = 1,159). Studies from Florida were separately processed as an external test set to validate generalizability. We obtained 323 studies from our internal sites and 39 studies from external sites. The best results were obtained by a random forest model trained on 3D delta radiomics features. This model achieved an area under curve (AUC) of 0.85 and 0.81 on internal and external test sets, while specificity and sensitivity were 0.84,0.68 on the internal set, 0.70, and 0.65 on the external set. Conclusion: This proposed automated pipeline can derive important differential kidney function information from contrast-enhanced CT and reduce the need for dedicated nuclear scans for early-stage differential kidney functional assessment. Clinical Impact: We establish a machine learning methodology for assessing differential kidney function from routine CT without the need for expensive and radioactive nuclear medicine scans.
{"title":"Automated Analysis of Split Kidney Function from CT Scans Using Deep Learning and Delta Radiomics.","authors":"Ramon Luis Correa-Medero, Jiwoong Jeong, Bhavik Patel, Imon Banerjee, Haidar Abdul-Muhsin","doi":"10.1089/end.2023.0488","DOIUrl":"10.1089/end.2023.0488","url":null,"abstract":"<p><p><b><i>Background:</i></b> Differential kidney function assessment is an important part of preoperative evaluation of various urological interventions. It is obtained through dedicated nuclear medical imaging and is not yet implemented through conventional Imaging. <b><i>Objective:</i></b> We assess if differential kidney function can be obtained through evaluation of contrast-enhanced computed tomography(CT) using a combination of deep learning and (2D and 3D) radiomic features. <b><i>Methods:</i></b> All patients who underwent kidney nuclear scanning at Mayo Clinic sites between 2018-2022 were collected. CT scans of the kidneys were obtained within a 3-month interval before or after the nuclear scans were extracted. Patients who underwent a urological or radiological intervention within this time frame were excluded. A segmentation model was used to segment both kidneys. 2D and 3D radiomics features were extracted and compared between the two kidneys to compute delta radiomics and assess its ability to predict differential kidney function. Performance was reported using receiver operating characteristics, sensitivity, and specificity. <b><i>Results:</i></b> Studies from Arizona & Rochester formed our internal dataset (<i>n</i> = 1,159). Studies from Florida were separately processed as an external test set to validate generalizability. We obtained 323 studies from our internal sites and 39 studies from external sites. The best results were obtained by a random forest model trained on 3D delta radiomics features. This model achieved an area under curve (AUC) of 0.85 and 0.81 on internal and external test sets, while specificity and sensitivity were 0.84,0.68 on the internal set, 0.70, and 0.65 on the external set. <b><i>Conclusion:</i></b> This proposed automated pipeline can derive important differential kidney function information from contrast-enhanced CT and reduce the need for dedicated nuclear scans for early-stage differential kidney functional assessment. <b><i>Clinical Impact:</i></b> We establish a machine learning methodology for assessing differential kidney function from routine CT without the need for expensive and radioactive nuclear medicine scans.</p>","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140863086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background/Aim: To evaluate the performance of Chat Generative Pre-trained Transformer (ChatGPT), a large language model trained by Open artificial intelligence. Materials and Methods: This study has three main steps to evaluate the effectiveness of ChatGPT in the urologic field. The first step involved 35 questions from our institution's experts, who have at least 10 years of experience in their fields. The responses of ChatGPT versions were qualitatively compared with the responses of urology residents to the same questions. The second step assesses the reliability of ChatGPT versions in answering current debate topics. The third step was to assess the reliability of ChatGPT versions in providing medical recommendations and directives to patients' commonly asked questions during the outpatient and inpatient clinic. Results: In the first step, version 4 provided correct answers to 25 questions out of 35 while version 3.5 provided only 19 (71.4% vs 54%). It was observed that residents in their last year of education in our clinic also provided a mean of 25 correct answers, and 4th year residents provided a mean of 19.3 correct responses. The second step involved evaluating the response of both versions to debate situations in urology, and it was found that both versions provided variable and inappropriate results. In the last step, both versions had a similar success rate in providing recommendations and guidance to patients based on expert ratings. Conclusion: The difference between the two versions of the 35 questions in the first step of the study was thought to be due to the improvement of ChatGPT's literature and data synthesis abilities. It may be a logical approach to use ChatGPT versions to inform the nonhealth care providers' questions with quick and safe answers but should not be used to as a diagnostic tool or make a choice among different treatment modalities.
{"title":"Evaluating the Performance of ChatGPT in Urology: A Comparative Study of Knowledge Interpretation and Patient Guidance.","authors":"Bahadır Şahin, Yunus Emre Genç, Kader Doğan, Tarık Emre Şener, Çağrı Akın Şekerci, Yılören Tanıdır, Selçuk Yücel, Tufan Tarcan, Haydar Kamil Çam","doi":"10.1089/end.2023.0413","DOIUrl":"10.1089/end.2023.0413","url":null,"abstract":"<p><p><b><i>Background/Aim:</i></b> To evaluate the performance of Chat Generative Pre-trained Transformer (ChatGPT), a large language model trained by Open artificial intelligence. <b><i>Materials and Methods:</i></b> This study has three main steps to evaluate the effectiveness of ChatGPT in the urologic field. The first step involved 35 questions from our institution's experts, who have at least 10 years of experience in their fields. The responses of ChatGPT versions were qualitatively compared with the responses of urology residents to the same questions. The second step assesses the reliability of ChatGPT versions in answering current debate topics. The third step was to assess the reliability of ChatGPT versions in providing medical recommendations and directives to patients' commonly asked questions during the outpatient and inpatient clinic. <b><i>Results:</i></b> In the first step, version 4 provided correct answers to 25 questions out of 35 while version 3.5 provided only 19 (71.4% <i>vs</i> 54%). It was observed that residents in their last year of education in our clinic also provided a mean of 25 correct answers, and 4th year residents provided a mean of 19.3 correct responses. The second step involved evaluating the response of both versions to debate situations in urology, and it was found that both versions provided variable and inappropriate results. In the last step, both versions had a similar success rate in providing recommendations and guidance to patients based on expert ratings. <b><i>Conclusion:</i></b> The difference between the two versions of the 35 questions in the first step of the study was thought to be due to the improvement of ChatGPT's literature and data synthesis abilities. It may be a logical approach to use ChatGPT versions to inform the nonhealth care providers' questions with quick and safe answers but should not be used to as a diagnostic tool or make a choice among different treatment modalities.</p>","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141179798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2024-05-30DOI: 10.1089/end.2023.0446
John A Chmiel, Gerrit A Stuivenberg, Jennifer F W Wong, Linda Nott, Jeremy P Burton, Hassan Razvi, Jennifer Bjazevic
Purpose: Preventative strategies and surgical treatments for urolithiasis depend on stone composition. However, stone composition is often unknown until the stone is passed or surgically managed. Given that stone composition likely reflects the physiological parameters during its formation, we used clinical data from stone formers to predict stone composition. Materials and Methods: Data on stone composition, 24-hour urine, serum biochemistry, patient demographics, and medical history were prospectively collected from 777 kidney stone patients. Data were used to train gradient boosted machine and logistic regression models to distinguish calcium vs noncalcium, calcium oxalate monohydrate vs dihydrate, and calcium oxalate vs calcium phosphate vs uric acid stone types. Model performance was evaluated using the kappa score, and the influence of each predictor variable was assessed. Results: The calcium vs noncalcium model differentiated stone types with a kappa of 0.5231. The most influential predictors were 24-hour urine calcium, blood urate, and phosphate. The calcium oxalate monohydrate vs dihydrate model is the first of its kind and could discriminate stone types with a kappa of 0.2042. The key predictors were 24-hour urine urea, calcium, and oxalate. The multiclass model had a kappa of 0.3023 and the top predictors were age and 24-hour urine calcium and creatinine. Conclusions: Clinical data can be leveraged with machine learning algorithms to predict stone composition, which may help urologists determine stone type and guide their management plan before stone treatment. Investigating the most influential predictors of each classifier may improve the understanding of key clinical features of urolithiasis and shed light on pathophysiology.
{"title":"Predictive Modeling of Urinary Stone Composition Using Machine Learning and Clinical Data: Implications for Treatment Strategies and Pathophysiological Insights.","authors":"John A Chmiel, Gerrit A Stuivenberg, Jennifer F W Wong, Linda Nott, Jeremy P Burton, Hassan Razvi, Jennifer Bjazevic","doi":"10.1089/end.2023.0446","DOIUrl":"10.1089/end.2023.0446","url":null,"abstract":"<p><p><b><i>Purpose:</i></b> Preventative strategies and surgical treatments for urolithiasis depend on stone composition. However, stone composition is often unknown until the stone is passed or surgically managed. Given that stone composition likely reflects the physiological parameters during its formation, we used clinical data from stone formers to predict stone composition. <b><i>Materials and Methods:</i></b> Data on stone composition, 24-hour urine, serum biochemistry, patient demographics, and medical history were prospectively collected from 777 kidney stone patients. Data were used to train gradient boosted machine and logistic regression models to distinguish calcium <i>vs</i> noncalcium, calcium oxalate monohydrate <i>vs</i> dihydrate, and calcium oxalate <i>vs</i> calcium phosphate <i>vs</i> uric acid stone types. Model performance was evaluated using the kappa score, and the influence of each predictor variable was assessed. <b><i>Results:</i></b> The calcium <i>vs</i> noncalcium model differentiated stone types with a kappa of 0.5231. The most influential predictors were 24-hour urine calcium, blood urate, and phosphate. The calcium oxalate monohydrate <i>vs</i> dihydrate model is the first of its kind and could discriminate stone types with a kappa of 0.2042. The key predictors were 24-hour urine urea, calcium, and oxalate. The multiclass model had a kappa of 0.3023 and the top predictors were age and 24-hour urine calcium and creatinine. <b><i>Conclusions:</i></b> Clinical data can be leveraged with machine learning algorithms to predict stone composition, which may help urologists determine stone type and guide their management plan before stone treatment. Investigating the most influential predictors of each classifier may improve the understanding of key clinical features of urolithiasis and shed light on pathophysiology.</p>","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136397694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}