首页 > 最新文献

JMIR AI最新文献

英文 中文
Leveraging Large Language Models and Machine Learning for Success Analysis in Robust Cancer Crowdfunding Predictions: Quantitative Study. 利用大型语言模型和机器学习在稳健的癌症众筹预测中成功分析:定量研究。
IF 2 Pub Date : 2025-11-19 DOI: 10.2196/73448
Runa Bhaumik, Abhishikta Roy, Vineet Srivastava, Lokesh Boggavarapu, Ranganathan Chandrasekaran, Edward K Mensah, John Galvin

Background: Recent advances in large language models (LLMs) such as GPT-4o offer a transformative opportunity to extract nuanced linguistic, emotional, and social features from medical crowdfunding campaign texts at scale. These models enable a deeper understanding of the factors influencing campaign success far beyond what structured data alone can reveal. Given these advancements, there is a pressing need for an integrated modeling framework that leverages both LLM-derived features and machine learning algorithms to more accurately predict and explain success in medical crowdfunding.

Objective: This study addressed the gap of failure to capture the deeper psychosocial and clinical nuances that influence campaign success. It leveraged cutting-edge machine learning techniques alongside state-of-the-art LLMs such as GPT-4o to automatically generate and extract nuanced linguistic, social, and clinical features from campaign narratives. By combining these features with ensemble learning approaches, the proposed methodology offers a novel and more comprehensive strategy for understanding and predicting crowdfunding success in the medical domain.

Methods: We used GPT-4o to extract linguistic and social determinants of health features from cancer crowdfunding campaign narratives. A random forest model with permutation importance was applied to rank features based on their contribution to predicting campaign success. Four machine learning algorithms-random forest, gradient boosting, logistic regression, and elastic net-were evaluated using stratified 10-fold cross-validation, with performance measured through accuracy, sensitivity, and specificity.

Results: Gradient boosting consistently outperformed the other algorithms in terms of sensitivity (consistently 0.786 to 0.798), indicating its superior ability to identify successful crowdfunding campaigns using linguistic and social determinants of health features. The permutation importance score revealed that for severe medical conditions, income loss, chemotherapy treatment, clear and effective communication, cognitive understanding, family involvement, empathy, and social behaviors play an important role in the success of campaigns.

Conclusions: This study demonstrates that LLMs such as GPT-4o can effectively extract nuanced linguistic and social features from crowdfunding narratives, offering deeper insights than traditional methods. These features, when combined with machine learning, significantly improve the identification of key predictors of campaign success, such as medical severity, financial hardship, and empathetic communication. Our findings underscore the potential of LLMs to enhance predictive modeling in health-related crowdfunding and support more targeted policy and communication strategies to reduce financial vulnerability among patients with cancer.

背景:大型语言模型(llm)的最新进展,如gpt - 40,为大规模地从医疗众筹活动文本中提取细微的语言、情感和社会特征提供了革命性的机会。这些模型使我们能够更深入地了解影响活动成功的因素,而不仅仅是结构化数据所能揭示的。鉴于这些进步,迫切需要一个集成的建模框架,利用法学硕士衍生的特征和机器学习算法,更准确地预测和解释医疗众筹的成功。目的:本研究解决了未能捕捉影响运动成功的更深层次的社会心理和临床细微差别的差距。它利用尖端的机器学习技术和最先进的法学硕士(如gpt - 40),从竞选叙事中自动生成和提取细微的语言、社会和临床特征。通过将这些特征与集成学习方法相结合,所提出的方法为理解和预测医疗领域的众筹成功提供了一种新颖且更全面的策略。方法:我们使用gpt - 40从癌症众筹活动叙事中提取健康特征的语言和社会决定因素。一个具有排列重要性的随机森林模型被应用于基于它们对预测活动成功的贡献的特征排序。四种机器学习算法——随机森林、梯度增强、逻辑回归和弹性网络——使用分层10倍交叉验证进行评估,并通过准确性、灵敏度和特异性来衡量性能。结果:梯度增强在灵敏度方面始终优于其他算法(始终为0.786至0.798),表明其在使用健康特征的语言和社会决定因素识别成功众筹活动方面的卓越能力。排列重要性评分显示,对于严重的医疗状况,收入损失,化疗,清晰有效的沟通,认知理解,家庭参与,移情和社会行为对运动的成功起着重要作用。结论:本研究表明,像gpt - 40这样的法学硕士可以有效地从众筹叙事中提取细微的语言和社会特征,提供比传统方法更深入的见解。这些功能与机器学习相结合,可以显著提高对活动成功关键预测因素的识别,例如医疗严重程度、经济困难和移情沟通。我们的研究结果强调了法学硕士在增强与健康相关的众筹预测建模方面的潜力,并支持更有针对性的政策和沟通策略,以减少癌症患者的财务脆弱性。
{"title":"Leveraging Large Language Models and Machine Learning for Success Analysis in Robust Cancer Crowdfunding Predictions: Quantitative Study.","authors":"Runa Bhaumik, Abhishikta Roy, Vineet Srivastava, Lokesh Boggavarapu, Ranganathan Chandrasekaran, Edward K Mensah, John Galvin","doi":"10.2196/73448","DOIUrl":"10.2196/73448","url":null,"abstract":"<p><strong>Background: </strong>Recent advances in large language models (LLMs) such as GPT-4o offer a transformative opportunity to extract nuanced linguistic, emotional, and social features from medical crowdfunding campaign texts at scale. These models enable a deeper understanding of the factors influencing campaign success far beyond what structured data alone can reveal. Given these advancements, there is a pressing need for an integrated modeling framework that leverages both LLM-derived features and machine learning algorithms to more accurately predict and explain success in medical crowdfunding.</p><p><strong>Objective: </strong>This study addressed the gap of failure to capture the deeper psychosocial and clinical nuances that influence campaign success. It leveraged cutting-edge machine learning techniques alongside state-of-the-art LLMs such as GPT-4o to automatically generate and extract nuanced linguistic, social, and clinical features from campaign narratives. By combining these features with ensemble learning approaches, the proposed methodology offers a novel and more comprehensive strategy for understanding and predicting crowdfunding success in the medical domain.</p><p><strong>Methods: </strong>We used GPT-4o to extract linguistic and social determinants of health features from cancer crowdfunding campaign narratives. A random forest model with permutation importance was applied to rank features based on their contribution to predicting campaign success. Four machine learning algorithms-random forest, gradient boosting, logistic regression, and elastic net-were evaluated using stratified 10-fold cross-validation, with performance measured through accuracy, sensitivity, and specificity.</p><p><strong>Results: </strong>Gradient boosting consistently outperformed the other algorithms in terms of sensitivity (consistently 0.786 to 0.798), indicating its superior ability to identify successful crowdfunding campaigns using linguistic and social determinants of health features. The permutation importance score revealed that for severe medical conditions, income loss, chemotherapy treatment, clear and effective communication, cognitive understanding, family involvement, empathy, and social behaviors play an important role in the success of campaigns.</p><p><strong>Conclusions: </strong>This study demonstrates that LLMs such as GPT-4o can effectively extract nuanced linguistic and social features from crowdfunding narratives, offering deeper insights than traditional methods. These features, when combined with machine learning, significantly improve the identification of key predictors of campaign success, such as medical severity, financial hardship, and empathetic communication. Our findings underscore the potential of LLMs to enhance predictive modeling in health-related crowdfunding and support more targeted policy and communication strategies to reduce financial vulnerability among patients with cancer.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e73448"},"PeriodicalIF":2.0,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12629620/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of Medical Misinformation in Hemangioma Patient Education: Comparative Study of ChatGPT-4o and DeepSeek-R1 Large Language Models. 血管瘤患者教育中医学错误信息的检测:chatgpt - 40和DeepSeek-R1大语言模型的比较研究
IF 2 Pub Date : 2025-11-18 DOI: 10.2196/76372
Guoyong Wang, Ye Zhang, Weixin Wang, Yingjie Zhu, Wei Lu, Chaonan Wang, Hui Bi, Xiaonan Yang

Background: This study examines the capability of large language models (LLMs) in detecting medical rumors, using hemangioma-related information as an example. It compares the performances of ChatGPT-4o and DeepSeek-R1.

Objective: This study aimed to evaluate and compare the accuracy, stability, and expert-rated reliability of 2 LLMs, ChatGPT-4o and DeepSeek-R1, in classifying medical information related to hemangiomas as either "rumors" or "accurate information."

Methods: We collected 82 publicly available texts from social media platforms, medical education websites, international guidelines, and journals. Of the 82 items, 47/82 (57%) were labeled as "rumors," and 35/82 (43%) were labeled as "accurate information." Three vascular anomaly specialists with extensive clinical experience independently annotated the texts in a double-blinded manner, and disagreements were resolved by arbitration to ensure labeling reliability. Subsequently, these texts were input into ChatGPT-4o and DeepSeek-R1, with each model generating 2 rounds of results under identical instructions. Output stability was assessed using bidirectional encoder representations from transformers-based semantic similarity scores. Classification accuracy, precision, recall, and F1-score were calculated to evaluate the performance. Additionally, 2 medical experts independently rated the model outputs using a 5-point scale based on clinical guidelines. Statistical analyses included paired t tests, Wilcoxon signed-rank tests, and bootstrap resampling to compute confidence intervals.

Results: In terms of semantic stability, the similarity distributions for the 2 models largely overlapped, with no statistically significant difference observed (mean difference=-0.003, 95% CI -0.011 to 0.005; P=.30). Regarding classification performance, DeepSeek-R1 achieved higher accuracy (0.963) compared to ChatGPT-4o (0.910), and also performed better in terms of precision (0.978 vs 0.940), recall (0.957 vs 0.894), and F1-score (0.967 vs 0.916). Expert evaluations revealed that DeepSeek-R1 significantly outperformed ChatGPT-4o on both "rumor" items (mean difference=0.431; P<.001; Cohen dz=0.594) and "accurate information" items (mean difference=0.264; P=.045; Cohen dz=0.352), with a particularly pronounced advantage in rumor detection.

Conclusions: DeepSeek-R1 demonstrated greater accuracy and rationale in detecting medical rumors compared with ChatGPT-4o. This study provides empirical support for the application of LLMs and recommends optimizing accuracy and incorporating real-time verification mechanisms to mitigate the harmful impact of misleading information on patient health.

背景:本研究以血管瘤相关信息为例,检验大语言模型(LLMs)检测医学谣言的能力。比较了chatgpt - 40和DeepSeek-R1的性能。目的:本研究旨在评估和比较chatgpt - 40和DeepSeek-R1两种LLMs在将血管瘤相关医疗信息分类为“谣言”或“准确信息”方面的准确性、稳定性和专家评价的可靠性。方法:我们从社交媒体平台、医学教育网站、国际指南和期刊中收集了82篇公开的文本。在82个条目中,47/82(57%)被标记为“谣言”,35/82(43%)被标记为“准确信息”。三名具有丰富临床经验的血管异常专家以双盲方式独立注释文本,并通过仲裁解决分歧,以确保标签的可靠性。随后,将这些文本输入到chatgpt - 40和DeepSeek-R1中,每个模型在相同的指令下产生2轮结果。输出稳定性评估使用双向编码器表示基于变压器的语义相似度评分。计算分类正确率、精密度、召回率和f1评分来评价其性能。此外,2名医学专家根据临床指南使用5分制对模型输出进行独立评级。统计分析包括配对t检验、Wilcoxon符号秩检验和自举重抽样来计算置信区间。结果:在语义稳定性方面,两种模型的相似度分布基本重合,差异无统计学意义(mean difference=-0.003, 95% CI = -0.011 ~ 0.005; P= 0.30)。在分类性能方面,DeepSeek-R1的准确率(0.963)高于chatgpt - 40(0.910),在精密度(0.978 vs 0.940)、召回率(0.957 vs 0.894)和f1评分(0.967 vs 0.916)方面也有更好的表现。专家评估显示,DeepSeek-R1在两个“谣言”项上的表现显著优于chatgpt - 40(平均差值=0.431);结论:与chatgpt - 40相比,DeepSeek-R1在检测医学谣言方面表现出更高的准确性和合理性。本研究为llm的应用提供了实证支持,并建议优化准确性和纳入实时验证机制,以减轻误导性信息对患者健康的有害影响。
{"title":"Detection of Medical Misinformation in Hemangioma Patient Education: Comparative Study of ChatGPT-4o and DeepSeek-R1 Large Language Models.","authors":"Guoyong Wang, Ye Zhang, Weixin Wang, Yingjie Zhu, Wei Lu, Chaonan Wang, Hui Bi, Xiaonan Yang","doi":"10.2196/76372","DOIUrl":"10.2196/76372","url":null,"abstract":"<p><strong>Background: </strong>This study examines the capability of large language models (LLMs) in detecting medical rumors, using hemangioma-related information as an example. It compares the performances of ChatGPT-4o and DeepSeek-R1.</p><p><strong>Objective: </strong>This study aimed to evaluate and compare the accuracy, stability, and expert-rated reliability of 2 LLMs, ChatGPT-4o and DeepSeek-R1, in classifying medical information related to hemangiomas as either \"rumors\" or \"accurate information.\"</p><p><strong>Methods: </strong>We collected 82 publicly available texts from social media platforms, medical education websites, international guidelines, and journals. Of the 82 items, 47/82 (57%) were labeled as \"rumors,\" and 35/82 (43%) were labeled as \"accurate information.\" Three vascular anomaly specialists with extensive clinical experience independently annotated the texts in a double-blinded manner, and disagreements were resolved by arbitration to ensure labeling reliability. Subsequently, these texts were input into ChatGPT-4o and DeepSeek-R1, with each model generating 2 rounds of results under identical instructions. Output stability was assessed using bidirectional encoder representations from transformers-based semantic similarity scores. Classification accuracy, precision, recall, and F1-score were calculated to evaluate the performance. Additionally, 2 medical experts independently rated the model outputs using a 5-point scale based on clinical guidelines. Statistical analyses included paired t tests, Wilcoxon signed-rank tests, and bootstrap resampling to compute confidence intervals.</p><p><strong>Results: </strong>In terms of semantic stability, the similarity distributions for the 2 models largely overlapped, with no statistically significant difference observed (mean difference=-0.003, 95% CI -0.011 to 0.005; P=.30). Regarding classification performance, DeepSeek-R1 achieved higher accuracy (0.963) compared to ChatGPT-4o (0.910), and also performed better in terms of precision (0.978 vs 0.940), recall (0.957 vs 0.894), and F1-score (0.967 vs 0.916). Expert evaluations revealed that DeepSeek-R1 significantly outperformed ChatGPT-4o on both \"rumor\" items (mean difference=0.431; P<.001; Cohen dz=0.594) and \"accurate information\" items (mean difference=0.264; P=.045; Cohen dz=0.352), with a particularly pronounced advantage in rumor detection.</p><p><strong>Conclusions: </strong>DeepSeek-R1 demonstrated greater accuracy and rationale in detecting medical rumors compared with ChatGPT-4o. This study provides empirical support for the application of LLMs and recommends optimizing accuracy and incorporating real-time verification mechanisms to mitigate the harmful impact of misleading information on patient health.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e76372"},"PeriodicalIF":2.0,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12627899/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145552318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Standardizing and Scaffolding Health Care AI-Chatbot Evaluation: Systematic Review. 标准化和脚手架式医疗ai -聊天机器人评估:系统综述。
IF 2 Pub Date : 2025-11-07 DOI: 10.2196/69006
Yining Hua, Winna Xia, David Bates, George Luke Hartstein, Hyungjin Tom Kim, Michael Li, Benjamin W Nelson, Charles Stromeyer Iv, Darlene King, Jina Suh, Li Zhou, John Torous

Background: Health care chatbots are rapidly proliferating, while generative artificial intelligence (AI) outpaces existing evaluation standards.

Objective: We aimed to develop a structured, stakeholder-informed framework to standardize evaluation of health care chatbots.

Methods: PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses)-guided searches across multiple databases identified 266 records; 152 were screened, 21 full texts were assessed, and 11 frameworks were included. We extracted 356 questions (refined to 271 by deduplication and relevance review), mapped items to Coalition for Health AI constructs, and organized them with iterative input from clinicians, patients, developers, epidemiologists, and policymakers.

Results: We developed the Health Care AI Chatbot Evaluation Framework (HAICEF), a hierarchical framework with 3 priority domains (safety, privacy, and fairness; trustworthiness and usefulness; and design and operational effectiveness) and 18 second-level and 60 third-level constructs covering 271 questions. Emphasis includes data provenance and harm control; Health Insurance Portability and Accountability Act/General Data Protection Regulation-aligned privacy and security; bias management; and reliability, transparency, and workflow integration. Question distribution across domains is as follows: design and operational effectiveness, 40%; trustworthiness and usefulness, 39%; and safety, privacy and fairness, 21%. The framework accommodates both patient-facing and back-office use cases.

Conclusions: HAICEF provides an adaptable scaffold for standardized evaluation and responsible implementation of health care chatbots. Planned next steps include prospective validation across settings and a Delphi consensus to extend accountability and accessibility assurances.

背景:医疗保健聊天机器人正在迅速激增,而生成式人工智能(AI)超越了现有的评估标准。目的:我们旨在开发一个结构化的、利益相关者知情的框架,以标准化医疗聊天机器人的评估。方法:PRISMA(系统评价和荟萃分析首选报告项目)引导搜索多个数据库,确定266条记录;筛选了152个,评估了21个全文,并纳入了11个框架。我们提取了356个问题(通过重复数据删除和相关性审查提炼为271个问题),将项目映射到卫生联盟人工智能结构中,并根据临床医生、患者、开发人员、流行病学家和政策制定者的反复输入对其进行组织。结果:我们开发了医疗保健人工智能聊天机器人评估框架(HAICEF),这是一个分层框架,有3个优先领域(安全、隐私和公平;可信度和有用性;设计和运营有效性)和18个二级结构和60个三级结构,涵盖271个问题。重点包括数据来源和危害控制;符合《健康保险流通与责任法案》/《一般数据保护条例》的隐私和安全;偏差管理;并且可靠性、透明性和工作流集成。跨领域的问题分布如下:设计和运营有效性,40%;可信度和实用性,39%;安全、隐私和公平占21%。该框架既支持面向患者的用例,也支持后台用例。结论:HAICEF为卫生保健聊天机器人的标准化评估和负责任的实施提供了一个适应性强的框架。计划的下一步包括跨设置的前瞻性验证和Delphi共识,以扩展问责制和可访问性保证。
{"title":"Standardizing and Scaffolding Health Care AI-Chatbot Evaluation: Systematic Review.","authors":"Yining Hua, Winna Xia, David Bates, George Luke Hartstein, Hyungjin Tom Kim, Michael Li, Benjamin W Nelson, Charles Stromeyer Iv, Darlene King, Jina Suh, Li Zhou, John Torous","doi":"10.2196/69006","DOIUrl":"10.2196/69006","url":null,"abstract":"<p><strong>Background: </strong>Health care chatbots are rapidly proliferating, while generative artificial intelligence (AI) outpaces existing evaluation standards.</p><p><strong>Objective: </strong>We aimed to develop a structured, stakeholder-informed framework to standardize evaluation of health care chatbots.</p><p><strong>Methods: </strong>PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses)-guided searches across multiple databases identified 266 records; 152 were screened, 21 full texts were assessed, and 11 frameworks were included. We extracted 356 questions (refined to 271 by deduplication and relevance review), mapped items to Coalition for Health AI constructs, and organized them with iterative input from clinicians, patients, developers, epidemiologists, and policymakers.</p><p><strong>Results: </strong>We developed the Health Care AI Chatbot Evaluation Framework (HAICEF), a hierarchical framework with 3 priority domains (safety, privacy, and fairness; trustworthiness and usefulness; and design and operational effectiveness) and 18 second-level and 60 third-level constructs covering 271 questions. Emphasis includes data provenance and harm control; Health Insurance Portability and Accountability Act/General Data Protection Regulation-aligned privacy and security; bias management; and reliability, transparency, and workflow integration. Question distribution across domains is as follows: design and operational effectiveness, 40%; trustworthiness and usefulness, 39%; and safety, privacy and fairness, 21%. The framework accommodates both patient-facing and back-office use cases.</p><p><strong>Conclusions: </strong>HAICEF provides an adaptable scaffold for standardized evaluation and responsible implementation of health care chatbots. Planned next steps include prospective validation across settings and a Delphi consensus to extend accountability and accessibility assurances.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e69006"},"PeriodicalIF":2.0,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12639340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145472290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI in Health Care Service Quality: Systematic Review. 人工智能在卫生保健服务质量中的应用:系统评价。
IF 2 Pub Date : 2025-11-05 DOI: 10.2196/69209
Eman Alghareeb, Najla Aljehani

Background: Artificial intelligence (AI) is a rapidly evolving technology with the potential to revolutionize the health care industry. In Saudi Arabia, the health care sector has adopted AI technologies over the past decade to enhance service efficiency and quality, aligning with the country's technological thrust under the Saudi Vision 2030 program.

Objective: This review aims to systematically examine the impact of AI on health care quality in Saudi Arabian hospitals.

Methods: A meticulous and comprehensive systematic literature review was undertaken to identify studies investigating AI's impact on health care in Saudi Arabia. We collected several studies from selected databases, including PubMed, Google Scholar, and Saudi Digital Library. The search terms used were "Artificial Intelligence," "health care," "health care quality," "AI in Saudi Arabia," "AI in health care," and "health care providers." The review focused on studies published in the past 10 years, ensuring the inclusion of the most recent and relevant research on the effects of AI on Saudi Arabian health care organizations. The review included quantitative and qualitative analyses, providing a robust and comprehensive understanding of the topic.

Results: A systematic review of 12 studies explored AI's influence on health care services in Saudi Arabia, highlighting notable advancements in diagnostic accuracy, patient management, and operational efficiency. AI-driven models demonstrate high precision in disease prediction and early diagnosis, while machine learning optimizes telehealth, electronic health record compliance, and workflow efficiency, despite adoption challenges like connectivity limitations. Additionally, AI strengthens data security, reduces costs, and facilitates personalized treatment, ultimately enhancing health care delivery.

Conclusions: The review underscores that AI technologies have significantly improved diagnostic accuracy, patient management, and operational efficiency in Saudi Arabia's health care system. However, challenges such as data privacy, algorithmic bias, and robust regulations require attention to ensure successful AI integration in health care.

背景:人工智能(AI)是一项快速发展的技术,有可能彻底改变医疗保健行业。在沙特阿拉伯,卫生保健部门在过去十年中采用了人工智能技术,以提高服务效率和质量,这与该国在“沙特2030年愿景”计划下的技术重点保持一致。目的:本综述旨在系统地研究人工智能对沙特阿拉伯医院卫生保健质量的影响。方法:对沙特阿拉伯人工智能对卫生保健影响的研究进行了细致而全面的系统文献综述。我们从选定的数据库中收集了几项研究,包括PubMed、b谷歌Scholar和沙特数字图书馆。使用的搜索词是“人工智能”、“医疗保健”、“医疗保健质量”、“沙特阿拉伯的人工智能”、“医疗保健中的人工智能”和“医疗保健提供者”。审查的重点是过去10年发表的研究,确保纳入关于人工智能对沙特阿拉伯卫生保健组织影响的最新相关研究。审查包括定量和定性分析,提供了对该主题的有力和全面的理解。结果:对12项研究的系统回顾探讨了人工智能对沙特阿拉伯医疗保健服务的影响,强调了在诊断准确性、患者管理和运营效率方面的显着进步。人工智能驱动的模型在疾病预测和早期诊断方面表现出高精度,而机器学习优化了远程医疗、电子健康记录合规性和工作流程效率,尽管存在连接限制等采用挑战。此外,人工智能增强了数据安全性,降低了成本,促进了个性化治疗,最终改善了医疗服务。结论:该综述强调,人工智能技术显著提高了沙特阿拉伯卫生保健系统的诊断准确性、患者管理和运营效率。然而,需要关注数据隐私、算法偏见和强有力的法规等挑战,以确保人工智能成功融入医疗保健。
{"title":"AI in Health Care Service Quality: Systematic Review.","authors":"Eman Alghareeb, Najla Aljehani","doi":"10.2196/69209","DOIUrl":"10.2196/69209","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) is a rapidly evolving technology with the potential to revolutionize the health care industry. In Saudi Arabia, the health care sector has adopted AI technologies over the past decade to enhance service efficiency and quality, aligning with the country's technological thrust under the Saudi Vision 2030 program.</p><p><strong>Objective: </strong>This review aims to systematically examine the impact of AI on health care quality in Saudi Arabian hospitals.</p><p><strong>Methods: </strong>A meticulous and comprehensive systematic literature review was undertaken to identify studies investigating AI's impact on health care in Saudi Arabia. We collected several studies from selected databases, including PubMed, Google Scholar, and Saudi Digital Library. The search terms used were \"Artificial Intelligence,\" \"health care,\" \"health care quality,\" \"AI in Saudi Arabia,\" \"AI in health care,\" and \"health care providers.\" The review focused on studies published in the past 10 years, ensuring the inclusion of the most recent and relevant research on the effects of AI on Saudi Arabian health care organizations. The review included quantitative and qualitative analyses, providing a robust and comprehensive understanding of the topic.</p><p><strong>Results: </strong>A systematic review of 12 studies explored AI's influence on health care services in Saudi Arabia, highlighting notable advancements in diagnostic accuracy, patient management, and operational efficiency. AI-driven models demonstrate high precision in disease prediction and early diagnosis, while machine learning optimizes telehealth, electronic health record compliance, and workflow efficiency, despite adoption challenges like connectivity limitations. Additionally, AI strengthens data security, reduces costs, and facilitates personalized treatment, ultimately enhancing health care delivery.</p><p><strong>Conclusions: </strong>The review underscores that AI technologies have significantly improved diagnostic accuracy, patient management, and operational efficiency in Saudi Arabia's health care system. However, challenges such as data privacy, algorithmic bias, and robust regulations require attention to ensure successful AI integration in health care.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e69209"},"PeriodicalIF":2.0,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12594439/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145453806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining Transparency in Kidney Transplant Recipient Selection Criteria: Nationwide Cross-Sectional Study. 肾移植受者选择标准的透明度:一项使用人工智能的全国性分析。
IF 2 Pub Date : 2025-11-04 DOI: 10.2196/74066
Belen Rivera, Stalin Canizares, Gabriel Cojuc-Konigsberg, Olena Holub, Alex Nakonechnyi, Ritah R Chumdermpadetsuk, Keren Ladin, Devin E Eckhoff, Rebecca Allen, Aditya Pawar

Background: Choosing a transplant program impacts a patient's likelihood of receiving a kidney transplant. Most patients are unaware of the factors influencing their candidacy. As patients increasingly rely on online resources for health care decisions, this study quantifies the available online patient-level information on kidney transplant recipient (KTR) selection criteria across US kidney transplant centers.

Objective: We aimed to use natural language processing and a large language model to quantify the available online patient-level information regarding the guideline-recommended KTR selection criteria reported by US transplant centers.

Methods: A cross-sectional study using natural language processing and a large language model was conducted to review the websites of US kidney transplant centers from June to August 2024. Links were explored up to 3 levels deep, and information on 31 guideline-recommended KTR selection criteria was collected from each transplant center.

Results: A total of 255 US kidney transplant centers were analyzed, comprising 10,508 web pages and 9,113,753 words. Among the kidney transplant guideline-recommended KTR selection criteria, only 2.6% (206/7905) of the information was present on the transplant center web pages. Socioeconomic and behavioral criteria were mentioned more than those related to the patient's medical conditions and comorbidities. Of the 31 criteria, finances and health insurance was the most frequently mentioned, appearing in 25.5% (65/255) of the transplant centers. Other socioeconomic and behavioral criteria, such as family and social support systems, adherence, and psychosocial assessment, were addressed in less than 4% (9/255) of the transplant centers. No information was found on any web page for 45.2% (14/31) of the KTR selection criteria. Geographically, disparities in reporting were observed, with the South Atlantic division showing the highest number of distinct criteria, while New England had the fewest.

Conclusions: Most transplant center websites do not disclose patient-level KTR selection criteria online. The lack of transparency in the evaluation and listing process for kidney transplantation may limit patients in choosing their most suitable transplant center and successfully receiving a kidney transplant.

背景:选择一个移植项目会影响患者接受肾移植的可能性。大多数患者不知道影响他们候选资格的因素。由于患者越来越依赖在线资源进行医疗保健决策,本研究量化了美国移植中心肾移植受体(KTR)选择标准的可用在线患者级信息。目的:我们旨在使用自然语言处理(NLP)和法学硕士(LLM)来量化美国移植中心报告的指南推荐肾移植受体(KTR)选择标准的可用在线患者级信息。方法:采用自然语言处理和大型语言模型对2024年6月至8月美国肾移植中心网站进行横断面研究。我们深入研究了三个层次的联系,并从每个移植中心收集了31个指南推荐的KTR选择标准的信息。结果:共分析了255个美国肾移植中心,包括10,508个网页和9,113,753个单词。在肾移植指南推荐的KTR选择标准中,只有2.6%的信息出现在移植中心的网页上。社会经济和行为标准被提及的次数多于与患者医疗状况和合并症相关的标准。在31个标准中,财务和健康保险是最常被提及的,出现在25.5%的移植中心。其他社会经济和行为标准,如家庭和社会支持系统、依从性和心理社会评估,只有不到4%的人得到解决。在任何网页上都找不到14项标准的资料。在地理上,报告的差异被观察到,南大西洋地区显示出不同标准的数量最多,而新英格兰地区的标准最少。结论:大多数移植中心网站没有公开在线患者层面的KTR选择标准。在肾移植的评估和列表过程中缺乏透明度可能会限制患者选择最合适的移植中心并成功接受肾移植。临床试验:
{"title":"Examining Transparency in Kidney Transplant Recipient Selection Criteria: Nationwide Cross-Sectional Study.","authors":"Belen Rivera, Stalin Canizares, Gabriel Cojuc-Konigsberg, Olena Holub, Alex Nakonechnyi, Ritah R Chumdermpadetsuk, Keren Ladin, Devin E Eckhoff, Rebecca Allen, Aditya Pawar","doi":"10.2196/74066","DOIUrl":"10.2196/74066","url":null,"abstract":"<p><strong>Background: </strong>Choosing a transplant program impacts a patient's likelihood of receiving a kidney transplant. Most patients are unaware of the factors influencing their candidacy. As patients increasingly rely on online resources for health care decisions, this study quantifies the available online patient-level information on kidney transplant recipient (KTR) selection criteria across US kidney transplant centers.</p><p><strong>Objective: </strong>We aimed to use natural language processing and a large language model to quantify the available online patient-level information regarding the guideline-recommended KTR selection criteria reported by US transplant centers.</p><p><strong>Methods: </strong>A cross-sectional study using natural language processing and a large language model was conducted to review the websites of US kidney transplant centers from June to August 2024. Links were explored up to 3 levels deep, and information on 31 guideline-recommended KTR selection criteria was collected from each transplant center.</p><p><strong>Results: </strong>A total of 255 US kidney transplant centers were analyzed, comprising 10,508 web pages and 9,113,753 words. Among the kidney transplant guideline-recommended KTR selection criteria, only 2.6% (206/7905) of the information was present on the transplant center web pages. Socioeconomic and behavioral criteria were mentioned more than those related to the patient's medical conditions and comorbidities. Of the 31 criteria, finances and health insurance was the most frequently mentioned, appearing in 25.5% (65/255) of the transplant centers. Other socioeconomic and behavioral criteria, such as family and social support systems, adherence, and psychosocial assessment, were addressed in less than 4% (9/255) of the transplant centers. No information was found on any web page for 45.2% (14/31) of the KTR selection criteria. Geographically, disparities in reporting were observed, with the South Atlantic division showing the highest number of distinct criteria, while New England had the fewest.</p><p><strong>Conclusions: </strong>Most transplant center websites do not disclose patient-level KTR selection criteria online. The lack of transparency in the evaluation and listing process for kidney transplantation may limit patients in choosing their most suitable transplant center and successfully receiving a kidney transplant.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":" ","pages":"e74066"},"PeriodicalIF":2.0,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12627972/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predetermined Change Control Plans: Guiding Principles for Advancing Safe, Effective, and High-Quality AI-ML Technologies. 预定变更控制计划:推进安全、有效和高质量AI-ML技术的指导原则。
IF 2 Pub Date : 2025-10-31 DOI: 10.2196/76854
Eduardo Carvalho, Miguel Mascarenhas, Francisca Pinheiro, Ricardo Correia, Sandra Balseiro, Guilherme Barbosa, Ana Guerra, Dulce Oliveira, Rita Moura, André Martins Dos Santos, Nilza Ramião

Unlabelled: The adaptive nature of artificial intelligence (AI), with its ability to improve performance through continuous learning, offers substantial benefits across various sectors. However, current regulatory frameworks are not intended to accommodate this adaptive nature, and they have prolonged approval timelines, sometimes exceeding one year for some AI-enabled devices. This creates significant challenges for manufacturers who must deal with lengthy waits and submit multiple approval requests for AI-enabled device software functions as they are updated. In response, regulatory agencies like the US Food and Drug Administration (FDA) have introduced guidelines to better support the approval process for continuously evolving AI technologies. This article explores the FDA's concept of predetermined change control plans and how they can streamline regulatory oversight by reducing the need for repeated approvals, while ensuring safety and compliance. This can help reduce the burden for regulatory bodies and decrease waiting times for approval decisions, therefore fostering innovation, increasing market uptake, and exploiting the benefits of artificial intelligence and machine learning technologies.

未标注:人工智能(AI)的自适应特性,以及通过持续学习提高绩效的能力,为各个行业带来了巨大的好处。然而,目前的监管框架并没有考虑到这种适应性,它们延长了批准时间,某些支持人工智能的设备有时超过一年。这给制造商带来了巨大的挑战,他们必须处理长时间的等待,并在更新支持人工智能的设备软件功能时提交多个批准请求。作为回应,美国食品和药物管理局(FDA)等监管机构已经出台了指导方针,以更好地支持不断发展的人工智能技术的审批过程。本文探讨了FDA预先确定的变更控制计划的概念,以及它们如何通过减少重复批准的需要来简化监管监督,同时确保安全性和合规性。这有助于减轻监管机构的负担,减少审批决定的等待时间,从而促进创新,增加市场占有率,并利用人工智能和机器学习技术的好处。
{"title":"Predetermined Change Control Plans: Guiding Principles for Advancing Safe, Effective, and High-Quality AI-ML Technologies.","authors":"Eduardo Carvalho, Miguel Mascarenhas, Francisca Pinheiro, Ricardo Correia, Sandra Balseiro, Guilherme Barbosa, Ana Guerra, Dulce Oliveira, Rita Moura, André Martins Dos Santos, Nilza Ramião","doi":"10.2196/76854","DOIUrl":"10.2196/76854","url":null,"abstract":"<p><strong>Unlabelled: </strong>The adaptive nature of artificial intelligence (AI), with its ability to improve performance through continuous learning, offers substantial benefits across various sectors. However, current regulatory frameworks are not intended to accommodate this adaptive nature, and they have prolonged approval timelines, sometimes exceeding one year for some AI-enabled devices. This creates significant challenges for manufacturers who must deal with lengthy waits and submit multiple approval requests for AI-enabled device software functions as they are updated. In response, regulatory agencies like the US Food and Drug Administration (FDA) have introduced guidelines to better support the approval process for continuously evolving AI technologies. This article explores the FDA's concept of predetermined change control plans and how they can streamline regulatory oversight by reducing the need for repeated approvals, while ensuring safety and compliance. This can help reduce the burden for regulatory bodies and decrease waiting times for approval decisions, therefore fostering innovation, increasing market uptake, and exploiting the benefits of artificial intelligence and machine learning technologies.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e76854"},"PeriodicalIF":2.0,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12577744/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145423746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing Innovation and Control: The European Union AI Act in an Era of Global Uncertainty. 平衡创新与控制:全球不确定时代的欧盟人工智能法案。
IF 2 Pub Date : 2025-10-30 DOI: 10.2196/75527
Elena Giovanna Bignami, Michele Russo, Federico Semeraro, Valentina Bellini

Unlabelled: The European Union's Artificial Intelligence Act (EU AI Act), adopted in 2024, establishes a landmark regulatory framework for artificial intelligence (AI) systems, with significant implications for health care. The Act classifies medical AI as "high-risk," imposing stringent requirements for transparency, data governance, and human oversight. While these measures aim to safeguard patient safety, they may also hinder innovation, particularly for smaller health care providers and startups. Concurrently, geopolitical instability-marked by rising military expenditures, trade tensions, and supply chain disruptions-threatens health care innovation and access. This paper examines the challenges and opportunities posed by the AI Act in health care within a volatile geopolitical landscape. It evaluates the intersection of Europe's regulatory approach with competing priorities, including technological sovereignty, ethical AI, and equitable health care, while addressing unintended consequences such as reduced innovation and supply chain vulnerabilities. The study employs a comprehensive review of the EU AI Act's provisions, geopolitical trends, and their implications for health care. It analyzes regulatory documents, stakeholder statements, and case studies to assess compliance burdens, innovation barriers, and geopolitical risks. The paper also synthesizes recommendations from multidisciplinary experts to propose actionable solutions. Key findings include: (1) the AI Act's high-risk classification for medical AI could improve patient safety but risks stifling innovation due to compliance costs (eg, €29,277 annually per AI unit) and certification burdens (€16,800-23,000 per unit); (2) geopolitical factors-such as United States-China semiconductor tariffs and EU rearmament-exacerbate supply chain vulnerabilities and divert funding from health care innovation; (3) the dominance of "superstar" firms in AI development may marginalize smaller players, further concentrating innovation in well-resourced organizations; and (4) regulatory sandboxes, AI literacy programs, and international collaboration emerge as viable strategies to balance innovation and compliance. The EU AI Act provides a critical framework for ethical AI in health care, but its success depends on mitigating regulatory burdens and geopolitical risks. Proactive measures-such as multidisciplinary task forces, resilient supply chains, and human-augmented AI systems-are essential to foster innovation while ensuring patient safety. Policymakers, clinicians, and technologists must collaborate to navigate these challenges in an era of global uncertainty.

未标注:欧盟于2024年通过的《人工智能法案》(EU AI法案)为人工智能(AI)系统建立了具有里程碑意义的监管框架,对医疗保健具有重大影响。该法案将医疗人工智能归类为“高风险”,对透明度、数据治理和人力监督提出了严格的要求。虽然这些措施旨在保障患者安全,但它们也可能阻碍创新,特别是对较小的医疗保健提供者和初创企业而言。与此同时,地缘政治不稳定——以军事开支上升、贸易紧张和供应链中断为标志——威胁着医疗保健创新和获取。本文探讨了人工智能法案在动荡的地缘政治环境中对医疗保健带来的挑战和机遇。它评估了欧洲监管方法与相互竞争的优先事项之间的交集,包括技术主权、道德人工智能和公平的医疗保健,同时解决创新减少和供应链脆弱性等意想不到的后果。该研究全面审查了《欧盟人工智能法案》的条款、地缘政治趋势及其对医疗保健的影响。它分析了监管文件、利益相关者声明和案例研究,以评估合规负担、创新障碍和地缘政治风险。本文还综合了多学科专家的建议,提出了可行的解决方案。主要发现包括:(1)《人工智能法案》对医疗人工智能的高风险分类可以改善患者安全,但由于合规成本(例如,每个人工智能单位每年29,277欧元)和认证负担(每个单位16,800-23,000欧元),可能会抑制创新;(2)地缘政治因素——如美中半导体关税和欧盟重整军备——加剧了供应链的脆弱性,转移了医疗创新的资金;(3)“超级明星”公司在人工智能开发中的主导地位可能会边缘化较小的参与者,进一步将创新集中在资源充足的组织中;(4)监管沙盒、人工智能扫盲计划和国际合作成为平衡创新与合规性的可行战略。《欧盟人工智能法案》为卫生保健领域的人工智能伦理提供了一个关键框架,但其成功取决于减轻监管负担和地缘政治风险。多学科工作组、弹性供应链和人工增强人工智能系统等主动措施对于促进创新,同时确保患者安全至关重要。政策制定者、临床医生和技术专家必须合作,在全球不确定的时代应对这些挑战。
{"title":"Balancing Innovation and Control: The European Union AI Act in an Era of Global Uncertainty.","authors":"Elena Giovanna Bignami, Michele Russo, Federico Semeraro, Valentina Bellini","doi":"10.2196/75527","DOIUrl":"10.2196/75527","url":null,"abstract":"<p><strong>Unlabelled: </strong>The European Union's Artificial Intelligence Act (EU AI Act), adopted in 2024, establishes a landmark regulatory framework for artificial intelligence (AI) systems, with significant implications for health care. The Act classifies medical AI as \"high-risk,\" imposing stringent requirements for transparency, data governance, and human oversight. While these measures aim to safeguard patient safety, they may also hinder innovation, particularly for smaller health care providers and startups. Concurrently, geopolitical instability-marked by rising military expenditures, trade tensions, and supply chain disruptions-threatens health care innovation and access. This paper examines the challenges and opportunities posed by the AI Act in health care within a volatile geopolitical landscape. It evaluates the intersection of Europe's regulatory approach with competing priorities, including technological sovereignty, ethical AI, and equitable health care, while addressing unintended consequences such as reduced innovation and supply chain vulnerabilities. The study employs a comprehensive review of the EU AI Act's provisions, geopolitical trends, and their implications for health care. It analyzes regulatory documents, stakeholder statements, and case studies to assess compliance burdens, innovation barriers, and geopolitical risks. The paper also synthesizes recommendations from multidisciplinary experts to propose actionable solutions. Key findings include: (1) the AI Act's high-risk classification for medical AI could improve patient safety but risks stifling innovation due to compliance costs (eg, €29,277 annually per AI unit) and certification burdens (€16,800-23,000 per unit); (2) geopolitical factors-such as United States-China semiconductor tariffs and EU rearmament-exacerbate supply chain vulnerabilities and divert funding from health care innovation; (3) the dominance of \"superstar\" firms in AI development may marginalize smaller players, further concentrating innovation in well-resourced organizations; and (4) regulatory sandboxes, AI literacy programs, and international collaboration emerge as viable strategies to balance innovation and compliance. The EU AI Act provides a critical framework for ethical AI in health care, but its success depends on mitigating regulatory burdens and geopolitical risks. Proactive measures-such as multidisciplinary task forces, resilient supply chains, and human-augmented AI systems-are essential to foster innovation while ensuring patient safety. Policymakers, clinicians, and technologists must collaborate to navigate these challenges in an era of global uncertainty.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e75527"},"PeriodicalIF":2.0,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12574960/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145411036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ETHICS of AI Adoption and Deployment in Health Care: Progress, Challenges, and Next Steps. 人工智能在医疗保健中的应用和部署:进展、挑战和下一步。
IF 2 Pub Date : 2025-10-30 DOI: 10.2196/67626
Obinna O Oleribe, Andrew W Taylor-Robinson, Christian C Chimezie, Simon D Taylor-Robinson

Generative artificial intelligence (GenAI) is increasingly being integrated into health care, offering a wide array of benefits. Currently, GenAI applications are useful in disease risk prediction and preventive care, diagnostics via imaging, artificial intelligence (AI)-assisted devices and point-of-care tools, drug discovery and design, patient and disease monitoring, remote monitoring and wearables, integration of multimodal data and personalized medicine, on-site and remote patient and disease monitoring and device integration, robotic surgery, and health system efficiency and workflow optimization, among other aspects of disease prevention, control, diagnosis, and treatment. Recent breakthroughs have led to the development of reliable and safer GenAI systems capable of handling the complexity of health care data. The potential of GenAI to optimize resource use and enhance productivity underscores its critical role in patient care. However, the use of AI in health is not without critical gaps and challenges, including (but not limited to) AI-related environmental concerns, transparency and explainability, hallucinations, inclusiveness and inconsistencies, cost and clinical workflow integration, and safety and security of data (ETHICS). In addition, the governance and regulatory issues surrounding GenAI applications in health care highlight the importance of addressing these aspects for responsible and appropriate GenAI integration. Building on AI's promising start necessitates striking a balance between technical advancements and ethical, equity, and environmental concerns. Here, we highlight several ways in which the transformative power of GenAI is revolutionizing public health practice and patient care, acknowledge gaps and challenges, and indicate future directions for AI adoption and deployment.

生成式人工智能(GenAI)正越来越多地集成到医疗保健中,提供了广泛的好处。目前,GenAI应用可用于疾病风险预测和预防保健、成像诊断、人工智能(AI)辅助设备和护理点工具、药物发现和设计、患者和疾病监测、远程监测和可穿戴设备、多模式数据集成和个性化医疗、现场和远程患者和疾病监测和设备集成、机器人手术以及卫生系统效率和工作流程优化。在疾病预防、控制、诊断和治疗等方面。最近的突破导致了可靠和更安全的GenAI系统的发展,能够处理复杂的医疗保健数据。GenAI优化资源利用和提高生产力的潜力强调了其在患者护理中的关键作用。然而,人工智能在卫生领域的使用并非没有重大差距和挑战,包括(但不限于)与人工智能相关的环境问题、透明度和可解释性、幻觉、包容性和不一致性、成本和临床工作流程整合,以及数据的安全性和安全性(伦理)。此外,围绕GenAI在医疗保健中的应用的治理和监管问题突出了解决这些方面的重要性,以便负责任和适当地整合GenAI。在人工智能充满希望的开端的基础上,需要在技术进步与道德、公平和环境问题之间取得平衡。在这里,我们强调了GenAI的变革力量正在彻底改变公共卫生实践和患者护理的几种方式,承认差距和挑战,并指出人工智能采用和部署的未来方向。
{"title":"ETHICS of AI Adoption and Deployment in Health Care: Progress, Challenges, and Next Steps.","authors":"Obinna O Oleribe, Andrew W Taylor-Robinson, Christian C Chimezie, Simon D Taylor-Robinson","doi":"10.2196/67626","DOIUrl":"10.2196/67626","url":null,"abstract":"<p><p>Generative artificial intelligence (GenAI) is increasingly being integrated into health care, offering a wide array of benefits. Currently, GenAI applications are useful in disease risk prediction and preventive care, diagnostics via imaging, artificial intelligence (AI)-assisted devices and point-of-care tools, drug discovery and design, patient and disease monitoring, remote monitoring and wearables, integration of multimodal data and personalized medicine, on-site and remote patient and disease monitoring and device integration, robotic surgery, and health system efficiency and workflow optimization, among other aspects of disease prevention, control, diagnosis, and treatment. Recent breakthroughs have led to the development of reliable and safer GenAI systems capable of handling the complexity of health care data. The potential of GenAI to optimize resource use and enhance productivity underscores its critical role in patient care. However, the use of AI in health is not without critical gaps and challenges, including (but not limited to) AI-related environmental concerns, transparency and explainability, hallucinations, inclusiveness and inconsistencies, cost and clinical workflow integration, and safety and security of data (ETHICS). In addition, the governance and regulatory issues surrounding GenAI applications in health care highlight the importance of addressing these aspects for responsible and appropriate GenAI integration. Building on AI's promising start necessitates striking a balance between technical advancements and ethical, equity, and environmental concerns. Here, we highlight several ways in which the transformative power of GenAI is revolutionizing public health practice and patient care, acknowledge gaps and challenges, and indicate future directions for AI adoption and deployment.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e67626"},"PeriodicalIF":2.0,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12616186/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145411110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Reliability and Accuracy of an AI-Powered Search Engine in Providing Responses on Dietary Supplements: Quantitative and Qualitative Evaluation. 评估人工智能搜索引擎提供膳食补充剂响应的可靠性和准确性:定量和定性评估。
IF 2 Pub Date : 2025-10-29 DOI: 10.2196/78436
Mingxin Liu, Tsuyoshi Okuhara, Ritsuko Shirabe, Yuriko Nishiie, Yinghan Xu, Hiroko Okada, Takahiro Kiuchi
<p><strong>Background: </strong>The widespread adoption of artificial intelligence (AI)-powered search engines has transformed how people access health information. Microsoft Copilot, formerly Bing Chat, offers real-time web-sourced responses to user queries, raising concerns about the reliability of its health content. This is particularly critical in the domain of dietary supplements, where scientific consensus is limited and online misinformation is prevalent. Despite the popularity of supplements in Japan, little is known about the accuracy of AI-generated advice on their effectiveness for common diseases.</p><p><strong>Objective: </strong>We aimed to evaluate the reliability and accuracy of Microsoft Copilot, an AI search engine, in responding to health-related queries about dietary supplements. Our findings can help consumers use large language models more safely and effectively when seeking information on dietary supplements and support developers in improving large language models' performance in this field.</p><p><strong>Methods: </strong>We simulated typical consumer behavior by posing 180 questions (6 per supplement × 30 supplements) to Copilot's 3 response modes (creative, balanced, and precise) in Japanese. These questions addressed the effectiveness of supplements in treating 6 common conditions (cancer, diabetes, obesity, constipation, joint pain, and hypertension). We classified the AI search engine's answers as "effective," "uncertain," or "ineffective" and evaluated for accuracy against evidence-based assessments conducted by licensed physicians. We conducted a qualitative content analysis of the response texts and systematically examined the types of sources cited in all responses.</p><p><strong>Results: </strong>The proportion of Copilot responses claiming supplement effectiveness was 29.4% (53/180), 47.8% (86/180), and 45% (81/180) for the creative, balanced, and precise modes, respectively, whereas overall accuracy of the responses was low across all modes: 36.1% (65/180), 31.7% (57/180), and 31.7% (57/180) for creative, balanced, and precise, respectively. No significant difference was observed among the 3 modes (P=.59). Notably, 72.7% (2240/3081) of the citations came from unverified sources such as blogs, sales websites, and social media. Of the 540 responses analyzed, 54 (10%) contained at least 1 citation in which the cited source did not include or support the claim made by Copilot, indicating hallucinated content. Only 48.5% (262/540) of the responses included a recommendation to consult health care professionals. Among disease categories, the highest accuracy was found for cancer-related questions, likely due to lower misinformation prevalence.</p><p><strong>Conclusions: </strong>This is the first study to assess Copilot's performance on dietary supplement information. Despite its authoritative appearance, Copilot frequently cited noncredible sources and provided ambiguous or inaccurate information. Its tendency to a
背景:人工智能(AI)搜索引擎的广泛采用改变了人们获取健康信息的方式。Microsoft Copilot(前身为必应聊天)为用户查询提供实时的网络响应,这引发了人们对其健康内容可靠性的担忧。这在膳食补充剂领域尤其重要,因为科学共识有限,网上错误信息普遍存在。尽管补充剂在日本很受欢迎,但人们对人工智能生成的关于补充剂对常见疾病有效性的建议的准确性知之甚少。目的:我们旨在评估微软人工智能搜索引擎Copilot在回答与膳食补充剂有关的健康问题时的可靠性和准确性。我们的研究结果可以帮助消费者在寻找膳食补充剂信息时更安全有效地使用大型语言模型,并支持开发人员在该领域改进大型语言模型的性能。方法:我们用日语向Copilot的3种回答模式(创意、平衡和精确)提出180个问题(每个补充剂6个× 30个补充剂),模拟典型的消费者行为。这些问题涉及补充剂在治疗6种常见疾病(癌症、糖尿病、肥胖、便秘、关节痛和高血压)方面的有效性。我们将人工智能搜索引擎的答案分为“有效”、“不确定”和“无效”,并根据有执照的医生进行的循证评估来评估准确性。我们对回复文本进行了定性内容分析,并系统地检查了所有回复中引用的来源类型。结果:在创造性、平衡和精确模式下,声称补充有效性的副驾驶应答比例分别为29.4%(53/180)、47.8%(86/180)和45%(81/180),而在所有模式下,应答的总体准确性较低:创造性、平衡和精确模式分别为36.1%(65/180)、31.7%(57/180)和31.7%(57/180)。3种模式间差异无统计学意义(P= 0.59)。值得注意的是,72.7%(2240/3081)的引用来自未经验证的来源,如博客、销售网站和社交媒体。在分析的540个回复中,54个(10%)包含至少1个引用,其中被引用的来源不包括或不支持Copilot的说法,表明存在幻觉内容。只有48.5%(262/540)的答复包括咨询卫生保健专业人员的建议。在疾病类别中,癌症相关问题的准确率最高,可能是由于错误信息的流行率较低。结论:这是第一个评估Copilot在膳食补充剂信息上的表现的研究。尽管它看起来很权威,但它经常引用不可信的消息来源,并提供模棱两可或不准确的信息。它倾向于避免明确的立场,并与感知到的用户期望保持一致,这构成了健康错误信息的潜在风险。这些发现突出表明,需要将卫生传播原则(如透明度、受众赋权和知情选择)纳入人工智能搜索引擎的开发和监管,以确保公众安全使用。
{"title":"Evaluating the Reliability and Accuracy of an AI-Powered Search Engine in Providing Responses on Dietary Supplements: Quantitative and Qualitative Evaluation.","authors":"Mingxin Liu, Tsuyoshi Okuhara, Ritsuko Shirabe, Yuriko Nishiie, Yinghan Xu, Hiroko Okada, Takahiro Kiuchi","doi":"10.2196/78436","DOIUrl":"10.2196/78436","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;The widespread adoption of artificial intelligence (AI)-powered search engines has transformed how people access health information. Microsoft Copilot, formerly Bing Chat, offers real-time web-sourced responses to user queries, raising concerns about the reliability of its health content. This is particularly critical in the domain of dietary supplements, where scientific consensus is limited and online misinformation is prevalent. Despite the popularity of supplements in Japan, little is known about the accuracy of AI-generated advice on their effectiveness for common diseases.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;We aimed to evaluate the reliability and accuracy of Microsoft Copilot, an AI search engine, in responding to health-related queries about dietary supplements. Our findings can help consumers use large language models more safely and effectively when seeking information on dietary supplements and support developers in improving large language models' performance in this field.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We simulated typical consumer behavior by posing 180 questions (6 per supplement × 30 supplements) to Copilot's 3 response modes (creative, balanced, and precise) in Japanese. These questions addressed the effectiveness of supplements in treating 6 common conditions (cancer, diabetes, obesity, constipation, joint pain, and hypertension). We classified the AI search engine's answers as \"effective,\" \"uncertain,\" or \"ineffective\" and evaluated for accuracy against evidence-based assessments conducted by licensed physicians. We conducted a qualitative content analysis of the response texts and systematically examined the types of sources cited in all responses.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The proportion of Copilot responses claiming supplement effectiveness was 29.4% (53/180), 47.8% (86/180), and 45% (81/180) for the creative, balanced, and precise modes, respectively, whereas overall accuracy of the responses was low across all modes: 36.1% (65/180), 31.7% (57/180), and 31.7% (57/180) for creative, balanced, and precise, respectively. No significant difference was observed among the 3 modes (P=.59). Notably, 72.7% (2240/3081) of the citations came from unverified sources such as blogs, sales websites, and social media. Of the 540 responses analyzed, 54 (10%) contained at least 1 citation in which the cited source did not include or support the claim made by Copilot, indicating hallucinated content. Only 48.5% (262/540) of the responses included a recommendation to consult health care professionals. Among disease categories, the highest accuracy was found for cancer-related questions, likely due to lower misinformation prevalence.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;This is the first study to assess Copilot's performance on dietary supplement information. Despite its authoritative appearance, Copilot frequently cited noncredible sources and provided ambiguous or inaccurate information. Its tendency to a","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e78436"},"PeriodicalIF":2.0,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12571200/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145402750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI Awareness and Tobacco Policy Messaging Among US Adults: Electronic Experimental Study. 美国成年人的人工智能意识和烟草政策信息:电子实验研究。
IF 2 Pub Date : 2025-10-27 DOI: 10.2196/72987
Julia Mary Alber, David Askay, Anuraj Dhillon, Lauren Sandoval, Sofia Ramos, Katharine Santilena

Background: Despite public health efforts, tobacco use remains the leading cause of preventable death in the United States and continues to disproportionately affect underrepresented populations. Public policies are needed to improve health equity in tobacco-related health outcomes. One strategy for promoting public support for these policies is through health messaging. Improvements in artificial intelligence (AI) technology offer new opportunities to create tailored policy messages quickly; however, there is limited research on how the public might perceive the use of AI for public health messages.

Objective: This study aimed to examine how knowledge of AI use impacts perceptions of a tobacco control policy video.

Methods: A national sample of US adults (N=500) was shown the same AI-generated video that focused on a tobacco control policy. Participants were then randomly assigned to 1 of 4 conditions where they were (1) told the narrator of the video was AI, (2) told the narrator of the video was human, (3) told it was unknown whether the narrator was AI or human, or (4) not provided any information about the narrator.

Results: Perceived video rating, effectiveness, and credibility did not significantly differ among the conditions. However, the mean speaker rating was significantly higher (P=.001) when participants were told the narrator of the health message was human (mean 3.65, SD 0.91) compared to the other conditions. Notably, positive attitudes toward AI were highest among those not provided information about the narrator; however, this difference was not statistically significant (mean 3.04, SD 0.90).

Conclusions: Results suggest that AI may impact perceptions of the speaker of a video; however, more research is needed to understand if these impacts would occur over time and after multiple exposures to content. Further qualitative research may help explain why potential differences may have occurred in speaker ratings. Public health professionals and researchers should further consider the ethics and cost-effectiveness of using AI for health messaging.

背景:尽管公共卫生努力,烟草使用仍然是美国可预防性死亡的主要原因,并继续不成比例地影响代表性不足的人口。需要制定公共政策来改善与烟草有关的健康结果方面的卫生公平性。促进公众支持这些政策的一项战略是通过卫生信息传递。人工智能(AI)技术的改进为快速创建量身定制的政策信息提供了新的机会;然而,关于公众如何看待使用人工智能传播公共卫生信息的研究有限。目的:本研究旨在研究人工智能使用的知识如何影响对烟草控制政策视频的看法。方法:向美国成年人的全国样本(N=500)展示了相同的人工智能生成的视频,该视频侧重于烟草控制政策。然后,参与者被随机分配到以下四种情况中的一种:(1)被告知视频的解说员是人工智能,(2)被告知视频的解说员是人类,(3)被告知不知道解说员是人工智能还是人类,或者(4)没有提供任何关于解说员的信息。结果:感知视频评分、有效性和可信度在不同条件下无显著差异。然而,与其他条件相比,当参与者被告知健康信息的讲述者是人类时(平均3.65,标准差0.91),平均讲述者评级显着更高(P=.001)。值得注意的是,那些没有提供叙述者信息的人对人工智能的积极态度最高;但差异无统计学意义(平均3.04,标准差0.90)。结论:研究结果表明,人工智能可能会影响视频说话者的感知;然而,需要更多的研究来了解这些影响是否会随着时间的推移和多次接触内容而发生。进一步的定性研究可能有助于解释为什么潜在的差异可能会出现在说话者的评级。公共卫生专业人员和研究人员应进一步考虑使用人工智能进行卫生信息传递的伦理和成本效益。
{"title":"AI Awareness and Tobacco Policy Messaging Among US Adults: Electronic Experimental Study.","authors":"Julia Mary Alber, David Askay, Anuraj Dhillon, Lauren Sandoval, Sofia Ramos, Katharine Santilena","doi":"10.2196/72987","DOIUrl":"10.2196/72987","url":null,"abstract":"<p><strong>Background: </strong>Despite public health efforts, tobacco use remains the leading cause of preventable death in the United States and continues to disproportionately affect underrepresented populations. Public policies are needed to improve health equity in tobacco-related health outcomes. One strategy for promoting public support for these policies is through health messaging. Improvements in artificial intelligence (AI) technology offer new opportunities to create tailored policy messages quickly; however, there is limited research on how the public might perceive the use of AI for public health messages.</p><p><strong>Objective: </strong>This study aimed to examine how knowledge of AI use impacts perceptions of a tobacco control policy video.</p><p><strong>Methods: </strong>A national sample of US adults (N=500) was shown the same AI-generated video that focused on a tobacco control policy. Participants were then randomly assigned to 1 of 4 conditions where they were (1) told the narrator of the video was AI, (2) told the narrator of the video was human, (3) told it was unknown whether the narrator was AI or human, or (4) not provided any information about the narrator.</p><p><strong>Results: </strong>Perceived video rating, effectiveness, and credibility did not significantly differ among the conditions. However, the mean speaker rating was significantly higher (P=.001) when participants were told the narrator of the health message was human (mean 3.65, SD 0.91) compared to the other conditions. Notably, positive attitudes toward AI were highest among those not provided information about the narrator; however, this difference was not statistically significant (mean 3.04, SD 0.90).</p><p><strong>Conclusions: </strong>Results suggest that AI may impact perceptions of the speaker of a video; however, more research is needed to understand if these impacts would occur over time and after multiple exposures to content. Further qualitative research may help explain why potential differences may have occurred in speaker ratings. Public health professionals and researchers should further consider the ethics and cost-effectiveness of using AI for health messaging.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e72987"},"PeriodicalIF":2.0,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12558419/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145380050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JMIR AI
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1