首页 > 最新文献

JMIR AI最新文献

英文 中文
Observer-Independent Assessment of Content Overlap in Mental Health Questionnaires: Large Language Model-Based Study. 心理健康问卷内容重叠的观察者独立评估:基于大型语言模型的研究
IF 2 Pub Date : 2025-12-11 DOI: 10.2196/79868
Annkathrin Böke, Hannah Hacker, Millennia Chakraborty, Luise Baumeister-Lingens, Jasper Vöckel, Julian Koenig, David Hv Vogel, Theresa Katharina Lichtenstein, Kai Vogeley, Lana Kambeitz-Ilankovic, Joseph Kambeitz

Background: Mental disorders are frequently evaluated using questionnaires, which have been developed over the past decades for the assessment of different conditions. Despite the rigorous validation of these tools, high levels of content divergence have been reported for questionnaires measuring the same construct of psychopathology. Previous studies that examined the content overlap required manual symptom labeling, which is observer-dependent and time-consuming.

Objective: In this study, we used large language models (LLMs) to analyze content overlap of mental health questionnaires in an observer-independent way and compare our results with clinical expertise.

Methods: We analyzed questionnaires from a range of mental health conditions, including adult depression (n=7), childhood depression (n=15), clinical high risk for psychosis (CHR-P; n=11), mania (n=7), obsessive-compulsive disorder (n=7), and sleep disorder (n=12). Two different LLM-based approaches were tested. First, we used sentence Bidirectional Encoder Representations from Transformers (sBERT) to derive numerical representations (embeddings) for each questionnaire item, which were then clustered using k-means to group semantically similar symptoms. Second, questionnaire items were prompted to a Generative Pretrained Transformer to identify underlying symptom clusters. Clustering results were compared to a manual categorization by experts using the adjusted rand index. Further, we assessed the content overlap within each diagnostic domain based on LLM-derived clusters.

Results: We observed varying degrees of similarity between expert-based and LLM-based clustering across diagnostic domains. Overall, agreement between experts was higher than between experts and LLMs. Among the 2 LLM approaches, GPT showed greater alignment with expert ratings than sBERT, ranging from weak to strong similarity depending on the diagnostic domain. Using GPT-based clustering of questionnaire items to assess the content overlap within each diagnostic domain revealed a weak (CHR-P: 0.344) to moderate (adult depression: 0.574; childhood depression: 0.433; mania: 0.419; obsessive-compulsive disorder [OCD]: 0.450; sleep disorder: 0.445) content overlap of questionnaires. Compared to the studies that manually investigated content overlap among these scales, the results of this study exhibited variations, though these were not substantial.

Conclusions: These findings demonstrate the feasibility of using LLMs to objectively assess content overlap in diagnostic questionnaires. Notably, the GPT-based approach showed particular promise in aligning with expert-derived symptom structures.

背景:精神障碍经常使用问卷进行评估,在过去的几十年里,人们开发了问卷来评估不同的状况。尽管对这些工具进行了严格的验证,但对于测量相同精神病理学结构的问卷,已经报告了高水平的内容差异。以前的研究检查内容重叠需要手动症状标记,这是观察者依赖和耗时的。目的:在本研究中,我们使用大语言模型(LLMs)以观察者独立的方式分析心理健康问卷的内容重叠,并将我们的结果与临床专业知识进行比较。方法:我们分析了来自一系列心理健康状况的问卷,包括成人抑郁症(n=7)、儿童抑郁症(n=15)、临床精神病高危患者(r - p; n=11)、躁狂(n=7)、强迫症(n=7)和睡眠障碍(n=12)。我们测试了两种不同的基于法学硕士的方法。首先,我们使用来自变压器的句子双向编码器表示(sBERT)来获得每个问卷项目的数值表示(嵌入),然后使用k-means对语义相似的症状进行聚类。其次,问卷项目被提示到一个生成预训练变压器,以确定潜在的症状集群。聚类结果与专家使用调整后的兰德指数进行人工分类进行比较。此外,我们基于llm衍生的聚类评估了每个诊断域中的内容重叠。结果:我们观察到基于专家和基于llm的聚类在诊断领域之间存在不同程度的相似性。总体而言,专家之间的一致性高于专家与法学硕士之间的一致性。在两种LLM方法中,GPT比sBERT更符合专家评级,根据诊断领域的不同,相似度从弱到强不等。采用基于gbp的问卷项目聚类方法评估各诊断域内问卷内容重叠程度,发现问卷内容重叠程度为弱(chrp: 0.344)至中度(成人抑郁:0.574;儿童抑郁:0.433;躁狂症:0.419;强迫症[OCD]: 0.450;睡眠障碍:0.445)。与手工调查这些尺度之间内容重叠的研究相比,本研究的结果显示出变化,尽管这些变化并不实质性。结论:这些发现证明了使用llm客观评估诊断问卷内容重叠的可行性。值得注意的是,基于gpt的方法在与专家衍生的症状结构一致方面显示出特别的希望。
{"title":"Observer-Independent Assessment of Content Overlap in Mental Health Questionnaires: Large Language Model-Based Study.","authors":"Annkathrin Böke, Hannah Hacker, Millennia Chakraborty, Luise Baumeister-Lingens, Jasper Vöckel, Julian Koenig, David Hv Vogel, Theresa Katharina Lichtenstein, Kai Vogeley, Lana Kambeitz-Ilankovic, Joseph Kambeitz","doi":"10.2196/79868","DOIUrl":"10.2196/79868","url":null,"abstract":"<p><strong>Background: </strong>Mental disorders are frequently evaluated using questionnaires, which have been developed over the past decades for the assessment of different conditions. Despite the rigorous validation of these tools, high levels of content divergence have been reported for questionnaires measuring the same construct of psychopathology. Previous studies that examined the content overlap required manual symptom labeling, which is observer-dependent and time-consuming.</p><p><strong>Objective: </strong>In this study, we used large language models (LLMs) to analyze content overlap of mental health questionnaires in an observer-independent way and compare our results with clinical expertise.</p><p><strong>Methods: </strong>We analyzed questionnaires from a range of mental health conditions, including adult depression (n=7), childhood depression (n=15), clinical high risk for psychosis (CHR-P; n=11), mania (n=7), obsessive-compulsive disorder (n=7), and sleep disorder (n=12). Two different LLM-based approaches were tested. First, we used sentence Bidirectional Encoder Representations from Transformers (sBERT) to derive numerical representations (embeddings) for each questionnaire item, which were then clustered using k-means to group semantically similar symptoms. Second, questionnaire items were prompted to a Generative Pretrained Transformer to identify underlying symptom clusters. Clustering results were compared to a manual categorization by experts using the adjusted rand index. Further, we assessed the content overlap within each diagnostic domain based on LLM-derived clusters.</p><p><strong>Results: </strong>We observed varying degrees of similarity between expert-based and LLM-based clustering across diagnostic domains. Overall, agreement between experts was higher than between experts and LLMs. Among the 2 LLM approaches, GPT showed greater alignment with expert ratings than sBERT, ranging from weak to strong similarity depending on the diagnostic domain. Using GPT-based clustering of questionnaire items to assess the content overlap within each diagnostic domain revealed a weak (CHR-P: 0.344) to moderate (adult depression: 0.574; childhood depression: 0.433; mania: 0.419; obsessive-compulsive disorder [OCD]: 0.450; sleep disorder: 0.445) content overlap of questionnaires. Compared to the studies that manually investigated content overlap among these scales, the results of this study exhibited variations, though these were not substantial.</p><p><strong>Conclusions: </strong>These findings demonstrate the feasibility of using LLMs to objectively assess content overlap in diagnostic questionnaires. Notably, the GPT-based approach showed particular promise in aligning with expert-derived symptom structures.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e79868"},"PeriodicalIF":2.0,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12697914/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transparent Reporting of AI in Systematic Literature Reviews: Development of the PRISMA-trAIce Checklist. 系统文献综述中人工智能的透明报告:prism - traice检查表的开发。
IF 2 Pub Date : 2025-12-10 DOI: 10.2196/80247
Dirk Holst, Keno Moenck, Julian Koch, Ole Schmedemann, Thorsten Schüppstuhl

Background: Systematic literature reviews (SLRs) build the foundation for evidence synthesis, but they are exceptionally demanding in terms of time and resources. While recent advances in artificial intelligence (AI), particularly large language models, offer the potential to accelerate this process, their use introduces challenges to transparency and reproducibility. Reporting guidelines such as the PRISMA-AI (Preferred Reporting Items for Systematic Reviews and Meta-Analyses-Artificial Intelligence Extension) primarily focus on AI as a subject of research, not as a tool in the review process itself.

Objective: To address the gap in reporting standards, this study aimed to develop and propose a discipline-agnostic checklist extension to the PRISMA 2020 statement. The goal was to ensure transparent reporting when AI is used as a methodological tool in evidence synthesis, fostering trust in the next generation of SLRs.

Methods: The proposed checklist, named PRISMA-trAIce (PRISMA-Transparent Reporting of Artificial Intelligence in Comprehensive Evidence Synthesis), was developed through a systematic process. We conducted a literature search to identify established, consensus-based AI reporting guidelines (eg, CONSORT-AI [Consolidated Standards of Reporting Trials-Artificial Intelligence] and TRIPOD-AI [Transparent Reporting of a Multivariable Prediction Model of Individual Prognosis or Diagnosis-Artificial Intelligence]). Relevant items from these frameworks were extracted, analyzed, and thematically synthesized to form a modular checklist that integrated with the PRISMA 2020 structure.

Results: The primary result of this work is the PRISMA-trAIce checklist, a comprehensive set of reporting items designed to document the use of AI in SLRs. The checklist covers the entire structure of an SLR, from title and abstract to methods and discussion, and includes specific items for identifying AI tools, describing human-AI interaction, reporting performance evaluation, and discussing limitations.

Conclusions: PRISMA-trAIce establishes an important framework to improve the transparency and methodological integrity of AI-assisted systematic reviews, enhancing the trust required for the responsible application of AI-assisted systematic reviews in evidence synthesis. We present this work as a foundational proposal, explicitly inviting the scientific community to join an open science process of consensus building. Through this collaborative refinement, we aim to evolve PRISMA-trAIce into a formally endorsed guideline, thereby ensuring the collective validation and scientific rigor of future AI-driven research.

背景:系统文献综述(slr)为证据综合奠定了基础,但在时间和资源方面要求特别高。虽然人工智能(AI)的最新进展,特别是大型语言模型,提供了加速这一过程的潜力,但它们的使用给透明度和可重复性带来了挑战。报告指南,如PRISMA-AI(系统评论和元分析的首选报告项目-人工智能扩展)主要关注人工智能作为研究主题,而不是作为审查过程本身的工具。目的:为了解决报告标准的差距,本研究旨在开发并提出一个与PRISMA 2020声明无关的学科清单扩展。目标是在人工智能被用作证据合成的方法论工具时,确保报告的透明度,培养对下一代单反相机的信任。方法:通过系统流程制定了拟议的清单PRISMA-trAIce (PRISMA-Transparent Reporting of Artificial Intelligence in Comprehensive Evidence Synthesis)。我们进行了文献检索,以确定已建立的、基于共识的人工智能报告指南(例如,consortium -AI[综合报告试验标准-人工智能]和TRIPOD-AI[透明报告个体预后或诊断的多变量预测模型-人工智能])。从这些框架中提取相关项目,进行分析,并按主题进行综合,形成与PRISMA 2020结构集成的模块化清单。结果:这项工作的主要成果是PRISMA-trAIce清单,这是一套全面的报告项目,旨在记录人工智能在单反相机中的使用。清单涵盖了单反的整个结构,从标题和摘要到方法和讨论,并包括识别人工智能工具、描述人类与人工智能交互、报告性能评估和讨论局限性的具体项目。结论:prism - traice建立了一个重要的框架,以提高人工智能辅助系统评价的透明度和方法完整性,增强了在证据合成中负责任地应用人工智能辅助系统评价所需的信任。我们将这项工作作为一个基础建议,明确邀请科学界加入一个建立共识的开放科学进程。通过这种协作改进,我们的目标是将PRISMA-trAIce发展成为正式认可的指南,从而确保未来人工智能驱动研究的集体验证和科学严谨性。
{"title":"Transparent Reporting of AI in Systematic Literature Reviews: Development of the PRISMA-trAIce Checklist.","authors":"Dirk Holst, Keno Moenck, Julian Koch, Ole Schmedemann, Thorsten Schüppstuhl","doi":"10.2196/80247","DOIUrl":"10.2196/80247","url":null,"abstract":"<p><strong>Background: </strong>Systematic literature reviews (SLRs) build the foundation for evidence synthesis, but they are exceptionally demanding in terms of time and resources. While recent advances in artificial intelligence (AI), particularly large language models, offer the potential to accelerate this process, their use introduces challenges to transparency and reproducibility. Reporting guidelines such as the PRISMA-AI (Preferred Reporting Items for Systematic Reviews and Meta-Analyses-Artificial Intelligence Extension) primarily focus on AI as a subject of research, not as a tool in the review process itself.</p><p><strong>Objective: </strong>To address the gap in reporting standards, this study aimed to develop and propose a discipline-agnostic checklist extension to the PRISMA 2020 statement. The goal was to ensure transparent reporting when AI is used as a methodological tool in evidence synthesis, fostering trust in the next generation of SLRs.</p><p><strong>Methods: </strong>The proposed checklist, named PRISMA-trAIce (PRISMA-Transparent Reporting of Artificial Intelligence in Comprehensive Evidence Synthesis), was developed through a systematic process. We conducted a literature search to identify established, consensus-based AI reporting guidelines (eg, CONSORT-AI [Consolidated Standards of Reporting Trials-Artificial Intelligence] and TRIPOD-AI [Transparent Reporting of a Multivariable Prediction Model of Individual Prognosis or Diagnosis-Artificial Intelligence]). Relevant items from these frameworks were extracted, analyzed, and thematically synthesized to form a modular checklist that integrated with the PRISMA 2020 structure.</p><p><strong>Results: </strong>The primary result of this work is the PRISMA-trAIce checklist, a comprehensive set of reporting items designed to document the use of AI in SLRs. The checklist covers the entire structure of an SLR, from title and abstract to methods and discussion, and includes specific items for identifying AI tools, describing human-AI interaction, reporting performance evaluation, and discussing limitations.</p><p><strong>Conclusions: </strong>PRISMA-trAIce establishes an important framework to improve the transparency and methodological integrity of AI-assisted systematic reviews, enhancing the trust required for the responsible application of AI-assisted systematic reviews in evidence synthesis. We present this work as a foundational proposal, explicitly inviting the scientific community to join an open science process of consensus building. Through this collaborative refinement, we aim to evolve PRISMA-trAIce into a formally endorsed guideline, thereby ensuring the collective validation and scientific rigor of future AI-driven research.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e80247"},"PeriodicalIF":2.0,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12694947/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Spinal Cord Injury Prognosis Using Machine Learning: Systematic Review and Meta-Analysis. 使用机器学习预测脊髓损伤预后:系统回顾和荟萃分析。
IF 2 Pub Date : 2025-12-05 DOI: 10.2196/66233
Linxing Zhong, Qiying Huang, Hao Zhang, Liang Xue, Yehuang Chen, Jianwu Wu, Liangfeng Wei

Background: Spinal cord injury (SCI) is complicated and varied conditions that receive a lot of attention. However, the prognosis of patients with SCI is increasingly being predicted using machine learning (ML) techniques.

Objective: This study aims to evaluate the efficacy and caliber of ML models in forecasting the consequences of SCI.

Methods: Literature searches were conducted in PubMed, Web of Science, Embase, PROSPERO, Scopus, Cochrane Library, China National Knowledge Infrastructure, China Biomedical Literature Service System, and Wanfang databases. Meta-analysis of the area under the receiver operating characteristic curve of ML models was performed to comprehensively evaluate their performance.

Results: A total of 1254 articles were retrieved, and 13 eligible studies were included. Predictive outcomes included spinal cord function prognosis, postoperative complications, independent living ability, and walking ability. For spinal cord function prognosis, the area under the curve (AUC) of the random forest algorithm was 0.832, the AUC of the logistic regression algorithm was 0.813 (95% CI 0.805-0.883), the AUC of the decision tree algorithm was 0.747 (95% CI 0.677-0.802), and the AUC of the XGBoost (extreme gradient boosting) algorithm was 0.867. For postoperative complications, the AUC of the random forest algorithm was 0.627 (95% CI 0.441-0.812), the AUC of the logistic regression algorithm was 0.747 (95% CI 0.597-0.896), and the AUC of the decision tree algorithm was 0.688. For independent living ability, the AUC of the classification and regression tree model was 0.813. For walking ability, the model based on the vector machine algorithm was the most effective, with an AUC of 0.780.

Conclusions: The ML models predict SCI outcomes with relative accuracy, particularly in spinal cord function prognosis. They are expected to become important tools for clinicians in assessing the prognosis of patients with SCI, with the XGBoost algorithm showing the best performance. Prediction models should continue to advance as large data are used and ML algorithms develop.

背景:脊髓损伤(SCI)是一种复杂多变的疾病,受到了广泛的关注。然而,越来越多地使用机器学习(ML)技术来预测脊髓损伤患者的预后。目的:本研究旨在评价ML模型预测脊髓损伤后果的有效性和水平。方法:检索PubMed、Web of Science、Embase、PROSPERO、Scopus、Cochrane图书馆、中国国家知识基础设施、中国生物医学文献服务系统、万方数据库。对ML模型的受试者工作特征曲线下面积进行meta分析,综合评价ML模型的性能。结果:共检索到1254篇文献,纳入13项符合条件的研究。预测结果包括脊髓功能预后、术后并发症、独立生活能力和行走能力。对于脊髓功能预后,随机森林算法的曲线下面积(AUC)为0.832,logistic回归算法的AUC为0.813 (95% CI 0.805 ~ 0.883),决策树算法的AUC为0.747 (95% CI 0.677 ~ 0.802), XGBoost(极端梯度增强)算法的AUC为0.867。对于术后并发症,随机森林算法的AUC为0.627 (95% CI 0.441 ~ 0.812), logistic回归算法的AUC为0.747 (95% CI 0.597 ~ 0.896),决策树算法的AUC为0.688。对于独立生活能力,分类回归树模型AUC为0.813。对于步行能力,基于向量机算法的模型最有效,AUC为0.780。结论:ML模型预测脊髓损伤的预后相对准确,特别是在脊髓功能预后方面。它们有望成为临床医生评估脊髓损伤患者预后的重要工具,其中XGBoost算法表现出最好的性能。随着大数据的使用和机器学习算法的发展,预测模型应该继续发展。
{"title":"Predicting Spinal Cord Injury Prognosis Using Machine Learning: Systematic Review and Meta-Analysis.","authors":"Linxing Zhong, Qiying Huang, Hao Zhang, Liang Xue, Yehuang Chen, Jianwu Wu, Liangfeng Wei","doi":"10.2196/66233","DOIUrl":"10.2196/66233","url":null,"abstract":"<p><strong>Background: </strong>Spinal cord injury (SCI) is complicated and varied conditions that receive a lot of attention. However, the prognosis of patients with SCI is increasingly being predicted using machine learning (ML) techniques.</p><p><strong>Objective: </strong>This study aims to evaluate the efficacy and caliber of ML models in forecasting the consequences of SCI.</p><p><strong>Methods: </strong>Literature searches were conducted in PubMed, Web of Science, Embase, PROSPERO, Scopus, Cochrane Library, China National Knowledge Infrastructure, China Biomedical Literature Service System, and Wanfang databases. Meta-analysis of the area under the receiver operating characteristic curve of ML models was performed to comprehensively evaluate their performance.</p><p><strong>Results: </strong>A total of 1254 articles were retrieved, and 13 eligible studies were included. Predictive outcomes included spinal cord function prognosis, postoperative complications, independent living ability, and walking ability. For spinal cord function prognosis, the area under the curve (AUC) of the random forest algorithm was 0.832, the AUC of the logistic regression algorithm was 0.813 (95% CI 0.805-0.883), the AUC of the decision tree algorithm was 0.747 (95% CI 0.677-0.802), and the AUC of the XGBoost (extreme gradient boosting) algorithm was 0.867. For postoperative complications, the AUC of the random forest algorithm was 0.627 (95% CI 0.441-0.812), the AUC of the logistic regression algorithm was 0.747 (95% CI 0.597-0.896), and the AUC of the decision tree algorithm was 0.688. For independent living ability, the AUC of the classification and regression tree model was 0.813. For walking ability, the model based on the vector machine algorithm was the most effective, with an AUC of 0.780.</p><p><strong>Conclusions: </strong>The ML models predict SCI outcomes with relative accuracy, particularly in spinal cord function prognosis. They are expected to become important tools for clinicians in assessing the prognosis of patients with SCI, with the XGBoost algorithm showing the best performance. Prediction models should continue to advance as large data are used and ML algorithms develop.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e66233"},"PeriodicalIF":2.0,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12680090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of Deep Learning-Based Multimodal Data Fusion for the Diagnosis of Skin Neglected Tropical Diseases: Systematic Review. 基于深度学习的多模态数据融合在皮肤被忽视热带病诊断中的应用:系统综述。
IF 2 Pub Date : 2025-12-04 DOI: 10.2196/67584
G Yohannes Minyilu, Mohammed Abebe Yimer, Million Meshesha

Background: Neglected tropical diseases (NTDs) are the most prevalent diseases and comprise 21 different conditions. One-half of these conditions have skin manifestations, known as skin NTDs. The diagnosis of skin NTDs incorporates visual examination of patients, and deep learning (DL)-based diagnostic tools can be used to assist the diagnostic procedures. The use of advanced DL-based methods, including multimodal data fusion (MMDF) functionality, could be a potential approach to enhance the diagnostic procedures of these diseases. However, little has been done toward the application of such tools, as confirmed by the very few studies currently available that implemented MMDF for skin NTDs.

Objective: This article presents a systematic review regarding the use of DL-based MMDF methods for the diagnosis of skin NTDs and related diseases (non-NTD skin diseases), including the ethical risks and potential risk of bias.

Methods: The review was conducted based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) method using 6 parameters (research approach followed, disease[s] diagnosed, dataset[s] used, algorithm[s] applied, performance achieved, and future direction[s]).

Results: Initially, 437 articles were collected from 5 major groups of identified sources; 14 articles were selected for the final analysis. Results revealed that, compared with traditional methods, the MMDF methods improved model performances for the diagnoses of skin NTDs and non-NTD skin diseases. Algorithmically, convolutional neural network (CNN)-based models were the predominantly used DL architectures (9/14 studies, 64% ), providing feature extraction, feature fusion, and disease classification, which were also conducted with transformer-based methods (1/14, 7%). Furthermore, recurrent neural networks were used in combination with CNN-based feature extractors to fuse multimodal data (1/14, 7%) and with generative models (1/14, 7%). The remaining studies used study-specific algorithms using transformers (1/14, 7%) and generative models (1/14, 7%).

Conclusions: Finally, this article suggests that further studies should be conducted about using DL-based MMDF methods for skin NTDs, considering model efficiency, data scarcity, algorithm selection and use, fusion strategies of multiple modalities, and the possible adoption of such tools for resource-constrained areas.

背景:被忽视的热带病(NTDs)是最流行的疾病,包括21种不同的疾病。这些疾病中有一半有皮肤表现,称为皮肤被忽视热带病。皮肤ntd的诊断包括对患者的视觉检查,基于深度学习(DL)的诊断工具可用于辅助诊断程序。使用先进的基于dl的方法,包括多模态数据融合(MMDF)功能,可能是一种增强这些疾病诊断程序的潜在方法。然而,这类工具的应用很少,目前很少有研究证实,MMDF用于皮肤ntd。目的:本文系统综述了基于dl的MMDF方法在皮肤ntd及相关疾病(非ntd皮肤病)诊断中的应用,包括伦理风险和潜在偏倚风险。方法:采用PRISMA (Preferred Reporting Items for Systematic Reviews and meta - analysis)方法,采用6个参数(采用的研究方法、诊断的疾病、使用的数据集、应用的算法、取得的绩效和未来发展方向)进行综述。结果:最初,从5大类已确定的来源中收集了437篇文章;最终选择了14篇文章进行分析。结果表明,与传统方法相比,MMDF方法提高了皮肤ntd和非ntd皮肤病的模型诊断性能。在算法上,基于卷积神经网络(CNN)的模型是主要使用的深度学习架构(9/14,64%),提供特征提取、特征融合和疾病分类,这些也使用基于变压器的方法进行(1/14,7%)。此外,将递归神经网络与基于cnn的特征提取器结合使用,融合多模态数据(1/ 14,7%)和生成模型(1/ 14,7%)。其余的研究使用特定的研究算法,使用变压器(1/ 14,7%)和生成模型(1/ 14,7%)。结论:最后,本文建议进一步研究基于dl的MMDF方法在皮肤ntd中的应用,考虑模型效率、数据稀缺性、算法选择和使用、多模式融合策略以及资源受限地区可能采用此类工具。
{"title":"Application of Deep Learning-Based Multimodal Data Fusion for the Diagnosis of Skin Neglected Tropical Diseases: Systematic Review.","authors":"G Yohannes Minyilu, Mohammed Abebe Yimer, Million Meshesha","doi":"10.2196/67584","DOIUrl":"10.2196/67584","url":null,"abstract":"<p><strong>Background: </strong>Neglected tropical diseases (NTDs) are the most prevalent diseases and comprise 21 different conditions. One-half of these conditions have skin manifestations, known as skin NTDs. The diagnosis of skin NTDs incorporates visual examination of patients, and deep learning (DL)-based diagnostic tools can be used to assist the diagnostic procedures. The use of advanced DL-based methods, including multimodal data fusion (MMDF) functionality, could be a potential approach to enhance the diagnostic procedures of these diseases. However, little has been done toward the application of such tools, as confirmed by the very few studies currently available that implemented MMDF for skin NTDs.</p><p><strong>Objective: </strong>This article presents a systematic review regarding the use of DL-based MMDF methods for the diagnosis of skin NTDs and related diseases (non-NTD skin diseases), including the ethical risks and potential risk of bias.</p><p><strong>Methods: </strong>The review was conducted based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) method using 6 parameters (research approach followed, disease[s] diagnosed, dataset[s] used, algorithm[s] applied, performance achieved, and future direction[s]).</p><p><strong>Results: </strong>Initially, 437 articles were collected from 5 major groups of identified sources; 14 articles were selected for the final analysis. Results revealed that, compared with traditional methods, the MMDF methods improved model performances for the diagnoses of skin NTDs and non-NTD skin diseases. Algorithmically, convolutional neural network (CNN)-based models were the predominantly used DL architectures (9/14 studies, 64% ), providing feature extraction, feature fusion, and disease classification, which were also conducted with transformer-based methods (1/14, 7%). Furthermore, recurrent neural networks were used in combination with CNN-based feature extractors to fuse multimodal data (1/14, 7%) and with generative models (1/14, 7%). The remaining studies used study-specific algorithms using transformers (1/14, 7%) and generative models (1/14, 7%).</p><p><strong>Conclusions: </strong>Finally, this article suggests that further studies should be conducted about using DL-based MMDF methods for skin NTDs, considering model efficiency, data scarcity, algorithm selection and use, fusion strategies of multiple modalities, and the possible adoption of such tools for resource-constrained areas.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e67584"},"PeriodicalIF":2.0,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715462/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145679802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking AI Workflows: Guidelines for Scientific Evaluation in Digital Health Companies. 重新思考人工智能工作流程:数字医疗公司科学评估指南。
IF 2 Pub Date : 2025-12-04 DOI: 10.2196/71798
Kelsey Lynn McAlister, Lee Gonzales, Jennifer Huberty

Unlabelled: Artificial intelligence (AI) is revolutionizing digital health, driving innovation in care delivery and operational efficiency. Despite its potential, many AI systems fail to meet real-world expectations due to limited evaluation practices that focus narrowly on short-term metrics like efficiency and technical accuracy. Ignoring factors such as usability, trust, transparency, and adaptability hinders AI adoption, scalability, and long-term impact in health care. This paper emphasizes the importance of embedding scientific evaluation as a core operational layer throughout the AI life cycle. We outline practical guidelines for digital health companies to improve AI integration and evaluation, informed by over 35 years of experience in science, the digital health industry, and AI development. It describes a multistep approach, including stakeholder analysis, real-time monitoring, and iterative improvement, that digital health companies can adopt to ensure robust AI integration. Key recommendations include assessing stakeholder needs, designing AI systems that can check its own work, conducting testing to address usability and biases, and ensuring continuous improvement to keep systems user-centered and adaptable. By integrating these guidelines, digital health companies can improve AI reliability, scalability, and trustworthiness, driving better health care delivery and stakeholder alignment.

未标注:人工智能(AI)正在彻底改变数字健康,推动医疗服务和运营效率的创新。尽管具有潜力,但由于有限的评估实践只关注效率和技术准确性等短期指标,许多人工智能系统未能满足现实世界的期望。忽视可用性、信任、透明度和适应性等因素阻碍了人工智能的采用、可扩展性和对医疗保健的长期影响。本文强调了在整个人工智能生命周期中嵌入科学评估作为核心操作层的重要性。我们根据在科学、数字健康产业和人工智能发展方面超过35年的经验,为数字健康公司提供了改进人工智能整合和评估的实用指南。它描述了一种多步骤方法,包括利益相关者分析、实时监测和迭代改进,数字医疗公司可以采用这种方法来确保强大的人工智能集成。主要建议包括评估利益相关者的需求,设计可以检查自己工作的人工智能系统,进行测试以解决可用性和偏见,并确保持续改进以保持系统以用户为中心和适应性。通过整合这些指导方针,数字医疗公司可以提高人工智能的可靠性、可扩展性和可信度,从而推动更好的医疗服务提供和利益相关者的协调。
{"title":"Rethinking AI Workflows: Guidelines for Scientific Evaluation in Digital Health Companies.","authors":"Kelsey Lynn McAlister, Lee Gonzales, Jennifer Huberty","doi":"10.2196/71798","DOIUrl":"10.2196/71798","url":null,"abstract":"<p><strong>Unlabelled: </strong>Artificial intelligence (AI) is revolutionizing digital health, driving innovation in care delivery and operational efficiency. Despite its potential, many AI systems fail to meet real-world expectations due to limited evaluation practices that focus narrowly on short-term metrics like efficiency and technical accuracy. Ignoring factors such as usability, trust, transparency, and adaptability hinders AI adoption, scalability, and long-term impact in health care. This paper emphasizes the importance of embedding scientific evaluation as a core operational layer throughout the AI life cycle. We outline practical guidelines for digital health companies to improve AI integration and evaluation, informed by over 35 years of experience in science, the digital health industry, and AI development. It describes a multistep approach, including stakeholder analysis, real-time monitoring, and iterative improvement, that digital health companies can adopt to ensure robust AI integration. Key recommendations include assessing stakeholder needs, designing AI systems that can check its own work, conducting testing to address usability and biases, and ensuring continuous improvement to keep systems user-centered and adaptable. By integrating these guidelines, digital health companies can improve AI reliability, scalability, and trustworthiness, driving better health care delivery and stakeholder alignment.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e71798"},"PeriodicalIF":2.0,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12677877/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145679797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Perceived Roles of AI in Clinical Practice: National Survey of 941 Academic Physicians. 人工智能在临床实践中的感知作用:全国941名学术医生的调查。
IF 2 Pub Date : 2025-12-04 DOI: 10.2196/72535
Anshul Ratnaparkhi, Simon Moore, Abhinav Suri, Bayard Wilson, Jacob Alderete, T J Florence, David Zarrin, David Berin, Rami Abuqubo, Kirstin Cook, Matiar Jafari, Joseph S Bell, Luke Macyszyn, Andrew C Vivas, Joel Beckett

Background: Artificial intelligence (AI) and machine learning models are frequently developed in medical research to optimize patient care, yet they remain rarely used in clinical practice.

Objective: This study aims to understand the disconnect between model development and implementation by surveying physicians of all specialties across the United States.

Methods: The present survey was distributed to residency coordinators at Accreditation Council for Graduate Medical Education-accredited residency programs to disseminate among attending physicians and resident physicians affiliated with their academic institution. Respondents were asked to identify and quantify the extent of their training and specialization, as well as the type and location of their practice. Physicians were then asked follow-up questions regarding AI in their practice, including whether its use is permitted, whether they would use it if made available, primary reasons for using or not using AI, elements that would encourage its use, and ethical concerns.

Results: Of the 941 physicians who responded to the survey, 384 (40.8%) were attending physicians and 557 (59.2%) were resident physicians. The majority of the physicians (651/795, 81.9%) indicated that they would adopt AI in clinical practice if given the opportunity. The most cited intended uses for AI were risk stratification, image analysis or segmentation, and disease prognosis. The most common reservations were concerns about clinical errors made by AI and the potential to replicate human biases.

Conclusions: To date, this study comprises the largest and most diverse dataset of physician perspectives on AI. Our results emphasize that most academic physicians in the United States are open to adopting AI in their clinical practice. However, for AI to become clinically relevant, developers and physicians must work synergistically to design models that are accurate, accessible, and intuitive while thoroughly addressing ethical concerns associated with the implementation of AI in medicine.

背景:人工智能(AI)和机器学习模型经常在医学研究中开发,以优化患者护理,但它们在临床实践中很少使用。目的:本研究旨在通过调查美国所有专业的医生来了解模型开发和实施之间的脱节。方法:本调查分发给研究生医学教育认证委员会认可的住院医师项目的住院医师协调员,以在其所属学术机构的主治医师和住院医师中传播。受访者被要求确定和量化他们的培训和专业化程度,以及他们实践的类型和地点。然后向医生询问有关其实践中人工智能的后续问题,包括是否允许使用人工智能,如果有的话他们是否会使用人工智能,使用或不使用人工智能的主要原因,鼓励使用人工智能的因素,以及道德问题。结果:941名受访医师中,主治医师384名(40.8%),住院医师557名(59.2%)。大多数医生(651/795,81.9%)表示,如果有机会,他们会在临床实践中采用人工智能。人工智能被引用最多的预期用途是风险分层、图像分析或分割以及疾病预后。最常见的保留意见是对人工智能造成的临床错误以及复制人类偏见的可能性的担忧。结论:迄今为止,这项研究包含了最大和最多样化的医生对人工智能的观点数据集。我们的研究结果强调,美国大多数学术医生对在临床实践中采用人工智能持开放态度。然而,为了使人工智能与临床相关,开发人员和医生必须协同合作,设计准确、可访问和直观的模型,同时彻底解决与人工智能在医学中实施相关的伦理问题。
{"title":"The Perceived Roles of AI in Clinical Practice: National Survey of 941 Academic Physicians.","authors":"Anshul Ratnaparkhi, Simon Moore, Abhinav Suri, Bayard Wilson, Jacob Alderete, T J Florence, David Zarrin, David Berin, Rami Abuqubo, Kirstin Cook, Matiar Jafari, Joseph S Bell, Luke Macyszyn, Andrew C Vivas, Joel Beckett","doi":"10.2196/72535","DOIUrl":"10.2196/72535","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) and machine learning models are frequently developed in medical research to optimize patient care, yet they remain rarely used in clinical practice.</p><p><strong>Objective: </strong>This study aims to understand the disconnect between model development and implementation by surveying physicians of all specialties across the United States.</p><p><strong>Methods: </strong>The present survey was distributed to residency coordinators at Accreditation Council for Graduate Medical Education-accredited residency programs to disseminate among attending physicians and resident physicians affiliated with their academic institution. Respondents were asked to identify and quantify the extent of their training and specialization, as well as the type and location of their practice. Physicians were then asked follow-up questions regarding AI in their practice, including whether its use is permitted, whether they would use it if made available, primary reasons for using or not using AI, elements that would encourage its use, and ethical concerns.</p><p><strong>Results: </strong>Of the 941 physicians who responded to the survey, 384 (40.8%) were attending physicians and 557 (59.2%) were resident physicians. The majority of the physicians (651/795, 81.9%) indicated that they would adopt AI in clinical practice if given the opportunity. The most cited intended uses for AI were risk stratification, image analysis or segmentation, and disease prognosis. The most common reservations were concerns about clinical errors made by AI and the potential to replicate human biases.</p><p><strong>Conclusions: </strong>To date, this study comprises the largest and most diverse dataset of physician perspectives on AI. Our results emphasize that most academic physicians in the United States are open to adopting AI in their clinical practice. However, for AI to become clinically relevant, developers and physicians must work synergistically to design models that are accurate, accessible, and intuitive while thoroughly addressing ethical concerns associated with the implementation of AI in medicine.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e72535"},"PeriodicalIF":2.0,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715463/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145679823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical Large Language Model Evaluation by Expert Review (CLEVER): Framework Development and Validation. 临床大语言模型专家评审评估(CLEVER):框架开发与验证。
IF 2 Pub Date : 2025-12-04 DOI: 10.2196/72153
Veysel Kocaman, Mustafa Aytuğ Kaya, Andrei Marian Feier, David Talby

Background: The proliferation of both general purpose and health care-specific large language models (LLMs) has intensified the challenge of effectively evaluating and comparing them. Data contamination plagues the validity of public benchmarks, self-preference distorts LLM-as-a-judge approaches, and there is a gap between the tasks used to test models and those used in clinical practice.

Objective: In response, we propose CLEVER (Clinical Large Language Model Evaluation-Expert Review), a methodology for blind, randomized, preference-based evaluation by practicing medical doctors on specific tasks.

Methods: We demonstrate the methodology by comparing GPT-4o (OpenAI) against 2 health care-specific LLMs, with 8 billion and 70 billion parameters, over 3 tasks: clinical text summarization, clinical information extraction, and question answering on biomedical research.

Results: Medical doctors prefer the medical model-small LLM trained by John Snow Labs over GPT-4o 45% to 92% more often on the dimensions of factuality, clinical relevance, and conciseness.

Conclusions: The models show comparable performance on open-ended medical question answering, suggesting that health care-specific LLMs can outperform much larger general purpose LLMs in tasks that require understanding of clinical context. We test the validity of CLEVER evaluations by conducting interannotator agreement, interclass correlation, and washout period analysis.

背景:通用和医疗保健特定大语言模型(llm)的扩散加剧了有效评估和比较它们的挑战。数据污染困扰着公共基准的有效性,自我偏好扭曲了法学硕士作为法官的方法,用于测试模型的任务与临床实践中使用的任务之间存在差距。为此,我们提出了一种由执业医生对特定任务进行盲、随机、基于偏好的评估方法CLEVER (Clinical Large Language Model evaluation - expert Review)。方法:我们通过比较gpt - 40 (OpenAI)与2个医疗保健特定llm(分别有80亿个和700亿个参数),通过3个任务(临床文本摘要,临床信息提取和生物医学研究问答)来演示方法。结果:在真实性、临床相关性和简洁性方面,医生对John Snow实验室培训的医学模型小型LLM的偏好比gpt - 40高45%至92%。结论:这些模型在开放式医学问题回答上显示出相当的性能,这表明在需要理解临床背景的任务中,医疗保健特定llm可以优于大型通用llm。我们通过进行注释者间的一致性、类间的相关性和洗脱期分析来测试CLEVER评估的有效性。
{"title":"Clinical Large Language Model Evaluation by Expert Review (CLEVER): Framework Development and Validation.","authors":"Veysel Kocaman, Mustafa Aytuğ Kaya, Andrei Marian Feier, David Talby","doi":"10.2196/72153","DOIUrl":"10.2196/72153","url":null,"abstract":"<p><strong>Background: </strong>The proliferation of both general purpose and health care-specific large language models (LLMs) has intensified the challenge of effectively evaluating and comparing them. Data contamination plagues the validity of public benchmarks, self-preference distorts LLM-as-a-judge approaches, and there is a gap between the tasks used to test models and those used in clinical practice.</p><p><strong>Objective: </strong>In response, we propose CLEVER (Clinical Large Language Model Evaluation-Expert Review), a methodology for blind, randomized, preference-based evaluation by practicing medical doctors on specific tasks.</p><p><strong>Methods: </strong>We demonstrate the methodology by comparing GPT-4o (OpenAI) against 2 health care-specific LLMs, with 8 billion and 70 billion parameters, over 3 tasks: clinical text summarization, clinical information extraction, and question answering on biomedical research.</p><p><strong>Results: </strong>Medical doctors prefer the medical model-small LLM trained by John Snow Labs over GPT-4o 45% to 92% more often on the dimensions of factuality, clinical relevance, and conciseness.</p><p><strong>Conclusions: </strong>The models show comparable performance on open-ended medical question answering, suggesting that health care-specific LLMs can outperform much larger general purpose LLMs in tasks that require understanding of clinical context. We test the validity of CLEVER evaluations by conducting interannotator agreement, interclass correlation, and washout period analysis.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e72153"},"PeriodicalIF":2.0,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12677871/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145679747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning-Enhanced Quantitative Structure-Activity Relationship Modeling for DNA Polymerase Inhibitor Discovery: Algorithm Development and Validation. DNA聚合酶抑制剂发现的机器学习增强定量结构-活性关系建模:算法开发和验证。
IF 2 Pub Date : 2025-12-03 DOI: 10.2196/77890
Samuel Kakraba, Srinivas Ayyadevara, Aayire Yadem Clement, Kuukua Egyinba Abraham, Cesar M Compadre, Robert J Shmookler Reis

Background: Cisplatin resistance remains a significant obstacle in cancer therapy, frequently driven by translesion DNA synthesis mechanisms that use specialized polymerases such as human DNA polymerase η (hpol η). Although small-molecule inhibitors such as PNR-7-02 have demonstrated potential in disrupting hpol η activity, current compounds often lack sufficient potency and specificity to effectively combat chemoresistance. The vastness of chemical space further limits traditional drug discovery approaches, underscoring the need for advanced computational strategies such as machine learning (ML)-enhanced quantitative structure-activity relationship (QSAR) modeling.

Objective: This study aimed to develop and validate ML-augmented QSAR models to accurately predict hpol η inhibition by indole thio-barbituric acid analogs, with the goal of accelerating the discovery of potent and selective inhibitors that could overcome cisplatin resistance.

Methods: A curated library of 85 indole thio-barbituric acid analogs with validated hpol η inhibition data was used, excluding outliers to ensure data integrity. Molecular descriptors spanning 1D to 4D were computed in MAESTRO, resulting in 220 features. In total, 17 ML algorithms, including random forest, extreme gradient boosting (XGBoost), and neural networks, were trained using 80% of the data for training and evaluated with 14 performance metrics. Robustness was ensured through hyperparameter optimization and 5-fold cross-validation.

Results: Ensemble methods outperformed other algorithms, with random forest achieving near-perfect predictive performance (training mean square error=0.0002; R²=0.9999 and testing mean square error=0.0003; R²=0.9998). Shapley additive explanations analysis revealed that electronic properties, lipophilicity, and topological atomic distances were the most important predictors of hpol η inhibition. Linear models exhibited higher error rates, highlighting the nonlinear relationship between molecular descriptors and inhibitory activity.

Conclusions: Integrating ML with QSAR modeling provides a robust framework for optimizing hpol η inhibition, offering both high predictive accuracy and biochemical interpretability. This approach accelerates the identification of potent selective inhibitors and represents a promising strategy for overcoming cisplatin resistance, thereby advancing precision oncology.

背景:顺铂耐药仍然是癌症治疗中的一个重要障碍,通常是由翻译DNA合成机制驱动的,该机制使用特殊的聚合酶,如人类DNA聚合酶η (hpol η)。尽管小分子抑制剂如PNR-7-02已被证明具有破坏hpol η活性的潜力,但目前的化合物往往缺乏足够的效力和特异性来有效对抗化学耐药。化学空间的巨大进一步限制了传统的药物发现方法,强调了对先进计算策略的需求,如机器学习(ML)增强的定量结构-活性关系(QSAR)建模。目的:本研究旨在建立并验证ml增强型QSAR模型,以准确预测吲哚硫代巴比妥酸类似物对hpol η的抑制作用,以期加速发现有效的选择性抑制剂,克服顺铂耐药。方法:筛选85个具有有效hpol η抑制数据的吲哚硫代巴比妥酸类似物,排除异常值以确保数据完整性。在MAESTRO中计算跨越1D到4D的分子描述符,得到220个特征。总共有17种ML算法,包括随机森林、极端梯度增强(XGBoost)和神经网络,使用80%的训练数据进行训练,并使用14个性能指标进行评估。通过超参数优化和5倍交叉验证确保稳健性。结果:集成方法优于其他算法,随机森林实现了近乎完美的预测性能(训练均方误差=0.0002;R²=0.9999;检验均方误差=0.0003;R²=0.9998)。Shapley加性解释分析表明,电子性质、亲脂性和拓扑原子距离是hpol η抑制的最重要预测因子。线性模型的错误率较高,突出了分子描述符与抑制活性之间的非线性关系。结论:将ML与QSAR模型相结合,为优化hpol η抑制提供了强大的框架,具有较高的预测准确性和生化可解释性。这种方法加速了有效选择性抑制剂的识别,代表了克服顺铂耐药的有希望的策略,从而推进精确肿瘤学。
{"title":"Machine Learning-Enhanced Quantitative Structure-Activity Relationship Modeling for DNA Polymerase Inhibitor Discovery: Algorithm Development and Validation.","authors":"Samuel Kakraba, Srinivas Ayyadevara, Aayire Yadem Clement, Kuukua Egyinba Abraham, Cesar M Compadre, Robert J Shmookler Reis","doi":"10.2196/77890","DOIUrl":"10.2196/77890","url":null,"abstract":"<p><strong>Background: </strong>Cisplatin resistance remains a significant obstacle in cancer therapy, frequently driven by translesion DNA synthesis mechanisms that use specialized polymerases such as human DNA polymerase η (hpol η). Although small-molecule inhibitors such as PNR-7-02 have demonstrated potential in disrupting hpol η activity, current compounds often lack sufficient potency and specificity to effectively combat chemoresistance. The vastness of chemical space further limits traditional drug discovery approaches, underscoring the need for advanced computational strategies such as machine learning (ML)-enhanced quantitative structure-activity relationship (QSAR) modeling.</p><p><strong>Objective: </strong>This study aimed to develop and validate ML-augmented QSAR models to accurately predict hpol η inhibition by indole thio-barbituric acid analogs, with the goal of accelerating the discovery of potent and selective inhibitors that could overcome cisplatin resistance.</p><p><strong>Methods: </strong>A curated library of 85 indole thio-barbituric acid analogs with validated hpol η inhibition data was used, excluding outliers to ensure data integrity. Molecular descriptors spanning 1D to 4D were computed in MAESTRO, resulting in 220 features. In total, 17 ML algorithms, including random forest, extreme gradient boosting (XGBoost), and neural networks, were trained using 80% of the data for training and evaluated with 14 performance metrics. Robustness was ensured through hyperparameter optimization and 5-fold cross-validation.</p><p><strong>Results: </strong>Ensemble methods outperformed other algorithms, with random forest achieving near-perfect predictive performance (training mean square error=0.0002; R²=0.9999 and testing mean square error=0.0003; R²=0.9998). Shapley additive explanations analysis revealed that electronic properties, lipophilicity, and topological atomic distances were the most important predictors of hpol η inhibition. Linear models exhibited higher error rates, highlighting the nonlinear relationship between molecular descriptors and inhibitory activity.</p><p><strong>Conclusions: </strong>Integrating ML with QSAR modeling provides a robust framework for optimizing hpol η inhibition, offering both high predictive accuracy and biochemical interpretability. This approach accelerates the identification of potent selective inhibitors and represents a promising strategy for overcoming cisplatin resistance, thereby advancing precision oncology.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e77890"},"PeriodicalIF":2.0,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12675996/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Primary Care Patients' Perspectives on Artificial Intelligence: Systematic Literature Review and Qualitative Meta-Synthesis. 探索初级保健患者对人工智能的看法:系统文献综述和定性元综合。
IF 2 Pub Date : 2025-11-19 DOI: 10.2196/72211
Alisa Mundzic, Robin Bogdanffy, David Sundemo, Pär-Daniel Sundvall, Jonathan Widén, Peter Nymberg, Carl Wikberg, Anna Moberg, Ronny Gunnarsson, Artin Entezarjou

Background: The introduction of artificial intelligence (AI) in health care holds great promise, offering the potential to alleviate physicians' workloads and allocate more time for patient interactions. After the emergence of large language models (LLMs), interest in AI has surged in the health care sector, including within primary care. However, patients have expressed concerns about the ethical implications and use of AI in primary care. Understanding patients' perspectives on using AI in primary care is crucial for its effective integration. Despite this, few studies have addressed patients' perspectives on using AI in primary care.

Objective: This study aimed to synthesize qualitative research on primary care patients' perspectives regarding the use of AI, including LLMs, in primary care.

Methods: A qualitative systematic review, using thematic analysis, was performed in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases, including PubMed, Scopus, Web of Science, CINAHL, and PsycINFO, were searched from inception to February 5, 2024. Eligible studies (1) used a qualitative interview research design, (2) explored primary care patients' perspectives on the use of AI in primary care, (3) were written in English, and (4) were published in peer-reviewed scientific journals. Quantitative studies, gray literature, surveys, and studies lacking depth in qualitative analysis were excluded. The Critical Appraisal Skills Program (CASP) checklist was used for quality assessment.

Results: Of 1004 studies screened, 6 were included, comprising 170 patients aged 13-91 years from 3 countries. Three themes emerged: "The Relationship with and Actions of AI Systems," "Implementing AI responsibly," and "Training Physicians and Artificial Minds." Patients acknowledged AI's potential benefits but advocated for clinician oversight, safety frameworks, and the preservation of patient autonomy.

Conclusions: This systematic review provides an understanding of patients' perspectives on AI in primary care. We identified heterogeneity in AI definitions across studies. Further research is needed on patients' perspectives across different countries. Notably, our synthesis revealed a significant research gap, as none of the included studies particularly explored patients' perspectives on LLMs, highlighting an important area for future research.

背景:人工智能(AI)在医疗保健领域的引入有着巨大的前景,它有可能减轻医生的工作量,并为患者互动分配更多的时间。在大型语言模型(llm)出现后,医疗保健领域(包括初级保健领域)对人工智能的兴趣激增。然而,患者对人工智能在初级保健中的伦理影响和使用表示担忧。了解患者对在初级保健中使用人工智能的看法对于有效整合至关重要。尽管如此,很少有研究涉及患者对在初级保健中使用人工智能的看法。目的:本研究旨在对初级保健患者对在初级保健中使用人工智能(包括法学硕士)的看法进行定性研究。方法:采用专题分析的定性系统评价,按照PRISMA(系统评价和荟萃分析首选报告项目)指南进行。数据库,包括PubMed, Scopus, Web of Science, CINAHL和PsycINFO,从成立到2024年2月5日进行了检索。符合条件的研究(1)采用定性访谈研究设计,(2)探讨初级保健患者对在初级保健中使用人工智能的看法,(3)用英文撰写,(4)发表在同行评议的科学期刊上。排除定量研究、灰色文献、调查和定性分析缺乏深度的研究。关键评估技能程序(CASP)检查表用于质量评估。结果:在筛选的1004项研究中,纳入了6项,包括来自3个国家的170名年龄在13-91岁之间的患者。出现了三个主题:“与人工智能系统的关系和行动”、“负责任地实施人工智能”和“培训医生和人工智能”。患者承认人工智能的潜在好处,但主张临床医生监督、安全框架和保留患者自主权。结论:本系统综述提供了患者对初级保健中人工智能的看法。我们发现了不同研究中人工智能定义的异质性。需要对不同国家患者的观点进行进一步研究。值得注意的是,我们的综合揭示了一个重大的研究空白,因为纳入的研究中没有一项特别探讨了患者对法学硕士的看法,这突出了未来研究的一个重要领域。
{"title":"Exploring Primary Care Patients' Perspectives on Artificial Intelligence: Systematic Literature Review and Qualitative Meta-Synthesis.","authors":"Alisa Mundzic, Robin Bogdanffy, David Sundemo, Pär-Daniel Sundvall, Jonathan Widén, Peter Nymberg, Carl Wikberg, Anna Moberg, Ronny Gunnarsson, Artin Entezarjou","doi":"10.2196/72211","DOIUrl":"10.2196/72211","url":null,"abstract":"<p><strong>Background: </strong>The introduction of artificial intelligence (AI) in health care holds great promise, offering the potential to alleviate physicians' workloads and allocate more time for patient interactions. After the emergence of large language models (LLMs), interest in AI has surged in the health care sector, including within primary care. However, patients have expressed concerns about the ethical implications and use of AI in primary care. Understanding patients' perspectives on using AI in primary care is crucial for its effective integration. Despite this, few studies have addressed patients' perspectives on using AI in primary care.</p><p><strong>Objective: </strong>This study aimed to synthesize qualitative research on primary care patients' perspectives regarding the use of AI, including LLMs, in primary care.</p><p><strong>Methods: </strong>A qualitative systematic review, using thematic analysis, was performed in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases, including PubMed, Scopus, Web of Science, CINAHL, and PsycINFO, were searched from inception to February 5, 2024. Eligible studies (1) used a qualitative interview research design, (2) explored primary care patients' perspectives on the use of AI in primary care, (3) were written in English, and (4) were published in peer-reviewed scientific journals. Quantitative studies, gray literature, surveys, and studies lacking depth in qualitative analysis were excluded. The Critical Appraisal Skills Program (CASP) checklist was used for quality assessment.</p><p><strong>Results: </strong>Of 1004 studies screened, 6 were included, comprising 170 patients aged 13-91 years from 3 countries. Three themes emerged: \"The Relationship with and Actions of AI Systems,\" \"Implementing AI responsibly,\" and \"Training Physicians and Artificial Minds.\" Patients acknowledged AI's potential benefits but advocated for clinician oversight, safety frameworks, and the preservation of patient autonomy.</p><p><strong>Conclusions: </strong>This systematic review provides an understanding of patients' perspectives on AI in primary care. We identified heterogeneity in AI definitions across studies. Further research is needed on patients' perspectives across different countries. Notably, our synthesis revealed a significant research gap, as none of the included studies particularly explored patients' perspectives on LLMs, highlighting an important area for future research.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e72211"},"PeriodicalIF":2.0,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12629519/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Large Language Models and Machine Learning for Success Analysis in Robust Cancer Crowdfunding Predictions: Quantitative Study. 利用大型语言模型和机器学习在稳健的癌症众筹预测中成功分析:定量研究。
IF 2 Pub Date : 2025-11-19 DOI: 10.2196/73448
Runa Bhaumik, Abhishikta Roy, Vineet Srivastava, Lokesh Boggavarapu, Ranganathan Chandrasekaran, Edward K Mensah, John Galvin

Background: Recent advances in large language models (LLMs) such as GPT-4o offer a transformative opportunity to extract nuanced linguistic, emotional, and social features from medical crowdfunding campaign texts at scale. These models enable a deeper understanding of the factors influencing campaign success far beyond what structured data alone can reveal. Given these advancements, there is a pressing need for an integrated modeling framework that leverages both LLM-derived features and machine learning algorithms to more accurately predict and explain success in medical crowdfunding.

Objective: This study addressed the gap of failure to capture the deeper psychosocial and clinical nuances that influence campaign success. It leveraged cutting-edge machine learning techniques alongside state-of-the-art LLMs such as GPT-4o to automatically generate and extract nuanced linguistic, social, and clinical features from campaign narratives. By combining these features with ensemble learning approaches, the proposed methodology offers a novel and more comprehensive strategy for understanding and predicting crowdfunding success in the medical domain.

Methods: We used GPT-4o to extract linguistic and social determinants of health features from cancer crowdfunding campaign narratives. A random forest model with permutation importance was applied to rank features based on their contribution to predicting campaign success. Four machine learning algorithms-random forest, gradient boosting, logistic regression, and elastic net-were evaluated using stratified 10-fold cross-validation, with performance measured through accuracy, sensitivity, and specificity.

Results: Gradient boosting consistently outperformed the other algorithms in terms of sensitivity (consistently 0.786 to 0.798), indicating its superior ability to identify successful crowdfunding campaigns using linguistic and social determinants of health features. The permutation importance score revealed that for severe medical conditions, income loss, chemotherapy treatment, clear and effective communication, cognitive understanding, family involvement, empathy, and social behaviors play an important role in the success of campaigns.

Conclusions: This study demonstrates that LLMs such as GPT-4o can effectively extract nuanced linguistic and social features from crowdfunding narratives, offering deeper insights than traditional methods. These features, when combined with machine learning, significantly improve the identification of key predictors of campaign success, such as medical severity, financial hardship, and empathetic communication. Our findings underscore the potential of LLMs to enhance predictive modeling in health-related crowdfunding and support more targeted policy and communication strategies to reduce financial vulnerability among patients with cancer.

背景:大型语言模型(llm)的最新进展,如gpt - 40,为大规模地从医疗众筹活动文本中提取细微的语言、情感和社会特征提供了革命性的机会。这些模型使我们能够更深入地了解影响活动成功的因素,而不仅仅是结构化数据所能揭示的。鉴于这些进步,迫切需要一个集成的建模框架,利用法学硕士衍生的特征和机器学习算法,更准确地预测和解释医疗众筹的成功。目的:本研究解决了未能捕捉影响运动成功的更深层次的社会心理和临床细微差别的差距。它利用尖端的机器学习技术和最先进的法学硕士(如gpt - 40),从竞选叙事中自动生成和提取细微的语言、社会和临床特征。通过将这些特征与集成学习方法相结合,所提出的方法为理解和预测医疗领域的众筹成功提供了一种新颖且更全面的策略。方法:我们使用gpt - 40从癌症众筹活动叙事中提取健康特征的语言和社会决定因素。一个具有排列重要性的随机森林模型被应用于基于它们对预测活动成功的贡献的特征排序。四种机器学习算法——随机森林、梯度增强、逻辑回归和弹性网络——使用分层10倍交叉验证进行评估,并通过准确性、灵敏度和特异性来衡量性能。结果:梯度增强在灵敏度方面始终优于其他算法(始终为0.786至0.798),表明其在使用健康特征的语言和社会决定因素识别成功众筹活动方面的卓越能力。排列重要性评分显示,对于严重的医疗状况,收入损失,化疗,清晰有效的沟通,认知理解,家庭参与,移情和社会行为对运动的成功起着重要作用。结论:本研究表明,像gpt - 40这样的法学硕士可以有效地从众筹叙事中提取细微的语言和社会特征,提供比传统方法更深入的见解。这些功能与机器学习相结合,可以显著提高对活动成功关键预测因素的识别,例如医疗严重程度、经济困难和移情沟通。我们的研究结果强调了法学硕士在增强与健康相关的众筹预测建模方面的潜力,并支持更有针对性的政策和沟通策略,以减少癌症患者的财务脆弱性。
{"title":"Leveraging Large Language Models and Machine Learning for Success Analysis in Robust Cancer Crowdfunding Predictions: Quantitative Study.","authors":"Runa Bhaumik, Abhishikta Roy, Vineet Srivastava, Lokesh Boggavarapu, Ranganathan Chandrasekaran, Edward K Mensah, John Galvin","doi":"10.2196/73448","DOIUrl":"10.2196/73448","url":null,"abstract":"<p><strong>Background: </strong>Recent advances in large language models (LLMs) such as GPT-4o offer a transformative opportunity to extract nuanced linguistic, emotional, and social features from medical crowdfunding campaign texts at scale. These models enable a deeper understanding of the factors influencing campaign success far beyond what structured data alone can reveal. Given these advancements, there is a pressing need for an integrated modeling framework that leverages both LLM-derived features and machine learning algorithms to more accurately predict and explain success in medical crowdfunding.</p><p><strong>Objective: </strong>This study addressed the gap of failure to capture the deeper psychosocial and clinical nuances that influence campaign success. It leveraged cutting-edge machine learning techniques alongside state-of-the-art LLMs such as GPT-4o to automatically generate and extract nuanced linguistic, social, and clinical features from campaign narratives. By combining these features with ensemble learning approaches, the proposed methodology offers a novel and more comprehensive strategy for understanding and predicting crowdfunding success in the medical domain.</p><p><strong>Methods: </strong>We used GPT-4o to extract linguistic and social determinants of health features from cancer crowdfunding campaign narratives. A random forest model with permutation importance was applied to rank features based on their contribution to predicting campaign success. Four machine learning algorithms-random forest, gradient boosting, logistic regression, and elastic net-were evaluated using stratified 10-fold cross-validation, with performance measured through accuracy, sensitivity, and specificity.</p><p><strong>Results: </strong>Gradient boosting consistently outperformed the other algorithms in terms of sensitivity (consistently 0.786 to 0.798), indicating its superior ability to identify successful crowdfunding campaigns using linguistic and social determinants of health features. The permutation importance score revealed that for severe medical conditions, income loss, chemotherapy treatment, clear and effective communication, cognitive understanding, family involvement, empathy, and social behaviors play an important role in the success of campaigns.</p><p><strong>Conclusions: </strong>This study demonstrates that LLMs such as GPT-4o can effectively extract nuanced linguistic and social features from crowdfunding narratives, offering deeper insights than traditional methods. These features, when combined with machine learning, significantly improve the identification of key predictors of campaign success, such as medical severity, financial hardship, and empathetic communication. Our findings underscore the potential of LLMs to enhance predictive modeling in health-related crowdfunding and support more targeted policy and communication strategies to reduce financial vulnerability among patients with cancer.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e73448"},"PeriodicalIF":2.0,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12629620/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JMIR AI
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1