首页 > 最新文献

JMIR Medical Informatics最新文献

英文 中文
Detection of Polyphonic Alarm Sounds From Medical Devices Using Frequency-Enhanced Deep Learning: Simulation Study. 使用频率增强深度学习检测医疗设备的复调报警声音:仿真研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-12 DOI: 10.2196/35987
Kazumasa Kishimoto, Tadamasa Takemura, Osamu Sugiyama, Ryosuke Kojima, Masahiro Yakami, Goshiro Yamamoto, Tomohiro Kuroda

Background: Although an increasing number of bedside medical devices are equipped with wireless connections for reliable notifications, many nonnetworked devices remain effective at detecting abnormal patient conditions and alerting medical staff through auditory alarms. Staff members, however, can miss these notifications, especially when in distant areas or other private rooms. In contrast, the signal-to-noise ratio of alarm systems for medical devices in the neonatal intensive care unit is 0 dB or higher. A feasible system for automatic sound identification with high accuracy is needed to prevent alarm sounds from being missed by the staff.

Objective: The purpose of this study was to design a method for classifying multiple alarm sounds collected with a monaural microphone in a noisy environment.

Methods: Features of 7 alarm sounds were extracted using a mel filter bank and incorporated into a classifier using convolutional and recurrent neural networks. To estimate its clinical usefulness, the classifier was evaluated with mixtures of up to 7 alarm sounds and hospital ward noise.

Results: The proposed convolutional recurrent neural network model was evaluated using a simulation dataset of 7 alarm sounds mixed with hospital ward noise. At a signal-to-noise ratio of 0 dB, the best-performing model (convolutional neural network 3+bidirectional gate recurrent unit) achieved an event-based F1-score of 0.967, with a precision of 0.944 and a recall of 0.991. When the venous foot pump class was excluded, the classwise recall of the classifier ranged from 0.990 to 1.000.

Conclusions: The proposed classifier was found to be highly accurate in detecting alarm sounds. Although the performance of the proposed classifier in a clinical environment can be improved, the classifier could be incorporated into an alarm sound detection system. The classifier, combined with network connectivity, could improve the notification of abnormal status detected by unconnected medical devices.

背景:尽管越来越多的床边医疗设备配备了无线连接,以提供可靠的通知,但许多非联网设备仍然有效地检测患者的异常情况,并通过听觉警报提醒医务人员。然而,工作人员可能会错过这些通知,特别是在遥远的地方或其他私人房间时。相比之下,新生儿重症监护病房医疗设备报警系统的信噪比为0 dB或更高。需要一种可行的、高精度的声音自动识别系统,防止报警声音被工作人员遗漏。目的:本研究的目的是设计一种在嘈杂环境下用单耳麦克风采集的多个报警声音的分类方法。方法:采用mel滤波器组提取7种报警声音的特征,并结合卷积神经网络和递归神经网络进行分类。为了评估其临床用途,分类器被评估与多达7报警声音和医院病房噪音的混合物。结果:使用混合了医院病房噪声的7个报警声音的模拟数据集对所提出的卷积递归神经网络模型进行了评估。在信噪比为0 dB时,表现最好的模型(卷积神经网络3+双向门循环单元)基于事件的f1得分为0.967,精度为0.944,召回率为0.991。当排除静脉足泵类别时,分类器的分类召回率为0.990 ~ 1.000。结论:所提出的分类器在检测报警声音方面具有较高的准确率。虽然所提出的分类器在临床环境中的性能可以得到改善,但分类器可以纳入报警声音检测系统。该分类器结合网络连通性,可以提高对未连接医疗设备检测到的异常状态的通知。
{"title":"Detection of Polyphonic Alarm Sounds From Medical Devices Using Frequency-Enhanced Deep Learning: Simulation Study.","authors":"Kazumasa Kishimoto, Tadamasa Takemura, Osamu Sugiyama, Ryosuke Kojima, Masahiro Yakami, Goshiro Yamamoto, Tomohiro Kuroda","doi":"10.2196/35987","DOIUrl":"10.2196/35987","url":null,"abstract":"<p><strong>Background: </strong>Although an increasing number of bedside medical devices are equipped with wireless connections for reliable notifications, many nonnetworked devices remain effective at detecting abnormal patient conditions and alerting medical staff through auditory alarms. Staff members, however, can miss these notifications, especially when in distant areas or other private rooms. In contrast, the signal-to-noise ratio of alarm systems for medical devices in the neonatal intensive care unit is 0 dB or higher. A feasible system for automatic sound identification with high accuracy is needed to prevent alarm sounds from being missed by the staff.</p><p><strong>Objective: </strong>The purpose of this study was to design a method for classifying multiple alarm sounds collected with a monaural microphone in a noisy environment.</p><p><strong>Methods: </strong>Features of 7 alarm sounds were extracted using a mel filter bank and incorporated into a classifier using convolutional and recurrent neural networks. To estimate its clinical usefulness, the classifier was evaluated with mixtures of up to 7 alarm sounds and hospital ward noise.</p><p><strong>Results: </strong>The proposed convolutional recurrent neural network model was evaluated using a simulation dataset of 7 alarm sounds mixed with hospital ward noise. At a signal-to-noise ratio of 0 dB, the best-performing model (convolutional neural network 3+bidirectional gate recurrent unit) achieved an event-based F1-score of 0.967, with a precision of 0.944 and a recall of 0.991. When the venous foot pump class was excluded, the classwise recall of the classifier ranged from 0.990 to 1.000.</p><p><strong>Conclusions: </strong>The proposed classifier was found to be highly accurate in detecting alarm sounds. Although the performance of the proposed classifier in a clinical environment can be improved, the classifier could be incorporated into an alarm sound detection system. The classifier, combined with network connectivity, could improve the notification of abnormal status detected by unconnected medical devices.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e35987"},"PeriodicalIF":3.8,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12611226/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145508208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model for Predicting Serious Hematological Adverse Events in Individuals With Ovarian Cancer Receiving Poly (Adenosine Diphosphate Ribose) Polymerase Inhibitor Treatment: Prospective Cohort Study. 预测接受聚二磷酸腺苷核糖聚合酶抑制剂治疗的卵巢癌患者严重血液学不良事件的模型:前瞻性队列研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-12 DOI: 10.2196/72994
Xiaotong Lian, Yu Lei

Background: Predicting serious hematological adverse events (SHAEs) from poly (adenosine diphosphate ribose) polymerase inhibitors (PARPis) would allow us to prioritize patients with ovarian cancer at higher risk for more intensive care, ultimately lowering morbidity and preventing them from premature termination of medication.

Objective: This study aimed to explore the risk factors for SHAEs in patients with ovarian cancer receiving PARPi treatment and develop a risk prediction model for such events.

Methods: Prospective clinical data were collected on patients with ovarian cancer who received PARPi treatment at the Guangxi Medical University Affiliated Tumor Hospital from December 2018 to August 2024. They were divided into a SHAE group and a no-SHAE group based on the occurrence of SHAEs. Variable differences were screened using the chi-square test or Fisher exact test. Multivariate logistic regression was used to determine independent factors influencing SHAEs in patients with ovarian cancer. A predictive model for serious blood-related complications in ovarian cancer treatment was developed from identified independent risk factors using the R software. The model's clinical utility was assessed through decision curve analysis (net benefit), calibration (calibration curve), and discrimination (receiver operating characteristic curve).

Results: A total of 70 patients with ovarian cancer receiving PARPi treatment were included in this study. Of these 70 patients, 16 (23%) experienced SHAEs, with decreases in red blood cell (RBC) count and hemoglobin levels being the most common. Multiple logistic regression analysis identified 4 independent predictors of PARPi-associated SHAEs in patients with ovarian cancer: lymph node metastasis (odds ratio [OR] 6.733, 95% CI 1.197-37.873; P=.03), creatinine clearance rate of ≤60 mL per minute (OR 23.722, 95% CI 3.121-180.303; P=.002), RBC count of ≤3.3×1012 per liter (OR 4.847, 95% CI 1.020-23.041; P=.047), and combination therapy with vascular endothelial growth factor inhibitors (OR 6.749, 95% CI 1.313-34.689; P=.02). The internal validation yielded an area under the curve of 0.874 (95% CI 0.793-0.955), indicating moderate clinical utility and accuracy for the risk prediction model incorporating these predictors.

Conclusions: Lymph node metastasis, creatinine clearance rate of ≤60 mL per minute, RBC count of ≤3.3×1012 per liter, and combination therapy with vascular endothelial growth factor inhibitors are independent risk factors for PARPi SHAEs in patients with ovarian cancer. The risk prediction model established based on these factors demonstrated moderate predictive value.

背景:预测聚腺苷二磷酸核糖聚合酶抑制剂(PARPis)的严重血液学不良事件(SHAEs)将使我们能够优先考虑高风险卵巢癌患者进行更多的重症监护,最终降低发病率并防止过早终止药物治疗。目的:本研究旨在探讨PARPi治疗的卵巢癌患者发生SHAEs的危险因素,并建立SHAEs的风险预测模型。方法:收集2018年12月至2024年8月在广西医科大学附属肿瘤医院接受PARPi治疗的卵巢癌患者的前瞻性临床资料。根据SHAE的发生情况分为SHAE组和无SHAE组。使用卡方检验或Fisher精确检验筛选变量差异。采用多因素logistic回归确定影响卵巢癌患者SHAEs的独立因素。使用R软件从确定的独立危险因素中开发了卵巢癌治疗中严重血液相关并发症的预测模型。通过决策曲线分析(净效益)、校准(校准曲线)和鉴别(受试者工作特征曲线)来评估模型的临床效用。结果:本研究共纳入70例接受PARPi治疗的卵巢癌患者。在这70例患者中,16例(23%)经历了SHAEs,其中最常见的是红细胞(RBC)计数和血红蛋白水平下降。多元logistic回归分析确定了卵巢癌患者parpi相关SHAEs的4个独立预测因素:淋巴结转移(比值比[OR] 6.733, 95% CI 1.197-37.873; P= 0.03)、肌酐清除率≤60 mL / min(比值比[OR] 23.722, 95% CI 3.121-180.303; P= 0.002)、红细胞计数≤3.3×1012 / l(比值比[OR] 4.847, 95% CI 1.020-23.041; P= 0.047)、血管内皮生长因子抑制剂联合治疗(比值比[OR] 6.749, 95% CI 1.313-34.689; P= 0.02)。内部验证的曲线下面积为0.874 (95% CI 0.793-0.955),表明纳入这些预测因子的风险预测模型具有中等的临床实用性和准确性。结论:淋巴结转移、肌酐清除率≤60ml / min、红细胞计数≤3.3×1012 / l、联合血管内皮生长因子抑制剂治疗是卵巢癌PARPi SHAEs的独立危险因素。基于这些因素建立的风险预测模型具有中等的预测价值。
{"title":"Model for Predicting Serious Hematological Adverse Events in Individuals With Ovarian Cancer Receiving Poly (Adenosine Diphosphate Ribose) Polymerase Inhibitor Treatment: Prospective Cohort Study.","authors":"Xiaotong Lian, Yu Lei","doi":"10.2196/72994","DOIUrl":"10.2196/72994","url":null,"abstract":"<p><strong>Background: </strong>Predicting serious hematological adverse events (SHAEs) from poly (adenosine diphosphate ribose) polymerase inhibitors (PARPis) would allow us to prioritize patients with ovarian cancer at higher risk for more intensive care, ultimately lowering morbidity and preventing them from premature termination of medication.</p><p><strong>Objective: </strong>This study aimed to explore the risk factors for SHAEs in patients with ovarian cancer receiving PARPi treatment and develop a risk prediction model for such events.</p><p><strong>Methods: </strong>Prospective clinical data were collected on patients with ovarian cancer who received PARPi treatment at the Guangxi Medical University Affiliated Tumor Hospital from December 2018 to August 2024. They were divided into a SHAE group and a no-SHAE group based on the occurrence of SHAEs. Variable differences were screened using the chi-square test or Fisher exact test. Multivariate logistic regression was used to determine independent factors influencing SHAEs in patients with ovarian cancer. A predictive model for serious blood-related complications in ovarian cancer treatment was developed from identified independent risk factors using the R software. The model's clinical utility was assessed through decision curve analysis (net benefit), calibration (calibration curve), and discrimination (receiver operating characteristic curve).</p><p><strong>Results: </strong>A total of 70 patients with ovarian cancer receiving PARPi treatment were included in this study. Of these 70 patients, 16 (23%) experienced SHAEs, with decreases in red blood cell (RBC) count and hemoglobin levels being the most common. Multiple logistic regression analysis identified 4 independent predictors of PARPi-associated SHAEs in patients with ovarian cancer: lymph node metastasis (odds ratio [OR] 6.733, 95% CI 1.197-37.873; P=.03), creatinine clearance rate of ≤60 mL per minute (OR 23.722, 95% CI 3.121-180.303; P=.002), RBC count of ≤3.3×10<sup>12</sup> per liter (OR 4.847, 95% CI 1.020-23.041; P=.047), and combination therapy with vascular endothelial growth factor inhibitors (OR 6.749, 95% CI 1.313-34.689; P=.02). The internal validation yielded an area under the curve of 0.874 (95% CI 0.793-0.955), indicating moderate clinical utility and accuracy for the risk prediction model incorporating these predictors.</p><p><strong>Conclusions: </strong>Lymph node metastasis, creatinine clearance rate of ≤60 mL per minute, RBC count of ≤3.3×10<sup>12</sup> per liter, and combination therapy with vascular endothelial growth factor inhibitors are independent risk factors for PARPi SHAEs in patients with ovarian cancer. The risk prediction model established based on these factors demonstrated moderate predictive value.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e72994"},"PeriodicalIF":3.8,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12658389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145508273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Delayed Extubation After General Anesthesia in Postanesthesia Care Unit Patients Using Machine Learning: Model Development Study. 使用机器学习预测麻醉后护理病房患者全身麻醉后延迟拔管:模型开发研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-11 DOI: 10.2196/72602
Jianwei Luo, Shaoman Lin, Liman Wang, Huanfan Ji, Jingcong Zheng, Tingkang Wang, Lin Chen, Ziqi Lin, Zhongqi Liu, Ning Liufu

Background: Delayed extubation after general anesthesia increases complications and can lead to longer hospital stays and higher mortality. Current risk assessments often rely on subjective judgment or simple tools, whereas machine learning offers potential for real-time evaluation, though research is limited and typically uses single-algorithm models.

Objective: The aims of this study were to identify risk factors for delayed extubation after general anesthesia in the sample and to construct a risk prediction model for delayed extubation in this population.

Methods: Data from 4779 patients admitted to the postanesthesia care unit between September 2023 and May 2024 were used to develop prediction models for delayed extubation using k-nearest neighbor, decision tree, extreme gradient boosting, random forest, a light gradient boosting machine, and an artificial neural network. Model performance was assessed by calculating the area under the receiver operating characteristic curve, sensitivity, specificity, accuracy, F1-score, and Brier score. Calibration performance was evaluated using calibration curves generated with 100-bin quantile calibration and Loess smoothing to provide bias-corrected and smoothed visual assessment. Additionally, the Hosmer-Lemeshow goodness-of-fit test was performed to quantitatively evaluate calibration, with P values >.05 indicating good calibration.

Results: Among the 6 models evaluated, the extreme gradient boosting model demonstrated the best performance, with an area under the receiver operating characteristic curve of 0.750 (95% CI 0.703-0.796), a sensitivity of 0.734 (95% CI 0.635-0.827), and a specificity of 0.647 (95% CI 0.623-0.673). The model calibration was acceptable, with a Brier score of 0.0505 and a nonsignificant Hosmer-Lemeshow goodness-of-fit test (χ²6=7.3; P=.287), indicating good calibration. Shapley additive explanations were used to rank feature importance.

Conclusions: These machine learning models enable early identification of delayed extubation risk, supporting personalized clinical decisions and optimizing postanesthesia care unit resource allocation.

背景:全麻后延迟拔管会增加并发症,延长住院时间和提高死亡率。目前的风险评估往往依赖于主观判断或简单的工具,而机器学习提供了实时评估的潜力,尽管研究有限,通常使用单一算法模型。目的:本研究的目的是识别样本中全麻后延迟拔管的危险因素,并构建该人群延迟拔管的风险预测模型。方法:采用2023年9月至2024年5月期间入住麻醉后护理病房的4779例患者的数据,采用k近邻、决策树、极端梯度增强、随机森林、轻梯度增强机和人工神经网络建立延迟拔管预测模型。通过计算受试者工作特征曲线下面积、敏感性、特异性、准确性、f1评分和Brier评分来评估模型的性能。使用100 bin分位数校准和黄土平滑生成的校准曲线来评估校准性能,以提供偏差校正和平滑的视觉评估。此外,采用Hosmer-Lemeshow拟合优度检验定量评价校准,P值>.05表示校准良好。结果:在评价的6个模型中,极端梯度增强模型表现最佳,其受试者工作特征曲线下面积为0.750 (95% CI 0.703 ~ 0.796),灵敏度为0.734 (95% CI 0.635 ~ 0.827),特异性为0.647 (95% CI 0.623 ~ 0.673)。模型校正是可接受的,Brier评分为0.0505,Hosmer-Lemeshow拟合优度检验不显著(χ 2 6=7.3; P= 0.287),表明校正良好。沙普利加性解释用于特征重要性排序。结论:这些机器学习模型能够早期识别延迟拔管风险,支持个性化临床决策并优化麻醉后护理单元资源分配。
{"title":"Predicting Delayed Extubation After General Anesthesia in Postanesthesia Care Unit Patients Using Machine Learning: Model Development Study.","authors":"Jianwei Luo, Shaoman Lin, Liman Wang, Huanfan Ji, Jingcong Zheng, Tingkang Wang, Lin Chen, Ziqi Lin, Zhongqi Liu, Ning Liufu","doi":"10.2196/72602","DOIUrl":"10.2196/72602","url":null,"abstract":"<p><strong>Background: </strong>Delayed extubation after general anesthesia increases complications and can lead to longer hospital stays and higher mortality. Current risk assessments often rely on subjective judgment or simple tools, whereas machine learning offers potential for real-time evaluation, though research is limited and typically uses single-algorithm models.</p><p><strong>Objective: </strong>The aims of this study were to identify risk factors for delayed extubation after general anesthesia in the sample and to construct a risk prediction model for delayed extubation in this population.</p><p><strong>Methods: </strong>Data from 4779 patients admitted to the postanesthesia care unit between September 2023 and May 2024 were used to develop prediction models for delayed extubation using k-nearest neighbor, decision tree, extreme gradient boosting, random forest, a light gradient boosting machine, and an artificial neural network. Model performance was assessed by calculating the area under the receiver operating characteristic curve, sensitivity, specificity, accuracy, F1-score, and Brier score. Calibration performance was evaluated using calibration curves generated with 100-bin quantile calibration and Loess smoothing to provide bias-corrected and smoothed visual assessment. Additionally, the Hosmer-Lemeshow goodness-of-fit test was performed to quantitatively evaluate calibration, with P values >.05 indicating good calibration.</p><p><strong>Results: </strong>Among the 6 models evaluated, the extreme gradient boosting model demonstrated the best performance, with an area under the receiver operating characteristic curve of 0.750 (95% CI 0.703-0.796), a sensitivity of 0.734 (95% CI 0.635-0.827), and a specificity of 0.647 (95% CI 0.623-0.673). The model calibration was acceptable, with a Brier score of 0.0505 and a nonsignificant Hosmer-Lemeshow goodness-of-fit test (χ²6=7.3; P=.287), indicating good calibration. Shapley additive explanations were used to rank feature importance.</p><p><strong>Conclusions: </strong>These machine learning models enable early identification of delayed extubation risk, supporting personalized clinical decisions and optimizing postanesthesia care unit resource allocation.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e72602"},"PeriodicalIF":3.8,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12604829/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145497504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing Sleep Behavior Using BERT-BiLSTM and Fine-Tuned GPT-2 Sentiment Classification: Comparison Study. 使用BERT-BiLSTM和微调GPT-2情绪分类分析睡眠行为:比较研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-10 DOI: 10.2196/70753
Yihan Deng, Julia van der Meer, Athina Tzovara, Markus Schmidt, Claudio Bassetti, Kerstin Denecke

Background: The diagnosis of sleep disorders presents a challenging landscape, characterized by the complex nature of their assessment and the often divergent views between objective clinical assessment and subjective patient experience. This study explores the interplay between these perspectives, focusing on the variability of individual perceptions of sleep quality and latency.

Objective: Our primary goal was to investigate the alignment, or lack thereof, between subjective experiences and objective measures in the assessment of sleep disorders.

Methods: To study this, we developed an aspect-based sentiment analysis method for clinical narratives: using large language models (Falcon 40B and Mixtral 8X7B), we are identifying entity groups of 3 aspects related to sleep behavior (day sleepiness, sleep quality, and fatigue). To phrases referring to these aspects, we are assigning sentiment values between 0 and 1 using a BERT-BiLSTM-based approach (accuracy 78%) and a fine-tuned GPT-2 sentiment classifier (accuracy 87%).

Results: In a cohort of 100 patients with complete subjective (Karolinska Sleepiness Scale [KSS]) and objective (Multiple Sleep Latency Test [MSLT]) assessments, approximately 15% exhibited notable discrepancies between perceived and measured levels of daytime sleepiness. A paired-sample t test comparing KSS scores to MSLT latencies approached statistical significance (t99=2.456; P=.06), suggesting a potential misalignment between subjective reports and physiological markers. In contrast, the comparison using text-derived sentiment scores revealed a statistically significant divergence (t99=2.324; P=.047), indicating that clinical narratives may more reliably capture discrepancies in sleepiness perception. These results underscore the importance of integrating multiple subjective sources, with an emphasis on narrative free text, in the assessment of domains such as fatigue and daytime sleepiness-where standardized measures may not fully reflect the patient's lived experience.

Conclusions: Our method has potential in uncovering critical insights into patient self-perception versus clinical evaluations, which enables clinicians to identify patients requiring objective verification of self-reported symptoms.

背景:睡眠障碍的诊断呈现出一种具有挑战性的景观,其特点是其评估的复杂性以及客观临床评估和主观患者经验之间经常存在分歧。本研究探讨了这些观点之间的相互作用,重点关注个人对睡眠质量和潜伏期的看法的可变性。目的:我们的主要目的是调查在评估睡眠障碍时主观经验和客观测量之间的一致性或缺乏一致性。为了研究这一点,我们开发了一种基于方面的临床叙述情绪分析方法:使用大型语言模型(Falcon 40B和Mixtral 8X7B),我们正在识别与睡眠行为(白天嗜睡、睡眠质量和疲劳)相关的3个方面的实体组。对于涉及这些方面的短语,我们使用基于bert - bilstm的方法(准确率78%)和经过微调的GPT-2情感分类器(准确率87%)在0到1之间分配情感值。结果:在100名患者的队列中,完成了主观(卡罗林斯卡嗜睡量表[KSS])和客观(多重睡眠潜伏期测试[MSLT])评估,大约15%的患者表现出白天嗜睡的感知水平和测量水平之间的显著差异。配对样本t检验比较KSS评分与MSLT潜伏期接近统计学意义(t99=2.456; P= 0.06),表明主观报告与生理标记之间可能存在不一致。相比之下,使用文本衍生情绪评分的比较显示了统计学上显著的差异(t99=2.324; P= 0.047),表明临床叙述可能更可靠地捕捉到困倦感知的差异。这些结果强调了在疲劳和白天嗜睡等领域的评估中整合多种主观来源的重要性,强调了叙述性自由文本,在这些领域,标准化的措施可能无法完全反映患者的生活经验。结论:我们的方法有可能揭示患者自我感知与临床评估的关键见解,这使临床医生能够识别需要客观验证自我报告症状的患者。
{"title":"Analyzing Sleep Behavior Using BERT-BiLSTM and Fine-Tuned GPT-2 Sentiment Classification: Comparison Study.","authors":"Yihan Deng, Julia van der Meer, Athina Tzovara, Markus Schmidt, Claudio Bassetti, Kerstin Denecke","doi":"10.2196/70753","DOIUrl":"10.2196/70753","url":null,"abstract":"<p><strong>Background: </strong>The diagnosis of sleep disorders presents a challenging landscape, characterized by the complex nature of their assessment and the often divergent views between objective clinical assessment and subjective patient experience. This study explores the interplay between these perspectives, focusing on the variability of individual perceptions of sleep quality and latency.</p><p><strong>Objective: </strong>Our primary goal was to investigate the alignment, or lack thereof, between subjective experiences and objective measures in the assessment of sleep disorders.</p><p><strong>Methods: </strong>To study this, we developed an aspect-based sentiment analysis method for clinical narratives: using large language models (Falcon 40B and Mixtral 8X7B), we are identifying entity groups of 3 aspects related to sleep behavior (day sleepiness, sleep quality, and fatigue). To phrases referring to these aspects, we are assigning sentiment values between 0 and 1 using a BERT-BiLSTM-based approach (accuracy 78%) and a fine-tuned GPT-2 sentiment classifier (accuracy 87%).</p><p><strong>Results: </strong>In a cohort of 100 patients with complete subjective (Karolinska Sleepiness Scale [KSS]) and objective (Multiple Sleep Latency Test [MSLT]) assessments, approximately 15% exhibited notable discrepancies between perceived and measured levels of daytime sleepiness. A paired-sample t test comparing KSS scores to MSLT latencies approached statistical significance (t99=2.456; P=.06), suggesting a potential misalignment between subjective reports and physiological markers. In contrast, the comparison using text-derived sentiment scores revealed a statistically significant divergence (t99=2.324; P=.047), indicating that clinical narratives may more reliably capture discrepancies in sleepiness perception. These results underscore the importance of integrating multiple subjective sources, with an emphasis on narrative free text, in the assessment of domains such as fatigue and daytime sleepiness-where standardized measures may not fully reflect the patient's lived experience.</p><p><strong>Conclusions: </strong>Our method has potential in uncovering critical insights into patient self-perception versus clinical evaluations, which enables clinicians to identify patients requiring objective verification of self-reported symptoms.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e70753"},"PeriodicalIF":3.8,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12599995/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145490926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large Language Model Versus Manual Review for Clinical Data Curation in Breast Cancer: Retrospective Comparative Study. 乳腺癌临床数据整理的大语言模型与人工回顾:回顾性比较研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-06 DOI: 10.2196/73605
Young-Joon Kang, Hocheol Lee, Jae Pak Yi, Hyobin Kim, Chang Ik Yoon, Jong Min Baek, Yong-Seok Kim, Ye Won Jeon, Jiyoung Rhu, Su Hyun Lim, Hoon Choi, Se Jeong Oh
<p><strong>Background: </strong>Manual review of electronic health records for clinical research is labor-intensive and prone to reviewer-dependent variations. Large language models (LLMs) offer potential for automated clinical data extraction; however, their feasibility in surgical oncology remains underexplored.</p><p><strong>Objective: </strong>This study aimed to evaluate the feasibility and accuracy of LLM-based processing compared with manual physician review for extracting clinical data from breast cancer records.</p><p><strong>Methods: </strong>We conducted a retrospective comparative study analyzing breast cancer records from 5 academic hospitals (January 2019-December 2019). Two data extraction pathways were compared: (1) manual physician review with direct electronic health record access (group 1: 1366/3100, 44.06%) and (2) LLM-based processing using Claude 3.5 Sonnet (Anthropic) on deidentified data automatically extracted through a clinical data warehouse platform (group 2: 1734/3100, 55.94%). The automated extraction system provided prestructured, deidentified data sheets organized by clinical domains, which were then processed by the LLM. The LLM prompt was developed through a 3-phase iterative process over 2 days. Primary outcomes included missing value rates, extraction accuracy, and concordance between groups. Secondary outcomes included comparison with the Korean Breast Cancer Society national registry data, processing time, and resource use. Validation involved 50 stratified random samples per group (900 data points each), assessed by 4 breast surgical oncologists. Statistical analysis included chi-square tests, 2-tailed t tests, Cohen κ, and intraclass correlation coefficients. The accuracy threshold was set at 90%.</p><p><strong>Results: </strong>The LLM achieved 90.8% (817) accuracy in validation analysis. Missing data patterns differed between groups: group 2 showed better lymph node documentation (missing: 152/1734, 8.76% vs 294/1366, 21.52%) but higher missing rates for cancer staging (211/1734, 12.17% vs 43/1366, 3.15%). Both groups demonstrated similar breast-conserving surgery rates (1107/1734, 63.84% vs 868/1366, 63.54%). Processing efficiency differed substantially: LLM processing required 12 days with 2 physicians versus 7 months with 5 physicians for manual review, representing a 91% reduction in physician hours (96 h vs 1025 h). The LLM group captured significantly more survival events (41 vs 11; P=.002). Stage distribution in the LLM group aligned better with national registry data (Cramér V=0.03 vs 0.07). Application programming interface costs totaled US $260 for 1734 cases (US $0.15 per case).</p><p><strong>Conclusions: </strong>LLM-based curation of automatically extracted, deidentified clinical data demonstrated comparable effectiveness to manual physician review while reducing processing time by 95% and physician hours by 91%. This 2-step approach-automated data extraction followed by LLM curation-addresse
背景:临床研究的电子健康记录的人工审查是劳动密集型的,并且容易出现审稿人依赖的变化。大型语言模型(LLMs)为自动临床数据提取提供了潜力;然而,它们在外科肿瘤学中的可行性仍有待探索。目的:本研究旨在评价基于llm的处理与人工医师评审相比,从乳腺癌病历中提取临床数据的可行性和准确性。方法:对5所专科医院2019年1月- 2019年12月的乳腺癌病例进行回顾性比较研究。比较了两种数据提取途径:(1)使用直接电子健康记录访问的手动医生审查(第1组:1366/3100,44.06%)和(2)使用Claude 3.5 Sonnet (Anthropic)对通过临床数据仓库平台自动提取的未识别数据进行基于llm的处理(第2组:1734/3100,55.94%)。自动提取系统提供按临床领域组织的预结构化、去识别的数据表,然后由LLM处理。LLM提示符的开发经过3个阶段的迭代过程,耗时2天。主要结果包括缺失值率、提取准确性和组间一致性。次要结果包括与韩国乳腺癌协会国家登记数据、处理时间和资源使用的比较。验证涉及每组50个分层随机样本(每组900个数据点),由4名乳腺外科肿瘤学家评估。统计分析包括卡方检验、双尾t检验、Cohen κ和类内相关系数。准确度阈值设为90%。结果:LLM在验证分析中准确率为90.8%(817)。缺失的数据模式在两组之间有所不同:2组有更好的淋巴结记录(缺失:152/1734,8.76% vs 294/1366, 21.52%),但癌症分期缺失率更高(211/1734,12.17% vs 43/1366, 3.15%)。两组保乳手术率相似(1107/1734,63.84% vs 868/1366, 63.54%)。处理效率有很大差异:2名医生处理LLM需要12天,而5名医生手工审查需要7个月,这意味着医生工作时间减少了91%(96小时对1025小时)。LLM组捕获的生存事件明显更多(41 vs 11; P= 0.002)。LLM组的分期分布与国家登记数据更一致(cramsamr V=0.03 vs 0.07)。1734个案例的应用程序编程接口成本总计260美元(每个案例0.15美元)。结论:基于法学硕士的自动提取、去识别临床数据的管理显示出与手动医生审查相当的有效性,同时减少了95%的处理时间和91%的医生工作时间。这种两步方法——自动数据提取,然后是法学硕士管理——既解决了隐私问题,又满足了效率需求。尽管在整合多个临床事件方面存在局限性,但该方法为肿瘤研究中的临床数据提取提供了可扩展的解决方案。90.8%的准确率和优越的生存事件捕获表明,将自动化数据提取系统与LLM处理相结合可以加速回顾性临床研究,同时保持数据质量和患者隐私。
{"title":"Large Language Model Versus Manual Review for Clinical Data Curation in Breast Cancer: Retrospective Comparative Study.","authors":"Young-Joon Kang, Hocheol Lee, Jae Pak Yi, Hyobin Kim, Chang Ik Yoon, Jong Min Baek, Yong-Seok Kim, Ye Won Jeon, Jiyoung Rhu, Su Hyun Lim, Hoon Choi, Se Jeong Oh","doi":"10.2196/73605","DOIUrl":"10.2196/73605","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Manual review of electronic health records for clinical research is labor-intensive and prone to reviewer-dependent variations. Large language models (LLMs) offer potential for automated clinical data extraction; however, their feasibility in surgical oncology remains underexplored.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aimed to evaluate the feasibility and accuracy of LLM-based processing compared with manual physician review for extracting clinical data from breast cancer records.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We conducted a retrospective comparative study analyzing breast cancer records from 5 academic hospitals (January 2019-December 2019). Two data extraction pathways were compared: (1) manual physician review with direct electronic health record access (group 1: 1366/3100, 44.06%) and (2) LLM-based processing using Claude 3.5 Sonnet (Anthropic) on deidentified data automatically extracted through a clinical data warehouse platform (group 2: 1734/3100, 55.94%). The automated extraction system provided prestructured, deidentified data sheets organized by clinical domains, which were then processed by the LLM. The LLM prompt was developed through a 3-phase iterative process over 2 days. Primary outcomes included missing value rates, extraction accuracy, and concordance between groups. Secondary outcomes included comparison with the Korean Breast Cancer Society national registry data, processing time, and resource use. Validation involved 50 stratified random samples per group (900 data points each), assessed by 4 breast surgical oncologists. Statistical analysis included chi-square tests, 2-tailed t tests, Cohen κ, and intraclass correlation coefficients. The accuracy threshold was set at 90%.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The LLM achieved 90.8% (817) accuracy in validation analysis. Missing data patterns differed between groups: group 2 showed better lymph node documentation (missing: 152/1734, 8.76% vs 294/1366, 21.52%) but higher missing rates for cancer staging (211/1734, 12.17% vs 43/1366, 3.15%). Both groups demonstrated similar breast-conserving surgery rates (1107/1734, 63.84% vs 868/1366, 63.54%). Processing efficiency differed substantially: LLM processing required 12 days with 2 physicians versus 7 months with 5 physicians for manual review, representing a 91% reduction in physician hours (96 h vs 1025 h). The LLM group captured significantly more survival events (41 vs 11; P=.002). Stage distribution in the LLM group aligned better with national registry data (Cramér V=0.03 vs 0.07). Application programming interface costs totaled US $260 for 1734 cases (US $0.15 per case).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;LLM-based curation of automatically extracted, deidentified clinical data demonstrated comparable effectiveness to manual physician review while reducing processing time by 95% and physician hours by 91%. This 2-step approach-automated data extraction followed by LLM curation-addresse","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e73605"},"PeriodicalIF":3.8,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12599480/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145460703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative Models and Sentence Transformers for the Recognition and Normalization of Continuous and Discontinuous Phenotype Mentions: Model Development and Evaluation. 连续和不连续表现型提及的识别和规范化的生成模型和句子转换器:模型的发展和评价。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-05 DOI: 10.2196/68558
Areej Alhassan, Viktor Schlegel, Monira Aloud, Riza Batista-Navarro, Goran Nenadic

Background: Extracting genetic phenotype mentions from clinical reports and normalizing them to standardized concepts within the human phenotype ontology are essential for consistent interpretation and representation of genetic conditions. This is particularly important in fields such as dysmorphology and plays a key role in advancing personalized health care. However, modern clinical named entity recognition methods face challenges in accurately identifying discontinuous mentions (ie, entity spans that are interrupted by unrelated words), which can be found in these clinical reports.

Objective: This study aims to develop a system that can accurately extract and normalize genetic phenotypes, specifically from physical examination reports related to dysmorphology assessment. These mentions appear in both continuous and discontinuous lexical forms, with a focus on addressing challenging discontinuous entity spans.

Methods: We introduce DiscHPO, a 2-phase pipeline consisting of a sequence-to-sequence named entity recognition model for span extraction, and an entity normalizer that uses a sentence transformer biencoder for candidate generation and a cross-encoder reranker for selecting the best candidate as the normalized concept. This system was tested as part of our participation in Track 3 of the BioCreative VIII shared task.

Results: For overall performance on the test set, the top-performing model for entity normalization achieved an F1-score of 0.723, while the best span extraction model reached an F1-score of 0.665. Both scores surpassed those of 2 baseline models using the same dataset, indicating superior efficacy in handling both continuous and discontinuous spans. On the validation set, we were able to demonstrate our system's ability to recognize these mentions, with the model achieving an F1-score of 0.631 for exact match on discontinuous spans only.

Conclusions: The findings suggest that exact extraction of entity spans may not always be necessary for successful normalization. Partial mention matches can be sufficient as long as they capture the essential concept information, supporting the system's utility in clinical downstream tasks.

背景:从临床报告中提取遗传表型,并将其规范化为人类表型本体中的标准化概念,对于遗传条件的一致解释和表示至关重要。这在畸变学等领域尤为重要,并在推进个性化医疗保健方面发挥着关键作用。然而,现代临床命名实体识别方法在准确识别不连续提及(即被不相关的单词打断的实体跨度)方面面临挑战,这可以在这些临床报告中找到。目的:本研究旨在开发一种能够准确提取和规范遗传表型的系统,特别是从与畸形评估相关的体检报告中提取和规范遗传表型。这些提及以连续和不连续的词汇形式出现,重点是解决具有挑战性的不连续实体范围。方法:我们引入了DiscHPO,这是一个两阶段的管道,包括一个序列到序列的命名实体识别模型,用于span提取,以及一个实体规范化器,该实体规范化器使用句子转换双编码器生成候选项,并使用交叉编码器重新排序器选择最佳候选项作为规范化概念。该系统作为我们参与BioCreative VIII共享任务的Track 3的一部分进行了测试。结果:对于测试集上的整体性能,表现最好的实体归一化模型的f1得分为0.723,而表现最好的跨度提取模型的f1得分为0.665。这两个分数都超过了使用相同数据集的2个基线模型,表明在处理连续和不连续的跨度方面都有更好的效果。在验证集上,我们能够展示系统识别这些提及的能力,模型仅在不连续的跨度上实现了精确匹配的f1分数为0.631。结论:研究结果表明,准确提取实体跨度可能并不总是成功规范化所必需的。只要能够捕获基本概念信息,部分提及匹配就足够了,从而支持系统在临床下游任务中的实用性。
{"title":"Generative Models and Sentence Transformers for the Recognition and Normalization of Continuous and Discontinuous Phenotype Mentions: Model Development and Evaluation.","authors":"Areej Alhassan, Viktor Schlegel, Monira Aloud, Riza Batista-Navarro, Goran Nenadic","doi":"10.2196/68558","DOIUrl":"10.2196/68558","url":null,"abstract":"<p><strong>Background: </strong>Extracting genetic phenotype mentions from clinical reports and normalizing them to standardized concepts within the human phenotype ontology are essential for consistent interpretation and representation of genetic conditions. This is particularly important in fields such as dysmorphology and plays a key role in advancing personalized health care. However, modern clinical named entity recognition methods face challenges in accurately identifying discontinuous mentions (ie, entity spans that are interrupted by unrelated words), which can be found in these clinical reports.</p><p><strong>Objective: </strong>This study aims to develop a system that can accurately extract and normalize genetic phenotypes, specifically from physical examination reports related to dysmorphology assessment. These mentions appear in both continuous and discontinuous lexical forms, with a focus on addressing challenging discontinuous entity spans.</p><p><strong>Methods: </strong>We introduce DiscHPO, a 2-phase pipeline consisting of a sequence-to-sequence named entity recognition model for span extraction, and an entity normalizer that uses a sentence transformer biencoder for candidate generation and a cross-encoder reranker for selecting the best candidate as the normalized concept. This system was tested as part of our participation in Track 3 of the BioCreative VIII shared task.</p><p><strong>Results: </strong>For overall performance on the test set, the top-performing model for entity normalization achieved an F<sub>1</sub>-score of 0.723, while the best span extraction model reached an F<sub>1</sub>-score of 0.665. Both scores surpassed those of 2 baseline models using the same dataset, indicating superior efficacy in handling both continuous and discontinuous spans. On the validation set, we were able to demonstrate our system's ability to recognize these mentions, with the model achieving an F<sub>1</sub>-score of 0.631 for exact match on discontinuous spans only.</p><p><strong>Conclusions: </strong>The findings suggest that exact extraction of entity spans may not always be necessary for successful normalization. Partial mention matches can be sufficient as long as they capture the essential concept information, supporting the system's utility in clinical downstream tasks.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e68558"},"PeriodicalIF":3.8,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12631088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145454083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attitudes Toward Common Data Models Among Chinese Biomedical Professionals: Cross-Sectional Survey. 中国生物医学专业人员对常用数据模型的态度:横断面调查。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-05 DOI: 10.2196/77603
Yexian Yu, Yongqi Zheng, Meng Zhang, Junqing Xie, Seng Chan You, Mengling Feng, Siyan Zhan, Feng Sun
<p><strong>Background: </strong>In the rapidly evolving landscape of health informatics, adopting a standardized common data model (CDM) is a pivotal strategy for harmonizing data from diverse sources within a cohesive framework. Transitioning regional databases to a CDM is important because it facilitates integration and analysis of vast and varied health datasets. This is particularly relevant in China, where unique demographic and epidemiologic profiles present a rich yet complex data landscape. The significance of this research from the perspective of the Chinese population lies in its potential to bridge gaps among disparate data sources, enabling more comprehensive insights into health trends and outcomes.</p><p><strong>Objective: </strong>This study aimed to understand biomedical professionals' and trainees' acceptance of the CDM in medical data management in China and to explore potential advantages and challenges associated with its promotion, implementation, and development in the country.</p><p><strong>Methods: </strong>We conducted a questionnaire survey using Sojump and distributed it on WeChat to evaluate the Chinese population's acceptance of transitioning from local databases to a standardized CDM. The survey assessed participants' understanding of the CDM and the Observational Medical Outcomes Partnership CDM, as well as their views on the importance of CDM for regional databases in China. Analysis of the survey results revealed the current state, challenges, and trends in CDM application within Chinese health care, providing a foundation for future efforts in data standardization and sharing. The reliability of the questionnaire data was assessed using Cronbach α and Guttman Lambda 6 to determine internal consistency.</p><p><strong>Results: </strong>Our survey of 418 participants revealed that 41.9% (175/418) were aware of the CDM. Recognition of CDM increased with higher education levels and was notably higher among professionals in contract research organizations and the pharmaceutical industry. Knowledge of CDM was primarily gained through literature and conferences, with formal education less common. Logistic regression analysis indicated that individuals with doctoral degrees, researchers, executives, medical professionals, data engineers, Centers for Disease Control and Prevention staff, and statisticians were more likely to be aware of CDM. Subgroup analyses showed higher awareness among doctoral versus nondoctoral and Beijing-based versus non-Beijing respondents, while perceived necessity was broadly comparable across subgroups. Overall, 94.7% (396/418) of respondents believed CDM integration in China is necessary for standardization and efficiency. Despite 60.7% (254/418) optimism for the Observational Medical Outcomes Partnership as the preferred CDM, challenges such as mapping traditional Chinese medicine or Chinese medical insurance remain.</p><p><strong>Conclusions: </strong>A large proportion of respondents express
背景:在快速发展的卫生信息学领域,采用标准化的公共数据模型(CDM)是在一个内聚框架内协调来自不同来源的数据的关键策略。将区域数据库过渡到清洁发展机制非常重要,因为它有助于整合和分析大量不同的卫生数据集。这在中国尤其重要,因为中国独特的人口和流行病学概况提供了丰富而复杂的数据格局。从中国人口的角度来看,这项研究的意义在于它有可能弥合不同数据来源之间的差距,从而更全面地了解健康趋势和结果。目的:本研究旨在了解中国生物医学专业人员和学员对CDM在医疗数据管理中的接受程度,并探讨其在中国推广、实施和发展的潜在优势和挑战。方法:我们使用Sojump进行问卷调查,并在微信上发布,以评估中国人口对从本地数据库过渡到标准化CDM的接受程度。调查评估了参与者对清洁发展机制和观察性医疗成果伙伴关系清洁发展机制的理解,以及他们对清洁发展机制对中国区域数据库重要性的看法。对调查结果的分析揭示了中国卫生保健领域CDM应用的现状、挑战和趋势,为未来在数据标准化和共享方面的努力提供了基础。采用Cronbach α和Guttman Lambda 6评估问卷数据的信度,以确定内部一致性。结果:我们对418名参与者的调查显示,41.9%(175/418)的人知道CDM。随着受教育程度的提高,对清洁发展机制的认识也在增加,尤其是在合同研究组织和制药行业的专业人员中。清洁发展机制的知识主要是通过文献和会议获得的,正规教育不太常见。逻辑回归分析表明,拥有博士学位的个人、研究人员、管理人员、医疗专业人员、数据工程师、疾病控制和预防中心工作人员和统计学家更有可能了解清洁发展机制。亚组分析显示,博士与非博士、北京受访者与非北京受访者的意识更高,而感知到的必要性在各亚组之间具有广泛的可比性。总体而言,94.7%(396/418)的受访者认为清洁发展机制在中国的整合对于标准化和效率是必要的。尽管60.7%(254/418)的人对观察性医疗成果伙伴关系作为首选清洁发展机制持乐观态度,但诸如绘制中医或中国医疗保险地图等挑战仍然存在。结论:大部分受访者对在中国的区域数据库中实施清洁发展机制持积极态度,并得到了博士团队和合同研究机构或制药行业专业人员的大力支持;亚组差异主要集中在意识而非感知必要性上。与会者建议加强与清洁发展机制相关的教育,建立明确的数据共享法规,以支持清洁发展机制在中国的发展。
{"title":"Attitudes Toward Common Data Models Among Chinese Biomedical Professionals: Cross-Sectional Survey.","authors":"Yexian Yu, Yongqi Zheng, Meng Zhang, Junqing Xie, Seng Chan You, Mengling Feng, Siyan Zhan, Feng Sun","doi":"10.2196/77603","DOIUrl":"10.2196/77603","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;In the rapidly evolving landscape of health informatics, adopting a standardized common data model (CDM) is a pivotal strategy for harmonizing data from diverse sources within a cohesive framework. Transitioning regional databases to a CDM is important because it facilitates integration and analysis of vast and varied health datasets. This is particularly relevant in China, where unique demographic and epidemiologic profiles present a rich yet complex data landscape. The significance of this research from the perspective of the Chinese population lies in its potential to bridge gaps among disparate data sources, enabling more comprehensive insights into health trends and outcomes.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aimed to understand biomedical professionals' and trainees' acceptance of the CDM in medical data management in China and to explore potential advantages and challenges associated with its promotion, implementation, and development in the country.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We conducted a questionnaire survey using Sojump and distributed it on WeChat to evaluate the Chinese population's acceptance of transitioning from local databases to a standardized CDM. The survey assessed participants' understanding of the CDM and the Observational Medical Outcomes Partnership CDM, as well as their views on the importance of CDM for regional databases in China. Analysis of the survey results revealed the current state, challenges, and trends in CDM application within Chinese health care, providing a foundation for future efforts in data standardization and sharing. The reliability of the questionnaire data was assessed using Cronbach α and Guttman Lambda 6 to determine internal consistency.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Our survey of 418 participants revealed that 41.9% (175/418) were aware of the CDM. Recognition of CDM increased with higher education levels and was notably higher among professionals in contract research organizations and the pharmaceutical industry. Knowledge of CDM was primarily gained through literature and conferences, with formal education less common. Logistic regression analysis indicated that individuals with doctoral degrees, researchers, executives, medical professionals, data engineers, Centers for Disease Control and Prevention staff, and statisticians were more likely to be aware of CDM. Subgroup analyses showed higher awareness among doctoral versus nondoctoral and Beijing-based versus non-Beijing respondents, while perceived necessity was broadly comparable across subgroups. Overall, 94.7% (396/418) of respondents believed CDM integration in China is necessary for standardization and efficiency. Despite 60.7% (254/418) optimism for the Observational Medical Outcomes Partnership as the preferred CDM, challenges such as mapping traditional Chinese medicine or Chinese medical insurance remain.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;A large proportion of respondents express","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e77603"},"PeriodicalIF":3.8,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12631093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145453977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance of the Large Language Models on the Chinese National Nurse Licensure Examination: Cross-Sectional Evaluation Study. 大型语言模型在中国护士执照考试中的表现:横断面评价研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-03 DOI: 10.2196/78279
Longhui Xu, Xiao Cong, Renxiu Wang, Na Li, Xinru Liu, Ronghui Wang, Cuiping Xu

Background: Large language models (LLMs) are increasingly explored in nursing education, but their capabilities in specialized, high-stakes, culturally specific examinations, such as the Chinese National Nurse Licensure Examination (CNNLE), remain underevaluated, making rigorous evaluation crucial before their adoption in nursing training and practice.

Objective: This study aimed to evaluate the performance, accuracy, repeatability, confidence, and robustness of 4 LLMs on the CNNLE.

Methods: Four LLMs (Sider Fusion [Vidline Inc], GPT-4o [OpenAI], Gemini 2.0 Pro [Google DeepMind], and DeepSeek V3) were tested on 237 multiple-choice questions from the 2024 CNNLE. Accuracy and repeatability were assessed using 2 prompting strategies. Confidence was evaluated via self-ratings (1-10 scale) and robustness via repeated adversarial prompting.

Results: DeepSeek V3 and Gemini 2.0 Pro demonstrated significantly higher overall accuracy (ranging from 199/237 to 209/237; >83%) compared to GPT-4o and Sider Fusion (ranging from 151/237 to 166/237; <71%). However, all LLMs showed suboptimal repeatability (highest at 206/237; <87% consistency). Critically, poor confidence calibration was evident; most models showed high confidence often mismatching actual accuracy (Sider Fusion: P=.01; GPT-4o: P=.03; and Gemini 2.0 Pro: P=.049). A stability-flexibility trade-off paradox was also observed.

Conclusions: While some LLMs show promising accuracy on the CNNLE, fundamental reliability limitations (poor confidence calibration and inconsistent repeatability) hinder safe application in nursing education and practice. Future LLM development must prioritize trustworthiness and calibrated reliability over surface accuracy.

背景:大型语言模型(llm)在护理教育中的应用越来越多,但其在专业、高风险、文化特异性考试(如中国护士执照考试(CNNLE))中的能力仍然被低估,因此在将其应用于护理培训和实践之前,严格的评估至关重要。目的:本研究旨在评价4种llm在CNNLE上的性能、准确性、可重复性、置信度和鲁棒性。方法:对4个llm (Sider Fusion [Vidline Inc .]、gpt - 40 [OpenAI]、Gemini 2.0 Pro[谷歌DeepMind]和DeepSeek V3)在2024年CNNLE的237道选择题上进行测试。采用两种提示策略评估准确性和重复性。信心通过自我评定(1-10量表)和稳健性通过反复的对抗性提示进行评估。结果:与gpt - 40和Sider Fusion(范围从151/237到166/237)相比,DeepSeek V3和Gemini 2.0 Pro显示出更高的总体准确性(范围从199/237到209/237;>83%)。结论:虽然一些llm在CNNLE上显示出有希望的准确性,但基本的可靠性限制(差的置信度校准和不一致的可重复性)阻碍了在护理教育和实践中的安全应用。未来LLM的发展必须优先考虑可信度和校准可靠性,而不是表面精度。
{"title":"Performance of the Large Language Models on the Chinese National Nurse Licensure Examination: Cross-Sectional Evaluation Study.","authors":"Longhui Xu, Xiao Cong, Renxiu Wang, Na Li, Xinru Liu, Ronghui Wang, Cuiping Xu","doi":"10.2196/78279","DOIUrl":"10.2196/78279","url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) are increasingly explored in nursing education, but their capabilities in specialized, high-stakes, culturally specific examinations, such as the Chinese National Nurse Licensure Examination (CNNLE), remain underevaluated, making rigorous evaluation crucial before their adoption in nursing training and practice.</p><p><strong>Objective: </strong>This study aimed to evaluate the performance, accuracy, repeatability, confidence, and robustness of 4 LLMs on the CNNLE.</p><p><strong>Methods: </strong>Four LLMs (Sider Fusion [Vidline Inc], GPT-4o [OpenAI], Gemini 2.0 Pro [Google DeepMind], and DeepSeek V3) were tested on 237 multiple-choice questions from the 2024 CNNLE. Accuracy and repeatability were assessed using 2 prompting strategies. Confidence was evaluated via self-ratings (1-10 scale) and robustness via repeated adversarial prompting.</p><p><strong>Results: </strong>DeepSeek V3 and Gemini 2.0 Pro demonstrated significantly higher overall accuracy (ranging from 199/237 to 209/237; >83%) compared to GPT-4o and Sider Fusion (ranging from 151/237 to 166/237; <71%). However, all LLMs showed suboptimal repeatability (highest at 206/237; <87% consistency). Critically, poor confidence calibration was evident; most models showed high confidence often mismatching actual accuracy (Sider Fusion: P=.01; GPT-4o: P=.03; and Gemini 2.0 Pro: P=.049). A stability-flexibility trade-off paradox was also observed.</p><p><strong>Conclusions: </strong>While some LLMs show promising accuracy on the CNNLE, fundamental reliability limitations (poor confidence calibration and inconsistent repeatability) hinder safe application in nursing education and practice. Future LLM development must prioritize trustworthiness and calibrated reliability over surface accuracy.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e78279"},"PeriodicalIF":3.8,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12582878/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145439935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Postoperative Stress Urinary Incontinence After Prolapse Surgery via Machine Learning and Regression Models: Development and Validation Study. 通过机器学习和回归模型预测脱垂手术后压力性尿失禁:发展和验证研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-03 DOI: 10.2196/76021
Minna Su, Shuyu Wang, Xiaochun Liu

Background: Pelvic organ prolapse (POP) and stress urinary incontinence (SUI) often concurrently exist. The incontinence in some patients with POP resolves after POP surgery, but it persists in others. Some patients without SUI before surgery may develop de novo SUI. It is unclear whether a concomitant anti-incontinence procedure should be performed at the time of POP surgery to prevent postoperative incontinence. A prediction model is needed to guide clinical decision-making.

Objective: This study aimed to analyze the risk factors and develop prediction models for SUI after POP surgery based on machine learning to provide new tools for evaluating and predicting postoperative SUI.

Methods: Sample size calculation was performed using the Riley 4-step method. Data of patients undergoing prolapse surgery in Shanxi Bethune Hospital were prospectively collected from August 2022 to February 2025 and were retrospectively collected from January 2021 to August 2022. General clinical data, relevant laboratory test results, urodynamic examination findings, and pelvic floor ultrasound findings were collected. Lasso regression, univariate analysis, and logistic analysis were used to screen the predictors of SUI after prolapse surgery. Data were split randomly in a 7:3 ratio into training and validation sets. The training set was used to develop the prediction model involving Lasso regression, random forest, support vector machine (SVM), extreme gradient boosting (XGBoost), classification and regression tree (CART), and logistic regression, and the validation set was used for internal verification. The final implementation was achieved by developing a Shiny-based application for model deployment.

Results: A total of 286 patients were enrolled in this study, and 91 patients had postoperative SUI. The following 6 risk factors were identified through univariate, logistic, and Lasso regression analyses: preoperative SUI, urge urinary incontinence, urodynamic occult SUI, anti-incontinence surgery, genital hiatus, and anterior colporrhaphy. Five prediction models were constructed by using logistic regression, random forest, XGBoost, SVM, and CART. Based on a comprehensive evaluation of model discrimination, calibration, and clinical utility, the SVM model demonstrated optimal overall performance, with an area under the curve of 0.821 in the training set and 0.846 in the validation set.

Conclusions: This study developed 5 prediction models for postoperative SUI following prolapse surgery, which demonstrated good performance in internal validation. Among them, the SVM prediction model appeared to be the most promising. However, further external validation data are required to assess its generalizability. This model has the potential to become a high-quality clinical risk prediction tool for postoperative SUI in patients with prolapse, guiding clinical decisi

背景:盆腔器官脱垂(POP)和压力性尿失禁(SUI)常同时存在。一些患者的尿失禁在POP手术后得到缓解,但也有一些患者的尿失禁持续存在。一些术前没有SUI的患者可能会发展为新生SUI。目前尚不清楚是否应该在POP手术时同时进行反失禁手术以预防术后失禁。需要一个预测模型来指导临床决策。目的:本研究旨在分析POP术后SUI的危险因素,建立基于机器学习的SUI预测模型,为评估和预测术后SUI提供新的工具。方法:采用Riley四步法计算样本量。前瞻性收集2022年8月至2025年2月山西白求恩医院脱垂手术患者资料,回顾性收集2021年1月至2022年8月患者资料。收集一般临床资料、相关实验室检查结果、尿动力学检查结果和盆底超声检查结果。采用套索回归、单变量分析和logistic分析筛选脱垂术后SUI的预测因素。数据以7:3的比例随机分成训练集和验证集。利用训练集建立Lasso回归、随机森林、支持向量机(SVM)、极端梯度提升(XGBoost)、分类与回归树(CART)和逻辑回归的预测模型,并利用验证集进行内部验证。最后的实现是通过为模型部署开发一个基于shine的应用程序来实现的。结果:本研究共纳入286例患者,91例患者术后发生SUI。通过单因素、logistic和Lasso回归分析确定了以下6个危险因素:术前SUI、急迫性尿失禁、尿动力隐匿性SUI、抗尿失禁手术、生殖器裂孔和前阴道破裂。利用logistic回归、随机森林、XGBoost、SVM和CART构建了5个预测模型。通过对模型判别、校准和临床效用的综合评价,SVM模型表现出最优的综合性能,训练集曲线下面积为0.821,验证集曲线下面积为0.846。结论:本研究建立了脱垂手术后SUI的5种预测模型,在内部验证中表现良好。其中,SVM预测模型的应用前景最为看好。然而,需要进一步的外部验证数据来评估其普遍性。该模型有潜力成为脱垂患者术后SUI的高质量临床风险预测工具,指导临床决策是否需要同时进行脱垂和尿失禁手术。
{"title":"Predicting Postoperative Stress Urinary Incontinence After Prolapse Surgery via Machine Learning and Regression Models: Development and Validation Study.","authors":"Minna Su, Shuyu Wang, Xiaochun Liu","doi":"10.2196/76021","DOIUrl":"10.2196/76021","url":null,"abstract":"<p><strong>Background: </strong>Pelvic organ prolapse (POP) and stress urinary incontinence (SUI) often concurrently exist. The incontinence in some patients with POP resolves after POP surgery, but it persists in others. Some patients without SUI before surgery may develop de novo SUI. It is unclear whether a concomitant anti-incontinence procedure should be performed at the time of POP surgery to prevent postoperative incontinence. A prediction model is needed to guide clinical decision-making.</p><p><strong>Objective: </strong>This study aimed to analyze the risk factors and develop prediction models for SUI after POP surgery based on machine learning to provide new tools for evaluating and predicting postoperative SUI.</p><p><strong>Methods: </strong>Sample size calculation was performed using the Riley 4-step method. Data of patients undergoing prolapse surgery in Shanxi Bethune Hospital were prospectively collected from August 2022 to February 2025 and were retrospectively collected from January 2021 to August 2022. General clinical data, relevant laboratory test results, urodynamic examination findings, and pelvic floor ultrasound findings were collected. Lasso regression, univariate analysis, and logistic analysis were used to screen the predictors of SUI after prolapse surgery. Data were split randomly in a 7:3 ratio into training and validation sets. The training set was used to develop the prediction model involving Lasso regression, random forest, support vector machine (SVM), extreme gradient boosting (XGBoost), classification and regression tree (CART), and logistic regression, and the validation set was used for internal verification. The final implementation was achieved by developing a Shiny-based application for model deployment.</p><p><strong>Results: </strong>A total of 286 patients were enrolled in this study, and 91 patients had postoperative SUI. The following 6 risk factors were identified through univariate, logistic, and Lasso regression analyses: preoperative SUI, urge urinary incontinence, urodynamic occult SUI, anti-incontinence surgery, genital hiatus, and anterior colporrhaphy. Five prediction models were constructed by using logistic regression, random forest, XGBoost, SVM, and CART. Based on a comprehensive evaluation of model discrimination, calibration, and clinical utility, the SVM model demonstrated optimal overall performance, with an area under the curve of 0.821 in the training set and 0.846 in the validation set.</p><p><strong>Conclusions: </strong>This study developed 5 prediction models for postoperative SUI following prolapse surgery, which demonstrated good performance in internal validation. Among them, the SVM prediction model appeared to be the most promising. However, further external validation data are required to assess its generalizability. This model has the potential to become a high-quality clinical risk prediction tool for postoperative SUI in patients with prolapse, guiding clinical decisi","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e76021"},"PeriodicalIF":3.8,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12582534/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Multitask Learning for Predicting Depression Severity and Suicide Risk Using Pretrained Audio and Text Embeddings: Methodology Development and Application. 使用预训练音频和文本嵌入预测抑郁严重程度和自杀风险的多模态多任务学习:方法开发和应用。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-10-30 DOI: 10.2196/66907
Ya-Han Hu, Ruei-Yan Wu, Min-Yi Su, I-Li Lin, Cheng-Che Shen

Background: Depression is a critical psychological disorder necessitating urgent assessment and treatment, given its strong association with increased suicide risk (SR). Effective management hinges on promptly identifying individuals with high depression severity (DS) and SR. While machine learning and deep learning have advanced the identification of DS and SR, research focusing on both aspects simultaneously remains limited and requires further refinement.

Objective: This study aimed to evaluate whether our proposed methods, which integrate multitask learning (MTL), multimodal learning, and transfer learning, enhance the efficacy of deep learning models in the joint classification of DS and SR.

Methods: This study proposed a multitask framework employing a multimodal fusion strategy for pretrained audio and text embeddings to concurrently assess DS and SR. Data encompassing Chinese audio recordings and clinical questionnaire scores from 100 patients with depression and 100 healthy controls were used. Preprocessed audio and text data were transformed into pretrained embeddings and integrated using concatenation and hard parameter sharing. Single-task learning (STL) models (DS and SR tasks) were evaluated with different embeddings and further compared with the MTL models.

Results: The STL models demonstrated exceptional DS prediction (area under the curve [AUC]=0.878) using wav2vec 2.0 combined with ERNIE-health, and SR prediction (AUC=0.876) using HuBERT combined with ERNIE-health. The MTL models significantly improved SR prediction over DS prediction, achieving the highest DS classification (AUC=0.887) with wav2vec 2.0 combined with ERNIE-health, and SR classification (AUC=0.883) with HuBERT combined with ERNIE-health.

Conclusions: The findings of this study underscore the effectiveness of the proposed MTL models using specific pretrained audio and text embeddings in enhancing model performance. However, we advocate for cautious implementation of MTL to mitigate potential negative transfer effects. Our research presents a method that is both promising and effective, offering an objective approach for accurate clinical decision support in the parallel diagnosis of DS and SR.

背景:抑郁症是一种严重的心理障碍,需要紧急评估和治疗,因为它与自杀风险(SR)增加密切相关。有效的管理取决于及时识别高抑郁严重程度(DS)和高抑郁严重程度(SR)的个体。虽然机器学习和深度学习促进了DS和SR的识别,但同时关注这两个方面的研究仍然有限,需要进一步完善。目的:本研究旨在评估我们提出的融合多任务学习(MTL)、多模态学习和迁移学习的方法是否能提高深度学习模型在DS和sr联合分类中的有效性。本研究提出了一个多任务框架,采用多模式融合策略,对预训练的音频和文本嵌入同时评估DS和sr。数据包括来自100名抑郁症患者和100名健康对照者的中文录音和临床问卷得分。将经过预处理的音频和文本数据转换为预训练的嵌入,并采用串联和硬参数共享的方法进行集成。采用不同的嵌入方式对单任务学习模型(DS和SR任务)进行评价,并与MTL模型进行比较。结果:使用wav2vec 2.0联合erie -health的STL模型对DS的预测(曲线下面积[AUC]=0.878)优于使用HuBERT联合erie -health的STL模型对SR的预测(AUC=0.876)。MTL模型对SR的预测效果明显优于DS预测,其中wav2vec 2.0联合erne -health模型的DS分类效果最高(AUC=0.887), HuBERT联合erne -health模型的SR分类效果最高(AUC=0.883)。结论:本研究的结果强调了使用特定的预训练音频和文本嵌入的MTL模型在提高模型性能方面的有效性。然而,我们主张谨慎实施MTL,以减轻潜在的负面转移效应。我们的研究提出了一种既有希望又有效的方法,为DS和SR的并行诊断提供了准确的临床决策支持的客观方法。
{"title":"Multimodal Multitask Learning for Predicting Depression Severity and Suicide Risk Using Pretrained Audio and Text Embeddings: Methodology Development and Application.","authors":"Ya-Han Hu, Ruei-Yan Wu, Min-Yi Su, I-Li Lin, Cheng-Che Shen","doi":"10.2196/66907","DOIUrl":"10.2196/66907","url":null,"abstract":"<p><strong>Background: </strong>Depression is a critical psychological disorder necessitating urgent assessment and treatment, given its strong association with increased suicide risk (SR). Effective management hinges on promptly identifying individuals with high depression severity (DS) and SR. While machine learning and deep learning have advanced the identification of DS and SR, research focusing on both aspects simultaneously remains limited and requires further refinement.</p><p><strong>Objective: </strong>This study aimed to evaluate whether our proposed methods, which integrate multitask learning (MTL), multimodal learning, and transfer learning, enhance the efficacy of deep learning models in the joint classification of DS and SR.</p><p><strong>Methods: </strong>This study proposed a multitask framework employing a multimodal fusion strategy for pretrained audio and text embeddings to concurrently assess DS and SR. Data encompassing Chinese audio recordings and clinical questionnaire scores from 100 patients with depression and 100 healthy controls were used. Preprocessed audio and text data were transformed into pretrained embeddings and integrated using concatenation and hard parameter sharing. Single-task learning (STL) models (DS and SR tasks) were evaluated with different embeddings and further compared with the MTL models.</p><p><strong>Results: </strong>The STL models demonstrated exceptional DS prediction (area under the curve [AUC]=0.878) using wav2vec 2.0 combined with ERNIE-health, and SR prediction (AUC=0.876) using HuBERT combined with ERNIE-health. The MTL models significantly improved SR prediction over DS prediction, achieving the highest DS classification (AUC=0.887) with wav2vec 2.0 combined with ERNIE-health, and SR classification (AUC=0.883) with HuBERT combined with ERNIE-health.</p><p><strong>Conclusions: </strong>The findings of this study underscore the effectiveness of the proposed MTL models using specific pretrained audio and text embeddings in enhancing model performance. However, we advocate for cautious implementation of MTL to mitigate potential negative transfer effects. Our research presents a method that is both promising and effective, offering an objective approach for accurate clinical decision support in the parallel diagnosis of DS and SR.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e66907"},"PeriodicalIF":3.8,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12574750/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145410827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JMIR Medical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1