JMIR Medical Informatics最新文献

Robust Automated Harmonization of Heterogeneous Data Through Ensemble Machine Learning: Algorithm Development and Validation Study.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-22 DOI: 10.2196/54133

Doris Yang, Doudou Zhou, Steven Cai, Ziming Gan, Michael Pencina, Paul Avillach, Tianxi Cai, Chuan Hong

Background: Cohort studies contain rich clinical data across large and diverse patient populations and are a common source of observational data for clinical research. Because large scale cohort studies are both time and resource intensive, one alternative is to harmonize data from existing cohorts through multicohort studies. However, given differences in variable encoding, accurate variable harmonization is difficult.

Objective: We propose SONAR (Semantic and Distribution-Based Harmonization) as a method for harmonizing variables across cohort studies to facilitate multicohort studies.

Methods: SONAR used semantic learning from variable descriptions and distribution learning from study participant data. Our method learned an embedding vector for each variable and used pairwise cosine similarity to score the similarity between variables. This approach was built off 3 National Institutes of Health cohorts, including the Cardiovascular Health Study, the Multi-Ethnic Study of Atherosclerosis, and the Women's Health Initiative. We also used gold standard labels to further refine the embeddings in a supervised manner.

Results: The method was evaluated using manually curated gold standard labels from the 3 National Institutes of Health cohorts. We evaluated both the intracohort and intercohort variable harmonization performance. The supervised SONAR method outperformed existing benchmark methods for almost all intracohort and intercohort comparisons using area under the curve and top-k accuracy metrics. Notably, SONAR was able to significantly improve harmonization of concepts that were difficult for existing semantic methods to harmonize.

Conclusions: SONAR achieves accurate variable harmonization within and between cohort studies by harnessing the complementary strengths of semantic learning and variable distribution learning.

{"title":"Robust Automated Harmonization of Heterogeneous Data Through Ensemble Machine Learning: Algorithm Development and Validation Study.","authors":"Doris Yang, Doudou Zhou, Steven Cai, Ziming Gan, Michael Pencina, Paul Avillach, Tianxi Cai, Chuan Hong","doi":"10.2196/54133","DOIUrl":"https://doi.org/10.2196/54133","url":null,"abstract":"Background: Cohort studies contain rich clinical data across large and diverse patient populations and are a common source of observational data for clinical research. Because large scale cohort studies are both time and resource intensive, one alternative is to harmonize data from existing cohorts through multicohort studies. However, given differences in variable encoding, accurate variable harmonization is difficult.Objective: We propose SONAR (Semantic and Distribution-Based Harmonization) as a method for harmonizing variables across cohort studies to facilitate multicohort studies.Methods: SONAR used semantic learning from variable descriptions and distribution learning from study participant data. Our method learned an embedding vector for each variable and used pairwise cosine similarity to score the similarity between variables. This approach was built off 3 National Institutes of Health cohorts, including the Cardiovascular Health Study, the Multi-Ethnic Study of Atherosclerosis, and the Women's Health Initiative. We also used gold standard labels to further refine the embeddings in a supervised manner.Results: The method was evaluated using manually curated gold standard labels from the 3 National Institutes of Health cohorts. We evaluated both the intracohort and intercohort variable harmonization performance. The supervised SONAR method outperformed existing benchmark methods for almost all intracohort and intercohort comparisons using area under the curve and top-k accuracy metrics. Notably, SONAR was able to significantly improve harmonization of concepts that were difficult for existing semantic methods to harmonize.Conclusions: SONAR achieves accurate variable harmonization within and between cohort studies by harnessing the complementary strengths of semantic learning and variable distribution learning.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e54133"},"PeriodicalIF":3.1,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Impact of Data Control and Delayed Discounting on the Public's Willingness to Share Different Types of Health Care Data: Empirical Study.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-22 DOI: 10.2196/66444

Dongle Wei, Pan Gao, Yunkai Zhai

Background: Health data typically include patient-generated data and clinical medical data. Different types of data contribute to disease prevention, precision medicine, and the overall improvement of health care. With the introduction of regulations such as the Health Insurance Portability and Accountability Act (HIPAA), individuals play a key role in the sharing and application of personal health data.

Objective: This study aims to explore the impact of different types of health data on users' willingness to share. Additionally, it analyzes the effect of data control and delay discounting rate on this process.

Methods: The results of a web-based survey were analyzed to examine individuals' perceptions of sharing different types of health data and how data control and delay discounting rates influenced their decisions. We recruited participants for our study through the web-based platform "Wenjuanxing." After screening, we obtained 257 valid responses. Regression analysis was used to investigate the impact of data control, delayed discounting, and mental accounting on the public's willingness to share different types of health care data.

Results: Our findings indicate that the type of health data does not significantly affect the perceived benefits of data sharing. Instead, it negatively influences willingness to share by indirectly affecting data acquisition costs and perceived risks. Our results also show that data control reduces the perceived risks associated with sharing, while higher delay discounting rates lead to an overestimation of data acquisition costs and perceived risks.

Conclusions: Individuals' willingness to share data is primarily influenced by costs. To promote the acquisition and development of personal health data, stakeholders should strengthen individuals' control over their data or provide direct short-term incentives.

{"title":"The Impact of Data Control and Delayed Discounting on the Public's Willingness to Share Different Types of Health Care Data: Empirical Study.","authors":"Dongle Wei, Pan Gao, Yunkai Zhai","doi":"10.2196/66444","DOIUrl":"https://doi.org/10.2196/66444","url":null,"abstract":"Background: Health data typically include patient-generated data and clinical medical data. Different types of data contribute to disease prevention, precision medicine, and the overall improvement of health care. With the introduction of regulations such as the Health Insurance Portability and Accountability Act (HIPAA), individuals play a key role in the sharing and application of personal health data.Objective: This study aims to explore the impact of different types of health data on users' willingness to share. Additionally, it analyzes the effect of data control and delay discounting rate on this process.Methods: The results of a web-based survey were analyzed to examine individuals' perceptions of sharing different types of health data and how data control and delay discounting rates influenced their decisions. We recruited participants for our study through the web-based platform \"Wenjuanxing.\" After screening, we obtained 257 valid responses. Regression analysis was used to investigate the impact of data control, delayed discounting, and mental accounting on the public's willingness to share different types of health care data.Results: Our findings indicate that the type of health data does not significantly affect the perceived benefits of data sharing. Instead, it negatively influences willingness to share by indirectly affecting data acquisition costs and perceived risks. Our results also show that data control reduces the perceived risks associated with sharing, while higher delay discounting rates lead to an overestimation of data acquisition costs and perceived risks.Conclusions: Individuals' willingness to share data is primarily influenced by costs. To promote the acquisition and development of personal health data, stakeholders should strengthen individuals' control over their data or provide direct short-term incentives.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e66444"},"PeriodicalIF":3.1,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-21 DOI: 10.2196/65454

Nicholas C Cardamone, Mark Olfson, Timothy Schmutte, Lyle Ungar, Tony Liu, Sara W Cullen, Nathaniel J Williams, Steven C Marcus

Background: Prediction models have demonstrated a range of applications across medicine, including using electronic health record (EHR) data to identify hospital readmission and mortality risk. Large language models (LLMs) can transform unstructured EHR text into structured features, which can then be integrated into statistical prediction models, ensuring that the results are both clinically meaningful and interpretable.

Objective: This study aims to compare the classification decisions made by clinical experts with those generated by a state-of-the-art LLM, using terms extracted from a large EHR data set of individuals with mental health disorders seen in emergency departments (EDs).

Methods: Using a dataset from the EHR systems of more than 50 health care provider organizations in the United States from 2016 to 2021, we extracted all clinical terms that appeared in at least 1000 records of individuals admitted to the ED for a mental health-related problem from a source population of over 6 million ED episodes. Two experienced mental health clinicians (one medically trained psychiatrist and one clinical psychologist) reached consensus on the classification of EHR terms and diagnostic codes into categories. We evaluated an LLM's agreement with clinical judgment across three classification tasks as follows: (1) classify terms into "mental health" or "physical health", (2) classify mental health terms into 1 of 42 prespecified categories, and (3) classify physical health terms into 1 of 19 prespecified broad categories.

Results: There was high agreement between the LLM and clinical experts when categorizing 4553 terms as "mental health" or "physical health" (κ=0.77, 95% CI 0.75-0.80). However, there was still considerable variability in LLM-clinician agreement on the classification of mental health terms (κ=0.62, 95% CI 0.59-0.66) and physical health terms (κ=0.69, 95% CI 0.67-0.70).

Conclusions: The LLM displayed high agreement with clinical experts when classifying EHR terms into certain mental health or physical health term categories. However, agreement with clinical experts varied considerably within both sets of mental and physical health term categories. Importantly, the use of LLMs presents an alternative to manual human coding, presenting great potential to create interpretable features for prediction models.

{"title":"Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study.","authors":"Nicholas C Cardamone, Mark Olfson, Timothy Schmutte, Lyle Ungar, Tony Liu, Sara W Cullen, Nathaniel J Williams, Steven C Marcus","doi":"10.2196/65454","DOIUrl":"https://doi.org/10.2196/65454","url":null,"abstract":"Background: Prediction models have demonstrated a range of applications across medicine, including using electronic health record (EHR) data to identify hospital readmission and mortality risk. Large language models (LLMs) can transform unstructured EHR text into structured features, which can then be integrated into statistical prediction models, ensuring that the results are both clinically meaningful and interpretable.Objective: This study aims to compare the classification decisions made by clinical experts with those generated by a state-of-the-art LLM, using terms extracted from a large EHR data set of individuals with mental health disorders seen in emergency departments (EDs).Methods: Using a dataset from the EHR systems of more than 50 health care provider organizations in the United States from 2016 to 2021, we extracted all clinical terms that appeared in at least 1000 records of individuals admitted to the ED for a mental health-related problem from a source population of over 6 million ED episodes. Two experienced mental health clinicians (one medically trained psychiatrist and one clinical psychologist) reached consensus on the classification of EHR terms and diagnostic codes into categories. We evaluated an LLM's agreement with clinical judgment across three classification tasks as follows: (1) classify terms into \"mental health\" or \"physical health\", (2) classify mental health terms into 1 of 42 prespecified categories, and (3) classify physical health terms into 1 of 19 prespecified broad categories.Results: There was high agreement between the LLM and clinical experts when categorizing 4553 terms as \"mental health\" or \"physical health\" (κ=0.77, 95% CI 0.75-0.80). However, there was still considerable variability in LLM-clinician agreement on the classification of mental health terms (κ=0.62, 95% CI 0.59-0.66) and physical health terms (κ=0.69, 95% CI 0.67-0.70).Conclusions: The LLM displayed high agreement with clinical experts when classifying EHR terms into certain mental health or physical health term categories. However, agreement with clinical experts varied considerably within both sets of mental and physical health term categories. Importantly, the use of LLMs presents an alternative to manual human coding, presenting great potential to create interpretable features for prediction models.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e65454"},"PeriodicalIF":3.1,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Dynamic Adaptive Ensemble Learning Framework for Noninvasive Mild Cognitive Impairment Detection: Development and Validation Study. 无创轻度认知障碍检测的动态自适应集成学习框架：开发与验证研究。

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-20 DOI: 10.2196/60250

Aoyu Li, Jingwen Li, Yishan Hu, Yan Geng, Yan Qiang, Juanjuan Zhao

Background: The prompt and accurate identification of mild cognitive impairment (MCI) is crucial for preventing its progression into more severe neurodegenerative diseases. However, current diagnostic solutions, such as biomarkers and cognitive screening tests, prove costly, time-consuming, and invasive, hindering patient compliance and the accessibility of these tests. Therefore, exploring a more cost-effective, efficient, and noninvasive method to aid clinicians in detecting MCI is necessary.Objective: This study aims to develop an ensemble learning framework that adaptively integrates multimodal physiological data collected from wearable wristbands and digital cognitive metrics recorded on tablets, thereby improving the accuracy and practicality of MCI detection.Methods: We recruited 843 participants aged 60 years and older from the geriatrics and neurology departments of our collaborating hospitals, who were randomly divided into a development dataset (674/843 participants) and an internal test dataset (169/843 participants) at a 4:1 ratio. In addition, 226 older adults were recruited from 3 external centers to form an external test dataset. We measured their physiological signals (eg, electrodermal activity and photoplethysmography) and digital cognitive parameters (eg, reaction time and test scores) using the clinically certified Empatica 4 wristband and a tablet cognitive screening tool. The collected data underwent rigorous preprocessing, during which features in the time, frequency, and nonlinear domains were extracted from individual physiological signals. To address the challenges (eg, the curse of dimensionality and increased model complexity) posed by high-dimensional features, we developed a dynamic adaptive feature selection optimization algorithm to identify the most impactful subset of features for classification performance. Finally, the accuracy and efficiency of the classification model were improved by optimizing the combination of base learners.Results: The experimental results indicate that the proposed MCI detection framework achieved classification accuracies of 88.4%, 85.5%, and 84.5% on the development, internal test, and external test datasets, respectively. The area under the curve for the binary classification task was 0.945 (95% CI 0.903-0.986), 0.912 (95% CI 0.859-0.965), and 0.904 (95% CI 0.846-0.962) on these datasets. Furthermore, a statistical analysis of feature subsets during the iterative modeling process revealed that the decay time of skin conductance response, the percentage of continuous normal-to-normal intervals exceeding 50 milliseconds, the ratio of low-frequency to high-frequency (LF/HF) components in heart rate variability, and cognitive time features emerged as the most prevalent and effective indicators. Specifically, compared with healthy individuals, patients with MCI exhibited a longer skin conductance

背景：及时准确地识别轻度认知障碍（MCI）对于防止其发展为更严重的神经退行性疾病至关重要。然而，目前的诊断解决方案，如生物标志物和认知筛查测试，证明是昂贵、耗时和侵入性的，阻碍了患者的依从性和这些测试的可及性。因此，探索一种更具成本效益、效率和非侵入性的方法来帮助临床医生检测MCI是必要的。目的：本研究旨在开发一个集成学习框架，自适应整合可穿戴腕带采集的多模态生理数据和平板电脑记录的数字认知指标，从而提高MCI检测的准确性和实用性。方法：从合作医院老年病学和神经内科招募60岁及以上的参与者843人，按4:1的比例随机分为开发数据集（674/843）和内部测试数据集（169/843）。此外，从3个外部中心招募226名老年人形成外部测试数据集。我们使用临床认证的Empatica 4腕带和平板电脑认知筛查工具测量了他们的生理信号（如皮电活动和光体积脉搏波）和数字认知参数（如反应时间和测试分数）。采集到的数据经过严格的预处理，提取个体生理信号的时间、频率和非线性域特征。为了解决高维特征带来的挑战（例如，维度的诅咒和模型复杂性的增加），我们开发了一种动态自适应特征选择优化算法来识别对分类性能最有影响的特征子集。最后，通过对基学习器组合的优化，提高了分类模型的准确率和效率。结果：实验结果表明，所提出的MCI检测框架在开发、内部测试和外部测试数据集上的分类准确率分别为88.4%、85.5%和84.5%。在这些数据集上，二元分类任务的曲线下面积分别为0.945 （95% CI 0.903-0.986）、0.912 （95% CI 0.859-0.965）和0.904 （95% CI 0.846-0.962）。此外，对迭代建模过程中特征子集的统计分析表明，皮肤电导响应的衰减时间、连续正常到正常间隔超过50毫秒的百分比、心率变异性中低频与高频（LF/HF）分量的比例以及认知时间特征是最普遍和最有效的指标。具体来说，与健康个体相比，MCI患者在认知测试中表现出更长的皮肤电导反应衰减时间(p结论：开发的MCI检测框架在大规模验证中表现出示范性的性能和稳定性。它为无创、有效的早期MCI检测建立了一个新的基准，可以整合到常规的可穿戴和平板电脑评估中。此外，该框架能够在家庭或非专业环境中进行持续和方便的自我筛查，有效缓解资源不足的卫生保健和地理位置限制，使其成为当前防治神经退行性疾病的重要工具。

{"title":"A Dynamic Adaptive Ensemble Learning Framework for Noninvasive Mild Cognitive Impairment Detection: Development and Validation Study.","authors":"Aoyu Li, Jingwen Li, Yishan Hu, Yan Geng, Yan Qiang, Juanjuan Zhao","doi":"10.2196/60250","DOIUrl":"https://doi.org/10.2196/60250","url":null,"abstract":"Background: The prompt and accurate identification of mild cognitive impairment (MCI) is crucial for preventing its progression into more severe neurodegenerative diseases. However, current diagnostic solutions, such as biomarkers and cognitive screening tests, prove costly, time-consuming, and invasive, hindering patient compliance and the accessibility of these tests. Therefore, exploring a more cost-effective, efficient, and noninvasive method to aid clinicians in detecting MCI is necessary.Objective: This study aims to develop an ensemble learning framework that adaptively integrates multimodal physiological data collected from wearable wristbands and digital cognitive metrics recorded on tablets, thereby improving the accuracy and practicality of MCI detection.Methods: We recruited 843 participants aged 60 years and older from the geriatrics and neurology departments of our collaborating hospitals, who were randomly divided into a development dataset (674/843 participants) and an internal test dataset (169/843 participants) at a 4:1 ratio. In addition, 226 older adults were recruited from 3 external centers to form an external test dataset. We measured their physiological signals (eg, electrodermal activity and photoplethysmography) and digital cognitive parameters (eg, reaction time and test scores) using the clinically certified Empatica 4 wristband and a tablet cognitive screening tool. The collected data underwent rigorous preprocessing, during which features in the time, frequency, and nonlinear domains were extracted from individual physiological signals. To address the challenges (eg, the curse of dimensionality and increased model complexity) posed by high-dimensional features, we developed a dynamic adaptive feature selection optimization algorithm to identify the most impactful subset of features for classification performance. Finally, the accuracy and efficiency of the classification model were improved by optimizing the combination of base learners.Results: The experimental results indicate that the proposed MCI detection framework achieved classification accuracies of 88.4%, 85.5%, and 84.5% on the development, internal test, and external test datasets, respectively. The area under the curve for the binary classification task was 0.945 (95% CI 0.903-0.986), 0.912 (95% CI 0.859-0.965), and 0.904 (95% CI 0.846-0.962) on these datasets. Furthermore, a statistical analysis of feature subsets during the iterative modeling process revealed that the decay time of skin conductance response, the percentage of continuous normal-to-normal intervals exceeding 50 milliseconds, the ratio of low-frequency to high-frequency (LF/HF) components in heart rate variability, and cognitive time features emerged as the most prevalent and effective indicators. Specifically, compared with healthy individuals, patients with MCI exhibited a longer skin conductance ","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e60250"},"PeriodicalIF":3.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143017039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Digital Health Innovations to Catalyze the Transition to Value-Based Health Care.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-20 DOI: 10.2196/57385

Lan Zhang, Christopher Bullen, Jinsong Chen

Unlabelled: The health care industry is currently going through a transformation due to the integration of technologies and the shift toward value-based health care (VBHC). This article explores how digital health solutions play a role in advancing VBHC, highlighting both the challenges and opportunities associated with adopting these technologies. Digital health, which includes mobile health, wearable devices, telehealth, and personalized medicine, shows promise in improving diagnostic accuracy, treatment options, and overall health outcomes. The article delves into the concept of transformation in health care by emphasizing its potential to reform care delivery through data communication, patient engagement, and operational efficiency. Moreover, it examines the principles of VBHC, with a focus on patient outcomes, and emphasizes how digital platforms play a role in treatment among tertiary hospitals by using patient-reported outcome measures. The article discusses challenges that come with implementing VBHC, such as stakeholder engagement and standardization of patient-reported outcome measures. It also highlights the role played by health innovators in facilitating the transition toward VBHC models. Through real-life case examples, this article illustrates how digital platforms have had an impact on efficiencies, patient outcomes, and empowerment. In conclusion, it envisions directions for solutions in VBHC by emphasizing the need for interoperability, standardization, and collaborative efforts among stakeholders to fully realize the potential of digital transformation in health care. This research highlights the impact of digital health in creating a health care system that focuses on providing high-quality, efficient, and patient-centered care.

{"title":"Digital Health Innovations to Catalyze the Transition to Value-Based Health Care.","authors":"Lan Zhang, Christopher Bullen, Jinsong Chen","doi":"10.2196/57385","DOIUrl":"https://doi.org/10.2196/57385","url":null,"abstract":"Unlabelled: The health care industry is currently going through a transformation due to the integration of technologies and the shift toward value-based health care (VBHC). This article explores how digital health solutions play a role in advancing VBHC, highlighting both the challenges and opportunities associated with adopting these technologies. Digital health, which includes mobile health, wearable devices, telehealth, and personalized medicine, shows promise in improving diagnostic accuracy, treatment options, and overall health outcomes. The article delves into the concept of transformation in health care by emphasizing its potential to reform care delivery through data communication, patient engagement, and operational efficiency. Moreover, it examines the principles of VBHC, with a focus on patient outcomes, and emphasizes how digital platforms play a role in treatment among tertiary hospitals by using patient-reported outcome measures. The article discusses challenges that come with implementing VBHC, such as stakeholder engagement and standardization of patient-reported outcome measures. It also highlights the role played by health innovators in facilitating the transition toward VBHC models. Through real-life case examples, this article illustrates how digital platforms have had an impact on efficiencies, patient outcomes, and empowerment. In conclusion, it envisions directions for solutions in VBHC by emphasizing the need for interoperability, standardization, and collaborative efforts among stakeholders to fully realize the potential of digital transformation in health care. This research highlights the impact of digital health in creating a health care system that focuses on providing high-quality, efficient, and patient-centered care.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e57385"},"PeriodicalIF":3.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interpretable Machine Learning Model for Predicting Postpartum Depression: Retrospective Study.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-20 DOI: 10.2196/58649

Ren Zhang, Yi Liu, Zhiwei Zhang, Rui Luo, Bin Lv

Background: Postpartum depression (PPD) is a prevalent mental health issue with significant impacts on mothers and families. Exploring reliable predictors is crucial for the early and accurate prediction of PPD, which remains challenging.

Objective: This study aimed to comprehensively collect variables from multiple aspects, develop and validate machine learning models to achieve precise prediction of PPD, and interpret the model to reveal clinical implications.

Methods: This study recruited pregnant women who delivered at the West China Second University Hospital, Sichuan University. Various variables were collected from electronic medical record data and screened using least absolute shrinkage and selection operator penalty regression. Participants were divided into training (1358/2055, 66.1%) and validation (697/2055, 33.9%) sets by random sampling. Machine learning-based predictive models were developed in the training cohort. Models were validated in the validation cohort with receiver operating curve and decision curve analysis. Multiple model interpretation methods were implemented to explain the optimal model.

Results: We recruited 2055 participants in this study. The extreme gradient boosting model was the optimal predictive model with the area under the receiver operating curve of 0.849. Shapley Additive Explanation indicated that the most influential predictors of PPD were antepartum depression, lower fetal weight, elevated thyroid-stimulating hormone, declined thyroid peroxidase antibodies, elevated serum ferritin, and older age.

Conclusions: This study developed and validated a machine learning-based predictive model for PPD. Several significant risk factors and how they impact the prediction of PPD were revealed. These findings provide new insights into the early screening of individuals with high risk for PPD, emphasizing the need for comprehensive screening approaches that include both physiological and psychological factors.

{"title":"Interpretable Machine Learning Model for Predicting Postpartum Depression: Retrospective Study.","authors":"Ren Zhang, Yi Liu, Zhiwei Zhang, Rui Luo, Bin Lv","doi":"10.2196/58649","DOIUrl":"https://doi.org/10.2196/58649","url":null,"abstract":"Background: Postpartum depression (PPD) is a prevalent mental health issue with significant impacts on mothers and families. Exploring reliable predictors is crucial for the early and accurate prediction of PPD, which remains challenging.Objective: This study aimed to comprehensively collect variables from multiple aspects, develop and validate machine learning models to achieve precise prediction of PPD, and interpret the model to reveal clinical implications.Methods: This study recruited pregnant women who delivered at the West China Second University Hospital, Sichuan University. Various variables were collected from electronic medical record data and screened using least absolute shrinkage and selection operator penalty regression. Participants were divided into training (1358/2055, 66.1%) and validation (697/2055, 33.9%) sets by random sampling. Machine learning-based predictive models were developed in the training cohort. Models were validated in the validation cohort with receiver operating curve and decision curve analysis. Multiple model interpretation methods were implemented to explain the optimal model.Results: We recruited 2055 participants in this study. The extreme gradient boosting model was the optimal predictive model with the area under the receiver operating curve of 0.849. Shapley Additive Explanation indicated that the most influential predictors of PPD were antepartum depression, lower fetal weight, elevated thyroid-stimulating hormone, declined thyroid peroxidase antibodies, elevated serum ferritin, and older age.Conclusions: This study developed and validated a machine learning-based predictive model for PPD. Several significant risk factors and how they impact the prediction of PPD were revealed. These findings provide new insights into the early screening of individuals with high risk for PPD, emphasizing the need for comprehensive screening approaches that include both physiological and psychological factors.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e58649"},"PeriodicalIF":3.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance of an Electronic Health Record-Based Automated Pulmonary Embolism Severity Index Score Calculator: Cohort Study in the Emergency Department.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-20 DOI: 10.2196/58800

Elizabeth Joyce, James McMullen, Xiaowen Kong, Connor O'Hare, Valerie Gavrila, Anthony Cuttitta, Geoffrey D Barnes, Colin F Greineder

Background: Studies suggest that less than 4% of patients with pulmonary embolisms (PEs) are managed in the outpatient setting. Strong evidence and multiple guidelines support the use of the Pulmonary Embolism Severity Index (PESI) for the identification of acute PE patients appropriate for outpatient management. However, calculating the PESI score can be inconvenient in a busy emergency department (ED). To facilitate integration into ED workflow, we created a 2023 Epic-compatible clinical decision support tool that automatically calculates the PESI score in real-time with patients' electronic health data (ePESI [Electronic Pulmonary Embolism Severity Index]).

Objective: The primary objectives of this study were to determine the overall accuracy of ePESI and its ability to correctly distinguish high- and low-risk PESI scores within the Epic 2023 software. The secondary objective was to identify variables that impact ePESI accuracy.

Methods: We collected ePESI scores on 500 consecutive patients at least 18 years old who underwent a computerized tomography-pulmonary embolism scan in the ED of our tertiary, academic health center between January 3 and February 15, 2023. We compared ePESI results to a PESI score calculated by 2 independent, medically-trained abstractors blinded to the ePESI and each other's results. ePESI accuracy was calculated with binomial test. The odds ratio (OR) was calculated using logistic regression.

Results: Of the 500 patients, a total of 203 (40.6%) and 297 (59.4%) patients had low- and high-risk PESI scores, respectively. The ePESI exactly matched the calculated PESI in 394 out of 500 cases, with an accuracy of 78.8% (95% CI 74.9%-82.3%), and correctly identified low- versus high-risk in 477 out of 500 (95.4%) cases. The accuracy of the ePESI was higher for low-risk scores (OR 2.96, P<.001) and lower when patients were without prior encounters in the health system (OR 0.42, P=.008).

Conclusions: In this single-center study, the ePESI was highly accurate in discriminating between low- and high-risk scores. The clinical decision support should facilitate real-time identification of patients who may be candidates for outpatient PE management.

{"title":"Performance of an Electronic Health Record-Based Automated Pulmonary Embolism Severity Index Score Calculator: Cohort Study in the Emergency Department.","authors":"Elizabeth Joyce, James McMullen, Xiaowen Kong, Connor O'Hare, Valerie Gavrila, Anthony Cuttitta, Geoffrey D Barnes, Colin F Greineder","doi":"10.2196/58800","DOIUrl":"https://doi.org/10.2196/58800","url":null,"abstract":"Background: Studies suggest that less than 4% of patients with pulmonary embolisms (PEs) are managed in the outpatient setting. Strong evidence and multiple guidelines support the use of the Pulmonary Embolism Severity Index (PESI) for the identification of acute PE patients appropriate for outpatient management. However, calculating the PESI score can be inconvenient in a busy emergency department (ED). To facilitate integration into ED workflow, we created a 2023 Epic-compatible clinical decision support tool that automatically calculates the PESI score in real-time with patients' electronic health data (ePESI [Electronic Pulmonary Embolism Severity Index]).Objective: The primary objectives of this study were to determine the overall accuracy of ePESI and its ability to correctly distinguish high- and low-risk PESI scores within the Epic 2023 software. The secondary objective was to identify variables that impact ePESI accuracy.Methods: We collected ePESI scores on 500 consecutive patients at least 18 years old who underwent a computerized tomography-pulmonary embolism scan in the ED of our tertiary, academic health center between January 3 and February 15, 2023. We compared ePESI results to a PESI score calculated by 2 independent, medically-trained abstractors blinded to the ePESI and each other's results. ePESI accuracy was calculated with binomial test. The odds ratio (OR) was calculated using logistic regression.Results: Of the 500 patients, a total of 203 (40.6%) and 297 (59.4%) patients had low- and high-risk PESI scores, respectively. The ePESI exactly matched the calculated PESI in 394 out of 500 cases, with an accuracy of 78.8% (95% CI 74.9%-82.3%), and correctly identified low- versus high-risk in 477 out of 500 (95.4%) cases. The accuracy of the ePESI was higher for low-risk scores (OR 2.96, P<.001) and lower when patients were without prior encounters in the health system (OR 0.42, P=.008).Conclusions: In this single-center study, the ePESI was highly accurate in discriminating between low- and high-risk scores. The clinical decision support should facilitate real-time identification of patients who may be candidates for outpatient PE management.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e58800"},"PeriodicalIF":3.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Development and Validation of a Machine Learning Method Using Vocal Biomarkers for Identifying Frailty in Community-Dwelling Older Adults: Cross-Sectional Study. 使用声音生物标志物识别社区老年人虚弱的机器学习方法的开发和验证：横断面研究。

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-16 DOI: 10.2196/57298

Taehwan Kim, Jung-Yeon Choi, Myung Jin Ko, Kwang-Il Kim

Background: The two most commonly used methods to identify frailty are the frailty phenotype and the frailty index. However, both methods have limitations in clinical application. In addition, methods for measuring frailty have not yet been standardized.Objective: We aimed to develop and validate a classification model for predicting frailty status using vocal biomarkers in community-dwelling older adults, based on voice recordings obtained from the picture description task (PDT).Methods: We recruited 127 participants aged 50 years and older and collected clinical information through a short form of the Comprehensive Geriatric Assessment scale. Voice recordings were collected with a tablet device during the Korean version of the PDT, and we preprocessed audio data to remove background noise before feature extraction. Three artificial intelligence (AI) models were developed for identifying frailty status: SpeechAI (using speech data only), DemoAI (using demographic data only), and DemoSpeechAI (combining both data types).Results: Our models were trained and evaluated on the basis of 5-fold cross-validation for 127 participants and compared. The SpeechAI model, using deep learning-based acoustic features, outperformed in terms of accuracy and area under the receiver operating characteristic curve (AUC), 80.4% (95% CI 76.89%-83.91%) and 0.89 (95% CI 0.86-0.92), respectively, while the model using only demographics showed an accuracy of 67.96% (95% CI 67.63%-68.29%) and an AUC of 0.74 (95% CI 0.73-0.75). The SpeechAI model outperformed the model using only demographics significantly in AUC (t4=8.705 [2-sided]; P<.001). The DemoSpeechAI model, which combined demographics with deep learning-based acoustic features, showed superior performance (accuracy 85.6%, 95% CI 80.03%-91.17% and AUC 0.93, 95% CI 0.89-0.97), but there was no significant difference in AUC between the SpeechAI and DemoSpeechAI models (t4=1.057 [2-sided]; P=.35). Compared with models using traditional acoustic features from the openSMILE toolkit, the SpeechAI model demonstrated superior performance (AUC 0.89) over traditional methods (logistic regression: AUC 0.62; decision tree: AUC 0.57; random forest: AUC 0.66).Conclusions: Our findings demonstrate that vocal biomarkers derived from deep learning-based acoustic features can be effectively used to predict frailty status in community-dwelling older adults. The SpeechAI model showed promising accuracy and AUC, outperforming models based solely on demographic data or traditional acoustic features. Furthermore, while the combined DemoSpeechAI model showed slightly improved performance over the SpeechAI model, the difference was not statistically significant. These results suggest that speech-based AI models offer a noninvasive, scalable method for frailty detection, potentially streamlining assessments in clinical and comm

背景：鉴定脆弱最常用的两种方法是脆弱表型和脆弱指数。然而，这两种方法在临床应用中都有局限性。此外，衡量脆弱性的方法尚未标准化。目的：基于图片描述任务（PDT）中获得的语音记录，我们旨在开发并验证一个基于声音生物标志物预测社区老年人虚弱状态的分类模型。方法：我们招募了127名年龄在50岁及以上的参与者，通过一份简短的老年综合评估量表收集临床信息。韩国版PDT使用平板设备收集录音，在特征提取之前对音频数据进行预处理，去除背景噪声。开发了三种用于识别脆弱状态的人工智能（AI）模型：SpeechAI（仅使用语音数据），DemoAI（仅使用人口统计数据）和demospeech hai（结合两种数据类型）。结果：我们的模型在127名参与者的5倍交叉验证的基础上进行了训练和评估，并进行了比较。使用基于深度学习的声学特征的SpeechAI模型在准确度和接收者工作特征曲线（AUC）下的面积方面分别优于80.4% （95% CI 76.89%-83.91%）和0.89 (95% CI 0.86-0.92)，而仅使用人口统计学的模型显示准确率为67.96% （95% CI 67.63%-68.29%）和AUC为0.74 （95% CI 0.73-0.75）。speech hai模型在AUC上显著优于仅使用人口统计数据的模型（t4=8.705[双侧]）；结论：我们的研究结果表明，基于深度学习的声学特征衍生的声音生物标志物可以有效地用于预测社区居住老年人的虚弱状态。speech hai模型显示出良好的准确性和AUC，优于仅基于人口统计数据或传统声学特征的模型。此外，虽然组合DemoSpeechAI模型的性能比SpeechAI模型略有提高，但差异没有统计学意义。这些结果表明，基于语音的人工智能模型提供了一种无创、可扩展的脆弱性检测方法，有可能简化临床和社区环境中的评估。

{"title":"Development and Validation of a Machine Learning Method Using Vocal Biomarkers for Identifying Frailty in Community-Dwelling Older Adults: Cross-Sectional Study.","authors":"Taehwan Kim, Jung-Yeon Choi, Myung Jin Ko, Kwang-Il Kim","doi":"10.2196/57298","DOIUrl":"10.2196/57298","url":null,"abstract":"Background: The two most commonly used methods to identify frailty are the frailty phenotype and the frailty index. However, both methods have limitations in clinical application. In addition, methods for measuring frailty have not yet been standardized.Objective: We aimed to develop and validate a classification model for predicting frailty status using vocal biomarkers in community-dwelling older adults, based on voice recordings obtained from the picture description task (PDT).Methods: We recruited 127 participants aged 50 years and older and collected clinical information through a short form of the Comprehensive Geriatric Assessment scale. Voice recordings were collected with a tablet device during the Korean version of the PDT, and we preprocessed audio data to remove background noise before feature extraction. Three artificial intelligence (AI) models were developed for identifying frailty status: SpeechAI (using speech data only), DemoAI (using demographic data only), and DemoSpeechAI (combining both data types).Results: Our models were trained and evaluated on the basis of 5-fold cross-validation for 127 participants and compared. The SpeechAI model, using deep learning-based acoustic features, outperformed in terms of accuracy and area under the receiver operating characteristic curve (AUC), 80.4% (95% CI 76.89%-83.91%) and 0.89 (95% CI 0.86-0.92), respectively, while the model using only demographics showed an accuracy of 67.96% (95% CI 67.63%-68.29%) and an AUC of 0.74 (95% CI 0.73-0.75). The SpeechAI model outperformed the model using only demographics significantly in AUC (t4=8.705 [2-sided]; P<.001). The DemoSpeechAI model, which combined demographics with deep learning-based acoustic features, showed superior performance (accuracy 85.6%, 95% CI 80.03%-91.17% and AUC 0.93, 95% CI 0.89-0.97), but there was no significant difference in AUC between the SpeechAI and DemoSpeechAI models (t4=1.057 [2-sided]; P=.35). Compared with models using traditional acoustic features from the openSMILE toolkit, the SpeechAI model demonstrated superior performance (AUC 0.89) over traditional methods (logistic regression: AUC 0.62; decision tree: AUC 0.57; random forest: AUC 0.66).Conclusions: Our findings demonstrate that vocal biomarkers derived from deep learning-based acoustic features can be effectively used to predict frailty status in community-dwelling older adults. The SpeechAI model showed promising accuracy and AUC, outperforming models based solely on demographic data or traditional acoustic features. Furthermore, while the combined DemoSpeechAI model showed slightly improved performance over the SpeechAI model, the difference was not statistically significant. These results suggest that speech-based AI models offer a noninvasive, scalable method for frailty detection, potentially streamlining assessments in clinical and comm","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e57298"},"PeriodicalIF":3.1,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11756832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143016957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset. 评估和增强遗传咨询支持的日语大型语言模型：领域适应的比较研究和专家评估数据集的开发。

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-16 DOI: 10.2196/65047

Takuya Fukushima, Masae Manabe, Shuntaro Yada, Shoko Wakamiya, Akiko Yoshida, Yusaku Urakawa, Akiko Maeda, Shigeyuki Kan, Masayo Takahashi, Eiji Aramaki

Background: Advances in genetics have underscored a strong association between genetic factors and health outcomes, leading to an increased demand for genetic counseling services. However, a shortage of qualified genetic counselors poses a significant challenge. Large language models (LLMs) have emerged as a potential solution for augmenting support in genetic counseling tasks. Despite the potential, Japanese genetic counseling LLMs (JGCLLMs) are underexplored. To advance a JGCLLM-based dialogue system for genetic counseling, effective domain adaptation methods require investigation.Objective: This study aims to evaluate the current capabilities and identify challenges in developing a JGCLLM-based dialogue system for genetic counseling. The primary focus is to assess the effectiveness of prompt engineering, retrieval-augmented generation (RAG), and instruction tuning within the context of genetic counseling. Furthermore, we will establish an experts-evaluated dataset of responses generated by LLMs adapted to Japanese genetic counseling for the future development of JGCLLMs.Methods: Two primary datasets were used in this study: (1) a question-answer (QA) dataset for LLM adaptation and (2) a genetic counseling question dataset for evaluation. The QA dataset included 899 QA pairs covering medical and genetic counseling topics, while the evaluation dataset contained 120 curated questions across 6 genetic counseling categories. Three enhancement techniques of LLMs-instruction tuning, RAG, and prompt engineering-were applied to a lightweight Japanese LLM to enhance its ability for genetic counseling. The performance of the adapted LLM was evaluated on the 120-question dataset by 2 certified genetic counselors and 1 ophthalmologist (SK, YU, and AY). Evaluation focused on four metrics: (1) inappropriateness of information, (2) sufficiency of information, (3) severity of harm, and (4) alignment with medical consensus.Results: The evaluation by certified genetic counselors and an ophthalmologist revealed varied outcomes across different methods. RAG showed potential, particularly in enhancing critical aspects of genetic counseling. In contrast, instruction tuning and prompt engineering produced less favorable outcomes. This evaluation process facilitated the creation an expert-evaluated dataset of responses generated by LLMs adapted with different combinations of these methods. Error analysis identified key ethical concerns, including inappropriate promotion of prenatal testing, criticism of relatives, and inaccurate probability statements.Conclusions: RAG demonstrated notable improvements across all evaluation metrics, suggesting potential for further enhancement through the expansion of RAG data. The expert-evaluated dataset developed in this study provides valuable insights for future optimization efforts. However, the ethical issues obser

背景：遗传学的进步强调了遗传因素与健康结果之间的强烈关联，导致对遗传咨询服务的需求增加。然而，合格的遗传咨询师的短缺构成了重大挑战。大型语言模型（llm）已成为增加遗传咨询任务支持的潜在解决方案。尽管有潜力，日本遗传咨询法学硕士（jgcllm）尚未得到充分开发。为了构建基于jgclm的遗传咨询对话系统，需要研究有效的域适应方法。目的：本研究旨在评估基于jgclm的遗传咨询对话系统的开发能力和面临的挑战。主要的焦点是评估在遗传咨询的背景下，即时工程、检索增强生成（RAG）和指令调整的有效性。此外，我们将建立一个专家评估的llm响应数据集，以适应日本遗传咨询，为jgcllm的未来发展提供帮助。方法：本研究使用了两个主要数据集：(1)用于LLM适应的问答（QA）数据集；(2)用于评估的遗传咨询问题数据集。QA数据集包括899对涵盖医学和遗传咨询主题的QA对，而评估数据集包含6个遗传咨询类别的120个策划问题。将指令调优、RAG和提示工程三种法学硕士增强技术应用于日本轻型法学硕士，提高其遗传咨询能力。2名经过认证的遗传咨询师和1名眼科医生（SK、YU和AY）在120个问题的数据集上对适应性LLM的性能进行了评估。评估侧重于四个指标：(1)信息不适当，(2)信息充分性，(3)危害严重程度，以及(4)与医学共识的一致性。结果：认证遗传咨询师和眼科医生的评估显示不同方法的结果不同。RAG显示出潜力，特别是在加强遗传咨询的关键方面。相比之下，指令调整和提示工程产生了不太有利的结果。这一评估过程有助于创建专家评估的数据集，这些数据集由法学硕士使用这些方法的不同组合生成。错误分析确定了关键的伦理问题，包括不适当的产前检查推广，对亲属的批评，以及不准确的概率陈述。结论：RAG在所有评估指标上都表现出显著的改善，表明通过扩展RAG数据可以进一步增强。本研究中开发的专家评估数据集为未来的优化工作提供了有价值的见解。然而，在JGCLLM回应中观察到的伦理问题强调了在这些系统在卫生保健环境中实施之前进行持续改进和彻底伦理评估的关键必要性。

{"title":"Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset.","authors":"Takuya Fukushima, Masae Manabe, Shuntaro Yada, Shoko Wakamiya, Akiko Yoshida, Yusaku Urakawa, Akiko Maeda, Shigeyuki Kan, Masayo Takahashi, Eiji Aramaki","doi":"10.2196/65047","DOIUrl":"https://doi.org/10.2196/65047","url":null,"abstract":"Background: Advances in genetics have underscored a strong association between genetic factors and health outcomes, leading to an increased demand for genetic counseling services. However, a shortage of qualified genetic counselors poses a significant challenge. Large language models (LLMs) have emerged as a potential solution for augmenting support in genetic counseling tasks. Despite the potential, Japanese genetic counseling LLMs (JGCLLMs) are underexplored. To advance a JGCLLM-based dialogue system for genetic counseling, effective domain adaptation methods require investigation.Objective: This study aims to evaluate the current capabilities and identify challenges in developing a JGCLLM-based dialogue system for genetic counseling. The primary focus is to assess the effectiveness of prompt engineering, retrieval-augmented generation (RAG), and instruction tuning within the context of genetic counseling. Furthermore, we will establish an experts-evaluated dataset of responses generated by LLMs adapted to Japanese genetic counseling for the future development of JGCLLMs.Methods: Two primary datasets were used in this study: (1) a question-answer (QA) dataset for LLM adaptation and (2) a genetic counseling question dataset for evaluation. The QA dataset included 899 QA pairs covering medical and genetic counseling topics, while the evaluation dataset contained 120 curated questions across 6 genetic counseling categories. Three enhancement techniques of LLMs-instruction tuning, RAG, and prompt engineering-were applied to a lightweight Japanese LLM to enhance its ability for genetic counseling. The performance of the adapted LLM was evaluated on the 120-question dataset by 2 certified genetic counselors and 1 ophthalmologist (SK, YU, and AY). Evaluation focused on four metrics: (1) inappropriateness of information, (2) sufficiency of information, (3) severity of harm, and (4) alignment with medical consensus.Results: The evaluation by certified genetic counselors and an ophthalmologist revealed varied outcomes across different methods. RAG showed potential, particularly in enhancing critical aspects of genetic counseling. In contrast, instruction tuning and prompt engineering produced less favorable outcomes. This evaluation process facilitated the creation an expert-evaluated dataset of responses generated by LLMs adapted with different combinations of these methods. Error analysis identified key ethical concerns, including inappropriate promotion of prenatal testing, criticism of relatives, and inaccurate probability statements.Conclusions: RAG demonstrated notable improvements across all evaluation metrics, suggesting potential for further enhancement through the expansion of RAG data. The expert-evaluated dataset developed in this study provides valuable insights for future optimization efforts. However, the ethical issues obser","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e65047"},"PeriodicalIF":3.1,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143016961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Effectiveness of the Facility for Elderly Surveillance System (FESSy) in Two Public Health Center Jurisdictions in Japan: Prospective Observational Study. 日本两个公共卫生中心辖区老年人监测系统设施（FESSy）的有效性：前瞻性观察研究。

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-10 DOI: 10.2196/58509

Junko Kurita, Motomi Hori, Sumiyo Yamaguchi, Aiko Ogiwara, Yurina Saito, Minako Sugiyama, Asami Sunadori, Tomoko Hayashi, Akane Hara, Yukari Kawana, Youichi Itoi, Tamie Sugawara, Yoshiyuki Sugishita, Fujiko Irie, Naomi Sakurai

Background: Residents of facilities for older people are vulnerable to COVID-19 outbreaks. Nevertheless, timely recognition of outbreaks at facilities for older people at public health centers has been impossible in Japan since May 8, 2023, when the Japanese government discontinued aggressive countermeasures against COVID-19 because of the waning severity of the dominant Omicron strain. The Facility for Elderly Surveillance System (FESSy) has been developed to improve information collection.

Objective: This study examined FESSy experiences and effectiveness in two public health center jurisdictions in Japan.

Methods: This study assessed the use by public health centers of the detection mode of an automated AI detection system (ie, FESSy AI), as well as manual detection by the public health centers' staff (ie, FESSy staff) and direct reporting by facilities to the public health centers. We considered the following aspects: (1) diagnoses or symptoms, (2) numbers of patients as of their detection date, and (3) ultimate numbers of patients involved in incidents. Subsequently, effectiveness was assessed and compared based on detection modes. The study lasted from June 1, 2023, through January 2024.

Results: In both areas, this study examined 31 facilities at which 87 incidents were detected. FESSy (AI or staff) detected significantly fewer patients than non-FESSy methods, that is, direct reporting to the public health center of the detection date and ultimate number of patients.

Conclusions: FESSy was superior to direct reporting from facilities for the number of patients as of the detection date and for the ultimate outbreak size.

背景：老年人设施的居民是COVID-19疫情的易感人群。但是，自去年5月8日，日本政府以“欧米克隆”（Omicron）病毒的严重程度下降为由，中断了积极的防疫措施以来，一直未能及时识别公共保健中心老年人设施的疫情。长者监察设施系统（FESSy）已发展，以改善资料收集。目的：本研究考察了FESSy在日本两个公共卫生中心辖区的经验和有效性。方法：本研究评估了公共卫生中心使用人工智能自动检测系统（即FESSy AI）的检测模式，以及公共卫生中心工作人员（即FESSy工作人员）手动检测和设施直接报告给公共卫生中心的情况。我们考虑了以下方面：(1)诊断或症状，(2)截至检测日期的患者人数，以及(3)涉及事件的最终患者人数。随后，根据检测模式对有效性进行评估和比较。这项研究从2023年6月1日持续到2024年1月。结果：在这两个地区，本研究检查了31个设施，发现了87起事故。FESSy（人工智能或工作人员）检测到的患者明显少于非FESSy方法，即直接向公共卫生中心报告检测日期和最终患者人数。结论：FESSy在发现日期的患者数量和最终爆发规模方面优于直接从机构报告。

{"title":"Effectiveness of the Facility for Elderly Surveillance System (FESSy) in Two Public Health Center Jurisdictions in Japan: Prospective Observational Study.","authors":"Junko Kurita, Motomi Hori, Sumiyo Yamaguchi, Aiko Ogiwara, Yurina Saito, Minako Sugiyama, Asami Sunadori, Tomoko Hayashi, Akane Hara, Yukari Kawana, Youichi Itoi, Tamie Sugawara, Yoshiyuki Sugishita, Fujiko Irie, Naomi Sakurai","doi":"10.2196/58509","DOIUrl":"10.2196/58509","url":null,"abstract":"Background: Residents of facilities for older people are vulnerable to COVID-19 outbreaks. Nevertheless, timely recognition of outbreaks at facilities for older people at public health centers has been impossible in Japan since May 8, 2023, when the Japanese government discontinued aggressive countermeasures against COVID-19 because of the waning severity of the dominant Omicron strain. The Facility for Elderly Surveillance System (FESSy) has been developed to improve information collection.Objective: This study examined FESSy experiences and effectiveness in two public health center jurisdictions in Japan.Methods: This study assessed the use by public health centers of the detection mode of an automated AI detection system (ie, FESSy AI), as well as manual detection by the public health centers' staff (ie, FESSy staff) and direct reporting by facilities to the public health centers. We considered the following aspects: (1) diagnoses or symptoms, (2) numbers of patients as of their detection date, and (3) ultimate numbers of patients involved in incidents. Subsequently, effectiveness was assessed and compared based on detection modes. The study lasted from June 1, 2023, through January 2024.Results: In both areas, this study examined 31 facilities at which 87 incidents were detected. FESSy (AI or staff) detected significantly fewer patients than non-FESSy methods, that is, direct reporting to the public health center of the detection date and ultimate number of patients.Conclusions: FESSy was superior to direct reporting from facilities for the number of patients as of the detection date and for the ultimate outbreak size.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e58509"},"PeriodicalIF":3.1,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11741194/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0