Shipra Taneja, Kamini Kalia, Terence Tang, Walter P Wodchis, Shelley Vanderhout
Background: Health systems are increasingly offering patient portals as tools for patients to access their health information with the goal of improving engagement in care. However, understanding health care providers' perspectives on patient portal implementation is crucial.
Objective: This study aimed to understand health care providers' experiences of implementing the MyChart patient portal, perspectives about its impact on patient care, clinical practice, and workload, and opportunities for improvement.
Methods: Using an explanatory sequential mixed methods approach, we conducted a web-based questionnaire and semistructured individual interviews with health care providers at a large Canadian community hospital, 6 months after MyChart was first offered to patients. We explored perspectives about the impact of MyChart on clinical practice, workload, and patient care. Data were analyzed using descriptive statistics and thematic analysis.
Results: In total, 261 health care providers completed the web-based questionnaire, and 15 also participated in interviews. Participants agreed that patients should have access to their health information through MyChart and identified its benefits such as patients gaining a greater understanding of their own health, which could improve patient safety (160/255, 62%). While many health care providers agreed that MyChart supported better patient care (108/258, 42%), there was limited understanding of features available to patients and expectations for integrating MyChart into clinical routines. Concerns were raised about the potential negative impacts of MyChart on patient-provider relationships because sensitive notes or results could be inappropriately interpreted (109/251, 43%), and a potential increase in workload if additional portal features were introduced. Suggested opportunities for improvement included support for both patients and health care providers to learn about MyChart and establishing guidelines for health care providers on how to communicate information available in MyChart to patients.
Conclusions: While health care providers acknowledged that MyChart improved patients' access to health information, its implementation introduced some friction and concerns. To reduce the risk of these challenges, health systems can benefit from engaging health care providers early to identify effective patient portal implementation strategies.
{"title":"Examining Health Care Provider Experiences With Patient Portal Implementation: Mixed Methods Study.","authors":"Shipra Taneja, Kamini Kalia, Terence Tang, Walter P Wodchis, Shelley Vanderhout","doi":"10.2196/65967","DOIUrl":"10.2196/65967","url":null,"abstract":"<p><strong>Background: </strong>Health systems are increasingly offering patient portals as tools for patients to access their health information with the goal of improving engagement in care. However, understanding health care providers' perspectives on patient portal implementation is crucial.</p><p><strong>Objective: </strong>This study aimed to understand health care providers' experiences of implementing the MyChart patient portal, perspectives about its impact on patient care, clinical practice, and workload, and opportunities for improvement.</p><p><strong>Methods: </strong>Using an explanatory sequential mixed methods approach, we conducted a web-based questionnaire and semistructured individual interviews with health care providers at a large Canadian community hospital, 6 months after MyChart was first offered to patients. We explored perspectives about the impact of MyChart on clinical practice, workload, and patient care. Data were analyzed using descriptive statistics and thematic analysis.</p><p><strong>Results: </strong>In total, 261 health care providers completed the web-based questionnaire, and 15 also participated in interviews. Participants agreed that patients should have access to their health information through MyChart and identified its benefits such as patients gaining a greater understanding of their own health, which could improve patient safety (160/255, 62%). While many health care providers agreed that MyChart supported better patient care (108/258, 42%), there was limited understanding of features available to patients and expectations for integrating MyChart into clinical routines. Concerns were raised about the potential negative impacts of MyChart on patient-provider relationships because sensitive notes or results could be inappropriately interpreted (109/251, 43%), and a potential increase in workload if additional portal features were introduced. Suggested opportunities for improvement included support for both patients and health care providers to learn about MyChart and establishing guidelines for health care providers on how to communicate information available in MyChart to patients.</p><p><strong>Conclusions: </strong>While health care providers acknowledged that MyChart improved patients' access to health information, its implementation introduced some friction and concerns. To reduce the risk of these challenges, health systems can benefit from engaging health care providers early to identify effective patient portal implementation strategies.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e65967"},"PeriodicalIF":5.8,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143066245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lin Guo, Yunwei Li, Kai Cheng, Ying Zhao, Wenqiang Yin, Ying Liu
Background: Depression is a widespread mental health issue affecting older adults globally, with substantial implications for their well-being. Although digital interventions have proven effective in high-income countries, research on the potential of internet usage to alleviate depression among older adults in high-income countries remains limited.
Objective: This study aimed to examine the impact of internet usage on depression among older adults in high-income countries by developing a comprehensive theoretical framework and testing key hypotheses.
Methods: Using data from the China Health and Retirement Longitudinal Study (CHARLS), a 2-stage instrumental variable approach was applied to address endogeneity and estimate causal relationships between internet usage and depression.
Results: The findings indicate that internet usage results in a 1.41% reduction in depression levels among older adults. This effect is mediated by four primary mechanisms: (1) enhanced social interaction, (2) increased physical activity, (3) improved intergenerational contact, and (4) expanded access to educational opportunities. A heterogeneity analysis revealed that these effects are more pronounced in urban areas, eastern regions, and regions with superior internet infrastructure.
Conclusions: Internet usage plays a crucial role in alleviating depression among older adults in high-income countries, with regional variations. The findings highlight the need for targeted policy interventions to improve internet access and digital literacy, which can mitigate depression and enhance the mental health of older adults.
{"title":"Impact of Internet Usage on Depression Among Older Adults: Comprehensive Study.","authors":"Lin Guo, Yunwei Li, Kai Cheng, Ying Zhao, Wenqiang Yin, Ying Liu","doi":"10.2196/65399","DOIUrl":"https://doi.org/10.2196/65399","url":null,"abstract":"<p><strong>Background: </strong>Depression is a widespread mental health issue affecting older adults globally, with substantial implications for their well-being. Although digital interventions have proven effective in high-income countries, research on the potential of internet usage to alleviate depression among older adults in high-income countries remains limited.</p><p><strong>Objective: </strong>This study aimed to examine the impact of internet usage on depression among older adults in high-income countries by developing a comprehensive theoretical framework and testing key hypotheses.</p><p><strong>Methods: </strong>Using data from the China Health and Retirement Longitudinal Study (CHARLS), a 2-stage instrumental variable approach was applied to address endogeneity and estimate causal relationships between internet usage and depression.</p><p><strong>Results: </strong>The findings indicate that internet usage results in a 1.41% reduction in depression levels among older adults. This effect is mediated by four primary mechanisms: (1) enhanced social interaction, (2) increased physical activity, (3) improved intergenerational contact, and (4) expanded access to educational opportunities. A heterogeneity analysis revealed that these effects are more pronounced in urban areas, eastern regions, and regions with superior internet infrastructure.</p><p><strong>Conclusions: </strong>Internet usage plays a crucial role in alleviating depression among older adults in high-income countries, with regional variations. The findings highlight the need for targeted policy interventions to improve internet access and digital literacy, which can mitigate depression and enhance the mental health of older adults.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e65399"},"PeriodicalIF":5.8,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143074810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Krysta Heaney-Huls, Rida Shams, Ruth Nwefo, Rachel Kane, Janna Gordon, Alison M Laffan, Scott Stare, Prashila Dullabh
<p><strong>Background: </strong>Poor health outcomes are well documented among patients with a non-English language preference (NELP). The use of interpreters can improve the quality of care for patients with NELP. Despite a growing and unmet need for interpretation services in the US health care system, rates of interpreter use in the care setting are consistently low. Standardized collection and exchange of patient interpretation needs can improve access to appropriate language assistance services.</p><p><strong>Objective: </strong>This study aims to examine current practices for collecting, documenting, and exchanging information on a patient's self-reported preference for an interpreter in the electronic health record (EHR) and the implementation maturity and adoption level of available data standards. The paper identifies standards implementation; data collection workflows; use cases for collecting, documenting, and exchanging information on a patient's self-reported preference for an interpreter; challenges to data collection and use; and opportunities to advance standardization of the interpreter needed data element to facilitate patient-centered care.</p><p><strong>Methods: </strong>We conducted a narrative review to describe the availability of terminology standards to facilitate health care organization documentation of a patient's self-reported preference for an interpreter in the EHR. Key informant discussions with EHR developers, health systems, clinicians, a practice-based research organization, a national standards collaborative, a professional health care association, and Federal agency representatives filled in gaps from the narrative review.</p><p><strong>Results: </strong>The findings indicate that health care organizations value standardized collection and exchange of patient language assistance service needs and preferences. Informants identified three use cases for collecting, documenting, and exchanging information on a patient's self-reported preference for an interpreter, which are (1) person-centered care, (2) transitions of care, and (3) health care administration. The discussions revealed that EHR developers provide a data field for documenting interpreter needed data, which are routinely collected across health care organizations through commonly used data collection workflows. However, this data element is not mapped to standard terminologies, such as Logical Observation Identifiers Names and Codes (LOINC) or Systematized Medical Nomenclature for Medicine-Clinical Terminology (SNOMED-CT), consequently limiting the opportunities to electronically share these data between health systems and community-based organizations. The narrative review and key informant discussions identified three potential challenges to using information on a patient's self-reported preference for an interpreter for person-centered care and quality improvement, which are (1) lack of adoption of available data standards, (2) limited electronic e
{"title":"Electronic Health Record Data Collection Practices to Advance Standardization and Interoperability of Patient Preferences for Interpretation Services: Qualitative Study.","authors":"Krysta Heaney-Huls, Rida Shams, Ruth Nwefo, Rachel Kane, Janna Gordon, Alison M Laffan, Scott Stare, Prashila Dullabh","doi":"10.2196/62670","DOIUrl":"10.2196/62670","url":null,"abstract":"<p><strong>Background: </strong>Poor health outcomes are well documented among patients with a non-English language preference (NELP). The use of interpreters can improve the quality of care for patients with NELP. Despite a growing and unmet need for interpretation services in the US health care system, rates of interpreter use in the care setting are consistently low. Standardized collection and exchange of patient interpretation needs can improve access to appropriate language assistance services.</p><p><strong>Objective: </strong>This study aims to examine current practices for collecting, documenting, and exchanging information on a patient's self-reported preference for an interpreter in the electronic health record (EHR) and the implementation maturity and adoption level of available data standards. The paper identifies standards implementation; data collection workflows; use cases for collecting, documenting, and exchanging information on a patient's self-reported preference for an interpreter; challenges to data collection and use; and opportunities to advance standardization of the interpreter needed data element to facilitate patient-centered care.</p><p><strong>Methods: </strong>We conducted a narrative review to describe the availability of terminology standards to facilitate health care organization documentation of a patient's self-reported preference for an interpreter in the EHR. Key informant discussions with EHR developers, health systems, clinicians, a practice-based research organization, a national standards collaborative, a professional health care association, and Federal agency representatives filled in gaps from the narrative review.</p><p><strong>Results: </strong>The findings indicate that health care organizations value standardized collection and exchange of patient language assistance service needs and preferences. Informants identified three use cases for collecting, documenting, and exchanging information on a patient's self-reported preference for an interpreter, which are (1) person-centered care, (2) transitions of care, and (3) health care administration. The discussions revealed that EHR developers provide a data field for documenting interpreter needed data, which are routinely collected across health care organizations through commonly used data collection workflows. However, this data element is not mapped to standard terminologies, such as Logical Observation Identifiers Names and Codes (LOINC) or Systematized Medical Nomenclature for Medicine-Clinical Terminology (SNOMED-CT), consequently limiting the opportunities to electronically share these data between health systems and community-based organizations. The narrative review and key informant discussions identified three potential challenges to using information on a patient's self-reported preference for an interpreter for person-centered care and quality improvement, which are (1) lack of adoption of available data standards, (2) limited electronic e","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e62670"},"PeriodicalIF":5.8,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143066243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Caroline A Figueroa, Helma Torkamaan, Ananya Bhattacharjee, Hanna Hauptmann, Kathleen W Guan, Gayane Sedrakyan
Health recommender systems (HRS) have the capability to improve human-centered care and prevention by personalizing content, such as health interventions or health information. HRS, an emerging and developing field, can play a unique role in the digital health field as they can offer relevant recommendations, not only based on what users themselves prefer and may be receptive to, but also using data about wider spheres of influence over human behavior, including peers, families, communities, and societies. We identify and discuss how HRS could play a unique role in decreasing health inequities. We use the socioecological model, which provides representations of how multiple, nested levels of influence (eg, community, institutional, and policy factors) interact to shape individual health. This perspective helps illustrate how HRS could address not just individual health factors but also the structural barriers-such as access to health care, social support, and access to healthy food-that shape health outcomes at various levels. Based on this analysis, we then discuss the challenges and future research priorities. We find that despite the potential for targeting more complex systemic challenges to obtaining good health, current HRS are still focused on individual health behaviors, often do not integrate the lived experiences of users in the design, and have had limited reach and effectiveness for individuals from low socioeconomic status and racial or ethnic minoritized backgrounds. In this viewpoint, we argue that a new design paradigm is necessary in which HRS focus on incorporating structural barriers to good health in addition to user preferences. HRS should be designed with an emphasis on health systems, which also includes incorporating decolonial perspectives of well-being that challenge prevailing medical models. Furthermore, potential lies in evaluating the health equity effects of HRS and leveraging collected data to influence policy. With changes in practices and with an intentional equity focus, HRS could play a crucial role in health promotion and decreasing health inequities.
{"title":"Designing Health Recommender Systems to Promote Health Equity: A Socioecological Perspective.","authors":"Caroline A Figueroa, Helma Torkamaan, Ananya Bhattacharjee, Hanna Hauptmann, Kathleen W Guan, Gayane Sedrakyan","doi":"10.2196/60138","DOIUrl":"https://doi.org/10.2196/60138","url":null,"abstract":"<p><p>Health recommender systems (HRS) have the capability to improve human-centered care and prevention by personalizing content, such as health interventions or health information. HRS, an emerging and developing field, can play a unique role in the digital health field as they can offer relevant recommendations, not only based on what users themselves prefer and may be receptive to, but also using data about wider spheres of influence over human behavior, including peers, families, communities, and societies. We identify and discuss how HRS could play a unique role in decreasing health inequities. We use the socioecological model, which provides representations of how multiple, nested levels of influence (eg, community, institutional, and policy factors) interact to shape individual health. This perspective helps illustrate how HRS could address not just individual health factors but also the structural barriers-such as access to health care, social support, and access to healthy food-that shape health outcomes at various levels. Based on this analysis, we then discuss the challenges and future research priorities. We find that despite the potential for targeting more complex systemic challenges to obtaining good health, current HRS are still focused on individual health behaviors, often do not integrate the lived experiences of users in the design, and have had limited reach and effectiveness for individuals from low socioeconomic status and racial or ethnic minoritized backgrounds. In this viewpoint, we argue that a new design paradigm is necessary in which HRS focus on incorporating structural barriers to good health in addition to user preferences. HRS should be designed with an emphasis on health systems, which also includes incorporating decolonial perspectives of well-being that challenge prevailing medical models. Furthermore, potential lies in evaluating the health equity effects of HRS and leveraging collected data to influence policy. With changes in practices and with an intentional equity focus, HRS could play a crucial role in health promotion and decreasing health inequities.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e60138"},"PeriodicalIF":5.8,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143066236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yannik Terhorst, Eva-Maria Messner, Kennedy Opoku Asare, Christian Montag, Christopher Kannen, Harald Baumeister
<p><strong>Background: </strong>Unobtrusively collected objective sensor data from everyday devices like smartphones provide a novel paradigm to infer mental health symptoms. This process, called smart sensing, allows a fine-grained assessment of various features (eg, time spent at home based on the GPS sensor). Based on its prevalence and impact, depression is a promising target for smart sensing. However, currently, it is unclear which sensor-based features should be used in depression severity prediction and if they hold an incremental benefit over established fine-grained assessments like the ecological momentary assessment (EMA).</p><p><strong>Objective: </strong>The aim of this study was to investigate various features based on the smartphone screen, app usage, and call sensor alongside EMA to infer depression severity. Bivariate, cluster-wise, and cluster-combined analyses were conducted to determine the incremental benefit of smart sensing features compared to each other and EMA in parsimonious regression models for depression severity.</p><p><strong>Methods: </strong>In this exploratory observational study, participants were recruited from the general population. Participants needed to be 18 years of age, provide written informed consent, and own an Android-based smartphone. Sensor data and EMA were collected via the INSIGHTS app. Depression severity was assessed using the 8-item Patient Health Questionnaire. Missing data were handled by multiple imputations. Correlation analyses were conducted for bivariate associations; stepwise linear regression analyses were used to find the best prediction models for depression severity. Models were compared by adjusted R<sup>2</sup>. All analyses were pooled across the imputed datasets according to Rubin's rule.</p><p><strong>Results: </strong>A total of 107 participants were included in the study. Ages ranged from 18 to 56 (mean 22.81, SD 7.32) years, and 78% of the participants identified as female. Depression severity was subclinical on average (mean 5.82, SD 4.44; Patient Health Questionnaire score ≥10: 18.7%). Small to medium correlations were found for depression severity and EMA (eg, valence: r=-0.55, 95% CI -0.67 to -0.41), and there were small correlations with sensing features (eg, screen duration: r=0.37, 95% CI 0.20 to 0.53). EMA features could explain 35.28% (95% CI 20.73% to 49.64%) of variance and sensing features (adjusted R<sup>2</sup>=20.45%, 95% CI 7.81% to 35.59%). The best regression model contained EMA and sensing features (R<sup>2</sup>=45.15%, 95% CI 30.39% to 58.53%).</p><p><strong>Conclusions: </strong>Our findings underline the potential of smart sensing and EMA to infer depression severity as isolated paradigms and when combined. Although these could become important parts of clinical decision support systems for depression diagnostics and treatment in the future, confirmatory studies are needed before they can be applied to routine care. Furthermore, privacy, ethical, a
{"title":"Investigating Smartphone-Based Sensing Features for Depression Severity Prediction: Observation Study.","authors":"Yannik Terhorst, Eva-Maria Messner, Kennedy Opoku Asare, Christian Montag, Christopher Kannen, Harald Baumeister","doi":"10.2196/55308","DOIUrl":"https://doi.org/10.2196/55308","url":null,"abstract":"<p><strong>Background: </strong>Unobtrusively collected objective sensor data from everyday devices like smartphones provide a novel paradigm to infer mental health symptoms. This process, called smart sensing, allows a fine-grained assessment of various features (eg, time spent at home based on the GPS sensor). Based on its prevalence and impact, depression is a promising target for smart sensing. However, currently, it is unclear which sensor-based features should be used in depression severity prediction and if they hold an incremental benefit over established fine-grained assessments like the ecological momentary assessment (EMA).</p><p><strong>Objective: </strong>The aim of this study was to investigate various features based on the smartphone screen, app usage, and call sensor alongside EMA to infer depression severity. Bivariate, cluster-wise, and cluster-combined analyses were conducted to determine the incremental benefit of smart sensing features compared to each other and EMA in parsimonious regression models for depression severity.</p><p><strong>Methods: </strong>In this exploratory observational study, participants were recruited from the general population. Participants needed to be 18 years of age, provide written informed consent, and own an Android-based smartphone. Sensor data and EMA were collected via the INSIGHTS app. Depression severity was assessed using the 8-item Patient Health Questionnaire. Missing data were handled by multiple imputations. Correlation analyses were conducted for bivariate associations; stepwise linear regression analyses were used to find the best prediction models for depression severity. Models were compared by adjusted R<sup>2</sup>. All analyses were pooled across the imputed datasets according to Rubin's rule.</p><p><strong>Results: </strong>A total of 107 participants were included in the study. Ages ranged from 18 to 56 (mean 22.81, SD 7.32) years, and 78% of the participants identified as female. Depression severity was subclinical on average (mean 5.82, SD 4.44; Patient Health Questionnaire score ≥10: 18.7%). Small to medium correlations were found for depression severity and EMA (eg, valence: r=-0.55, 95% CI -0.67 to -0.41), and there were small correlations with sensing features (eg, screen duration: r=0.37, 95% CI 0.20 to 0.53). EMA features could explain 35.28% (95% CI 20.73% to 49.64%) of variance and sensing features (adjusted R<sup>2</sup>=20.45%, 95% CI 7.81% to 35.59%). The best regression model contained EMA and sensing features (R<sup>2</sup>=45.15%, 95% CI 30.39% to 58.53%).</p><p><strong>Conclusions: </strong>Our findings underline the potential of smart sensing and EMA to infer depression severity as isolated paradigms and when combined. Although these could become important parts of clinical decision support systems for depression diagnostics and treatment in the future, confirmatory studies are needed before they can be applied to routine care. Furthermore, privacy, ethical, a","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e55308"},"PeriodicalIF":5.8,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143066254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><strong>Background: </strong>Primary intracranial germ cell tumors (iGCTs) are highly malignant brain tumors that predominantly occur in children and adolescents, with an incidence rate ranking third among primary brain tumors in East Asia (8%-15%). Due to their insidious onset and impact on critical functional areas of the brain, these tumors often result in irreversible abnormalities in growth and development, as well as cognitive and motor impairments in affected children. Therefore, early diagnosis through advanced screening techniques is vital for improving patient outcomes and quality of life.</p><p><strong>Objective: </strong>This study aimed to investigate the application of facial recognition technology in the early detection of iGCTs in children and adolescents. Early diagnosis through advanced screening techniques is vital for improving patient outcomes and quality of life.</p><p><strong>Methods: </strong>A multicenter, phased approach was adopted for the development and validation of a deep learning model, GVisageNet, dedicated to the screening of midline brain tumors from normal controls (NCs) and iGCTs from other midline brain tumors. The study comprised the collection and division of datasets into training (n=847, iGCTs=358, NCs=300, other midline brain tumors=189) and testing (n=212, iGCTs=79, NCs=70, other midline brain tumors=63), with an additional independent validation dataset (n=336, iGCTs=130, NCs=100, other midline brain tumors=106) sourced from 4 medical institutions. A regression model using clinically relevant, statistically significant data was developed and combined with GVisageNet outputs to create a hybrid model. This integration sought to assess the incremental value of clinical data. The model's predictive mechanisms were explored through correlation analyses with endocrine indicators and stratified evaluations based on the degree of hypothalamic-pituitary-target axis damage. Performance metrics included area under the curve (AUC), accuracy, sensitivity, and specificity.</p><p><strong>Results: </strong>On the independent validation dataset, GVisageNet achieved an AUC of 0.938 (P<.01) in distinguishing midline brain tumors from NCs. Further, GVisageNet demonstrated significant diagnostic capability in distinguishing iGCTs from the other midline brain tumors, achieving an AUC of 0.739, which is superior to the regression model alone (AUC=0.632, P<.001) but less than the hybrid model (AUC=0.789, P=.04). Significant correlations were found between the GVisageNet's outputs and 7 endocrine indicators. Performance varied with hypothalamic-pituitary-target axis damage, indicating a further understanding of the working mechanism of GVisageNet.</p><p><strong>Conclusions: </strong>GVisageNet, capable of high accuracy both independently and with clinical data, shows substantial potential for early iGCTs detection, highlighting the importance of combining deep learning with clinical insights for personalized health care.</p
{"title":"Identification of Intracranial Germ Cell Tumors Based on Facial Photos: Exploratory Study on the Use of Deep Learning for Software Development.","authors":"Yanong Li, Yixuan He, Yawei Liu, Bingchen Wang, Bo Li, Xiaoguang Qiu","doi":"10.2196/58760","DOIUrl":"https://doi.org/10.2196/58760","url":null,"abstract":"<p><strong>Background: </strong>Primary intracranial germ cell tumors (iGCTs) are highly malignant brain tumors that predominantly occur in children and adolescents, with an incidence rate ranking third among primary brain tumors in East Asia (8%-15%). Due to their insidious onset and impact on critical functional areas of the brain, these tumors often result in irreversible abnormalities in growth and development, as well as cognitive and motor impairments in affected children. Therefore, early diagnosis through advanced screening techniques is vital for improving patient outcomes and quality of life.</p><p><strong>Objective: </strong>This study aimed to investigate the application of facial recognition technology in the early detection of iGCTs in children and adolescents. Early diagnosis through advanced screening techniques is vital for improving patient outcomes and quality of life.</p><p><strong>Methods: </strong>A multicenter, phased approach was adopted for the development and validation of a deep learning model, GVisageNet, dedicated to the screening of midline brain tumors from normal controls (NCs) and iGCTs from other midline brain tumors. The study comprised the collection and division of datasets into training (n=847, iGCTs=358, NCs=300, other midline brain tumors=189) and testing (n=212, iGCTs=79, NCs=70, other midline brain tumors=63), with an additional independent validation dataset (n=336, iGCTs=130, NCs=100, other midline brain tumors=106) sourced from 4 medical institutions. A regression model using clinically relevant, statistically significant data was developed and combined with GVisageNet outputs to create a hybrid model. This integration sought to assess the incremental value of clinical data. The model's predictive mechanisms were explored through correlation analyses with endocrine indicators and stratified evaluations based on the degree of hypothalamic-pituitary-target axis damage. Performance metrics included area under the curve (AUC), accuracy, sensitivity, and specificity.</p><p><strong>Results: </strong>On the independent validation dataset, GVisageNet achieved an AUC of 0.938 (P<.01) in distinguishing midline brain tumors from NCs. Further, GVisageNet demonstrated significant diagnostic capability in distinguishing iGCTs from the other midline brain tumors, achieving an AUC of 0.739, which is superior to the regression model alone (AUC=0.632, P<.001) but less than the hybrid model (AUC=0.789, P=.04). Significant correlations were found between the GVisageNet's outputs and 7 endocrine indicators. Performance varied with hypothalamic-pituitary-target axis damage, indicating a further understanding of the working mechanism of GVisageNet.</p><p><strong>Conclusions: </strong>GVisageNet, capable of high accuracy both independently and with clinical data, shows substantial potential for early iGCTs detection, highlighting the importance of combining deep learning with clinical insights for personalized health care.</p","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e58760"},"PeriodicalIF":5.8,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143066249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanqi Kou, Shicai Ye, Yuan Tian, Ke Yang, Ling Qin, Zhe Huang, Botao Luo, Yanping Ha, Liping Zhan, Ruyin Ye, Yujie Huang, Qing Zhang, Kun He, Mouji Liang, Jieming Zheng, Haoyuan Huang, Chunyi Wu, Lei Ge, Yuping Yang
<p><strong>Background: </strong>Gastrointestinal bleeding (GIB) is a severe and potentially life-threatening complication in patients with acute myocardial infarction (AMI), significantly affecting prognosis during hospitalization. Early identification of high-risk patients is essential to reduce complications, improve outcomes, and guide clinical decision-making.</p><p><strong>Objective: </strong>This study aimed to develop and validate a machine learning (ML)-based model for predicting in-hospital GIB in patients with AMI, identify key risk factors, and evaluate the clinical applicability of the model for risk stratification and decision support.</p><p><strong>Methods: </strong>A multicenter retrospective cohort study was conducted, including 1910 patients with AMI from the Affiliated Hospital of Guangdong Medical University (2005-2024). Patients were divided into training (n=1575) and testing (n=335) cohorts based on admission dates. For external validation, 1746 patients with AMI were included in the publicly available MIMIC-IV (Medical Information Mart for Intensive Care IV) database. Propensity score matching was adjusted for demographics, and the Boruta algorithm identified key predictors. A total of 7 ML algorithms-logistic regression, k-nearest neighbors, support vector machine, decision tree, random forest (RF), extreme gradient boosting, and neural networks-were trained using 10-fold cross-validation. The models were evaluated for the area under the receiver operating characteristic curve, accuracy, sensitivity, specificity, recall, F<sub>1-</sub>score, and decision curve analysis. Shapley additive explanations analysis ranked variable importance. Kaplan-Meier survival analysis evaluated the impact of GIB on short-term survival. Multivariate logistic regression assessed the relationship between coronary heart disease (CHD) and in-hospital GIB after adjusting for clinical variables.</p><p><strong>Results: </strong>The RF model outperformed other ML models, achieving an area under the receiver operating characteristic curve of 0.77 in the training cohort, 0.77 in the testing cohort, and 0.75 in the validation cohort. Key predictors included red blood cell count, hemoglobin, maximal myoglobin, hematocrit, CHD, and other variables, all of which were strongly associated with GIB risk. Decision curve analysis demonstrated the clinical use of the RF model for early risk stratification. Kaplan-Meier survival analysis showed no significant differences in 7- and 15-day survival rates between patients with AMI with and without GIB (P=.83 for 7-day survival and P=.87 for 15-day survival). Multivariate logistic regression showed that CHD was an independent risk factor for in-hospital GIB (odds ratio 2.79, 95% CI 2.09-3.74). Stratified analyses by sex, age, occupation, marital status, and other subgroups consistently showed that the association between CHD and GIB remained robust across all subgroups.</p><p><strong>Conclusions: </strong>The ML-based
{"title":"Risk Factors for Gastrointestinal Bleeding in Patients With Acute Myocardial Infarction: Multicenter Retrospective Cohort Study.","authors":"Yanqi Kou, Shicai Ye, Yuan Tian, Ke Yang, Ling Qin, Zhe Huang, Botao Luo, Yanping Ha, Liping Zhan, Ruyin Ye, Yujie Huang, Qing Zhang, Kun He, Mouji Liang, Jieming Zheng, Haoyuan Huang, Chunyi Wu, Lei Ge, Yuping Yang","doi":"10.2196/67346","DOIUrl":"https://doi.org/10.2196/67346","url":null,"abstract":"<p><strong>Background: </strong>Gastrointestinal bleeding (GIB) is a severe and potentially life-threatening complication in patients with acute myocardial infarction (AMI), significantly affecting prognosis during hospitalization. Early identification of high-risk patients is essential to reduce complications, improve outcomes, and guide clinical decision-making.</p><p><strong>Objective: </strong>This study aimed to develop and validate a machine learning (ML)-based model for predicting in-hospital GIB in patients with AMI, identify key risk factors, and evaluate the clinical applicability of the model for risk stratification and decision support.</p><p><strong>Methods: </strong>A multicenter retrospective cohort study was conducted, including 1910 patients with AMI from the Affiliated Hospital of Guangdong Medical University (2005-2024). Patients were divided into training (n=1575) and testing (n=335) cohorts based on admission dates. For external validation, 1746 patients with AMI were included in the publicly available MIMIC-IV (Medical Information Mart for Intensive Care IV) database. Propensity score matching was adjusted for demographics, and the Boruta algorithm identified key predictors. A total of 7 ML algorithms-logistic regression, k-nearest neighbors, support vector machine, decision tree, random forest (RF), extreme gradient boosting, and neural networks-were trained using 10-fold cross-validation. The models were evaluated for the area under the receiver operating characteristic curve, accuracy, sensitivity, specificity, recall, F<sub>1-</sub>score, and decision curve analysis. Shapley additive explanations analysis ranked variable importance. Kaplan-Meier survival analysis evaluated the impact of GIB on short-term survival. Multivariate logistic regression assessed the relationship between coronary heart disease (CHD) and in-hospital GIB after adjusting for clinical variables.</p><p><strong>Results: </strong>The RF model outperformed other ML models, achieving an area under the receiver operating characteristic curve of 0.77 in the training cohort, 0.77 in the testing cohort, and 0.75 in the validation cohort. Key predictors included red blood cell count, hemoglobin, maximal myoglobin, hematocrit, CHD, and other variables, all of which were strongly associated with GIB risk. Decision curve analysis demonstrated the clinical use of the RF model for early risk stratification. Kaplan-Meier survival analysis showed no significant differences in 7- and 15-day survival rates between patients with AMI with and without GIB (P=.83 for 7-day survival and P=.87 for 15-day survival). Multivariate logistic regression showed that CHD was an independent risk factor for in-hospital GIB (odds ratio 2.79, 95% CI 2.09-3.74). Stratified analyses by sex, age, occupation, marital status, and other subgroups consistently showed that the association between CHD and GIB remained robust across all subgroups.</p><p><strong>Conclusions: </strong>The ML-based","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e67346"},"PeriodicalIF":5.8,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143065582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diana Zhu, Aimee L Dordevic, Zoe E Davidson, Simone Gibson
Background: eHealth interventions can favorably impact health outcomes and encourage health-promoting behaviors in children. More insight is needed from the perspective of children and their families regarding eHealth interventions, including features influencing program effectiveness.
Objective: This review aimed to explore families' experiences with family-focused web-based interventions for improving health.
Methods: Five databases were searched on October 26, 2022-updated on October 24, 2023-for studies reporting qualitative data on participating children or their caregivers' experiences with web-based programs. Study identification was performed in duplicate and studies were independently appraised for quality. Thematic synthesis was undertaken on qualitative data extracted from the results section of each included article.
Results: Of 5524 articles identified, 28 articles were included. The studies examined the experiences of school-aged children (aged 5-18 years) and their caregivers (mostly mothers) with 26 web-based interventions that were developed to manage 17 different health conditions or influence health-supporting behaviors. Six themes were identified on families' experiences: connecting with others, agency of learning, program reputability or credibility, program flexibility, meeting participants' needs regarding program content or delivery, and impact on lifestyle.
Conclusions: Families positively perceived family-focused web-based interventions, finding value in quality connections and experiencing social support; intervention features aligned with behavioral and self-management principles. Key considerations were highlighted for program developers and health care professionals on ways to adapt eHealth elements to meet families' health-related needs. Continued research examining families' experiences with eHealth interventions is needed, including the experiences of families from diverse populations and distinguishing the perspectives of children, their caregivers, and other family members, to inform the expansion of family-focused eHealth interventions in health care systems.
{"title":"Families' Experiences With Family-Focused Web-Based Interventions for Improving Health: Qualitative Systematic Literature Review.","authors":"Diana Zhu, Aimee L Dordevic, Zoe E Davidson, Simone Gibson","doi":"10.2196/58774","DOIUrl":"https://doi.org/10.2196/58774","url":null,"abstract":"<p><strong>Background: </strong>eHealth interventions can favorably impact health outcomes and encourage health-promoting behaviors in children. More insight is needed from the perspective of children and their families regarding eHealth interventions, including features influencing program effectiveness.</p><p><strong>Objective: </strong>This review aimed to explore families' experiences with family-focused web-based interventions for improving health.</p><p><strong>Methods: </strong>Five databases were searched on October 26, 2022-updated on October 24, 2023-for studies reporting qualitative data on participating children or their caregivers' experiences with web-based programs. Study identification was performed in duplicate and studies were independently appraised for quality. Thematic synthesis was undertaken on qualitative data extracted from the results section of each included article.</p><p><strong>Results: </strong>Of 5524 articles identified, 28 articles were included. The studies examined the experiences of school-aged children (aged 5-18 years) and their caregivers (mostly mothers) with 26 web-based interventions that were developed to manage 17 different health conditions or influence health-supporting behaviors. Six themes were identified on families' experiences: connecting with others, agency of learning, program reputability or credibility, program flexibility, meeting participants' needs regarding program content or delivery, and impact on lifestyle.</p><p><strong>Conclusions: </strong>Families positively perceived family-focused web-based interventions, finding value in quality connections and experiencing social support; intervention features aligned with behavioral and self-management principles. Key considerations were highlighted for program developers and health care professionals on ways to adapt eHealth elements to meet families' health-related needs. Continued research examining families' experiences with eHealth interventions is needed, including the experiences of families from diverse populations and distinguishing the perspectives of children, their caregivers, and other family members, to inform the expansion of family-focused eHealth interventions in health care systems.</p><p><strong>Trial registration: </strong>PROSPERO CRD42022363874; https://tinyurl.com/3xxa8enz.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e58774"},"PeriodicalIF":5.8,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143066247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lorien C Abroms, Artin Yousefi, Christina N Wysota, Tien-Chin Wu, David A Broniatowski
Background: Large language model (LLM) artificial intelligence chatbots using generative language can offer smoking cessation information and advice. However, little is known about the reliability of the information provided to users.
Objective: This study aims to examine whether 3 ChatGPT chatbots-the World Health Organization's Sarah, BeFreeGPT, and BasicGPT-provide reliable information on how to quit smoking.
Methods: A list of quit smoking queries was generated from frequent quit smoking searches on Google related to "how to quit smoking" (n=12). Each query was given to each chatbot, and responses were analyzed for their adherence to an index developed from the US Preventive Services Task Force public health guidelines for quitting smoking and counseling principles. Responses were independently coded by 2 reviewers, and differences were resolved by a third coder.
Results: Across chatbots and queries, on average, chatbot responses were rated as being adherent to 57.1% of the items on the adherence index. Sarah's adherence (72.2%) was significantly higher than BeFreeGPT (50%) and BasicGPT (47.8%; P<.001). The majority of chatbot responses had clear language (97.3%) and included a recommendation to seek out professional counseling (80.3%). About half of the responses included the recommendation to consider using nicotine replacement therapy (52.7%), the recommendation to seek out social support from friends and family (55.6%), and information on how to deal with cravings when quitting smoking (44.4%). The least common was information about considering the use of non-nicotine replacement therapy prescription drugs (14.1%). Finally, some types of misinformation were present in 22% of responses. Specific queries that were most challenging for the chatbots included queries on "how to quit smoking cold turkey," "...with vapes," "...with gummies," "...with a necklace," and "...with hypnosis." All chatbots showed resilience to adversarial attacks that were intended to derail the conversation.
Conclusions: LLM chatbots varied in their adherence to quit-smoking guidelines and counseling principles. While chatbots reliably provided some types of information, they omitted other types, as well as occasionally provided misinformation, especially for queries about less evidence-based methods of quitting. LLM chatbot instructions can be revised to compensate for these weaknesses.
{"title":"Assessing the Adherence of ChatGPT Chatbots to Public Health Guidelines for Smoking Cessation: Content Analysis.","authors":"Lorien C Abroms, Artin Yousefi, Christina N Wysota, Tien-Chin Wu, David A Broniatowski","doi":"10.2196/66896","DOIUrl":"https://doi.org/10.2196/66896","url":null,"abstract":"<p><strong>Background: </strong>Large language model (LLM) artificial intelligence chatbots using generative language can offer smoking cessation information and advice. However, little is known about the reliability of the information provided to users.</p><p><strong>Objective: </strong>This study aims to examine whether 3 ChatGPT chatbots-the World Health Organization's Sarah, BeFreeGPT, and BasicGPT-provide reliable information on how to quit smoking.</p><p><strong>Methods: </strong>A list of quit smoking queries was generated from frequent quit smoking searches on Google related to \"how to quit smoking\" (n=12). Each query was given to each chatbot, and responses were analyzed for their adherence to an index developed from the US Preventive Services Task Force public health guidelines for quitting smoking and counseling principles. Responses were independently coded by 2 reviewers, and differences were resolved by a third coder.</p><p><strong>Results: </strong>Across chatbots and queries, on average, chatbot responses were rated as being adherent to 57.1% of the items on the adherence index. Sarah's adherence (72.2%) was significantly higher than BeFreeGPT (50%) and BasicGPT (47.8%; P<.001). The majority of chatbot responses had clear language (97.3%) and included a recommendation to seek out professional counseling (80.3%). About half of the responses included the recommendation to consider using nicotine replacement therapy (52.7%), the recommendation to seek out social support from friends and family (55.6%), and information on how to deal with cravings when quitting smoking (44.4%). The least common was information about considering the use of non-nicotine replacement therapy prescription drugs (14.1%). Finally, some types of misinformation were present in 22% of responses. Specific queries that were most challenging for the chatbots included queries on \"how to quit smoking cold turkey,\" \"...with vapes,\" \"...with gummies,\" \"...with a necklace,\" and \"...with hypnosis.\" All chatbots showed resilience to adversarial attacks that were intended to derail the conversation.</p><p><strong>Conclusions: </strong>LLM chatbots varied in their adherence to quit-smoking guidelines and counseling principles. While chatbots reliably provided some types of information, they omitted other types, as well as occasionally provided misinformation, especially for queries about less evidence-based methods of quitting. LLM chatbot instructions can be revised to compensate for these weaknesses.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e66896"},"PeriodicalIF":5.8,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143066234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William Trevena, Xiang Zhong, Michelle Alvarado, Alexander Semenov, Alp Oktay, Devin Devlin, Aarya Yogesh Gohil, Sai Harsha Chittimouju
Background: The implementation of large language models (LLMs), such as BART (Bidirectional and Auto-Regressive Transformers) and GPT-4, has revolutionized the extraction of insights from unstructured text. These advancements have expanded into health care, allowing analysis of social media for public health insights. However, the detection of drug discontinuation events (DDEs) remains underexplored. Identifying DDEs is crucial for understanding medication adherence and patient outcomes.
Objective: The aim of this study is to provide a flexible framework for investigating various clinical research questions in data-sparse environments. We provide an example of the utility of this framework by identifying DDEs and their root causes in an open-source web-based forum, MedHelp, and by releasing the first open-source DDE datasets to aid further research in this domain.
Methods: We used several LLMs, including GPT-4 Turbo, GPT-4o, DeBERTa (Decoding-Enhanced Bidirectional Encoder Representations from Transformer with Disentangled Attention), and BART, among others, to detect and determine the root causes of DDEs in user comments posted on MedHelp. Our study design included the use of zero-shot classification, which allows these models to make predictions without task-specific training. We split user comments into sentences and applied different classification strategies to assess the performance of these models in identifying DDEs and their root causes.
Results: Among the selected models, GPT-4o performed the best at determining the root causes of DDEs, predicting only 12.9% of root causes incorrectly (hamming loss). Among the open-source models tested, BART demonstrated the best performance in detecting DDEs, achieving an F1-score of 0.86, a false positive rate of 2.8%, and a false negative rate of 6.5%, all without any fine-tuning. The dataset included 10.7% (107/1000) DDEs, emphasizing the models' robustness in an imbalanced data context.
Conclusions: This study demonstrated the effectiveness of open- and closed-source LLMs, such as GPT-4o and BART, for detecting DDEs and their root causes from publicly accessible data through zero-shot classification. The robust and scalable framework we propose can aid researchers in addressing data-sparse clinical research questions. The launch of open-access DDE datasets has the potential to stimulate further research and novel discoveries in this field.
{"title":"Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation Study.","authors":"William Trevena, Xiang Zhong, Michelle Alvarado, Alexander Semenov, Alp Oktay, Devin Devlin, Aarya Yogesh Gohil, Sai Harsha Chittimouju","doi":"10.2196/54601","DOIUrl":"https://doi.org/10.2196/54601","url":null,"abstract":"<p><strong>Background: </strong>The implementation of large language models (LLMs), such as BART (Bidirectional and Auto-Regressive Transformers) and GPT-4, has revolutionized the extraction of insights from unstructured text. These advancements have expanded into health care, allowing analysis of social media for public health insights. However, the detection of drug discontinuation events (DDEs) remains underexplored. Identifying DDEs is crucial for understanding medication adherence and patient outcomes.</p><p><strong>Objective: </strong>The aim of this study is to provide a flexible framework for investigating various clinical research questions in data-sparse environments. We provide an example of the utility of this framework by identifying DDEs and their root causes in an open-source web-based forum, MedHelp, and by releasing the first open-source DDE datasets to aid further research in this domain.</p><p><strong>Methods: </strong>We used several LLMs, including GPT-4 Turbo, GPT-4o, DeBERTa (Decoding-Enhanced Bidirectional Encoder Representations from Transformer with Disentangled Attention), and BART, among others, to detect and determine the root causes of DDEs in user comments posted on MedHelp. Our study design included the use of zero-shot classification, which allows these models to make predictions without task-specific training. We split user comments into sentences and applied different classification strategies to assess the performance of these models in identifying DDEs and their root causes.</p><p><strong>Results: </strong>Among the selected models, GPT-4o performed the best at determining the root causes of DDEs, predicting only 12.9% of root causes incorrectly (hamming loss). Among the open-source models tested, BART demonstrated the best performance in detecting DDEs, achieving an F<sub>1</sub>-score of 0.86, a false positive rate of 2.8%, and a false negative rate of 6.5%, all without any fine-tuning. The dataset included 10.7% (107/1000) DDEs, emphasizing the models' robustness in an imbalanced data context.</p><p><strong>Conclusions: </strong>This study demonstrated the effectiveness of open- and closed-source LLMs, such as GPT-4o and BART, for detecting DDEs and their root causes from publicly accessible data through zero-shot classification. The robust and scalable framework we propose can aid researchers in addressing data-sparse clinical research questions. The launch of open-access DDE datasets has the potential to stimulate further research and novel discoveries in this field.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e54601"},"PeriodicalIF":5.8,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143065649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}