Jiarui Yao, Zinaida Perova, Tushar Mandloi, Elizabeth Lewis, Helen Parkinson, Guergana Savova
Background: Patient-derived cancer models (PDCMs) have become essential tools in cancer research and preclinical studies. Consequently, the number of publications on PDCMs has increased significantly over the past decade. Advances in artificial intelligence, particularly in large language models (LLMs), offer promising solutions for extracting knowledge from scientific literature at scale.
Objective: This study aims to investigate LLM-based systems, focusing specifically on prompting techniques for the automated extraction of PDCM-related entities from scientific texts.
Methods: We explore 2 LLM-prompting approaches. The classic method, direct prompting, involves manually designing a prompt. Our direct prompt consists of an instruction, entity-type definitions, gold examples, and a query. In addition, we experiment with a novel and underexplored prompting strategy-soft prompting. Unlike direct prompting, soft prompts are trainable continuous vectors that learn from provided data. We evaluate both prompting approaches across state-of-the-art proprietary and open LLMs.
Results: We manually annotated 100 abstracts of PDCM-relevant papers, focusing on PDCM papers with data deposited in the CancerModels.Org platform. The resulting gold annotations span 15 entity types for a total 3313 entity mentions, which we split across training (2089 entities), development (542 entities) and held-out, eye-off test (682 entities) sets. Evaluation includes the standard metrics of precision or positive predictive value, recall or sensitivity, and F1-score (harmonic mean of precision and recall) in 2 settings: an exact match setting, where spans of gold and predicted annotations have to match exactly, and an overlapping match setting, where the spans of gold and predicted annotations have to overlap. GPT4-o with direct prompting achieved F1-scores of 50.48 and 71.36 for exact and overlapping match settings, respectively. In both evaluation settings, LLaMA3 soft prompting improved performance over direct prompting (F1-score from 7.06 to 46.68 in the exact match setting; and 12.0 to 71.80 in the overlapping evaluation setting). Results with LLaMA3 soft prompting are slightly higher than GPT4-o direct prompting in the overlapping match evaluation setting.
Conclusions: We investigated LLM-prompting techniques for the automatic extraction of PDCM-relevant entities from scientific texts, comparing the traditional direct prompting approach with the emerging soft prompting method. In our experiments, GPT4-o demonstrated strong performance with direct prompting, maintaining competitive results. Meanwhile, soft prompting significantly enhanced the performance of smaller open LLMs. Our findings suggest that training soft prompts on smaller open models can achieve performance levels comparable to those of proprietary very large language models.
{"title":"Extracting Knowledge From Scientific Texts on Patient-Derived Cancer Models Using Large Language Models: Algorithm Development and Validation Study.","authors":"Jiarui Yao, Zinaida Perova, Tushar Mandloi, Elizabeth Lewis, Helen Parkinson, Guergana Savova","doi":"10.2196/70706","DOIUrl":"10.2196/70706","url":null,"abstract":"<p><strong>Background: </strong>Patient-derived cancer models (PDCMs) have become essential tools in cancer research and preclinical studies. Consequently, the number of publications on PDCMs has increased significantly over the past decade. Advances in artificial intelligence, particularly in large language models (LLMs), offer promising solutions for extracting knowledge from scientific literature at scale.</p><p><strong>Objective: </strong>This study aims to investigate LLM-based systems, focusing specifically on prompting techniques for the automated extraction of PDCM-related entities from scientific texts.</p><p><strong>Methods: </strong>We explore 2 LLM-prompting approaches. The classic method, direct prompting, involves manually designing a prompt. Our direct prompt consists of an instruction, entity-type definitions, gold examples, and a query. In addition, we experiment with a novel and underexplored prompting strategy-soft prompting. Unlike direct prompting, soft prompts are trainable continuous vectors that learn from provided data. We evaluate both prompting approaches across state-of-the-art proprietary and open LLMs.</p><p><strong>Results: </strong>We manually annotated 100 abstracts of PDCM-relevant papers, focusing on PDCM papers with data deposited in the CancerModels.Org platform. The resulting gold annotations span 15 entity types for a total 3313 entity mentions, which we split across training (2089 entities), development (542 entities) and held-out, eye-off test (682 entities) sets. Evaluation includes the standard metrics of precision or positive predictive value, recall or sensitivity, and F1-score (harmonic mean of precision and recall) in 2 settings: an exact match setting, where spans of gold and predicted annotations have to match exactly, and an overlapping match setting, where the spans of gold and predicted annotations have to overlap. GPT4-o with direct prompting achieved F1-scores of 50.48 and 71.36 for exact and overlapping match settings, respectively. In both evaluation settings, LLaMA3 soft prompting improved performance over direct prompting (F1-score from 7.06 to 46.68 in the exact match setting; and 12.0 to 71.80 in the overlapping evaluation setting). Results with LLaMA3 soft prompting are slightly higher than GPT4-o direct prompting in the overlapping match evaluation setting.</p><p><strong>Conclusions: </strong>We investigated LLM-prompting techniques for the automatic extraction of PDCM-relevant entities from scientific texts, comparing the traditional direct prompting approach with the emerging soft prompting method. In our experiments, GPT4-o demonstrated strong performance with direct prompting, maintaining competitive results. Meanwhile, soft prompting significantly enhanced the performance of smaller open LLMs. Our findings suggest that training soft prompts on smaller open models can achieve performance levels comparable to those of proprietary very large language models.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e70706"},"PeriodicalIF":0.0,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12232492/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Agustriawan, Adithama Mulia, Marlinda Vasty Overbeek, Vincent Kurniawan, Jheno Syechlo, Moeljono Widjaja, Muhammad Imran Ahmad
Background: Previous machine learning approaches for prostate cancer detection using gene expression data have shown remarkable classification accuracies. However, prior studies overlook the influence of racial diversity within the population and the importance of selecting outlier genes based on expression profiles.
Objective: To develop a classification method for diagnosing prostate cancer using gene expression in specific populations.
Methods: This research uses Differentially Expressed Gene (DEG) analysis, Receiver Operating Characteristic (ROC) analysis, and MSigDB verification as a feature selection framework to identify genes for constructing Support Vector Machine (SVM) models.
Results: Among the models evaluated, the highest observed accuracy was achieved using 139 gene features without oversampling, resulting in 98% accuracy for white patients and 97% for African American patients, based on 388 training samples and 92 testing samples. Notably, another model achieved similarly strong performance 97% accuracy for white and 95% for African American patients while using only 9 gene features, trained on 374 samples and tested on 138 samples.
Conclusions: The findings identify a race-specific diagnosis method for prostate cancer detection using enhanced feature selection and machine learning. This approach emphasizes the potential for developing unbiased diagnostic tools in specific populations.
{"title":"A Framework for Race-Specific Prostate Cancer Detection Using Machine Learning Through Gene Expression Data: Feature Selection Optimization Approach.","authors":"David Agustriawan, Adithama Mulia, Marlinda Vasty Overbeek, Vincent Kurniawan, Jheno Syechlo, Moeljono Widjaja, Muhammad Imran Ahmad","doi":"10.2196/72423","DOIUrl":"10.2196/72423","url":null,"abstract":"<p><strong>Background: </strong>Previous machine learning approaches for prostate cancer detection using gene expression data have shown remarkable classification accuracies. However, prior studies overlook the influence of racial diversity within the population and the importance of selecting outlier genes based on expression profiles.</p><p><strong>Objective: </strong>To develop a classification method for diagnosing prostate cancer using gene expression in specific populations.</p><p><strong>Methods: </strong>This research uses Differentially Expressed Gene (DEG) analysis, Receiver Operating Characteristic (ROC) analysis, and MSigDB verification as a feature selection framework to identify genes for constructing Support Vector Machine (SVM) models.</p><p><strong>Results: </strong>Among the models evaluated, the highest observed accuracy was achieved using 139 gene features without oversampling, resulting in 98% accuracy for white patients and 97% for African American patients, based on 388 training samples and 92 testing samples. Notably, another model achieved similarly strong performance 97% accuracy for white and 95% for African American patients while using only 9 gene features, trained on 374 samples and tested on 138 samples.</p><p><strong>Conclusions: </strong>The findings identify a race-specific diagnosis method for prostate cancer detection using enhanced feature selection and machine learning. This approach emphasizes the potential for developing unbiased diagnostic tools in specific populations.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12314727/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144487397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William Sanchez, Ananya Dewan, Eve Budd, M Eifler, Robert C Miller, Jeffery Kahn, Mario Macis, Marielle Gross
<p><strong>Background: </strong>Biobank privacy policies strip patient identifiers from donated specimens, undermining transparency, utility, and value for patients, scientists, and society. We are advancing decentralized biobanking apps that reconnect patients with biospecimens and facilitate engagement through a privacy-preserving nonfungible token (NFT) digital twin framework. The decentralized biobanking platform was first piloted for breast cancer biobank members.</p><p><strong>Objective: </strong>This study aimed to demonstrate the technical feasibility of (1) patient-friendly biobanking apps, (2) integration with institutional biobanks, and (3) establishing the foundation of an NFT digital twin framework for decentralized biobanking.</p><p><strong>Methods: </strong>We designed, developed, and deployed a decentralized biobanking mobile app for a feasibility pilot from 2021 to 2023 in the setting of a breast cancer biobank at a National Cancer Institute comprehensive cancer center. The Flutter app was integrated with the biobank's laboratory information management systems via an institutional review board-approved mechanism leveraging authorized, secure devices and anonymous ID codes and complemented with a nontransferable ERC-721 NFT representing the soul-bound connection between an individual and their specimens. Biowallet NFTs were held within a custodial wallet, whereas the user experiences simulated token-gated access to personalized feedback about collection and use of individual and collective deidentified specimens. Quantified app user journeys and NFT deployment data demonstrate technical feasibility complemented with design workshop feedback.</p><p><strong>Results: </strong>The decentralized biobanking app incorporated key features: "biobank" (learn about biobanking), "biowallet" (track personal biospecimens), "labs" (follow research), and "profile" (share data and preferences). In total, 405 pilot participants downloaded the app, including 361 (89.1%) biobank members. A total of 4 central user journeys were captured. First, all app users were oriented to the ≥60,000-biospecimen collection, and 37.8% (153/405) completed research profiles, collectively enhancing annotations for 760 unused specimens. NFTs were minted for 94.6% (140/148) of app users with specimens at an average cost of US $4.51 (SD US $2.54; range US $1.84-$11.23) per token, projected to US $17,769.40 (SD US $159.52; range US $7265.62-$44,229.27) for the biobank population. In total, 89.3% (125/140) of the users successfully claimed NFTs during the pilot, thereby tracking 1812 personal specimens, including 202 (11.2%) distributed under 42 unique research protocols. Participants embraced the opportunity for direct feedback, community engagement, and potential health benefits, although user onboarding requires further refinement.</p><p><strong>Conclusions: </strong>Decentralized biobanking apps demonstrate technical feasibility for empowering patients to track donated
{"title":"Decentralized Biobanking Apps for Patient Tracking of Biospecimen Research: Real-World Usability and Feasibility Study.","authors":"William Sanchez, Ananya Dewan, Eve Budd, M Eifler, Robert C Miller, Jeffery Kahn, Mario Macis, Marielle Gross","doi":"10.2196/70463","DOIUrl":"https://doi.org/10.2196/70463","url":null,"abstract":"<p><strong>Background: </strong>Biobank privacy policies strip patient identifiers from donated specimens, undermining transparency, utility, and value for patients, scientists, and society. We are advancing decentralized biobanking apps that reconnect patients with biospecimens and facilitate engagement through a privacy-preserving nonfungible token (NFT) digital twin framework. The decentralized biobanking platform was first piloted for breast cancer biobank members.</p><p><strong>Objective: </strong>This study aimed to demonstrate the technical feasibility of (1) patient-friendly biobanking apps, (2) integration with institutional biobanks, and (3) establishing the foundation of an NFT digital twin framework for decentralized biobanking.</p><p><strong>Methods: </strong>We designed, developed, and deployed a decentralized biobanking mobile app for a feasibility pilot from 2021 to 2023 in the setting of a breast cancer biobank at a National Cancer Institute comprehensive cancer center. The Flutter app was integrated with the biobank's laboratory information management systems via an institutional review board-approved mechanism leveraging authorized, secure devices and anonymous ID codes and complemented with a nontransferable ERC-721 NFT representing the soul-bound connection between an individual and their specimens. Biowallet NFTs were held within a custodial wallet, whereas the user experiences simulated token-gated access to personalized feedback about collection and use of individual and collective deidentified specimens. Quantified app user journeys and NFT deployment data demonstrate technical feasibility complemented with design workshop feedback.</p><p><strong>Results: </strong>The decentralized biobanking app incorporated key features: \"biobank\" (learn about biobanking), \"biowallet\" (track personal biospecimens), \"labs\" (follow research), and \"profile\" (share data and preferences). In total, 405 pilot participants downloaded the app, including 361 (89.1%) biobank members. A total of 4 central user journeys were captured. First, all app users were oriented to the ≥60,000-biospecimen collection, and 37.8% (153/405) completed research profiles, collectively enhancing annotations for 760 unused specimens. NFTs were minted for 94.6% (140/148) of app users with specimens at an average cost of US $4.51 (SD US $2.54; range US $1.84-$11.23) per token, projected to US $17,769.40 (SD US $159.52; range US $7265.62-$44,229.27) for the biobank population. In total, 89.3% (125/140) of the users successfully claimed NFTs during the pilot, thereby tracking 1812 personal specimens, including 202 (11.2%) distributed under 42 unique research protocols. Participants embraced the opportunity for direct feedback, community engagement, and potential health benefits, although user onboarding requires further refinement.</p><p><strong>Conclusions: </strong>Decentralized biobanking apps demonstrate technical feasibility for empowering patients to track donated ","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e70463"},"PeriodicalIF":0.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12022527/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144000062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The number of survivors of cancer is growing, and they often experience negative long-term behavioral outcomes due to cancer treatments. There is a need for better computational methods to handle and predict these outcomes so that physicians and health care providers can implement preventive treatments.
Objective: This study aimed to create a new feature selection algorithm to improve the performance of machine learning classifiers to predict negative long-term behavioral outcomes in survivors of cancer.
Methods: We devised a hybrid deep learning-based feature selection approach to support early detection of negative long-term behavioral outcomes in survivors of cancer. Within a data-driven, clinical domain-guided framework to select the best set of features among cancer treatments, chronic health conditions, and socioenvironmental factors, we developed a 2-stage feature selection algorithm, that is, a multimetric, majority-voting filter and a deep dropout neural network, to dynamically and automatically select the best set of features for each behavioral outcome. We also conducted an experimental case study on existing study data with 102 survivors of acute lymphoblastic leukemia (aged 15-39 years at evaluation and >5 years postcancer diagnosis) who were treated in a public hospital in Hong Kong. Finally, we designed and implemented radial charts to illustrate the significance of the selected features on each behavioral outcome to support clinical professionals' future treatment and diagnoses.
Results: In this pilot study, we demonstrated that our approach outperforms the traditional statistical and computation methods, including linear and nonlinear feature selectors, for the addressed top-priority behavioral outcomes. Our approach holistically has higher F1, precision, and recall scores compared to existing feature selection methods. The models in this study select several significant clinical and socioenvironmental variables as risk factors associated with the development of behavioral problems in young survivors of acute lymphoblastic leukemia.
Conclusions: Our novel feature selection algorithm has the potential to improve machine learning classifiers' capability to predict adverse long-term behavioral outcomes in survivors of cancer.
{"title":"A Hybrid Deep Learning-Based Feature Selection Approach for Supporting Early Detection of Long-Term Behavioral Outcomes in Survivors of Cancer: Cross-Sectional Study.","authors":"Tracy Huang, Chun-Kit Ngan, Yin Ting Cheung, Madelyn Marcotte, Benjamin Cabrera","doi":"10.2196/65001","DOIUrl":"10.2196/65001","url":null,"abstract":"<p><strong>Background: </strong>The number of survivors of cancer is growing, and they often experience negative long-term behavioral outcomes due to cancer treatments. There is a need for better computational methods to handle and predict these outcomes so that physicians and health care providers can implement preventive treatments.</p><p><strong>Objective: </strong>This study aimed to create a new feature selection algorithm to improve the performance of machine learning classifiers to predict negative long-term behavioral outcomes in survivors of cancer.</p><p><strong>Methods: </strong>We devised a hybrid deep learning-based feature selection approach to support early detection of negative long-term behavioral outcomes in survivors of cancer. Within a data-driven, clinical domain-guided framework to select the best set of features among cancer treatments, chronic health conditions, and socioenvironmental factors, we developed a 2-stage feature selection algorithm, that is, a multimetric, majority-voting filter and a deep dropout neural network, to dynamically and automatically select the best set of features for each behavioral outcome. We also conducted an experimental case study on existing study data with 102 survivors of acute lymphoblastic leukemia (aged 15-39 years at evaluation and >5 years postcancer diagnosis) who were treated in a public hospital in Hong Kong. Finally, we designed and implemented radial charts to illustrate the significance of the selected features on each behavioral outcome to support clinical professionals' future treatment and diagnoses.</p><p><strong>Results: </strong>In this pilot study, we demonstrated that our approach outperforms the traditional statistical and computation methods, including linear and nonlinear feature selectors, for the addressed top-priority behavioral outcomes. Our approach holistically has higher F<sub>1</sub>, precision, and recall scores compared to existing feature selection methods. The models in this study select several significant clinical and socioenvironmental variables as risk factors associated with the development of behavioral problems in young survivors of acute lymphoblastic leukemia.</p><p><strong>Conclusions: </strong>Our novel feature selection algorithm has the potential to improve machine learning classifiers' capability to predict adverse long-term behavioral outcomes in survivors of cancer.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e65001"},"PeriodicalIF":0.0,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11950700/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143627038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Eduarda Goes Job, Heidge Fukumasu, Tathiane Maistro Malta, Pedro Luiz Porfirio Xavier
Background: Multiple correspondence analysis (MCA) is an unsupervised data science methodology that aims to identify and represent associations between categorical variables. Gliomas are an aggressive type of cancer characterized by diverse molecular and clinical features that serve as key prognostic factors. Thus, advanced computational approaches are essential to enhance the analysis and interpretation of the associations between clinical and molecular features in gliomas.
Objective: This study aims to apply MCA to identify associations between glioma prognostic factors and also explore their associations with stemness phenotype.
Methods: Clinical and molecular data from 448 patients with brain tumors were obtained from the Cancer Genome Atlas. The DNA methylation stemness index, derived from DNA methylation patterns, was built using a one-class logistic regression. Associations between variables were evaluated using the χ² test with k degrees of freedom, followed by analysis of the adjusted standardized residuals (ASRs >1.96 indicate a significant association between variables). MCA was used to uncover associations between glioma prognostic factors and stemness.
Results: Our analysis revealed significant associations among molecular and clinical characteristics in gliomas. Additionally, we demonstrated the capability of MCA to identify associations between stemness and these prognostic factors. Our results exhibited a strong association between higher DNA methylation stemness index and features related to poorer prognosis such as glioblastoma cancer type (ASR: 8.507), grade 4 (ASR: 8.507), isocitrate dehydrogenase wild type (ASR:15.904), unmethylated MGMT (methylguanine methyltransferase) Promoter (ASR: 9.983), and telomerase reverse transcriptase expression (ASR: 3.351), demonstrating the utility of MCA as an analytical tool for elucidating potential prognostic factors.
Conclusions: MCA is a valuable tool for understanding the complex interdependence of prognostic markers in gliomas. MCA facilitates the exploration of large-scale datasets and enhances the identification of significant associations.
{"title":"Investigating Associations Between Prognostic Factors in Gliomas: Unsupervised Multiple Correspondence Analysis.","authors":"Maria Eduarda Goes Job, Heidge Fukumasu, Tathiane Maistro Malta, Pedro Luiz Porfirio Xavier","doi":"10.2196/65645","DOIUrl":"10.2196/65645","url":null,"abstract":"<p><strong>Background: </strong>Multiple correspondence analysis (MCA) is an unsupervised data science methodology that aims to identify and represent associations between categorical variables. Gliomas are an aggressive type of cancer characterized by diverse molecular and clinical features that serve as key prognostic factors. Thus, advanced computational approaches are essential to enhance the analysis and interpretation of the associations between clinical and molecular features in gliomas.</p><p><strong>Objective: </strong>This study aims to apply MCA to identify associations between glioma prognostic factors and also explore their associations with stemness phenotype.</p><p><strong>Methods: </strong>Clinical and molecular data from 448 patients with brain tumors were obtained from the Cancer Genome Atlas. The DNA methylation stemness index, derived from DNA methylation patterns, was built using a one-class logistic regression. Associations between variables were evaluated using the χ² test with k degrees of freedom, followed by analysis of the adjusted standardized residuals (ASRs >1.96 indicate a significant association between variables). MCA was used to uncover associations between glioma prognostic factors and stemness.</p><p><strong>Results: </strong>Our analysis revealed significant associations among molecular and clinical characteristics in gliomas. Additionally, we demonstrated the capability of MCA to identify associations between stemness and these prognostic factors. Our results exhibited a strong association between higher DNA methylation stemness index and features related to poorer prognosis such as glioblastoma cancer type (ASR: 8.507), grade 4 (ASR: 8.507), isocitrate dehydrogenase wild type (ASR:15.904), unmethylated MGMT (methylguanine methyltransferase) Promoter (ASR: 9.983), and telomerase reverse transcriptase expression (ASR: 3.351), demonstrating the utility of MCA as an analytical tool for elucidating potential prognostic factors.</p><p><strong>Conclusions: </strong>MCA is a valuable tool for understanding the complex interdependence of prognostic markers in gliomas. MCA facilitates the exploration of large-scale datasets and enhances the identification of significant associations.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"6 ","pages":"e65645"},"PeriodicalIF":0.0,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11922494/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jayaram Thimmapuram, Kamlesh D Patel, Deepti Bhatt, Ajay Chauhan, Divya Madhusudhan, Kashyap K Bhatt, Snehal Deshpande, Urvi Budhbhatti, Chaitanya Joshi
Background: Health care students often experience high levels of stress, anxiety, and mental health issues, making it crucial to address these challenges. Variations in stress levels may be associated with changes in dehydroepiandrosterone sulfate (DHEA-S) and interleukin-6 (IL-6) levels and gene expression. Meditative practices have demonstrated effectiveness in reducing stress and improving mental well-being.
Objective: This study aims to assess the effects of Heartfulness meditation on mental well-being, DHEA-S, IL-6, and gene expression profile.
Methods: The 78 enrolled participants were randomly assigned to the Heartfulness meditation (n=42, 54%) and control (n=36, 46%) groups. The participants completed the Perceived Stress Scale (PSS) and Depression Anxiety Stress Scale (DASS-21) at baseline and after week 12. Gene expression with messenger RNA sequencing and DHEA-S and IL-6 levels were also measured at baseline and the completion of the 12 weeks. Statistical analysis included descriptive statistics, paired t test, and 1-way ANOVA with Bonferroni correction.
Results: The Heartfulness group exhibited a significant 17.35% reduction in PSS score (from mean 19.71, SD 5.09 to mean 16.29, SD 4.83; P<.001) compared to a nonsignificant 6% reduction in the control group (P=.31). DASS-21 scores decreased significantly by 27.14% in the Heartfulness group (from mean 21.15, SD 9.56 to mean 15.41, SD 7.87; P<.001) while it increased nonsignificantly by 17% in the control group (P=.04). For the DASS-21 subcomponents-the Heartfulness group showed a statistically significant 28.53% reduction in anxiety (P=.006) and 27.38% reduction in stress (P=.002) versus an insignificant 22% increase in anxiety (P=.02) and 6% increase in stress (P=.47) in the control group. Further, DHEA-S levels showed a significant 20.27% increase in the Heartfulness group (from mean 251.71, SD 80.98 to mean 302.74, SD 123.56; P=.002) compared to an insignificant 9% increase in the control group (from mean 285.33, SD 112.14 to mean 309.90, SD 136.90; P=.10). IL-6 levels showed a statistically significant difference in both the groups (from mean 4.93, SD 1.35 to mean 3.67, SD 1.0; 28.6%; P<.001 [Heartfulness group] and from mean 4.52, SD 1.40 to mean 2.72, SD 1.74; 40%; P<.001 [control group]). Notably, group comparison at 12 weeks revealed a significant difference in perceived stress, DASS-21 and its subcomponents, and IL-6 (all P<.05/4). The gene expression profile with messenger RNA sequencing identified 875 upregulated genes and 1539 downregulated genes in the Heartfulness group compared to baseline, and there were 292 upregulated genes and 1180 downregulated genes in the Heartfulness group compared to the control group after the intervention.
Conclusions: Heartfulness practice was associated with decreased depression, anxiety, and stress scores and improved health measur
{"title":"Effect of a Web-Based Heartfulness Program on the Mental Well-Being, Biomarkers, and Gene Expression Profile of Health Care Students: Randomized Controlled Trial.","authors":"Jayaram Thimmapuram, Kamlesh D Patel, Deepti Bhatt, Ajay Chauhan, Divya Madhusudhan, Kashyap K Bhatt, Snehal Deshpande, Urvi Budhbhatti, Chaitanya Joshi","doi":"10.2196/65506","DOIUrl":"10.2196/65506","url":null,"abstract":"<p><strong>Background: </strong>Health care students often experience high levels of stress, anxiety, and mental health issues, making it crucial to address these challenges. Variations in stress levels may be associated with changes in dehydroepiandrosterone sulfate (DHEA-S) and interleukin-6 (IL-6) levels and gene expression. Meditative practices have demonstrated effectiveness in reducing stress and improving mental well-being.</p><p><strong>Objective: </strong>This study aims to assess the effects of Heartfulness meditation on mental well-being, DHEA-S, IL-6, and gene expression profile.</p><p><strong>Methods: </strong>The 78 enrolled participants were randomly assigned to the Heartfulness meditation (n=42, 54%) and control (n=36, 46%) groups. The participants completed the Perceived Stress Scale (PSS) and Depression Anxiety Stress Scale (DASS-21) at baseline and after week 12. Gene expression with messenger RNA sequencing and DHEA-S and IL-6 levels were also measured at baseline and the completion of the 12 weeks. Statistical analysis included descriptive statistics, paired t test, and 1-way ANOVA with Bonferroni correction.</p><p><strong>Results: </strong>The Heartfulness group exhibited a significant 17.35% reduction in PSS score (from mean 19.71, SD 5.09 to mean 16.29, SD 4.83; P<.001) compared to a nonsignificant 6% reduction in the control group (P=.31). DASS-21 scores decreased significantly by 27.14% in the Heartfulness group (from mean 21.15, SD 9.56 to mean 15.41, SD 7.87; P<.001) while it increased nonsignificantly by 17% in the control group (P=.04). For the DASS-21 subcomponents-the Heartfulness group showed a statistically significant 28.53% reduction in anxiety (P=.006) and 27.38% reduction in stress (P=.002) versus an insignificant 22% increase in anxiety (P=.02) and 6% increase in stress (P=.47) in the control group. Further, DHEA-S levels showed a significant 20.27% increase in the Heartfulness group (from mean 251.71, SD 80.98 to mean 302.74, SD 123.56; P=.002) compared to an insignificant 9% increase in the control group (from mean 285.33, SD 112.14 to mean 309.90, SD 136.90; P=.10). IL-6 levels showed a statistically significant difference in both the groups (from mean 4.93, SD 1.35 to mean 3.67, SD 1.0; 28.6%; P<.001 [Heartfulness group] and from mean 4.52, SD 1.40 to mean 2.72, SD 1.74; 40%; P<.001 [control group]). Notably, group comparison at 12 weeks revealed a significant difference in perceived stress, DASS-21 and its subcomponents, and IL-6 (all P<.05/4). The gene expression profile with messenger RNA sequencing identified 875 upregulated genes and 1539 downregulated genes in the Heartfulness group compared to baseline, and there were 292 upregulated genes and 1180 downregulated genes in the Heartfulness group compared to the control group after the intervention.</p><p><strong>Conclusions: </strong>Heartfulness practice was associated with decreased depression, anxiety, and stress scores and improved health measur","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"5 ","pages":"e65506"},"PeriodicalIF":0.0,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11686021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142831121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amy Marie Campbell, Chris Hauton, Ronny van Aerle, Jaime Martinez-Urtaza
Background: Environmentally sensitive pathogens exhibit ecological and evolutionary responses to climate change that result in the emergence and global expansion of well-adapted variants. It is imperative to understand the mechanisms that facilitate pathogen emergence and expansion, as well as the drivers behind the mechanisms, to understand and prepare for future pandemic expansions.
Objective: The unique, rapid, global expansion of a clonal complex of Vibrio parahaemolyticus (a marine bacterium causing gastroenteritis infections) named Vibrio parahaemolyticus sequence type 3 (VpST3) provides an opportunity to explore the eco-evolutionary drivers of pathogen expansion.
Methods: The global expansion of VpST3 was reconstructed using VpST3 genomes, which were then classified into metrics characterizing the stages of this expansion process, indicative of the stages of emergence and establishment. We used machine learning, specifically a random forest classifier, to test a range of ecological and evolutionary drivers for their potential in predicting VpST3 expansion dynamics.
Results: We identified a range of evolutionary features, including mutations in the core genome and accessory gene presence, associated with expansion dynamics. A range of random forest classifier approaches were tested to predict expansion classification metrics for each genome. The highest predictive accuracies (ranging from 0.722 to 0.967) were achieved for models using a combined eco-evolutionary approach. While population structure and the difference between introduced and established isolates could be predicted to a high accuracy, our model reported multiple false positives when predicting the success of an introduced isolate, suggesting potential limiting factors not represented in our eco-evolutionary features. Regional models produced for 2 countries reporting the most VpST3 genomes had varying success, reflecting the impacts of class imbalance.
Conclusions: These novel insights into evolutionary features and ecological conditions related to the stages of VpST3 expansion showcase the potential of machine learning models using genomic data and will contribute to the future understanding of the eco-evolutionary pathways of climate-sensitive pathogens.
背景:环境敏感病原体对气候变化表现出生态和进化反应,导致适应良好的变异的出现和全球扩张。必须了解促进病原体出现和扩展的机制,以及这些机制背后的驱动因素,以便了解和为未来的大流行扩展做好准备。目的:副溶血性弧菌(一种引起胃肠炎感染的海洋细菌)克隆复合体VpST3 (Vibrio parahaolyticus sequence type 3)的独特、快速、全球扩展为探索病原体扩展的生态进化驱动因素提供了机会。方法:利用VpST3基因组重建VpST3的全球扩展,然后将其分类为表征该扩展过程阶段的指标,指示其出现和建立阶段。我们使用机器学习,特别是随机森林分类器,来测试一系列生态和进化驱动因素在预测VpST3扩展动态方面的潜力。结果:我们发现了一系列进化特征,包括核心基因组的突变和辅助基因的存在,这些特征与扩张动力学有关。测试了一系列随机森林分类器方法来预测每个基因组的扩展分类指标。采用综合生态进化方法的模型预测精度最高,为0.722 ~ 0.967。虽然种群结构和引入菌株和已建立菌株之间的差异可以预测到很高的准确性,但我们的模型在预测引入菌株的成功时报告了多个假阳性,这表明潜在的限制因素没有在我们的生态进化特征中得到体现。为报告VpST3基因组最多的两个国家制作的区域模型取得了不同程度的成功,反映了阶级不平衡的影响。结论:这些关于VpST3扩展阶段相关的进化特征和生态条件的新见解展示了使用基因组数据的机器学习模型的潜力,并将有助于未来了解气候敏感病原体的生态进化途径。
{"title":"Eco-Evolutionary Drivers of Vibrio parahaemolyticus Sequence Type 3 Expansion: Retrospective Machine Learning Approach.","authors":"Amy Marie Campbell, Chris Hauton, Ronny van Aerle, Jaime Martinez-Urtaza","doi":"10.2196/62747","DOIUrl":"10.2196/62747","url":null,"abstract":"<p><strong>Background: </strong>Environmentally sensitive pathogens exhibit ecological and evolutionary responses to climate change that result in the emergence and global expansion of well-adapted variants. It is imperative to understand the mechanisms that facilitate pathogen emergence and expansion, as well as the drivers behind the mechanisms, to understand and prepare for future pandemic expansions.</p><p><strong>Objective: </strong>The unique, rapid, global expansion of a clonal complex of Vibrio parahaemolyticus (a marine bacterium causing gastroenteritis infections) named Vibrio parahaemolyticus sequence type 3 (VpST3) provides an opportunity to explore the eco-evolutionary drivers of pathogen expansion.</p><p><strong>Methods: </strong>The global expansion of VpST3 was reconstructed using VpST3 genomes, which were then classified into metrics characterizing the stages of this expansion process, indicative of the stages of emergence and establishment. We used machine learning, specifically a random forest classifier, to test a range of ecological and evolutionary drivers for their potential in predicting VpST3 expansion dynamics.</p><p><strong>Results: </strong>We identified a range of evolutionary features, including mutations in the core genome and accessory gene presence, associated with expansion dynamics. A range of random forest classifier approaches were tested to predict expansion classification metrics for each genome. The highest predictive accuracies (ranging from 0.722 to 0.967) were achieved for models using a combined eco-evolutionary approach. While population structure and the difference between introduced and established isolates could be predicted to a high accuracy, our model reported multiple false positives when predicting the success of an introduced isolate, suggesting potential limiting factors not represented in our eco-evolutionary features. Regional models produced for 2 countries reporting the most VpST3 genomes had varying success, reflecting the impacts of class imbalance.</p><p><strong>Conclusions: </strong>These novel insights into evolutionary features and ecological conditions related to the stages of VpST3 expansion showcase the potential of machine learning models using genomic data and will contribute to the future understanding of the eco-evolutionary pathways of climate-sensitive pathogens.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"5 ","pages":"e62747"},"PeriodicalIF":0.0,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11638695/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142752503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: An increasing body of literature highlights the integration of machine learning with genomic data in psychiatry, particularly for complex mental health disorders such as schizophrenia. These advanced techniques offer promising potential for uncovering various facets of these disorders. A comprehensive review of the current applications of machine learning in conjunction with genomic data within this context can significantly enhance our understanding of the current state of research and its future directions.
Objective: This study aims to conduct a systematic scoping review of the use of machine learning algorithms with genomic data in the field of schizophrenia.
Methods: To conduct a systematic scoping review, a search was performed in the electronic databases MEDLINE, Web of Science, PsycNet (PsycINFO), and Google Scholar from 2013 to 2024. Studies at the intersection of schizophrenia, genomic data, and machine learning were evaluated.
Results: The literature search identified 2437 eligible articles after removing duplicates. Following abstract screening, 143 full-text articles were assessed, and 121 were subsequently excluded. Therefore, 21 studies were thoroughly assessed. Various machine learning algorithms were used in the identified studies, with support vector machines being the most common. The studies notably used genomic data to predict schizophrenia, identify schizophrenia features, discover drugs, classify schizophrenia amongst other mental health disorders, and predict the quality of life of patients.
Conclusions: Several high-quality studies were identified. Yet, the application of machine learning with genomic data in the context of schizophrenia remains limited. Future research is essential to further evaluate the portability of these models and to explore their potential clinical applications.
{"title":"Exploring the Intersection of Schizophrenia, Machine Learning, and Genomics: Scoping Review.","authors":"Alexandre Hudon, Mélissa Beaudoin, Kingsada Phraxayavong, Stéphane Potvin, Alexandre Dumais","doi":"10.2196/62752","DOIUrl":"10.2196/62752","url":null,"abstract":"<p><strong>Background: </strong>An increasing body of literature highlights the integration of machine learning with genomic data in psychiatry, particularly for complex mental health disorders such as schizophrenia. These advanced techniques offer promising potential for uncovering various facets of these disorders. A comprehensive review of the current applications of machine learning in conjunction with genomic data within this context can significantly enhance our understanding of the current state of research and its future directions.</p><p><strong>Objective: </strong>This study aims to conduct a systematic scoping review of the use of machine learning algorithms with genomic data in the field of schizophrenia.</p><p><strong>Methods: </strong>To conduct a systematic scoping review, a search was performed in the electronic databases MEDLINE, Web of Science, PsycNet (PsycINFO), and Google Scholar from 2013 to 2024. Studies at the intersection of schizophrenia, genomic data, and machine learning were evaluated.</p><p><strong>Results: </strong>The literature search identified 2437 eligible articles after removing duplicates. Following abstract screening, 143 full-text articles were assessed, and 121 were subsequently excluded. Therefore, 21 studies were thoroughly assessed. Various machine learning algorithms were used in the identified studies, with support vector machines being the most common. The studies notably used genomic data to predict schizophrenia, identify schizophrenia features, discover drugs, classify schizophrenia amongst other mental health disorders, and predict the quality of life of patients.</p><p><strong>Conclusions: </strong>Several high-quality studies were identified. Yet, the application of machine learning with genomic data in the context of schizophrenia remains limited. Future research is essential to further evaluate the portability of these models and to explore their potential clinical applications.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"5 ","pages":"e62752"},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11607571/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142640166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The integration of chatbots in oncology underscores the pressing need for human-centered artificial intelligence (AI) that addresses patient and family concerns with empathy and precision. Human-centered AI emphasizes ethical principles, empathy, and user-centric approaches, ensuring technology aligns with human values and needs. This review critically examines the ethical implications of using large language models (LLMs) like GPT-3 and GPT-4 (OpenAI) in oncology chatbots. It examines how these models replicate human-like language patterns, impacting the design of ethical AI systems. The paper identifies key strategies for ethically developing oncology chatbots, focusing on potential biases arising from extensive datasets and neural networks. Specific datasets, such as those sourced from predominantly Western medical literature and patient interactions, may introduce biases by overrepresenting certain demographic groups. Moreover, the training methodologies of LLMs, including fine-tuning processes, can exacerbate these biases, leading to outputs that may disproportionately favor affluent or Western populations while neglecting marginalized communities. By providing examples of biased outputs in oncology chatbots, the review highlights the ethical challenges LLMs present and the need for mitigation strategies. The study emphasizes integrating human-centric values into AI to mitigate these biases, ultimately advocating for the development of oncology chatbots that are aligned with ethical principles and capable of serving diverse patient populations equitably.
{"title":"Ethical Considerations in Human-Centered AI: Advancing Oncology Chatbots Through Large Language Models.","authors":"James C L Chow, Kay Li","doi":"10.2196/64406","DOIUrl":"10.2196/64406","url":null,"abstract":"<p><p>The integration of chatbots in oncology underscores the pressing need for human-centered artificial intelligence (AI) that addresses patient and family concerns with empathy and precision. Human-centered AI emphasizes ethical principles, empathy, and user-centric approaches, ensuring technology aligns with human values and needs. This review critically examines the ethical implications of using large language models (LLMs) like GPT-3 and GPT-4 (OpenAI) in oncology chatbots. It examines how these models replicate human-like language patterns, impacting the design of ethical AI systems. The paper identifies key strategies for ethically developing oncology chatbots, focusing on potential biases arising from extensive datasets and neural networks. Specific datasets, such as those sourced from predominantly Western medical literature and patient interactions, may introduce biases by overrepresenting certain demographic groups. Moreover, the training methodologies of LLMs, including fine-tuning processes, can exacerbate these biases, leading to outputs that may disproportionately favor affluent or Western populations while neglecting marginalized communities. By providing examples of biased outputs in oncology chatbots, the review highlights the ethical challenges LLMs present and the need for mitigation strategies. The study emphasizes integrating human-centric values into AI to mitigate these biases, ultimately advocating for the development of oncology chatbots that are aligned with ethical principles and capable of serving diverse patient populations equitably.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":"e64406"},"PeriodicalIF":0.0,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11579624/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142333890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Younga Heather Lee, Yingzhe Zhang, Chris J Kennedy, Travis T Mallard, Zhaowen Liu, Phuong Linh Vu, Yen-Chen Anne Feng, Tian Ge, Maria V Petukhova, Ronald C Kessler, Matthew K Nock, Jordan W Smoller
Background: Despite growing interest in the clinical translation of polygenic risk scores (PRSs), it remains uncertain to what extent genomic information can enhance the prediction of psychiatric outcomes beyond the data collected during clinical visits alone.
Objective: This study aimed to assess the clinical utility of incorporating PRSs into a suicide risk prediction model trained on electronic health records (EHRs) and patient-reported surveys among patients admitted to the emergency department.
Methods: Study participants were recruited from the psychiatric emergency department at Massachusetts General Hospital. There were 333 adult patients of European ancestry who had high-quality genotype data available through their participation in the Mass General Brigham Biobank. Multiple neuropsychiatric PRSs were added to a previously validated suicide prediction model in a prospective cohort enrolled between February 4, 2015, and March 13, 2017. Data analysis was performed from July 11, 2022, to August 31, 2023. Suicide attempt was defined using diagnostic codes from longitudinal EHRs combined with 6-month follow-up surveys. The clinical risk score for suicide attempt was calculated from an ensemble model trained using an EHR-based suicide risk score and a brief survey, and it was subsequently used to define the baseline model. We generated PRSs for depression, bipolar disorder, schizophrenia, suicide attempt, and externalizing traits using a Bayesian polygenic scoring method for European ancestry participants. Model performance was evaluated using area under the receiver operator curve (AUC), area under the precision-recall curve, and positive predictive values.
Results: Of the 333 patients (n=178, 53.5% male; mean age 36.8, SD 13.6 years; n=333, 100% non-Hispanic and n=324, 97.3% self-reported White), 28 (8.4%) had a suicide attempt within 6 months. Adding either the schizophrenia PRS or all PRSs to the baseline model resulted in the numerically highest discrimination (AUC 0.86, 95% CI 0.73-0.99) compared to the baseline model (AUC 0.84, 95% Cl 0.70-0.98). However, the improvement in model performance was not statistically significant.
Conclusions: In this study, incorporating genomic information into clinical prediction models for suicide attempt did not improve patient risk stratification. Larger studies that include more diverse participants are required to validate whether the inclusion of psychiatric PRSs in clinical prediction models can enhance the stratification of patients at risk of suicide attempts.
{"title":"Enhancing Suicide Risk Prediction With Polygenic Scores in Psychiatric Emergency Settings: Prospective Study.","authors":"Younga Heather Lee, Yingzhe Zhang, Chris J Kennedy, Travis T Mallard, Zhaowen Liu, Phuong Linh Vu, Yen-Chen Anne Feng, Tian Ge, Maria V Petukhova, Ronald C Kessler, Matthew K Nock, Jordan W Smoller","doi":"10.2196/58357","DOIUrl":"10.2196/58357","url":null,"abstract":"<p><strong>Background: </strong>Despite growing interest in the clinical translation of polygenic risk scores (PRSs), it remains uncertain to what extent genomic information can enhance the prediction of psychiatric outcomes beyond the data collected during clinical visits alone.</p><p><strong>Objective: </strong>This study aimed to assess the clinical utility of incorporating PRSs into a suicide risk prediction model trained on electronic health records (EHRs) and patient-reported surveys among patients admitted to the emergency department.</p><p><strong>Methods: </strong>Study participants were recruited from the psychiatric emergency department at Massachusetts General Hospital. There were 333 adult patients of European ancestry who had high-quality genotype data available through their participation in the Mass General Brigham Biobank. Multiple neuropsychiatric PRSs were added to a previously validated suicide prediction model in a prospective cohort enrolled between February 4, 2015, and March 13, 2017. Data analysis was performed from July 11, 2022, to August 31, 2023. Suicide attempt was defined using diagnostic codes from longitudinal EHRs combined with 6-month follow-up surveys. The clinical risk score for suicide attempt was calculated from an ensemble model trained using an EHR-based suicide risk score and a brief survey, and it was subsequently used to define the baseline model. We generated PRSs for depression, bipolar disorder, schizophrenia, suicide attempt, and externalizing traits using a Bayesian polygenic scoring method for European ancestry participants. Model performance was evaluated using area under the receiver operator curve (AUC), area under the precision-recall curve, and positive predictive values.</p><p><strong>Results: </strong>Of the 333 patients (n=178, 53.5% male; mean age 36.8, SD 13.6 years; n=333, 100% non-Hispanic and n=324, 97.3% self-reported White), 28 (8.4%) had a suicide attempt within 6 months. Adding either the schizophrenia PRS or all PRSs to the baseline model resulted in the numerically highest discrimination (AUC 0.86, 95% CI 0.73-0.99) compared to the baseline model (AUC 0.84, 95% Cl 0.70-0.98). However, the improvement in model performance was not statistically significant.</p><p><strong>Conclusions: </strong>In this study, incorporating genomic information into clinical prediction models for suicide attempt did not improve patient risk stratification. Larger studies that include more diverse participants are required to validate whether the inclusion of psychiatric PRSs in clinical prediction models can enhance the stratification of patients at risk of suicide attempts.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"5 ","pages":"e58357"},"PeriodicalIF":0.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11541145/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}