Pub Date : 2026-01-01Epub Date: 2025-12-05DOI: 10.1016/j.jbi.2025.104966
Rabeya Tus Sadia , Md Atik Ahamed , Qiang Cheng
Understanding gene expression within a spatial context requires the effective integration of single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) data. However, existing approaches often perform suboptimally, with structural similarity typically falling below 60%. We identify the neglect of causal gene relationships as a major limiting factor. To address this, we propose CausalGenDiff, a model that integrates diffusion and autoregressive processes to exploit these underlying causal dependencies. Our approach extends the Causal Attention Transformer originally designed for image generation to handle high-dimensional gene expression data, enabling the capture of gene regulatory mechanisms without relying on predefined relationships. We further incorporate VAE-based pretraining and fine-tuning strategies to enhance performance, supported by thorough ablation studies. Evaluated on 10 tissue datasets, our method consistently outperforms state-of-the-art baselines across four standard metrics, achieving improvements of 5%–32% in Pearson correlation and structural similarity, thereby contributing to both technical advancement and biological insight.
{"title":"CausalGenDiff: Generative causal diffusion bridges scRNA-seq and spatial transcriptomics","authors":"Rabeya Tus Sadia , Md Atik Ahamed , Qiang Cheng","doi":"10.1016/j.jbi.2025.104966","DOIUrl":"10.1016/j.jbi.2025.104966","url":null,"abstract":"<div><div>Understanding gene expression within a spatial context requires the effective integration of single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) data. However, existing approaches often perform suboptimally, with structural similarity typically falling below 60%. We identify the neglect of causal gene relationships as a major limiting factor. To address this, we propose CausalGenDiff, a model that integrates diffusion and autoregressive processes to exploit these underlying causal dependencies. Our approach extends the Causal Attention Transformer originally designed for image generation to handle high-dimensional gene expression data, enabling the capture of gene regulatory mechanisms without relying on predefined relationships. We further incorporate VAE-based pretraining and fine-tuning strategies to enhance performance, supported by thorough ablation studies. Evaluated on 10 tissue datasets, our method consistently outperforms state-of-the-art baselines across four standard metrics, achieving improvements of 5%–32% in Pearson correlation and structural similarity, thereby contributing to both technical advancement and biological insight.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"173 ","pages":"Article 104966"},"PeriodicalIF":4.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145700219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditional deep learning models for multivariate time-series data often fall short in capturing long-range temporal dependencies critical for early prediction of the onset of acute respiratory distress syndrome (ARDS). To address this gap, we introduce Graph-spa, a dynamic Spatiotemporal Graph Neural Network (STGNN) based framework that not only improves ARDS prediction by modeling evolving interactions among clinical variables but also enhances interpretability through model-agnostic feature attribution.
Methods:
Graph-spa at its core integrates temporal convolution layers with an STGNN model that dynamically updates the adjacency structure, capturing both local and non-local temporal dependencies across three datasets (HiRID, MIMIC-IV, and eICU). We benchmarked our model against four traditional deep learning models (GRU, LSTM, TCN, Transformer) and an STGNN baseline. To complement the prediction framework, we applied mask-based interpretability approaches to generate feature-time attribution scores. These scores guide a subsequent co-occurrence analysis that identifies clusters of sustained feature activations in the 12-h window preceding ARDS onset.
Results:
Our experiments demonstrate that Graph-spa consistently outperforms the baseline models in both internal and external validations. On the AUC F1–MCC metric, chosen for this imbalanced classification task, Graph-spa achieves 50.02% vs 45.61% on HiRID, 48.52% vs 46.88% on MIMIC-IV, and 46.64% vs 45.41% on eICU-CRD compared with the STGNN baseline. Graph-spa also outperforms recurrent, convolutional, and attention-based models evaluated under identical settings (Wilcoxon signed-rank; Holm-adjusted p-values 0.05). The dynamic adjacency enhancement allows the model to capture complex, evolving feature interactions, as evidenced by more diversified connectivity patterns compared to the baseline. In addition, interpretability analysis reveals that sustained abnormalities in potassium levels, along with declining Glasgow Coma Scale scores, form a critical composite risk profile that may serve as an early indicator of ARDS.
Conclusion:
Graph-spa advances dynamic clinical event prediction and also offers significant promise for early detection of organ failure in acute care settings by illustrating an end-to-end approach covering spatiotemporal modeling, interpretability, and discovery of sub-clinical signatures. Because its core modules, dynamic spatiotemporal graph construction, mask-based attribution, and co-occurrence mining, are model-agnostic, the framework can easily be extrapolated to any dynamic classification or regression task in the ICU. The code is available at https://github.com/vsubbian/Graph-spa.
{"title":"Graph-spa: A Spatiotemporal Graph Neural Network based framework for ARDS prediction and interpretability","authors":"Shashank Yadav , Molly Douglas , Jarrod Mosier , Vignesh Subbian","doi":"10.1016/j.jbi.2025.104969","DOIUrl":"10.1016/j.jbi.2025.104969","url":null,"abstract":"<div><h3>Objective:</h3><div>Traditional deep learning models for multivariate time-series data often fall short in capturing long-range temporal dependencies critical for early prediction of the onset of acute respiratory distress syndrome (ARDS). To address this gap, we introduce <em>Graph-spa</em>, a dynamic Spatiotemporal Graph Neural Network (STGNN) based framework that not only improves ARDS prediction by modeling evolving interactions among clinical variables but also enhances interpretability through model-agnostic feature attribution.</div></div><div><h3>Methods:</h3><div>Graph-spa at its core integrates temporal convolution layers with an STGNN model that dynamically updates the adjacency structure, capturing both local and non-local temporal dependencies across three datasets (HiRID, MIMIC-IV, and eICU). We benchmarked our model against four traditional deep learning models (GRU, LSTM, TCN, Transformer) and an STGNN baseline. To complement the prediction framework, we applied mask-based interpretability approaches to generate feature-time attribution scores. These scores guide a subsequent co-occurrence analysis that identifies clusters of sustained feature activations in the 12-h window preceding ARDS onset.</div></div><div><h3>Results:</h3><div>Our experiments demonstrate that Graph-spa consistently outperforms the baseline models in both internal and external validations. On the AUC F1–MCC metric, chosen for this imbalanced classification task, Graph-spa achieves 50.02% vs 45.61% on HiRID, 48.52% vs 46.88% on MIMIC-IV, and 46.64% vs 45.41% on eICU-CRD compared with the STGNN baseline. Graph-spa also outperforms recurrent, convolutional, and attention-based models evaluated under identical settings (Wilcoxon signed-rank; Holm-adjusted p-values <span><math><mo><</mo></math></span> 0.05). The dynamic adjacency enhancement allows the model to capture complex, evolving feature interactions, as evidenced by more diversified connectivity patterns compared to the baseline. In addition, interpretability analysis reveals that sustained abnormalities in potassium levels, along with declining Glasgow Coma Scale scores, form a critical composite risk profile that may serve as an early indicator of ARDS.</div></div><div><h3>Conclusion:</h3><div>Graph-spa advances dynamic clinical event prediction and also offers significant promise for early detection of organ failure in acute care settings by illustrating an end-to-end approach covering spatiotemporal modeling, interpretability, and discovery of sub-clinical signatures. Because its core modules, dynamic spatiotemporal graph construction, mask-based attribution, and co-occurrence mining, are model-agnostic, the framework can easily be extrapolated to any dynamic classification or regression task in the ICU. The code is available at <span><span>https://github.com/vsubbian/Graph-spa</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"173 ","pages":"Article 104969"},"PeriodicalIF":4.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-12DOI: 10.1016/j.jbi.2025.104956
Yilu Fang , Jordan G. Nestor , Casey N. Ta , Jerard Z. Kneifati-Hayek , Chunhua Weng
Objective
Patients with acute kidney injury (AKI) are at high risk of developing chronic kidney disease (CKD), but identifying those at greatest risk remains challenging. We used electronic health record (EHR) data to dynamically track AKI patients’ clinical evolution and characterize AKI-to-CKD progression.
Methods
Post-AKI clinical states were identified by clustering patient vectors derived from longitudinal medical codes and creatinine measurements. Transition probabilities between states and progression to CKD were estimated using multi-state modeling. After identifying common post-AKI trajectories, CKD risk factors in AKI subpopulations were identified through survival analysis.
Results
Of 20,699 patients with AKI at admission, 3,491 (17 %) developed CKD. We identified fifteen distinct post-AKI states, each with different probabilities of CKD development. Most patients (75 %, n = 15,607) remained in a single state or made only one transition during the study period. Both established (e.g., AKI severity, diabetes, hypertension, heart failure, liver disease) and novel CKD risk factors, with their impact varying across these clinical states.
Conclusion
This study demonstrates a data-driven approach for identifying high-risk AKI patients, supporting the development of decision-support tools for early CKD detection and intervention.
{"title":"A method for characterizing disease progression from acute kidney injury to chronic kidney disease","authors":"Yilu Fang , Jordan G. Nestor , Casey N. Ta , Jerard Z. Kneifati-Hayek , Chunhua Weng","doi":"10.1016/j.jbi.2025.104956","DOIUrl":"10.1016/j.jbi.2025.104956","url":null,"abstract":"<div><h3>Objective</h3><div>Patients with acute kidney injury (AKI) are at high risk of developing chronic kidney disease (CKD), but identifying those at greatest risk remains challenging. We used electronic health record (EHR) data to dynamically track AKI patients’ clinical evolution and characterize AKI-to-CKD progression.</div></div><div><h3>Methods</h3><div>Post-AKI clinical states were identified by clustering patient vectors derived from longitudinal medical codes and creatinine measurements. Transition probabilities between states and progression to CKD were estimated using multi-state modeling. After identifying common post-AKI trajectories, CKD risk factors in AKI subpopulations were identified through survival analysis.</div></div><div><h3>Results</h3><div>Of 20,699 patients with AKI at admission, 3,491 (17 %) developed CKD. We identified fifteen distinct post-AKI states, each with different probabilities of CKD development. Most patients (75 %, n = 15,607) remained in a single state or made only one transition during the study period. Both established (e.g., AKI severity, diabetes, hypertension, heart failure, liver disease) and novel CKD risk factors, with their impact varying across these clinical states.</div></div><div><h3>Conclusion</h3><div>This study demonstrates a data-driven approach for identifying high-risk AKI patients, supporting the development of decision-support tools for early CKD detection and intervention.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104956"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145523424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-11DOI: 10.1016/j.jbi.2025.104954
Biyi Shen , Yilin Zhang , Thomas G. Travison , Michelle Shardell , Rozalina G. McCoy , Takumi Saegusa , Jason Falvey , Chixiang Chen
Objective:
We propose JLNet, along with a companion R software package, as a systematic joint learning framework for analyzing data from national geriatric centralized networks, such as Medicare Claims. JLNet addresses key challenges in real-world, large-scale healthcare datasets, including hospital-level clustering and heterogeneity, patient-level variability from high-dimensional covariates, and losses to follow-up, while promoting easy implementation to ultimately support decision-making.
Methods:
JLNet proceeds in three steps: (1) fit a dynamic propensity score model to handle patient loss to follow-up; (2) fit a projection-based regularized regression to identify predictive patient-level features while adjusting for hospital-level confounding; and (3) perform hospital-level clustering using transformed residuals, enabling downstream analyses without sharing raw data. We applied JLNet to Medicare claims data to study post-fracture recovery among older adults with Alzheimer’s disease and related dementias (ADRD) following a hip fracture (2010–2018), and evaluated its performance via numerical experiments.
Results:
JLNet identified clinically meaningful patient-level variables (e.g., age, weight loss, peripheral vascular disease, etc.) and distinct hospital clusters associated with variation in post-discharge recovery, measured by days at home, among patients with ADRD. Numerical experiments showed that JLNet outperformed existing approaches in variable selection and hospital clustering in the setting involving high-dimensional covariates and unmeasured hospital-level confounding.
Discussion and conclusion:
JLNet is a scalable, interpretable framework for analyzing centralized health data. It enhances identification of high-risk subcohorts and hospital clusters, supporting more precise resource allocation and personalized care strategies for high-risk older adults. Findings also inform the design of tailored interventions in real-world settings.
{"title":"A joint learning framework for analyzing data from national geriatric centralized networks: A new toolbox deciphering real-world complexity","authors":"Biyi Shen , Yilin Zhang , Thomas G. Travison , Michelle Shardell , Rozalina G. McCoy , Takumi Saegusa , Jason Falvey , Chixiang Chen","doi":"10.1016/j.jbi.2025.104954","DOIUrl":"10.1016/j.jbi.2025.104954","url":null,"abstract":"<div><h3>Objective:</h3><div>We propose JLNet, along with a companion R software package, as a systematic joint learning framework for analyzing data from national geriatric centralized networks, such as Medicare Claims. JLNet addresses key challenges in real-world, large-scale healthcare datasets, including hospital-level clustering and heterogeneity, patient-level variability from high-dimensional covariates, and losses to follow-up, while promoting easy implementation to ultimately support decision-making.</div></div><div><h3>Methods:</h3><div>JLNet proceeds in three steps: (1) fit a dynamic propensity score model to handle patient loss to follow-up; (2) fit a projection-based regularized regression to identify predictive patient-level features while adjusting for hospital-level confounding; and (3) perform hospital-level clustering using transformed residuals, enabling downstream analyses without sharing raw data. We applied JLNet to Medicare claims data to study post-fracture recovery among older adults with Alzheimer’s disease and related dementias (ADRD) following a hip fracture (2010–2018), and evaluated its performance via numerical experiments.</div></div><div><h3>Results:</h3><div>JLNet identified clinically meaningful patient-level variables (e.g., age, weight loss, peripheral vascular disease, etc.) and distinct hospital clusters associated with variation in post-discharge recovery, measured by days at home, among patients with ADRD. Numerical experiments showed that JLNet outperformed existing approaches in variable selection and hospital clustering in the setting involving high-dimensional covariates and unmeasured hospital-level confounding.</div></div><div><h3>Discussion and conclusion:</h3><div>JLNet is a scalable, interpretable framework for analyzing centralized health data. It enhances identification of high-risk subcohorts and hospital clusters, supporting more precise resource allocation and personalized care strategies for high-risk older adults. Findings also inform the design of tailored interventions in real-world settings.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104954"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145513020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-27DOI: 10.1016/j.jbi.2025.104947
Luoting Zhuang , Seyed Mohammad Hossein Tabatabaei , Ramin Salehi-Rad , Linh M. Tran , Denise R. Aberle , Ashley E. Prosper , William Hsu
Objective:
Machine learning models have utilized semantic features, deep features, or both to assess lung nodule malignancy. However, their reliance on manual annotation during inference, limited interpretability, and sensitivity to imaging variations hinder their application in real-world clinical settings. Thus, this research aims to integrate semantic features derived from radiologists’ assessments of nodules, guiding the model to learn clinically relevant, robust, and explainable imaging features for predicting lung cancer.
Methods:
We obtained 938 low-dose CT scans from the National Lung Screening Trial (NLST) with 1,261 nodules and semantic features. Additionally, the Lung Image Database Consortium dataset contains 1,018 CT scans, with 2,625 lesions annotated for nodule characteristics. Three external datasets were obtained from UCLA Health, the LUNGx Challenge, and the Duke Lung Cancer Screening. For imaging input, we obtained 2D nodule slices in nine directions from nodule crop. We converted structured semantic features into sentences using Gemini. We fine-tuned a pretrained Contrastive Language-Image Pretraining (CLIP) model with a parameter-efficient fine-tuning approach to align imaging and semantic text features and predict the one-year lung cancer diagnosis.
Results:
Our model outperformed the state-of-the-art (SOTA) models in the NLST test set with an AUROC of 0.901 and AUPRC of 0.776. It also showed robust results in external datasets. Using CLIP, we also obtained predictions on semantic features through zero-shot inference, such as nodule margin (AUROC: 0.807), nodule consistency (0.812), and pleural attachment (0.840).
Conclusion:
By incorporating semantic features into the vision-language model, our approach surpasses the SOTA models in predicting lung cancer from CT scans collected from diverse clinical settings. It provides explainable outputs, aiding clinicians in comprehending the underlying meaning of model predictions. The code is available at https://github.com/luotingzhuang/CLIP_nodule.
{"title":"Vision-language model-based semantic-guided imaging biomarker for lung nodule malignancy prediction","authors":"Luoting Zhuang , Seyed Mohammad Hossein Tabatabaei , Ramin Salehi-Rad , Linh M. Tran , Denise R. Aberle , Ashley E. Prosper , William Hsu","doi":"10.1016/j.jbi.2025.104947","DOIUrl":"10.1016/j.jbi.2025.104947","url":null,"abstract":"<div><h3>Objective:</h3><div>Machine learning models have utilized semantic features, deep features, or both to assess lung nodule malignancy. However, their reliance on manual annotation during inference, limited interpretability, and sensitivity to imaging variations hinder their application in real-world clinical settings. Thus, this research aims to integrate semantic features derived from radiologists’ assessments of nodules, guiding the model to learn clinically relevant, robust, and explainable imaging features for predicting lung cancer.</div></div><div><h3>Methods:</h3><div>We obtained 938 low-dose CT scans from the National Lung Screening Trial (NLST) with 1,261 nodules and semantic features. Additionally, the Lung Image Database Consortium dataset contains 1,018 CT scans, with 2,625 lesions annotated for nodule characteristics. Three external datasets were obtained from UCLA Health, the LUNGx Challenge, and the Duke Lung Cancer Screening. For imaging input, we obtained 2D nodule slices in nine directions from <span><math><mrow><mn>50</mn><mo>×</mo><mn>50</mn><mo>×</mo><mn>50</mn><mspace></mspace><mi>mm</mi></mrow></math></span> nodule crop. We converted structured semantic features into sentences using Gemini. We fine-tuned a pretrained Contrastive Language-Image Pretraining (CLIP) model with a parameter-efficient fine-tuning approach to align imaging and semantic text features and predict the one-year lung cancer diagnosis.</div></div><div><h3>Results:</h3><div>Our model outperformed the state-of-the-art (SOTA) models in the NLST test set with an AUROC of 0.901 and AUPRC of 0.776. It also showed robust results in external datasets. Using CLIP, we also obtained predictions on semantic features through zero-shot inference, such as nodule margin (AUROC: 0.807), nodule consistency (0.812), and pleural attachment (0.840).</div></div><div><h3>Conclusion:</h3><div>By incorporating semantic features into the vision-language model, our approach surpasses the SOTA models in predicting lung cancer from CT scans collected from diverse clinical settings. It provides explainable outputs, aiding clinicians in comprehending the underlying meaning of model predictions. The code is available at <span><span>https://github.com/luotingzhuang/CLIP_nodule</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104947"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145398392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-21DOI: 10.1016/j.jbi.2025.104961
Jian Liu , Yingzan Ren , Guodong Xiao , Ponian Li , Chuanqi Sun , Jiaxin Chen , Fubin Ma , Rui Gao , Jia Mi , Haiyan Cong , Mingyi Wang , Yusen Zhang
Objective
Efficient and comprehensive prioritization of cancer driver genes across individual patients, cancer cohorts, and pan–cancer is crucial for advancing cancer diagnosis and treatment. The existing methods are effective, but they seem to have reached a plateau in accuracy enhancement and lack broad–scale joint analysis, flexibility in adapting to cancer and interpretability.
Methods
Here, we introduce GenMorw, a heterogeneous network framework that discovers a novel association score between patients and their mutated genes, enabling the estimation of the likelihood of the mutated genes acting as drivers in patients. GenMorw flexibly integrates or fully utilize collected mutation, gene/miRNA expression, methylation data and PPI networks to classify patient groups based on data–specific characteristics and identify potential drivers at the individual, cancer and pan–cancer levels.
Results
GenMorw outperforms existing algorithms with an average cohort AUC improvement of 17.66% and higher overall accuracy by a cumulative ranking strategy in patient–gene heterogeneous networks. Except for AUC evaluation, other various comparative strategies consistently demonstrate the superior performance of GenMorw across multiple cancers, outperforming other algorithms. Some uniquely predicted genes, such as ANK3, CENPF, and COL7A1, which are absent from standard databases and not identified by other methods, were validated as highly cancer–related through literature review and survival analysis. Based on GenMorw–derived heterogeneous networks, the strongly connected components and cliques, which are extracted from them, capture most of the predicted or known driver genes to help predict driver genes.
Conclusion
We conclude that GenMorw, with its novel gene–patient score mechanism, offers a significant advance in cancer driver gene discovery by capturing both population-wide and patient-specific network signals, thereby improving predictive power and enabling deeper insights into cancer heterogeneity.
{"title":"Multi-scale cancer driver gene prediction by flexible data selection and network topology guidance","authors":"Jian Liu , Yingzan Ren , Guodong Xiao , Ponian Li , Chuanqi Sun , Jiaxin Chen , Fubin Ma , Rui Gao , Jia Mi , Haiyan Cong , Mingyi Wang , Yusen Zhang","doi":"10.1016/j.jbi.2025.104961","DOIUrl":"10.1016/j.jbi.2025.104961","url":null,"abstract":"<div><h3>Objective</h3><div>Efficient and comprehensive prioritization of cancer driver genes across individual patients, cancer cohorts, and pan–cancer is crucial for advancing cancer diagnosis and treatment. The existing methods are effective, but they seem to have reached a plateau in accuracy enhancement and lack broad–scale joint analysis, flexibility in adapting to cancer and interpretability.</div></div><div><h3>Methods</h3><div>Here, we introduce GenMorw, a heterogeneous network framework that discovers a novel association score between patients and their mutated genes, enabling the estimation of the likelihood of the mutated genes acting as drivers in patients. GenMorw flexibly integrates or fully utilize collected mutation, gene/miRNA expression, methylation data and PPI networks to classify patient groups based on data–specific characteristics and identify potential drivers at the individual, cancer and pan–cancer levels.</div></div><div><h3>Results</h3><div>GenMorw outperforms existing algorithms with an average cohort AUC improvement of 17.66% and higher overall accuracy by a cumulative ranking strategy in patient–gene heterogeneous networks. Except for AUC evaluation, other various comparative strategies consistently demonstrate the superior performance of GenMorw across multiple cancers, outperforming other algorithms. Some uniquely predicted genes, such as ANK3, CENPF, and COL7A1, which are absent from standard databases and not identified by other methods, were validated as highly cancer–related through literature review and survival analysis. Based on GenMorw–derived heterogeneous networks, the strongly connected components and cliques, which are extracted from them, capture most of the predicted or known driver genes to help predict driver genes.</div></div><div><h3>Conclusion</h3><div>We conclude that GenMorw, with its novel gene–patient score mechanism, offers a significant advance in cancer driver gene discovery by capturing both population-wide and patient-specific network signals, thereby improving predictive power and enabling deeper insights into cancer heterogeneity.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104961"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145587610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-18DOI: 10.1016/j.jbi.2025.104953
Yaozheng Zhou , Xingyu Shi , Lingfeng Wang , Jin Xu , Demin Li , Congzhou Chen
{"title":"Corrigendum to “Drug repositioning with metapath guidance and adaptive negative sampling enhancement” [J. Biomed. Inform. 171 (2025) 104916]","authors":"Yaozheng Zhou , Xingyu Shi , Lingfeng Wang , Jin Xu , Demin Li , Congzhou Chen","doi":"10.1016/j.jbi.2025.104953","DOIUrl":"10.1016/j.jbi.2025.104953","url":null,"abstract":"","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104953"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145556907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-25DOI: 10.1016/j.jbi.2025.104962
YuHao Wu , Zhijie Xiang , Yuzhe Tan , Jiayue Hu , Desheng Chen , Jing Zhao , Haicheng Wei
To address the limitations of single-modal feature coverage and class distribution imbalance in knee osteoarthritis (KOA) classification, this study proposes a Multimodal Spatial-constraint Contrastive Learning (MSCL) model. First, dynamic and static plantar pressure data and human keypoint trajectories are synchronously acquired. The model first feeds dynamic plantar pressure and keypoint data into a multimodal spatial–temporal fusion branch, where graph convolutional networks and Transformers extract spatial–temporal representations of human keypoints and dynamic pressure patterns respectively, followed by Cross Attention fusion. Subsequently, static plantar pressure is processed through a pyramid CNN architecture to generate coarse-grained spatial constraint vectors, which serve as anatomical priors to regularize the fused representations. Finally, a contrastive learning framework is integrated to establish explicit mapping between the enhanced representations and Kellgren–Lawrence (KL) grading system, enabling precise KOA severity stratification. Experimental results demonstrate that the MSCL model achieves 0.94 macro-average accuracy in KL grading, with 7% improvement in F1-scores for imbalanced categories with limited samples. This work establishes a novel paradigm for accurate KOA assessment through multimodal gait analysis.
{"title":"Study on multimodal spatially-constrained contrastive learning for knee osteoarthritis severity grading","authors":"YuHao Wu , Zhijie Xiang , Yuzhe Tan , Jiayue Hu , Desheng Chen , Jing Zhao , Haicheng Wei","doi":"10.1016/j.jbi.2025.104962","DOIUrl":"10.1016/j.jbi.2025.104962","url":null,"abstract":"<div><div>To address the limitations of single-modal feature coverage and class distribution imbalance in knee osteoarthritis (KOA) classification, this study proposes a Multimodal Spatial-constraint Contrastive Learning (MSCL) model. First, dynamic and static plantar pressure data and human keypoint trajectories are synchronously acquired. The model first feeds dynamic plantar pressure and keypoint data into a multimodal spatial–temporal fusion branch, where graph convolutional networks and Transformers extract spatial–temporal representations of human keypoints and dynamic pressure patterns respectively, followed by Cross Attention fusion. Subsequently, static plantar pressure is processed through a pyramid CNN architecture to generate coarse-grained spatial constraint vectors, which serve as anatomical priors to regularize the fused representations. Finally, a contrastive learning framework is integrated to establish explicit mapping between the enhanced representations and Kellgren–Lawrence (KL) grading system, enabling precise KOA severity stratification. Experimental results demonstrate that the MSCL model achieves 0.94 macro-average accuracy in KL grading, with 7% improvement in F1-scores for imbalanced categories with limited samples. This work establishes a novel paradigm for accurate KOA assessment through multimodal gait analysis.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104962"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145615386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-14DOI: 10.1016/j.jbi.2025.104957
Iris Beerepoot , Sjaak Brinkkemper , Elke Huntink , Berfin Duman , Hajo A. Reijers , Nienke Bleijenberg
Objective:
To assess the feasibility of using a large language model (LLM) to generate structured event logs from conversational data in home-based nursing care, with the goal of reducing the documentation burden and enabling process analysis.
Methods:
We conducted an exploratory study involving 27 audio-recorded home care visits between district nurses and patients. These recordings were transcribed and used as input for a Generative Pre-Trained Transformer (GPT) to identify nursing interventions and construct event logs, using the standardised Nursing Interventions Classification (NIC) system. We applied and evaluated different prompts through an iterative, interdisciplinary process involving computer scientists and nurse researchers.
Results:
GPT demonstrated reasonable ability to extract nursing interventions from conversational transcripts, especially when activities were discussed explicitly and temporally aligned. Challenges emerged when information was implicit, ambiguous, or not captured in the dialogue. We propose five guidelines for using LLMs in this context, addressing data source limitations, activity label selection, confidence calibration, hallucination handling, and stakeholder-specific output needs. These guidelines provide lessons that extend beyond home care to other domains where conversational data must be translated into structured process insights.
Conclusion:
LLMs show promise for transforming informal clinical dialogue into structured representations of care. While expert oversight and tailored prompts remain essential, future model improvements may enhance reliability. Still, applications in real-world healthcare contexts must be handled with care to ensure accuracy, transparency, and stakeholder trust.
{"title":"Turning Dialogues Into Event Data: Lessons From GPT-Based Recognition of Nursing Actions","authors":"Iris Beerepoot , Sjaak Brinkkemper , Elke Huntink , Berfin Duman , Hajo A. Reijers , Nienke Bleijenberg","doi":"10.1016/j.jbi.2025.104957","DOIUrl":"10.1016/j.jbi.2025.104957","url":null,"abstract":"<div><h3>Objective:</h3><div>To assess the feasibility of using a large language model (LLM) to generate structured event logs from conversational data in home-based nursing care, with the goal of reducing the documentation burden and enabling process analysis.</div></div><div><h3>Methods:</h3><div>We conducted an exploratory study involving 27 audio-recorded home care visits between district nurses and patients. These recordings were transcribed and used as input for a Generative Pre-Trained Transformer (GPT) to identify nursing interventions and construct event logs, using the standardised Nursing Interventions Classification (NIC) system. We applied and evaluated different prompts through an iterative, interdisciplinary process involving computer scientists and nurse researchers.</div></div><div><h3>Results:</h3><div>GPT demonstrated reasonable ability to extract nursing interventions from conversational transcripts, especially when activities were discussed explicitly and temporally aligned. Challenges emerged when information was implicit, ambiguous, or not captured in the dialogue. We propose five guidelines for using LLMs in this context, addressing data source limitations, activity label selection, confidence calibration, hallucination handling, and stakeholder-specific output needs. These guidelines provide lessons that extend beyond home care to other domains where conversational data must be translated into structured process insights.</div></div><div><h3>Conclusion:</h3><div>LLMs show promise for transforming informal clinical dialogue into structured representations of care. While expert oversight and tailored prompts remain essential, future model improvements may enhance reliability. Still, applications in real-world healthcare contexts must be handled with care to ensure accuracy, transparency, and stakeholder trust.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104957"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145517870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}