Journal of Biomedical Informatics最新文献

Leveraging natural language processing to elucidate real-world clinical decision-making paradigms: A proof of concept study

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics

Pub Date : 2025-04-22 DOI: 10.1016/j.jbi.2025.104829

Yaniv Alon , Etti Naimi , Chedva Levin , Hila Videl , Mor Saban

Background

Understanding how clinicians arrive at decisions in actual practice settings is vital for advancing personalized, evidence-based care. However, systematic analysis of qualitative decision data poses challenges.

Methods

We analyzed transcribed interviews with Hebrew-speaking clinicians on decision processes using natural language processing (NLP). Word frequency and characterized terminology use, while large language models (ChatGPT from OpenAI and Gemini by Google) identified potential cognitive paradigms.

Results

Word frequency analysis of clinician interviews identified experience and knowledge as most influential on decision-making. NLP tentatively recognized heuristics-based reasoning grounded in past cases and intuition as dominant cognitive paradigms. Elements of shared decision-making through individualizing care with patients and families were also observed. Limited Hebrew clinical language resources required developing preliminary lexicons and dynamically adjusting stopwords. Findings also provided preliminary support for heuristics guiding clinical judgment while highlighting needs for broader sampling and enhanced analytical frameworks.

Conclusions

This study represents the first use of integrated qualitative and computational methods to systematically elucidate clinical decision-making. Findings supported experience-based heuristics guiding cognition. With methodological enhancements, similar analyses could transform global understanding of tailored care delivery. Standardizing interdisciplinary collaborations on developing NLP tools and analytical frameworks may advance equitable, evidence-based healthcare by elucidating real-world clinical reasoning processes across diverse populations and settings.

{"title":"Leveraging natural language processing to elucidate real-world clinical decision-making paradigms: A proof of concept study","authors":"Yaniv Alon , Etti Naimi , Chedva Levin , Hila Videl , Mor Saban","doi":"10.1016/j.jbi.2025.104829","DOIUrl":"10.1016/j.jbi.2025.104829","url":null,"abstract":"<div><h3>Background</h3><div>Understanding how clinicians arrive at decisions in actual practice settings is vital for advancing personalized, evidence-based care. However, systematic analysis of qualitative decision data poses challenges.</div></div><div><h3>Methods</h3><div>We analyzed transcribed interviews with Hebrew-speaking clinicians on decision processes using natural language processing (NLP). Word frequency and characterized terminology use, while large language models (ChatGPT from OpenAI and Gemini by Google) identified potential cognitive paradigms.</div></div><div><h3>Results</h3><div>Word frequency analysis of clinician interviews identified experience and knowledge as most influential on decision-making. NLP tentatively recognized heuristics-based reasoning grounded in past cases and intuition as dominant cognitive paradigms. Elements of shared decision-making through individualizing care with patients and families were also observed. Limited Hebrew clinical language resources required developing preliminary lexicons and dynamically adjusting stopwords. Findings also provided preliminary support for heuristics guiding clinical judgment while highlighting needs for broader sampling and enhanced analytical frameworks.</div></div><div><h3>Conclusions</h3><div>This study represents the first use of integrated qualitative and computational methods to systematically elucidate clinical decision-making. Findings supported experience-based heuristics guiding cognition. With methodological enhancements, similar analyses could transform global understanding of tailored care delivery. Standardizing interdisciplinary collaborations on developing NLP tools and analytical frameworks may advance equitable, evidence-based healthcare by elucidating real-world clinical reasoning processes across diverse populations and settings.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104829"},"PeriodicalIF":4.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Causal inference for time series datasets with partially overlapping variables

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics

Pub Date : 2025-04-22 DOI: 10.1016/j.jbi.2025.104828

Louis Adedapo Gomez , Jan Claassen , Samantha Kleinberg

Objective:

Healthcare data provides a unique opportunity to learn causal relationships but the largest datasets, such as from hospitals or intensive care units, are often observational and do not standardize variables collected for all patients. Rather, the variables depend on a patient’s health status, treatment plan, and differences between providers. This poses major challenges for causal inference, which either must restrict analysis to patients with complete data (reducing power) or learn patient-specific models (making it difficult to generalize). While missing variables can lead to confounding, variables absent for one individual are often measured in another.

Methods:

We propose a novel method, called Causal Model Combination for Time Series (CMC-TS), to learn causal relationships from time series with partially overlapping variable sets. CMC-TS overcomes errors by specifically leveraging partial overlap between datasets (e.g., patients) to iteratively reconstruct missing variables and correct errors by reweighting inferences using shared information across datasets. We evaluated CMC-TS and compared it to the state of the art on both simulated data and real-world data from stroke patients admitted to a neurological intensive care unit.

Results:

On simulated data, CMC-TS had the fewest false discoveries and highest F1-score compared to baselines. On real data from stroke patients in a neurological intensive care unit, we found fewer implausible and more highly ranked plausible causes of a clinically important adverse event.

Conclusion:

Our approach may lead to better use of observational healthcare data for causal inference, by enabling causal inference from patient data with partially overlapping variable sets.

{"title":"Causal inference for time series datasets with partially overlapping variables","authors":"Louis Adedapo Gomez , Jan Claassen , Samantha Kleinberg","doi":"10.1016/j.jbi.2025.104828","DOIUrl":"10.1016/j.jbi.2025.104828","url":null,"abstract":"<div><h3>Objective:</h3><div>Healthcare data provides a unique opportunity to learn causal relationships but the largest datasets, such as from hospitals or intensive care units, are often observational and do not standardize variables collected for all patients. Rather, the variables depend on a patient’s health status, treatment plan, and differences between providers. This poses major challenges for causal inference, which either must restrict analysis to patients with complete data (reducing power) or learn patient-specific models (making it difficult to generalize). While missing variables can lead to confounding, variables absent for one individual are often measured in another.</div></div><div><h3>Methods:</h3><div>We propose a novel method, called Causal Model Combination for Time Series (CMC-TS), to learn causal relationships from time series with partially overlapping variable sets. CMC-TS overcomes errors by specifically leveraging partial overlap between datasets (e.g., patients) to iteratively reconstruct missing variables and correct errors by reweighting inferences using shared information across datasets. We evaluated CMC-TS and compared it to the state of the art on both simulated data and real-world data from stroke patients admitted to a neurological intensive care unit.</div></div><div><h3>Results:</h3><div>On simulated data, CMC-TS had the fewest false discoveries and highest F1-score compared to baselines. On real data from stroke patients in a neurological intensive care unit, we found fewer implausible and more highly ranked plausible causes of a clinically important adverse event.</div></div><div><h3>Conclusion:</h3><div>Our approach may lead to better use of observational healthcare data for causal inference, by enabling causal inference from patient data with partially overlapping variable sets.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104828"},"PeriodicalIF":4.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RoBIn: A Transformer-based model for risk of bias inference with machine reading comprehension

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics

Pub Date : 2025-04-16 DOI: 10.1016/j.jbi.2025.104819

Abel Corrêa Dias, Viviane Pereira Moreira, João Luiz Dihl Comba

Objective:

Scientific publications are essential for uncovering insights, testing new drugs, and informing healthcare policies. Evaluating the quality of these publications often involves assessing their Risk of Bias (RoB), a task traditionally performed by human reviewers. The goal of this work is to create a dataset and develop models that allow automated RoB assessment in clinical trials.

Methods:

We use data from the Cochrane Database of Systematic Reviews (CDSR) as ground truth to label open-access clinical trial publications from PubMed. This process enabled us to develop training and test datasets specifically for machine reading comprehension and RoB inference. Additionally, we created extractive (RoBIn^Ext) and generative (RoBIn^Gen) Transformer-based approaches to extract relevant evidence and classify the RoB effectively.

Results:

RoBIn was evaluated across various settings and benchmarked against state-of-the-art methods, including large language models (LLMs). In most cases, the best-performing RoBIn variant surpasses traditional machine learning and LLM-based approaches, achieving a AUROC of 0.83.

Conclusion:

This work addresses RoB assessment in clinical trials by introducing RoBIn, two Transformer-based models for RoB inference and evidence retrieval, which outperform traditional models and LLMs, demonstrating its potential to improve efficiency and scalability in clinical research evaluation. We also introduce a public dataset that is automatically annotated and can be used to enable future research to enhance automated RoB assessment.

{"title":"RoBIn: A Transformer-based model for risk of bias inference with machine reading comprehension","authors":"Abel Corrêa Dias, Viviane Pereira Moreira, João Luiz Dihl Comba","doi":"10.1016/j.jbi.2025.104819","DOIUrl":"10.1016/j.jbi.2025.104819","url":null,"abstract":"<div><h3>Objective:</h3><div>Scientific publications are essential for uncovering insights, testing new drugs, and informing healthcare policies. Evaluating the quality of these publications often involves assessing their Risk of Bias (RoB), a task traditionally performed by human reviewers. The goal of this work is to create a dataset and develop models that allow automated RoB assessment in clinical trials.</div></div><div><h3>Methods:</h3><div>We use data from the Cochrane Database of Systematic Reviews (CDSR) as ground truth to label open-access clinical trial publications from PubMed. This process enabled us to develop training and test datasets specifically for machine reading comprehension and RoB inference. Additionally, we created extractive (RoBIn<sup>Ext</sup>) and generative (RoBIn<sup>Gen</sup>) Transformer-based approaches to extract relevant evidence and classify the RoB effectively.</div></div><div><h3>Results:</h3><div>RoBIn was evaluated across various settings and benchmarked against state-of-the-art methods, including large language models (LLMs). In most cases, the best-performing RoBIn variant surpasses traditional machine learning and LLM-based approaches, achieving a AUROC of 0.83.</div></div><div><h3>Conclusion:</h3><div>This work addresses RoB assessment in clinical trials by introducing RoBIn, two Transformer-based models for RoB inference and evidence retrieval, which outperform traditional models and LLMs, demonstrating its potential to improve efficiency and scalability in clinical research evaluation. We also introduce a public dataset that is automatically annotated and can be used to enable future research to enhance automated RoB assessment.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104819"},"PeriodicalIF":4.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Benchmarking domain-specific pretrained language models to identify the best model for methodological rigor in clinical studies

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics

Pub Date : 2025-04-15 DOI: 10.1016/j.jbi.2025.104825

Fangwen Zhou , Rick Parrish , Muhammad Afzal , Ashirbani Saha , R. Brian Haynes , Alfonso Iorio , Cynthia Lokker

Objective

Encoder-only transformer-based language models have shown promise in automating critical appraisal of clinical literature. However, a comprehensive evaluation of the models for classifying the methodological rigor of randomized controlled trials is necessary to identify the more robust ones. This study benchmarks several state-of-the-art transformer-based language models using a diverse set of performance metrics.

Methods

Seven transformer-based language models were fine-tuned on the title and abstract of 42,575 articles from 2003 to 2023 in McMaster University’s Premium LiteratUre Service database under different configurations. The studies reported in the articles addressed questions related to treatment, prevention, or quality improvement for which randomized controlled trials are the gold standard with defined criteria for rigorous methods. Models were evaluated on the validation set using 12 schemes and metrics, including optimization for cross-entropy loss, Brier score, AUROC, average precision, sensitivity, specificity, and accuracy, among others. Threshold tuning was performed to optimize threshold-dependent metrics. Models that achieved the best performance in one or more schemes on the validation set were further tested in hold-out and external datasets.

Results

A total of 210 models were fine-tuned. Six models achieved top performance in one or more evaluation schemes. Three BioLinkBERT models outperformed others on 8 of the 12 schemes. BioBERT, BiomedBERT, and SciBERT were best on 1, 1 and 2 schemes, respectively. While model performance remained robust on the hold-out test set, it declined in external datasets. Class weight adjustments improved performance in most instances.

Conclusion

BioLinkBERT generally outperformed the other models. Using comprehensive evaluation metrics and threshold tuning optimizes model selection for real-world applications. Future work should assess generalizability to other datasets, explore alternate imbalance strategies, and examine training on full-text articles.

{"title":"Benchmarking domain-specific pretrained language models to identify the best model for methodological rigor in clinical studies","authors":"Fangwen Zhou , Rick Parrish , Muhammad Afzal , Ashirbani Saha , R. Brian Haynes , Alfonso Iorio , Cynthia Lokker","doi":"10.1016/j.jbi.2025.104825","DOIUrl":"10.1016/j.jbi.2025.104825","url":null,"abstract":"<div><h3>Objective</h3><div>Encoder-only transformer-based language models have shown promise in automating critical appraisal of clinical literature. However, a comprehensive evaluation of the models for classifying the methodological rigor of randomized controlled trials is necessary to identify the more robust ones. This study benchmarks several state-of-the-art transformer-based language models using a diverse set of performance metrics.</div></div><div><h3>Methods</h3><div>Seven transformer-based language models were fine-tuned on the title and abstract of 42,575 articles from 2003 to 2023 in McMaster University’s Premium LiteratUre Service database under different configurations. The studies reported in the articles addressed questions related to treatment, prevention, or quality improvement for which randomized controlled trials are the gold standard with defined criteria for rigorous methods. Models were evaluated on the validation set using 12 schemes and metrics, including optimization for cross-entropy loss, Brier score, AUROC, average precision, sensitivity, specificity, and accuracy, among others. Threshold tuning was performed to optimize threshold-dependent metrics. Models that achieved the best performance in one or more schemes on the validation set were further tested in hold-out and external datasets.</div></div><div><h3>Results</h3><div>A total of 210 models were fine-tuned. Six models achieved top performance in one or more evaluation schemes. Three BioLinkBERT models outperformed others on 8 of the 12 schemes. BioBERT, BiomedBERT, and SciBERT were best on 1, 1 and 2 schemes, respectively. While model performance remained robust on the hold-out test set, it declined in external datasets. Class weight adjustments improved performance in most instances.</div></div><div><h3>Conclusion</h3><div>BioLinkBERT generally outperformed the other models. Using comprehensive evaluation metrics and threshold tuning optimizes model selection for real-world applications. Future work should assess generalizability to other datasets, explore alternate imbalance strategies, and examine training on full-text articles.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104825"},"PeriodicalIF":4.0,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel machine learning-based workflow to capture intra-patient heterogeneity through transcriptional multi-label characterization and clinically relevant classification

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics

Pub Date : 2025-04-09 DOI: 10.1016/j.jbi.2025.104817

Silvia Cascianelli, Iva Milojkovic, Marco Masseroli

Objectives:

Patient classification into specific molecular subtypes is paramount in biomedical research and clinical practice to face complex, heterogeneous diseases. Existing methods, especially for gene expression-based cancer subtyping, often simplify patient molecular portraits, neglecting the potential co-occurrence of traits from multiple subtypes. Yet, recognizing intra-sample heterogeneity is essential for more precise patient characterization and improved personalized treatments.

Methods:

We developed a novel computational workflow, named MULTI-STAR, which addresses current limitations and provides tailored solutions for reliable multi-label patient subtyping. MULTI-STAR uses state-of-the-art subtyping methods to obtain promising machine learning-based multi-label classifiers, leveraging gene expression profiles. It modifies standard single-label similarity-based techniques to obtain multi-label patient characterizations. Then, it employs these characterizations to train single-sample predictors using different multi-label strategies and find the best-performing classifiers.

Results:

MULTI-STAR classifiers offer advanced multi-label recognition of all the subtypes contributing to the molecular and clinical traits of a patient, also distinguishing the primary from the additional relevant secondary subtype(s). The efficacy was demonstrated by developing multi-label solutions for breast and colorectal cancer subtyping that outperform existing methods in terms of prognostic value, primarily for overall survival predictions, and ability to work on a single sample at a time, as required in clinical practice.

Conclusions:

This work emphasizes the importance of moving to multi-label subtyping to capture all the molecular traits of individual patients, considering also previously overlooked secondary assignments and paving the way for improved clinical decision-making processes in diverse heterogeneous disease contexts. Indeed, MULTI-STAR novel, reproducible and generalizable approach provides comprehensive representations of patient inner heterogeneity and clinically relevant insights, contributing to precision medicine and personalized treatments.

{"title":"A novel machine learning-based workflow to capture intra-patient heterogeneity through transcriptional multi-label characterization and clinically relevant classification","authors":"Silvia Cascianelli, Iva Milojkovic, Marco Masseroli","doi":"10.1016/j.jbi.2025.104817","DOIUrl":"10.1016/j.jbi.2025.104817","url":null,"abstract":"<div><h3>Objectives:</h3><div>Patient classification into specific molecular subtypes is paramount in biomedical research and clinical practice to face complex, heterogeneous diseases. Existing methods, especially for gene expression-based cancer subtyping, often simplify patient molecular portraits, neglecting the potential co-occurrence of traits from multiple subtypes. Yet, recognizing intra-sample heterogeneity is essential for more precise patient characterization and improved personalized treatments.</div></div><div><h3>Methods:</h3><div>We developed a novel computational workflow, named MULTI-STAR, which addresses current limitations and provides tailored solutions for reliable multi-label patient subtyping. MULTI-STAR uses state-of-the-art subtyping methods to obtain promising machine learning-based multi-label classifiers, leveraging gene expression profiles. It modifies standard single-label similarity-based techniques to obtain multi-label patient characterizations. Then, it employs these characterizations to train single-sample predictors using different multi-label strategies and find the best-performing classifiers.</div></div><div><h3>Results:</h3><div>MULTI-STAR classifiers offer advanced multi-label recognition of all the subtypes contributing to the molecular and clinical traits of a patient, also distinguishing the primary from the additional relevant secondary subtype(s). The efficacy was demonstrated by developing multi-label solutions for breast and colorectal cancer subtyping that outperform existing methods in terms of prognostic value, primarily for overall survival predictions, and ability to work on a single sample at a time, as required in clinical practice.</div></div><div><h3>Conclusions:</h3><div>This work emphasizes the importance of moving to multi-label subtyping to capture all the molecular traits of individual patients, considering also previously overlooked secondary assignments and paving the way for improved clinical decision-making processes in diverse heterogeneous disease contexts. Indeed, MULTI-STAR novel, reproducible and generalizable approach provides comprehensive representations of patient inner heterogeneity and clinically relevant insights, contributing to precision medicine and personalized treatments.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104817"},"PeriodicalIF":4.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Low-cost algorithms for clinical notes phenotype classification to enhance epidemiological surveillance: A case study 用于临床笔记表型分类的低成本算法，以加强流行病学监测：案例研究

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics

Pub Date : 2025-04-08 DOI: 10.1016/j.jbi.2025.104795

Javier Petri , Pilar Barcena Barbeira , Martina Pesce , Verónica Xhardez , Rodrigo Laje , Viviana Cotik

Objective:

Our study aims to enhance epidemic intelligence through event-based surveillance in an emerging pandemic context. We classified electronic health records (EHRs) from La Rioja, Argentina, focusing on predicting COVID-19-related categories in a scenario with limited disease knowledge, evolving symptoms, non-standardized coding practices, and restricted training data due to privacy issues.

Methods:

Using natural language processing techniques, we developed rapid, cost-effective methods suitable for implementation with limited resources. We annotated a corpus for training and testing classification models, ranging from simple logistic regression to more complex fine-tuned transformers.

Results:

The transformer-based, Spanish-adapted models BETO Clínico and RoBERTa Clínico, further pre-trained with an unannotated portion of our corpus, were the best-performing models (F1= 88.13% and 87.01%). A simple logistic regression (LR) model ranked third (F1=85.09%), outperforming more complex models like XGBoost and BiLSTM. Data classified as COVID-confirmed using LR and BETO Clínico exhibit stronger time-series Pearson correlation with official COVID-19 case counts from the National Health Surveillance System (SNVS 2.0) in La Rioja province compared to the correlations observed between the International Code of Diseases (ICD-10) codes and the SNVS 2.0 data (0.840, 0.873, and 0.663, p-values

\leq 3 \times 1 0^{- 7}

). Both models have a good Pearson correlation with ICD-10 codes assigned to the clinical notes for confirmed (0.940 and 0.902) and for suspected cases (0.960 and 0.954), p-values

\leq 1.7 \times 1 0^{- 18}

.

Conclusion:

This study shows that simple, resource-efficient methods can achieve results comparable to complex approaches. BETO Clínico and LR strongly correlate with official data, revealing uncoded confirmed cases at the pandemic’s onset. Our results suggest that annotating a smaller set of EHRs and training a simple model may be more cost-effective than manual coding. This points to potentially efficient strategies in public health emergencies, particularly in resource-limited settings, and provides valuable insights for future epidemic response efforts.

{"title":"Low-cost algorithms for clinical notes phenotype classification to enhance epidemiological surveillance: A case study","authors":"Javier Petri , Pilar Barcena Barbeira , Martina Pesce , Verónica Xhardez , Rodrigo Laje , Viviana Cotik","doi":"10.1016/j.jbi.2025.104795","DOIUrl":"10.1016/j.jbi.2025.104795","url":null,"abstract":"<div><h3>Objective:</h3><div>Our study aims to enhance epidemic intelligence through event-based surveillance in an emerging pandemic context. We classified electronic health records (EHRs) from La Rioja, Argentina, focusing on predicting COVID-19-related categories in a scenario with limited disease knowledge, evolving symptoms, non-standardized coding practices, and restricted training data due to privacy issues.</div></div><div><h3>Methods:</h3><div>Using natural language processing techniques, we developed rapid, cost-effective methods suitable for implementation with limited resources. We annotated a corpus for training and testing classification models, ranging from simple logistic regression to more complex fine-tuned transformers.</div></div><div><h3>Results:</h3><div>The transformer-based, Spanish-adapted models BETO Clínico and RoBERTa Clínico, further pre-trained with an unannotated portion of our corpus, were the best-performing models (F1= 88.13% and 87.01%). A simple logistic regression (LR) model ranked third (F1=85.09%), outperforming more complex models like XGBoost and BiLSTM. Data classified as COVID-confirmed using LR and BETO Clínico exhibit stronger time-series Pearson correlation with official COVID-19 case counts from the National Health Surveillance System (SNVS 2.0) in La Rioja province compared to the correlations observed between the International Code of Diseases (ICD-10) codes and the SNVS 2.0 data (0.840, 0.873, and 0.663, p-values <span><math><mrow><mo>≤</mo><mn>3</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>7</mn></mrow></msup></mrow></math></span>). Both models have a good Pearson correlation with ICD-10 codes assigned to the clinical notes for confirmed (0.940 and 0.902) and for suspected cases (0.960 and 0.954), p-values <span><math><mrow><mo>≤</mo><mn>1</mn><mo>.</mo><mn>7</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>18</mn></mrow></msup></mrow></math></span>.</div></div><div><h3>Conclusion:</h3><div>This study shows that simple, resource-efficient methods can achieve results comparable to complex approaches. BETO Clínico and LR strongly correlate with official data, revealing uncoded confirmed cases at the pandemic’s onset. Our results suggest that annotating a smaller set of EHRs and training a simple model may be more cost-effective than manual coding. This points to potentially efficient strategies in public health emergencies, particularly in resource-limited settings, and provides valuable insights for future epidemic response efforts.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104795"},"PeriodicalIF":4.0,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143833466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transfer learning for a tabular-to-image approach: A case study for cardiovascular disease prediction

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics

Pub Date : 2025-04-08 DOI: 10.1016/j.jbi.2025.104821

Francisco J. Lara-Abelenda , David Chushig-Muzo , Pablo Peiro-Corbacho , Vanesa Gómez-Martínez , Ana M. Wägner , Conceição Granja , Cristina Soguero-Ruiz

Objective:

Machine learning (ML) models have been extensively used for tabular data classification but recent works have been developed to transform tabular data into images, aiming to leverage the predictive performance of convolutional neural networks (CNNs). However, most of these approaches fail to convert data with a low number of samples and mixed-type features. This study aims: to evaluate the performance of the tabular-to-image method named low mixed-image generator for tabular data (LM-IGTD); and to assess the effectiveness of transfer learning and fine-tuning for improving predictions on tabular data.

Methods:

We employed two public tabular datasets with patients diagnosed with cardiovascular diseases (CVDs): Framingham and Steno. First, both datasets were transformed into images using LM-IGTD. Then, Framingham, which contains a larger set of samples than Steno, is used to train CNN-based models. Finally, we performed transfer learning and fine-tuning using the pre-trained CNN on the Steno dataset to predict CVD risk.

Results:

The CNN-based model with transfer learning achieved the highest AUCORC in Steno (0.855), outperforming ML models such as decision trees, K-nearest neighbours, least absolute shrinkage and selection operator (LASSO) support vector machine and TabPFN. This approach improved accuracy by 2% over the best-performing traditional model, TabPFN.

Conclusion:

To the best of our knowledge, this is the first study that evaluates the effectiveness of applying transfer learning and fine-tuning to tabular data using tabular-to-image approaches. Through the use of CNNs’ predictive capabilities, our work also advances the diagnosis of CVD by providing a framework for early clinical intervention and decision-making support.

{"title":"Transfer learning for a tabular-to-image approach: A case study for cardiovascular disease prediction","authors":"Francisco J. Lara-Abelenda , David Chushig-Muzo , Pablo Peiro-Corbacho , Vanesa Gómez-Martínez , Ana M. Wägner , Conceição Granja , Cristina Soguero-Ruiz","doi":"10.1016/j.jbi.2025.104821","DOIUrl":"10.1016/j.jbi.2025.104821","url":null,"abstract":"<div><h3>Objective:</h3><div>Machine learning (ML) models have been extensively used for tabular data classification but recent works have been developed to transform tabular data into images, aiming to leverage the predictive performance of convolutional neural networks (CNNs). However, most of these approaches fail to convert data with a low number of samples and mixed-type features. This study aims: to evaluate the performance of the tabular-to-image method named low mixed-image generator for tabular data (LM-IGTD); and to assess the effectiveness of transfer learning and fine-tuning for improving predictions on tabular data.</div></div><div><h3>Methods:</h3><div>We employed two public tabular datasets with patients diagnosed with cardiovascular diseases (CVDs): Framingham and Steno. First, both datasets were transformed into images using LM-IGTD. Then, Framingham, which contains a larger set of samples than Steno, is used to train CNN-based models. Finally, we performed transfer learning and fine-tuning using the pre-trained CNN on the Steno dataset to predict CVD risk.</div></div><div><h3>Results:</h3><div>The CNN-based model with transfer learning achieved the highest AUCORC in Steno (0.855), outperforming ML models such as decision trees, K-nearest neighbours, least absolute shrinkage and selection operator (LASSO) support vector machine and TabPFN. This approach improved accuracy by 2% over the best-performing traditional model, TabPFN.</div></div><div><h3>Conclusion:</h3><div>To the best of our knowledge, this is the first study that evaluates the effectiveness of applying transfer learning and fine-tuning to tabular data using tabular-to-image approaches. Through the use of CNNs’ predictive capabilities, our work also advances the diagnosis of CVD by providing a framework for early clinical intervention and decision-making support.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"165 ","pages":"Article 104821"},"PeriodicalIF":4.0,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143799192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-modal fusion model for Time-Varying medical Data: Addressing Long-Term dependencies and memory challenges in sequence fusion

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics

Pub Date : 2025-04-04 DOI: 10.1016/j.jbi.2025.104823

Moxuan Ma , Muyu Wang , Lan Wei , Xiaolu Fei , Hui Chen

Background

Multi-modal time-varying data continuously generated during a patient’s hospitalization reflects the patient’s disease progression. Certain patient conditions may be associated with long-term states, which is a weakness of current medical multi-modal time-varying data fusion models. Daily ward round notes, as time-series long texts, are often neglected by models.

Objective

This study aims to develop an effective medical multi-modal time-varying data fusion model capable of extracting features from long sequences and long texts while capturing long-term dependencies.

Methods

We proposed a model called medical multi-modal fusion for long-term dependencies (MMF-LD) that fuses time-varying and time-invariant, tabular, and textual data. A progressive multi-modal fusion (PMF) strategy was introduced to address information loss in multi-modal time series fusion, particularly for long time-varying texts. With the integration of the attention mechanism, the long short-term storage memory (LSTsM) gained enhanced capacity to extract long-term dependencies. In conjunction with the temporal convolutional network (TCN), it extracted long-term features from time-varying sequences without neglecting the local contextual information of the time series. Model performance was evaluated on acute myocardial infarction (AMI) and stroke datasets for in-hospital mortality risk prediction and long length-of-stay prediction. area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and F1 score were used as evaluation metrics for model performance.

Results

The MMF-LD model demonstrated superior performance compared to other multi-modal time-varying data fusion models in model comparison experiments (AUROC: 0.947 and 0.918 in the AMI dataset, and 0.965 and 0.868 in the stroke dataset; AUPRC: 0.410 and 0.675, and 0.467 and 0.533; F1 score: 0.658 and 0.513, and 0.684 and 0.401). Ablation experiments confirmed that the proposed PMF strategy, LSTsM, and TCN modules all contributed to performance improvements as intended.

Conclusions

The proposed medical multi-modal time-varying data fusion architecture addresses the challenge of forgetting time-varying long textual information in time series fusion. It exhibits stable performance across multiple datasets and tasks. It exhibits strength in capturing long-term dependencies and shows stable performance across multiple datasets and tasks.

{"title":"Multi-modal fusion model for Time-Varying medical Data: Addressing Long-Term dependencies and memory challenges in sequence fusion","authors":"Moxuan Ma , Muyu Wang , Lan Wei , Xiaolu Fei , Hui Chen","doi":"10.1016/j.jbi.2025.104823","DOIUrl":"10.1016/j.jbi.2025.104823","url":null,"abstract":"<div><h3>Background</h3><div>Multi-modal time-varying data continuously generated during a patient’s hospitalization reflects the patient’s disease progression. Certain patient conditions may be associated with long-term states, which is a weakness of current medical multi-modal time-varying data fusion models. Daily ward round notes, as time-series long texts, are often neglected by models.</div></div><div><h3>Objective</h3><div>This study aims to develop an effective medical multi-modal time-varying data fusion model capable of extracting features from long sequences and long texts while capturing long-term dependencies.</div></div><div><h3>Methods</h3><div>We proposed a model called medical multi-modal fusion for long-term dependencies (MMF-LD) that fuses time-varying and time-invariant, tabular, and textual data. A progressive multi-modal fusion (PMF) strategy was introduced to address information loss in multi-modal time series fusion, particularly for long time-varying texts. With the integration of the attention mechanism, the long short-term storage memory (LSTsM) gained enhanced capacity to extract long-term dependencies. In conjunction with the temporal convolutional network (TCN), it extracted long-term features from time-varying sequences without neglecting the local contextual information of the time series. Model performance was evaluated on acute myocardial infarction (AMI) and stroke datasets for in-hospital mortality risk prediction and long length-of-stay prediction. area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and F1 score were used as evaluation metrics for model performance.</div></div><div><h3>Results</h3><div>The MMF-LD model demonstrated superior performance compared to other multi-modal time-varying data fusion models in model comparison experiments (AUROC: 0.947 and 0.918 in the AMI dataset, and 0.965 and 0.868 in the stroke dataset; AUPRC: 0.410 and 0.675, and 0.467 and 0.533; F1 score: 0.658 and 0.513, and 0.684 and 0.401). Ablation experiments confirmed that the proposed PMF strategy, LSTsM, and TCN modules all contributed to performance improvements as intended.</div></div><div><h3>Conclusions</h3><div>The proposed medical multi-modal time-varying data fusion architecture addresses the challenge of forgetting time-varying long textual information in time series fusion. It exhibits stable performance across multiple datasets and tasks. It exhibits strength in capturing long-term dependencies and shows stable performance across multiple datasets and tasks.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"165 ","pages":"Article 104823"},"PeriodicalIF":4.0,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deduplicating the FDA adverse event reporting system with a novel application of network-based grouping

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics

Pub Date : 2025-04-02 DOI: 10.1016/j.jbi.2025.104824

Kory Kreimeyer , Jonathan Spiker , Oanh Dang , Suranjan De , Robert Ball , Taxiarchis Botsis

Objective

To improve the reliability of data mining for product safety concerns in the Food and Drug Administration’s (FDA) Adverse Event Reporting System (FAERS) by robustly identifying duplicate reports describing the same patient experience.

Materials and methods

A duplicate detection algorithm based on a probabilistic record linkage algorithm, including features extracted from report narratives, and designed to support FAERS case safety review as part of the Information Visualization Platform (InfoViP) has been upgraded into a full deduplication pipeline for the entire FAERS database. The pipeline contains several new and updated components, including a network analysis-based community detection routine for breaking up sparsely connected groups of duplicates constructed from chains of pairwise comparisons. The pipeline was applied to all 29 million FAERS reports to assemble groups of duplicate cases.

Results

The pipeline was evaluated on 12 human expert adjudicated data sets with a total of 2300 reports and was found to have better overall performance than the current tool used at the FDA for labeling duplicates on 10 of them, with F1 scores ranging from 0.36 to 0.93, with half above 0.75. Because minimizing false discovery increases human expert review efficiency, the improved deduplication pipeline was applied to all historic and daily incoming FAERS reports at FDA and identified about 5 million reports as duplicates.

Conclusions

The InfoViP deduplication pipeline is operating at FDA to identify duplicate case reports in FAERS and provide deduplicated input for improved efficiency and accuracy of safety review operations like adverse event data mining calculations.

{"title":"Deduplicating the FDA adverse event reporting system with a novel application of network-based grouping","authors":"Kory Kreimeyer , Jonathan Spiker , Oanh Dang , Suranjan De , Robert Ball , Taxiarchis Botsis","doi":"10.1016/j.jbi.2025.104824","DOIUrl":"10.1016/j.jbi.2025.104824","url":null,"abstract":"<div><h3>Objective</h3><div>To improve the reliability of data mining for product safety concerns in the Food and Drug Administration’s (FDA) Adverse Event Reporting System (FAERS) by robustly identifying duplicate reports describing the same patient experience.</div></div><div><h3>Materials and methods</h3><div>A duplicate detection algorithm based on a probabilistic record linkage algorithm, including features extracted from report narratives, and designed to support FAERS case safety review as part of the Information Visualization Platform (InfoViP) has been upgraded into a full deduplication pipeline for the entire FAERS database. The pipeline contains several new and updated components, including a network analysis-based community detection routine for breaking up sparsely connected groups of duplicates constructed from chains of pairwise comparisons. The pipeline was applied to all 29 million FAERS reports to assemble groups of duplicate cases.</div></div><div><h3>Results</h3><div>The pipeline was evaluated on 12 human expert adjudicated data sets with a total of 2300 reports and was found to have better overall performance than the current tool used at the FDA for labeling duplicates on 10 of them, with F1 scores ranging from 0.36 to 0.93, with half above 0.75. Because minimizing false discovery increases human expert review efficiency, the improved deduplication pipeline was applied to all historic and daily incoming FAERS reports at FDA and identified about 5 million reports as duplicates.</div></div><div><h3>Conclusions</h3><div>The InfoViP deduplication pipeline is operating at FDA to identify duplicate case reports in FAERS and provide deduplicated input for improved efficiency and accuracy of safety review operations like adverse event data mining calculations.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"165 ","pages":"Article 104824"},"PeriodicalIF":4.0,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143777390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing data quality in medical concept normalization through large language models

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics

Pub Date : 2025-04-01 DOI: 10.1016/j.jbi.2025.104812

Haihua Chen , Ruochi Li , Ana Cleveland , Junhua Ding

Objective:

Medical concept normalization (MCN) aims to map informal medical terms to formal medical concepts, a critical task in building machine learning systems for medical applications. However, most existing studies on MCN primarily focus on models and algorithms, often overlooking the vital role of data quality. This research evaluates MCN performance across varying data quality scenarios and investigates how to leverage these evaluation results to enhance data quality, ultimately improving MCN performance through the use of large language models (LLMs). The effectiveness of the proposed approach is demonstrated through a case study.

Methods:

We begin by conducting a data quality evaluation of a dataset used for MCN. Based on these findings, we employ ChatGPT-based zero-shot prompting for data augmentation. The quality of the generated data is then assessed across the dimensions of correctness and comprehensiveness. A series of experiments is performed to analyze the impact of data quality on MCN model performance. These results guide us in implementing LLM-based few-shot prompting to further enhance data quality and improve model performance.

Results:

Duplication of data items within a dataset can lead to inaccurate evaluation results. Data augmentation techniques such as zero-shot and few-shot learning with ChatGPT can introduce duplicated data items, particularly those in the mean region of a dataset’s distribution. As such, data augmentation strategies must be carefully designed, incorporating context information and training data to avoid these issues. Additionally, we found that including augmented data in the testing set is necessary to fairly evaluate the effectiveness of data augmentation strategies.

Conclusion:

While LLMs can generate high-quality data for MCN, the success of data augmentation depends heavily on the strategy employed. Our study found that few-shot learning, with prompts that incorporate appropriate context and a small, representative set of original data, is an effective approach. The methods developed in this research, including the data quality evaluation framework, LLM-based data augmentation strategies, and procedures for data quality enhancement, provide valuable insights for data augmentation and evaluation in similar deep learning applications.

Availability:

https://github.com/RichardLRC/mcn-data-quality-llm/tree/main/evaluation

{"title":"Enhancing data quality in medical concept normalization through large language models","authors":"Haihua Chen , Ruochi Li , Ana Cleveland , Junhua Ding","doi":"10.1016/j.jbi.2025.104812","DOIUrl":"10.1016/j.jbi.2025.104812","url":null,"abstract":"<div><h3>Objective:</h3><div>Medical concept normalization (MCN) aims to map informal medical terms to formal medical concepts, a critical task in building machine learning systems for medical applications. However, most existing studies on MCN primarily focus on models and algorithms, often overlooking the vital role of data quality. This research evaluates MCN performance across varying data quality scenarios and investigates how to leverage these evaluation results to enhance data quality, ultimately improving MCN performance through the use of large language models (LLMs). The effectiveness of the proposed approach is demonstrated through a case study.</div></div><div><h3>Methods:</h3><div>We begin by conducting a data quality evaluation of a dataset used for MCN. Based on these findings, we employ ChatGPT-based zero-shot prompting for data augmentation. The quality of the generated data is then assessed across the dimensions of correctness and comprehensiveness. A series of experiments is performed to analyze the impact of data quality on MCN model performance. These results guide us in implementing LLM-based few-shot prompting to further enhance data quality and improve model performance.</div></div><div><h3>Results:</h3><div>Duplication of data items within a dataset can lead to inaccurate evaluation results. Data augmentation techniques such as zero-shot and few-shot learning with ChatGPT can introduce duplicated data items, particularly those in the mean region of a dataset’s distribution. As such, data augmentation strategies must be carefully designed, incorporating context information and training data to avoid these issues. Additionally, we found that including augmented data in the testing set is necessary to fairly evaluate the effectiveness of data augmentation strategies.</div></div><div><h3>Conclusion:</h3><div>While LLMs can generate high-quality data for MCN, the success of data augmentation depends heavily on the strategy employed. Our study found that few-shot learning, with prompts that incorporate appropriate context and a small, representative set of original data, is an effective approach. The methods developed in this research, including the data quality evaluation framework, LLM-based data augmentation strategies, and procedures for data quality enhancement, provide valuable insights for data augmentation and evaluation in similar deep learning applications.</div></div><div><h3>Availability:</h3><div><span><span>https://github.com/RichardLRC/mcn-data-quality-llm/tree/main/evaluation</span><svg><path></path></svg></span></div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"165 ","pages":"Article 104812"},"PeriodicalIF":4.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143777389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0