Pub Date : 2025-12-01Epub Date: 2025-11-14DOI: 10.1016/j.jbi.2025.104957
Iris Beerepoot , Sjaak Brinkkemper , Elke Huntink , Berfin Duman , Hajo A. Reijers , Nienke Bleijenberg
Objective:
To assess the feasibility of using a large language model (LLM) to generate structured event logs from conversational data in home-based nursing care, with the goal of reducing the documentation burden and enabling process analysis.
Methods:
We conducted an exploratory study involving 27 audio-recorded home care visits between district nurses and patients. These recordings were transcribed and used as input for a Generative Pre-Trained Transformer (GPT) to identify nursing interventions and construct event logs, using the standardised Nursing Interventions Classification (NIC) system. We applied and evaluated different prompts through an iterative, interdisciplinary process involving computer scientists and nurse researchers.
Results:
GPT demonstrated reasonable ability to extract nursing interventions from conversational transcripts, especially when activities were discussed explicitly and temporally aligned. Challenges emerged when information was implicit, ambiguous, or not captured in the dialogue. We propose five guidelines for using LLMs in this context, addressing data source limitations, activity label selection, confidence calibration, hallucination handling, and stakeholder-specific output needs. These guidelines provide lessons that extend beyond home care to other domains where conversational data must be translated into structured process insights.
Conclusion:
LLMs show promise for transforming informal clinical dialogue into structured representations of care. While expert oversight and tailored prompts remain essential, future model improvements may enhance reliability. Still, applications in real-world healthcare contexts must be handled with care to ensure accuracy, transparency, and stakeholder trust.
{"title":"Turning Dialogues Into Event Data: Lessons From GPT-Based Recognition of Nursing Actions","authors":"Iris Beerepoot , Sjaak Brinkkemper , Elke Huntink , Berfin Duman , Hajo A. Reijers , Nienke Bleijenberg","doi":"10.1016/j.jbi.2025.104957","DOIUrl":"10.1016/j.jbi.2025.104957","url":null,"abstract":"<div><h3>Objective:</h3><div>To assess the feasibility of using a large language model (LLM) to generate structured event logs from conversational data in home-based nursing care, with the goal of reducing the documentation burden and enabling process analysis.</div></div><div><h3>Methods:</h3><div>We conducted an exploratory study involving 27 audio-recorded home care visits between district nurses and patients. These recordings were transcribed and used as input for a Generative Pre-Trained Transformer (GPT) to identify nursing interventions and construct event logs, using the standardised Nursing Interventions Classification (NIC) system. We applied and evaluated different prompts through an iterative, interdisciplinary process involving computer scientists and nurse researchers.</div></div><div><h3>Results:</h3><div>GPT demonstrated reasonable ability to extract nursing interventions from conversational transcripts, especially when activities were discussed explicitly and temporally aligned. Challenges emerged when information was implicit, ambiguous, or not captured in the dialogue. We propose five guidelines for using LLMs in this context, addressing data source limitations, activity label selection, confidence calibration, hallucination handling, and stakeholder-specific output needs. These guidelines provide lessons that extend beyond home care to other domains where conversational data must be translated into structured process insights.</div></div><div><h3>Conclusion:</h3><div>LLMs show promise for transforming informal clinical dialogue into structured representations of care. While expert oversight and tailored prompts remain essential, future model improvements may enhance reliability. Still, applications in real-world healthcare contexts must be handled with care to ensure accuracy, transparency, and stakeholder trust.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104957"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145517870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-24DOI: 10.1016/j.jbi.2025.104938
Zitao Shuai , Chenwei Wu , Zhengxu Tang , David Restrepo , Michael Morley , Luis Filipe Nakayama
Objective:
AI-based DR screening is promising in low- and middle-income countries (LMICs), where limited human resources constrain access to specialist-led programs. However, current systems often degrade under real-world image-quality variations, especially with portable devices that are vital for low- and middle-income countries. This study aims to develop Retsyn, a synthetic-data augmentation framework that improves screening robustness across devices and imaging conditions.
Methods:
RetSyn leverages advanced diffusion models to generate synthetic retinal images with diverse device and imaging quality characteristics. To address the challenges of (1) portable device data scarcity, (2) disease and quality distribution imbalance, and (3) varying image quality, RetSyn uses class and quality-conditioned diffusion for controllable synthesis, a group-balanced loss to increase coverage of minority (quality, disease) pairs, and a Direct Preference Optimization alignment step with a small paired smartphone–tabletop set. The synthesized images are then used to augment classifier training.
Results:
The effectiveness of RetSyn-generated images was evaluated by training retinal diagnosis models on a combination of real and synthetic data. RetSyn yields consistent gains in-domain and out-of-domain. On low-quality tabletop images, F1 improves from 0.781 to 0.874 (binary) and 0.607 to 0.703 (three-class), while AUROC reaches 0.982 and 0.951, respectively. On out-of-domain portable images, RetSyn attains AUROC 0.813/F1 0.703 (binary) and AUROC 0.804/F1 0.609 (three-class), exceeding group-robustness baselines such as GroupDRO (binary: AUROC 0.786/F1 0.626; three-class: AUROC 0.789/F1 0.544).
Conclusion:
RetSyn presents an effective and scalable synthetic data framework that significantly enhances the robustness and generalizability of AI-based DR screening models in LMICs. By addressing the critical challenges posed by varying image quality and device characteristics, RetSyn facilitates more reliable deployment of AI diagnostics in underserved regions. Additionally, the release of the first publicly available paired smartphone-tabletop retinal image dataset will support further research into cross-device DR screening solutions.
{"title":"Enhancing AI-based diabetic retinopathy screening in low- and middle-income countries with synthetic data","authors":"Zitao Shuai , Chenwei Wu , Zhengxu Tang , David Restrepo , Michael Morley , Luis Filipe Nakayama","doi":"10.1016/j.jbi.2025.104938","DOIUrl":"10.1016/j.jbi.2025.104938","url":null,"abstract":"<div><h3>Objective:</h3><div>AI-based DR screening is promising in low- and middle-income countries (LMICs), where limited human resources constrain access to specialist-led programs. However, current systems often degrade under real-world image-quality variations, especially with portable devices that are vital for low- and middle-income countries. This study aims to develop Retsyn, a synthetic-data augmentation framework that improves screening robustness across devices and imaging conditions.</div></div><div><h3>Methods:</h3><div>RetSyn leverages advanced diffusion models to generate synthetic retinal images with diverse device and imaging quality characteristics. To address the challenges of (1) portable device data scarcity, (2) disease and quality distribution imbalance, and (3) varying image quality, RetSyn uses class and quality-conditioned diffusion for controllable synthesis, a group-balanced loss to increase coverage of minority (quality, disease) pairs, and a Direct Preference Optimization alignment step with a small paired smartphone–tabletop set. The synthesized images are then used to augment classifier training.</div></div><div><h3>Results:</h3><div>The effectiveness of RetSyn-generated images was evaluated by training retinal diagnosis models on a combination of real and synthetic data. RetSyn yields consistent gains in-domain and out-of-domain. On low-quality tabletop images, F1 improves from 0.781 to 0.874 (binary) and 0.607 to 0.703 (three-class), while AUROC reaches 0.982 and 0.951, respectively. On out-of-domain portable images, RetSyn attains AUROC 0.813/F1 0.703 (binary) and AUROC 0.804/F1 0.609 (three-class), exceeding group-robustness baselines such as GroupDRO (binary: AUROC 0.786/F1 0.626; three-class: AUROC 0.789/F1 0.544).</div></div><div><h3>Conclusion:</h3><div>RetSyn presents an effective and scalable synthetic data framework that significantly enhances the robustness and generalizability of AI-based DR screening models in LMICs. By addressing the critical challenges posed by varying image quality and device characteristics, RetSyn facilitates more reliable deployment of AI diagnostics in underserved regions. Additionally, the release of the first publicly available paired smartphone-tabletop retinal image dataset will support further research into cross-device DR screening solutions.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104938"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145370266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-01DOI: 10.1016/j.jbi.2025.104949
Yilun Liang , Gongbo Zhang , Edward Sun , Betina Idnay , Yilu Fang , Fangyi Chen , Casey Ta , Yifan Peng , Chunhua Weng
Objective
Research profiles highlight scientists’ research focus, enabling talent discovery and fostering collaborations, but they are often outdated. Automated, scalable methods are urgently needed to keep these profiles current.
Methods
In this study, we design and evaluate two Large Language Models (LLMs)-based methods to generate scientific interest profiles—one summarizing researchers’ PubMed abstracts and the other generating a summary using their publications’ Medical Subject Headings (MeSH) terms—and compare these machine-generated profiles with researchers’ self-summarized interests. We collected the titles, MeSH terms, and abstracts of PubMed publications for 595 faculty members affiliated with Columbia University Irving Medical Center (CUIMC), for 167 of whom we obtained human-written online research profiles. Subsequently, GPT-4o-mini, a state-of-the-art LLM, was prompted to summarize each researcher’s interests. Both manual and automated evaluations were conducted to characterize the similarities and differences between the machine-generated and self-written research profiles.
Results
The similarity study showed low ROUGE-L, BLEU, and METEOR scores, reflecting little overlap between terminologies used in machine-generated and self-written profiles. BERTScore analysis revealed moderate semantic similarity between machine-generated and reference summaries (F1: 0.542 for MeSH-based, 0.555 for abstract-based), despite low lexical overlap. In validation, paraphrased summaries achieved a higher F1 of 0.851. A further comparison between the original and paraphrased manually written summaries indicates the limitations of such metrics. Kullback-Leibler (KL) Divergence of term frequency-inverse document frequency (TF-IDF) values (8.56 and 8.58 for profiles derived from MeSH terms and abstracts, respectively) suggests that machine-generated summaries employ different keywords than human-written summaries. Manual reviews further showed that 77.78% rated the overall impression of MeSH-based profiling as “good” or “excellent,” with readability receiving favorable ratings in 93.44% of cases, though granularity and factual accuracy varied. Overall, panel reviews favored 67.86% of machine-generated profiles derived from MeSH terms over those derived from abstracts.
Conclusion
LLMs promise to automate scientific interest profiling at scale. Profiles derived from MeSH terms have better readability than profiles derived from abstracts. Overall, machine-generated summaries differ from human-written ones in their choice of concepts, with the latter initiating more novel ideas.
{"title":"Scalable scientific interest profiling using large language models","authors":"Yilun Liang , Gongbo Zhang , Edward Sun , Betina Idnay , Yilu Fang , Fangyi Chen , Casey Ta , Yifan Peng , Chunhua Weng","doi":"10.1016/j.jbi.2025.104949","DOIUrl":"10.1016/j.jbi.2025.104949","url":null,"abstract":"<div><h3>Objective</h3><div>Research profiles highlight scientists’ research focus, enabling talent discovery and fostering collaborations, but they are often outdated. Automated, scalable methods are urgently needed to keep these profiles current.</div></div><div><h3>Methods</h3><div>In this study, we design and evaluate two Large Language Models (LLMs)-based methods to generate scientific interest profiles—one summarizing researchers’ PubMed abstracts and the other generating a summary using their publications’ Medical Subject Headings (MeSH) terms—and compare these machine-generated profiles with researchers’ self-summarized interests. We collected the titles, MeSH terms, and abstracts of PubMed publications for 595 faculty members affiliated with Columbia University Irving Medical Center (CUIMC), for 167 of whom we obtained human-written online research profiles. Subsequently, GPT-4o-mini, a state-of-the-art LLM, was prompted to summarize each researcher’s interests. Both manual and automated evaluations were conducted to characterize the similarities and differences between the machine-generated and self-written research profiles.</div></div><div><h3>Results</h3><div>The similarity study showed low ROUGE-L, BLEU, and METEOR scores, reflecting little overlap between terminologies used in machine-generated and self-written profiles. BERTScore analysis revealed moderate semantic similarity between machine-generated and reference summaries (F1: 0.542 for MeSH-based, 0.555 for abstract-based), despite low lexical overlap. In validation, paraphrased summaries achieved a higher F1 of 0.851. A further comparison between the original and paraphrased manually written summaries indicates the limitations of such metrics. Kullback-Leibler (KL) Divergence of term frequency-inverse document frequency (TF-IDF) values (8.56 and 8.58 for profiles derived from MeSH terms and abstracts, respectively) suggests that machine-generated summaries employ different keywords than human-written summaries. Manual reviews further showed that 77.78% rated the overall impression of MeSH-based profiling as “good” or “excellent,” with readability receiving favorable ratings in 93.44% of cases, though granularity and factual accuracy varied. Overall, panel reviews favored 67.86% of machine-generated profiles derived from MeSH terms over those derived from abstracts.</div></div><div><h3>Conclusion</h3><div>LLMs promise to automate scientific interest profiling at scale. Profiles derived from MeSH terms have better readability than profiles derived from abstracts. Overall, machine-generated summaries differ from human-written ones in their choice of concepts, with the latter initiating more novel ideas.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104949"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145431708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-31DOI: 10.1016/j.jbi.2025.104942
Jiageng Wu , Xian Wu , Yefeng Zheng , Jie Yang
Objective:
Large language models (LLMs) offer promising potential in answering real-time medical queries, but they often produce lengthy, generic, and even hallucinatory responses. We aim to develop a reliable and interpretable medical dialogue system that incorporates clinical reasoning and then mitigates the risk of hallucination.
Methods:
Two large datasets of real-world online consultation, MedDG and KaMed, were used for evaluation. We proposed a Medical Dialogue System with Knowledge Enhancement and Clinical Pathway Encoding (MedKP), which integrates an external medical knowledge graph and encodes internal clinical pathways to model physician reasoning. Performance was compared with state-of-the-art baselines, including GPT-4o and LLaMA3.1-70B. A multi-dimensional evaluation framework assessed (1) clinical relevance (medical entity-based), (2) textual similarity (ROUGE, BLEU), (3) semantic alignment (BERTScore), and (4) hallucination and consistency via an external LLM-based judge, as well as parallel human evaluation.
Results:
Across both datasets, MedKP (6B) achieved the best overall performance, outperforming other advanced baselines and producing responses that align more closely with those of human physicians. For clinical relevance, MedKP reached a macro F1-score of medical entity at 31.41 on MedDG (previous best DFMed: 24.76, improved 30.41%) and 26.62 on KaMed (previous best LLaM-A3.1-70B: 20.67, improved 25.62%). Consistent improvements were observed across other metrics. Ablation studies further validated the effectiveness of each model component.
Conclusion:
Our results highlight the critical role of clinical reasoning in advancing trustworthy AI for digital healthcare. By enhancing the reliability, coherence, and transparency of AI-generated responses, this pathway-aware approach bridges the gap between LLMs and real-world clinical workflows, improving the accessibility of high-quality telemedicine services, particularly benefiting underserved populations.
{"title":"Clinical pathway-aware large language models for reliable and transparent medical dialogue","authors":"Jiageng Wu , Xian Wu , Yefeng Zheng , Jie Yang","doi":"10.1016/j.jbi.2025.104942","DOIUrl":"10.1016/j.jbi.2025.104942","url":null,"abstract":"<div><h3>Objective:</h3><div>Large language models (LLMs) offer promising potential in answering real-time medical queries, but they often produce lengthy, generic, and even hallucinatory responses. We aim to develop a reliable and interpretable medical dialogue system that incorporates clinical reasoning and then mitigates the risk of hallucination.</div></div><div><h3>Methods:</h3><div>Two large datasets of real-world online consultation, MedDG and KaMed, were used for evaluation. We proposed a Medical Dialogue System with Knowledge Enhancement and Clinical Pathway Encoding (MedKP), which integrates an external medical knowledge graph and encodes internal clinical pathways to model physician reasoning. Performance was compared with state-of-the-art baselines, including GPT-4o and LLaMA3.1-70B. A multi-dimensional evaluation framework assessed (1) clinical relevance (medical entity-based), (2) textual similarity (ROUGE, BLEU), (3) semantic alignment (BERTScore), and (4) hallucination and consistency via an external LLM-based judge, as well as parallel human evaluation.</div></div><div><h3>Results:</h3><div>Across both datasets, MedKP (6B) achieved the best overall performance, outperforming other advanced baselines and producing responses that align more closely with those of human physicians. For clinical relevance, MedKP reached a macro F1-score of medical entity at 31.41 on MedDG (previous best DFMed: 24.76, improved 30.41%) and 26.62 on KaMed (previous best LLaM-A3.1-70B: 20.67, improved 25.62%). Consistent improvements were observed across other metrics. Ablation studies further validated the effectiveness of each model component.</div></div><div><h3>Conclusion:</h3><div>Our results highlight the critical role of clinical reasoning in advancing trustworthy AI for digital healthcare. By enhancing the reliability, coherence, and transparency of AI-generated responses, this pathway-aware approach bridges the gap between LLMs and real-world clinical workflows, improving the accessibility of high-quality telemedicine services, particularly benefiting underserved populations.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104942"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145431672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-22DOI: 10.1016/j.jbi.2025.104941
Fan Ye , Xuan Hu , Yihao Ding , Feifei Liu
Objective:
Radiology report generation (RRG) is a transformative technology in the field of radiology imaging that aims to address the critical need for consistency and comprehensiveness in diagnostic interpretation. Although recent advances in graph-based representation learning have demonstrated excellent performance in disease progression modeling, their application in radiology report generation still suffers from three inherent limitations: (i) semantic separation between local image features and free-text descriptions, (ii) inherent noise in automated medical concept annotation, and (iii) lack of anatomical constraints in cross-modal attention mechanisms.
Method:
This study proposes a pseudo-label and knowledge-guided comparative learning (PKCL) framework, which addresses the above issues through a novel fusion of dynamic query learning and knowledge-guided contrastive learning. The PKCL framework employs a trainable cross-modal query matrix (QM) to learn shared representations through parameter-sharing self-attention mechanisms between imaging and text encoders. The QM is used during training to query disease-related visual regions in reports and enables dynamic alignment between radiological features and textual descriptions during both training and inference. Additionally, this method combines pseudo labels with an adaptive top-k weighted feature fusion strategy to enhance learning from standard comparisons and leverages pre-built knowledge graphs via the XRayVision (Cohen et al., 2022) model to account for disease relationships and anatomical dependencies, thereby improving the clinical accuracy of generated reports.
Results:
Comprehensive evaluations on the IU-Xray and MIMIC-CXR datasets demonstrate that PKCL achieves state-of-the-art performance on both natural language generation metrics and clinical efficacy metrics. Specifically, it obtains 0.499 BLEU-1 and 0.374 RL on IU-Xray, and 0.346 BLEU-1 and 0.277 RL on MIMIC-CXR, outperforming prior methods such as R2GEN and CMCL.
Furthermore, PKCL exhibited robust generalization on the out-of-domain Montgomery County X-ray Set, effectively handling its low-resource conditions and brief, diagnostic-level textual supervision.
Conclusion:
The framework’s ability to maintain semantic consistency when generating clinically relevant reports represents a significant advancement over existing methods, particularly in capturing the subtle relationships between radiological findings and their textual descriptions.
{"title":"Pseudo-labeling and knowledge-guided contrastive learning for radiology report generation","authors":"Fan Ye , Xuan Hu , Yihao Ding , Feifei Liu","doi":"10.1016/j.jbi.2025.104941","DOIUrl":"10.1016/j.jbi.2025.104941","url":null,"abstract":"<div><h3>Objective:</h3><div>Radiology report generation (RRG) is a transformative technology in the field of radiology imaging that aims to address the critical need for consistency and comprehensiveness in diagnostic interpretation. Although recent advances in graph-based representation learning have demonstrated excellent performance in disease progression modeling, their application in radiology report generation still suffers from three inherent limitations: (i) semantic separation between local image features and free-text descriptions, (ii) inherent noise in automated medical concept annotation, and (iii) lack of anatomical constraints in cross-modal attention mechanisms.</div></div><div><h3>Method:</h3><div>This study proposes a pseudo-label and knowledge-guided comparative learning (PKCL) framework, which addresses the above issues through a novel fusion of dynamic query learning and knowledge-guided contrastive learning. The PKCL framework employs a trainable cross-modal query matrix (QM) to learn shared representations through parameter-sharing self-attention mechanisms between imaging and text encoders. The QM is used during training to query disease-related visual regions in reports and enables dynamic alignment between radiological features and textual descriptions during both training and inference. Additionally, this method combines pseudo labels with an adaptive top-k weighted feature fusion strategy to enhance learning from standard comparisons and leverages pre-built knowledge graphs via the XRayVision (Cohen et al., 2022) model to account for disease relationships and anatomical dependencies, thereby improving the clinical accuracy of generated reports.</div></div><div><h3>Results:</h3><div>Comprehensive evaluations on the IU-Xray and MIMIC-CXR datasets demonstrate that PKCL achieves state-of-the-art performance on both natural language generation metrics and clinical efficacy metrics. Specifically, it obtains 0.499 BLEU-1 and 0.374 RL on IU-Xray, and 0.346 BLEU-1 and 0.277 RL on MIMIC-CXR, outperforming prior methods such as R2GEN and CMCL.</div><div>Furthermore, PKCL exhibited robust generalization on the out-of-domain Montgomery County X-ray Set, effectively handling its low-resource conditions and brief, diagnostic-level textual supervision.</div></div><div><h3>Conclusion:</h3><div>The framework’s ability to maintain semantic consistency when generating clinically relevant reports represents a significant advancement over existing methods, particularly in capturing the subtle relationships between radiological findings and their textual descriptions.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104941"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145368032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-14DOI: 10.1016/j.jbi.2025.104958
Chia-Hsuan Chang , Brian Ondov , Bin Choi , Xueqing Peng , Huan He , Hua Xu
Objective
The rapid expansion of biomedical literature necessitates effective approaches for organizing and interpreting complex research topics. Existing embedding-based topic modeling techniques provide flat clusters at single granularities, which ignores the reality of complex hierarchies of subjects. Our objective is to instead create a forest of topic trees, each of which start from a broad area and drill down to narrow specialties.
Methods
We propose TopicForest, a new embedding-driven hierarchical clustering and labeling framework that involves: (1) embedding biomedical abstracts within a high-dimensional semantic space using contrastively trained LLMs, (2) manifold learning to reduce dimensionality for visual interpretation, (3) hierarchical clustering via binary partitioning and multi-level dendrogram cutting, and (4) recursive LLM-based topic summarization to efficiently generate concise and coherent labels from the smallest clusters up to broad subjects covering thousands of publications. We construct a corpus comprising 24,366 biomedical abstracts from Scientific Reports, leveraging its human-curated topic hierarchy as gold-standard for evaluation. We evaluate clustering performance using Adjusted Mutual Information (AMI) and Dasgupta’s cost, while labeling quality is evaluated based on diversity and hierarchical affinity.
Results
TopicForest’s dendrogram cutting achieves AMI scores comparable to or better than flat embedding-based clustering methods such as BERTopic (with K-means or HDBSCAN) across multiple dimension-reduction strategies (t-SNE and UMAP), while uniquely providing multi-scale topic granularity. It also outperforms the deep hierarchical topic model HyperMiner, yielding higher AMI scores and comparable Dasgupta’s costs. For labeling, the proposed LLM recursive labeling method surpasses both c-TF-IDF and HyperMiner, achieving higher label diversity and hierarchical affinity, while maintaining efficient token usage. Furthermore, TopicForest maintains stable clustering quality across different embedding models, demonstrating robustness and generalizability in hierarchical topic discovery.
Conclusion
Through novel integration of LLMs, dimension reduction, and advanced hierarchical clustering techniques, TopicForest provides effective and interpretable hierarchical topic modeling for biomedical literature, facilitating multi-scale exploration and visualization of literature corpora.
{"title":"TopicForest: embedding-driven hierarchical clustering and labeling for biomedical literature","authors":"Chia-Hsuan Chang , Brian Ondov , Bin Choi , Xueqing Peng , Huan He , Hua Xu","doi":"10.1016/j.jbi.2025.104958","DOIUrl":"10.1016/j.jbi.2025.104958","url":null,"abstract":"<div><h3>Objective</h3><div>The rapid expansion of biomedical literature necessitates effective approaches for organizing and interpreting complex research topics. Existing embedding-based topic modeling techniques provide flat clusters at single granularities, which ignores the reality of complex hierarchies of subjects. Our objective is to instead create a forest of topic trees, each of which start from a broad area and drill down to narrow specialties.</div></div><div><h3>Methods</h3><div>We propose TopicForest, a new embedding-driven hierarchical clustering and labeling framework that involves: (1) embedding biomedical abstracts within a high-dimensional semantic space using contrastively trained LLMs, (2) manifold learning to reduce dimensionality for visual interpretation, (3) hierarchical clustering via binary partitioning and multi-level dendrogram cutting, and (4) recursive LLM-based topic summarization to efficiently generate concise and coherent labels from the smallest clusters up to broad subjects covering thousands of publications. We construct a corpus comprising 24,366 biomedical abstracts from Scientific Reports, leveraging its human-curated topic hierarchy as gold-standard for evaluation. We evaluate clustering performance using Adjusted Mutual Information (AMI) and Dasgupta’s cost, while labeling quality is evaluated based on diversity and hierarchical affinity.</div></div><div><h3>Results</h3><div>TopicForest’s dendrogram cutting achieves AMI scores comparable to or better than flat embedding-based clustering methods such as BERTopic (with K-means or HDBSCAN) across multiple dimension-reduction strategies (t-SNE and UMAP), while uniquely providing multi-scale topic granularity. It also outperforms the deep hierarchical topic model HyperMiner, yielding higher AMI scores and comparable Dasgupta’s costs. For labeling, the proposed LLM recursive labeling method surpasses both c-TF-IDF and HyperMiner, achieving higher label diversity and hierarchical affinity, while maintaining efficient token usage. Furthermore, TopicForest maintains stable clustering quality across different embedding models, demonstrating robustness and generalizability in hierarchical topic discovery.</div></div><div><h3>Conclusion</h3><div>Through novel integration of LLMs, dimension reduction, and advanced hierarchical clustering techniques, TopicForest provides effective and interpretable hierarchical topic modeling for biomedical literature, facilitating multi-scale exploration and visualization of literature corpora.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104958"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145534439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-07DOI: 10.1016/j.jbi.2025.104928
Nicholas I-Hsien Kuo, Blanca Gallego, Louisa Jorm
Objectives
Access to real-world healthcare data is constrained by privacy regulations and data imbalances, hindering the development of fair and reliable clinical prediction models. Synthetic data offers a potential solution, yet existing methods often fail to maintain calibration or enable subgroup-specific augmentation. This study introduces Masked Clinical Modelling (MCM), an attention-based synthetic data generation framework designed to enhance survival model calibration in both global and stratified analyses.
Methods
MCM uses masked feature reconstruction to learn feature dependencies without explicitly training on survival objectives. It supports both standalone dataset synthesis and conditional data augmentation, enabling the generation of targeted synthetic subcohorts without retraining. Evaluated on a chronic kidney disease (CKD) electronic health record (EHR) dataset, MCM was benchmarked against eight baseline methods, including variational autoencoders, GANs, SMOTE variants, and a recent risk-aware distillation model. Model performance was assessed via calibration loss, Cox model consistency, and Kaplan–Meier fidelity.
Results
MCM-generated data closely replicated statistical properties of the real dataset, pre- served hazard ratios, and matched time-to-event curves with high fidelity. Cox models trained on MCM-augmented data demonstrated improved calibration, reducing overall calibration loss by 15% and subgroup meta-calibration loss by 9% compared to unaugmented data. These improvements held across multiple high-risk subgroups including those with diabetes, renal dys- function, and advanced age. Unlike competing methods, MCM achieved this without retraining or outcome-specific tuning.
Conclusions
MCM offers a practical and flexible framework for generating synthetic survival data that improves risk model calibration. By supporting both reproducible dataset synthesis and conditional subgroup augmentation, MCM bridges privacy-preserving data access with calibration-aware learning. This work highlights the role of synthetic data not just as a privacy tool, but as a vehicle for improving equity and reliability in clinical modelling.
{"title":"Attention-based synthetic data generation for calibration-enhanced survival analysis: A case study for chronic kidney disease using electronic health records","authors":"Nicholas I-Hsien Kuo, Blanca Gallego, Louisa Jorm","doi":"10.1016/j.jbi.2025.104928","DOIUrl":"10.1016/j.jbi.2025.104928","url":null,"abstract":"<div><h3>Objectives</h3><div>Access to real-world healthcare data is constrained by privacy regulations and data imbalances, hindering the development of fair and reliable clinical prediction models. Synthetic data offers a potential solution, yet existing methods often fail to maintain calibration or enable subgroup-specific augmentation. This study introduces Masked Clinical Modelling (MCM), an attention-based synthetic data generation framework designed to enhance survival model calibration in both global and stratified analyses.</div></div><div><h3>Methods</h3><div>MCM uses masked feature reconstruction to learn feature dependencies without explicitly training on survival objectives. It supports both standalone dataset synthesis and conditional data augmentation, enabling the generation of targeted synthetic subcohorts without retraining. Evaluated on a chronic kidney disease (CKD) electronic health record (EHR) dataset, MCM was benchmarked against eight baseline methods, including variational autoencoders, GANs, SMOTE variants, and a recent risk-aware distillation model. Model performance was assessed via calibration loss, Cox model consistency, and Kaplan–Meier fidelity.</div></div><div><h3>Results</h3><div>MCM-generated data closely replicated statistical properties of the real dataset, pre- served hazard ratios, and matched time-to-event curves with high fidelity. Cox models trained on MCM-augmented data demonstrated improved calibration, reducing overall calibration loss by 15% and subgroup <em>meta</em>-calibration loss by 9% compared to unaugmented data. These improvements held across multiple high-risk subgroups including those with diabetes, renal dys- function, and advanced age. Unlike competing methods, MCM achieved this without retraining or outcome-specific tuning.</div></div><div><h3>Conclusions</h3><div>MCM offers a practical and flexible framework for generating synthetic survival data that improves risk model calibration. By supporting both reproducible dataset synthesis and conditional subgroup augmentation, MCM bridges privacy-preserving data access with calibration-aware learning. This work highlights the role of synthetic data not just as a privacy tool, but as a vehicle for improving equity and reliability in clinical modelling.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104928"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145476802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-14DOI: 10.1016/j.jbi.2025.104952
Xueqing Peng , Yutong Xie , Huan He , Brian Ondov , Kalpana Raja , Qijia Liu , Qiaozhu Mei , Hua Xu
Objective
The rapid growth of scientific literature necessitates robust methods to identify novel contributions. However, there is currently no widely-recognized measurement of novelty in biomedical research. Existing approaches typically quantify novelty using isolated article features, such as keywords, MeSH terms, or references, potentially losing important context and nuance from the semantic content of the text.
Methods
We propose SemNovel, a semantic novelty detection framework that leverages embeddings from Large Language Models (LLMs) to capture richer semantic content. Specifically, we adopt LLM-embedder (BAAI/llm-embedder) for semantic universe construction, a unified embedding model that integrates Llama2-7B-Chat as its foundation and BGE base as the embedding backbone. We employ t-distributed Stochastic Neighbor Embedding (t-SNE) for 2D visualization and project the entire PubMed library into a “semantic universe”. A SemNovel score is calculated for each article based on its distance from prior publications. We validated SemNovel’s effectiveness through its correlation with future research impact and its ability to distinguish groundbreaking studies. We further explored its potential for analyzing trends in research trajectories and interdisciplinary collaboration. To enhance usability, we developed an interactive interface for users to analyze SemNovel scores.
Results
The SemNovel score exhibited a positive correlation with future research impact, as measured by citation counts (ρ = 0.1782, p < 0.001, Spearman rank correlation), independent of factors such as journal impact factors (JIFs), publication years, and author counts, and outperformed previous semantic novelty indicators. It effectively identified highly novel papers, including Nobel Prize-winning studies (p < 0.001, Kolmogorov-Smirnov test). SemNovel also revealed trends in the evolution of scientific research, exemplified in the PD-1/PD-L1 field, and underscored the role of interdisciplinary collaboration in enhancing biomedical research novelty.
Conclusion
SemNovel represents a scalable and robust method for quantifying semantic novelty in biomedical literature. It provides a powerful tool for uncovering groundbreaking research, tracking scientific progress, and analyzing trends in innovation.
{"title":"SemNovel – A new approach to detecting semantic novelty of biomedical publications using embeddings of large language models","authors":"Xueqing Peng , Yutong Xie , Huan He , Brian Ondov , Kalpana Raja , Qijia Liu , Qiaozhu Mei , Hua Xu","doi":"10.1016/j.jbi.2025.104952","DOIUrl":"10.1016/j.jbi.2025.104952","url":null,"abstract":"<div><h3>Objective</h3><div>The rapid growth of scientific literature necessitates robust methods to identify novel contributions. However, there is currently no widely-recognized measurement of novelty in biomedical research. Existing approaches typically quantify novelty using isolated article features, such as keywords, MeSH terms, or references, potentially losing important context and nuance from the semantic content of the text.</div></div><div><h3>Methods</h3><div>We propose SemNovel, a semantic novelty detection framework that leverages embeddings from Large Language Models (LLMs) to capture richer semantic content. Specifically, we adopt LLM-embedder (BAAI/llm-embedder) for semantic universe construction, a unified embedding model that integrates Llama2-7B-Chat as its foundation and BGE base as the embedding backbone. We employ t-distributed Stochastic Neighbor Embedding (t-SNE) for 2D visualization and project the entire PubMed library into a “semantic universe”. A SemNovel score is calculated for each article based on its distance from prior publications. We validated SemNovel’s effectiveness through its correlation with future research impact and its ability to distinguish groundbreaking studies. We further explored its potential for analyzing trends in research trajectories and interdisciplinary collaboration. To enhance usability, we developed an interactive interface for users to analyze SemNovel scores.</div></div><div><h3>Results</h3><div>The SemNovel score exhibited a positive correlation with future research impact, as measured by citation counts (<em>ρ</em> = 0.1782, <em>p</em> < 0.001, Spearman rank correlation), independent of factors such as journal impact factors (JIFs), publication years, and author counts, and outperformed previous semantic novelty indicators. It effectively identified highly novel papers, including Nobel Prize-winning studies (<em>p</em> < 0.001, Kolmogorov-Smirnov test). SemNovel also revealed trends in the evolution of scientific research, exemplified in the PD-1/PD-L1 field, and underscored the role of interdisciplinary collaboration in enhancing biomedical research novelty.</div></div><div><h3>Conclusion</h3><div>SemNovel represents a scalable and robust method for quantifying semantic novelty in biomedical literature. It provides a powerful tool for uncovering groundbreaking research, tracking scientific progress, and analyzing trends in innovation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104952"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145534411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-11-15DOI: 10.1016/j.jbi.2025.104959
Xu Wang , Andrea Preston , Jonathan Aning , Shang-Ming Zhou
Objective
The rising incidence and mortality in bladder cancer (BC) underscore the importance of identifying asscociated features. Current reliance on haematuria as a primary indicator for BC proves inadequate. While mining electronic health records (EHRs) offer potential of identifying BC-related signals, traditional data-driven methods struggle with high-dimensional datasets. This study aims to uncover novel BC-associated clinical signals by developing Parsimony-driven cAtegory-balaNced binary Signal extractor for Primary Care EHRs (PanSPICE) tailored to extremely high-dimensional data linked from multi-centres.
Methods
We collected BC cases and control patients (n = 64,884) linked at patient-level from Welsh nationwide databases, yielding 48,261 features in primary care settings. The PanSPICE approach begins with information gain to pre-rank features, then applies Retentive Stickiness Binary Particle Swarm Optimisation (RSBPSO) combined with C5.0 classification tree to overcome computational barriers in feature selection. A two-layer optimisation treated clinical signals in care processes (POC), diagnoses (DIAG), and medications (MED) separately to prevent feature masking. A tailored fitness function for RSBPSO to simultaneously optimise model performance and feature sparsity. Associations of the selected features were interpreted using logistic regression models adjusted for deprivation indices.
Results
The PanSPICE identified 38 optimal features (AUC (area under the curve) = 0.81, 95 % CI: 0.80–0.82), including urinary tract infections (OR = 2.19, 95 % CI: 2.05–2.14) and inverse associations with stroke (OR = 0.64, 95 % CI: 0.54–0.74) and dementia (OR = 0.25, 95 % CI: 0.17–0.35). Gender stratification revealed female-specific urine glucose testing association (OR = 1.24, 95 % CI: 1.08–1.43). Certain medications, such as trimethoprim, were positively associated with BC, while others, including ramipril and prednisolone, showed protective effects.
Conclusion
The PanSPICE enables efficient high-dimensional EHR analysis, revealing under-recognised potential BC risk profiles and protective comorbidities. Gender-specific differences in BC associations highlight the importance of gender-stratified analyses, while computational advances provide a template for EHR-based clinical discovery. Findings warrant further mechanistic research into neurological protective pathways.
{"title":"Unveiling novel bladder cancer associations from multicentred primary and secondary care electronic health records by machine learning: a case-control study","authors":"Xu Wang , Andrea Preston , Jonathan Aning , Shang-Ming Zhou","doi":"10.1016/j.jbi.2025.104959","DOIUrl":"10.1016/j.jbi.2025.104959","url":null,"abstract":"<div><h3>Objective</h3><div>The rising incidence and mortality in bladder cancer (BC) underscore the importance of identifying asscociated features. Current reliance on haematuria as a primary indicator for BC proves inadequate. While mining electronic health records (EHRs) offer potential of identifying BC-related signals, traditional data-driven methods struggle with high-dimensional datasets. This study aims to uncover novel BC-associated clinical signals by developing Parsimony-driven cAtegory-balaNced binary Signal extractor for Primary Care EHRs (PanSPICE) tailored to extremely high-dimensional data linked from multi-centres.</div></div><div><h3>Methods</h3><div>We collected BC cases and control patients (n = 64,884) linked at patient-level from Welsh nationwide databases, yielding 48,261 features in primary care settings. The PanSPICE approach begins with information gain to pre-rank features, then applies Retentive Stickiness Binary Particle Swarm Optimisation (RSBPSO) combined with C5.0 classification tree to overcome computational barriers in feature selection. A two-layer optimisation treated clinical signals in care processes (POC), diagnoses (DIAG), and medications (MED) separately to prevent feature masking. A tailored fitness function for RSBPSO to simultaneously optimise model performance and feature sparsity. Associations of the selected features were interpreted using logistic regression models adjusted for deprivation indices.</div></div><div><h3>Results</h3><div>The PanSPICE identified 38 optimal features (AUC (area under the curve) = 0.81, 95 % CI: 0.80–0.82), including urinary tract infections (OR = 2.19, 95 % CI: 2.05–2.14) and inverse associations with stroke (OR = 0.64, 95 % CI: 0.54–0.74) and dementia (OR = 0.25, 95 % CI: 0.17–0.35). Gender stratification revealed female-specific urine glucose testing association (OR = 1.24, 95 % CI: 1.08–1.43). Certain medications, such as trimethoprim, were positively associated with BC, while others, including ramipril and prednisolone, showed protective effects.</div></div><div><h3>Conclusion</h3><div>The PanSPICE enables efficient high-dimensional EHR analysis, revealing under-recognised potential BC risk profiles and protective comorbidities. Gender-specific differences in BC associations highlight the importance of gender-stratified analyses, while computational advances provide a template for EHR-based clinical discovery. Findings warrant further mechanistic research into neurological protective pathways.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104959"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145534494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-27DOI: 10.1016/j.jbi.2025.104948
Yuxin Lin , Jing Ma , Suyu Dong , Chaoyu Sun , Wanting Cong , Kuanquan Wang , Gongning Luo , Wei Wang
Objective:
Existing generative models for electrocardiogram (ECG) synthesis often lack fine-grained, interpretable control, limiting their utility for addressing data scarcity and imbalance. This study aims to develop a model capable of producing diverse and semantically controllable synthetic ECGs to fill this critical gap.
Methods:
We propose TransDiffECG, a novel Transformer-based diffusion model that integrates semantic information injection and global temporal modeling to enable fine-grained control over ECG synthesis. The model allows user-controllable generation of ECG signals with customized physiological details. We establish a comprehensive evaluation protocol, including downstream segmentation and classification tasks, to rigorously assess the authenticity and utility of the generated signals. Extensive experiments are conducted on both single-lead (QTDB) and multi-lead (LUDB) ECG datasets.
Results:
TransDiffECG significantly outperforms state-of-the-art baselines. On the multi-lead LUDB dataset, it achieved superior signal quality (MMD: ; Pearson Correlation: 0.6177). The utility of the synthetic data was confirmed in downstream tasks, where data augmentation improved atrial fibrillation classification to an AUROC of 0.9451. Moreover, a segmentation model trained solely on our synthetic data rivaled one trained on real data (e.g., precision/recall on QTDB).
Conclusion:
TransDiffECG represents a significant advancement in synthetic medical signal generation by bridging the gap between clinical interpretability and generative flexibility. Its ability to generate semantically controllable and clinically valid ECGs greatly expands the application potential of generative models in healthcare research and practice.
{"title":"TransDiffECG: Semantically controllable ECG synthesis via transformer-based diffusion modeling","authors":"Yuxin Lin , Jing Ma , Suyu Dong , Chaoyu Sun , Wanting Cong , Kuanquan Wang , Gongning Luo , Wei Wang","doi":"10.1016/j.jbi.2025.104948","DOIUrl":"10.1016/j.jbi.2025.104948","url":null,"abstract":"<div><h3>Objective:</h3><div>Existing generative models for electrocardiogram (ECG) synthesis often lack fine-grained, interpretable control, limiting their utility for addressing data scarcity and imbalance. This study aims to develop a model capable of producing diverse and semantically controllable synthetic ECGs to fill this critical gap.</div></div><div><h3>Methods:</h3><div>We propose TransDiffECG, a novel Transformer-based diffusion model that integrates semantic information injection and global temporal modeling to enable fine-grained control over ECG synthesis. The model allows user-controllable generation of ECG signals with customized physiological details. We establish a comprehensive evaluation protocol, including downstream segmentation and classification tasks, to rigorously assess the authenticity and utility of the generated signals. Extensive experiments are conducted on both single-lead (QTDB) and multi-lead (LUDB) ECG datasets.</div></div><div><h3>Results:</h3><div>TransDiffECG significantly outperforms state-of-the-art baselines. On the multi-lead LUDB dataset, it achieved superior signal quality (MMD: <span><math><mrow><mn>3</mn><mo>.</mo><mn>21</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>2</mn></mrow></msup></mrow></math></span>; Pearson Correlation: 0.6177). The utility of the synthetic data was confirmed in downstream tasks, where data augmentation improved atrial fibrillation classification to an AUROC of 0.9451. Moreover, a segmentation model trained solely on our synthetic data rivaled one trained on real data (e.g., <span><math><mrow><mo>∼</mo><mn>98</mn><mtext>%</mtext></mrow></math></span> precision/recall on QTDB).</div></div><div><h3>Conclusion:</h3><div>TransDiffECG represents a significant advancement in synthetic medical signal generation by bridging the gap between clinical interpretability and generative flexibility. Its ability to generate semantically controllable and clinically valid ECGs greatly expands the application potential of generative models in healthcare research and practice.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104948"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145400897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}