首页 > 最新文献

Journal of Biomedical Informatics最新文献

英文 中文
Enhancing AI-based diabetic retinopathy screening in low- and middle-income countries with synthetic data 利用合成数据加强中低收入国家基于人工智能的糖尿病视网膜病变筛查。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-10-24 DOI: 10.1016/j.jbi.2025.104938
Zitao Shuai , Chenwei Wu , Zhengxu Tang , David Restrepo , Michael Morley , Luis Filipe Nakayama

Objective:

AI-based DR screening is promising in low- and middle-income countries (LMICs), where limited human resources constrain access to specialist-led programs. However, current systems often degrade under real-world image-quality variations, especially with portable devices that are vital for low- and middle-income countries. This study aims to develop Retsyn, a synthetic-data augmentation framework that improves screening robustness across devices and imaging conditions.

Methods:

RetSyn leverages advanced diffusion models to generate synthetic retinal images with diverse device and imaging quality characteristics. To address the challenges of (1) portable device data scarcity, (2) disease and quality distribution imbalance, and (3) varying image quality, RetSyn uses class and quality-conditioned diffusion for controllable synthesis, a group-balanced loss to increase coverage of minority (quality, disease) pairs, and a Direct Preference Optimization alignment step with a small paired smartphone–tabletop set. The synthesized images are then used to augment classifier training.

Results:

The effectiveness of RetSyn-generated images was evaluated by training retinal diagnosis models on a combination of real and synthetic data. RetSyn yields consistent gains in-domain and out-of-domain. On low-quality tabletop images, F1 improves from 0.781 to 0.874 (binary) and 0.607 to 0.703 (three-class), while AUROC reaches 0.982 and 0.951, respectively. On out-of-domain portable images, RetSyn attains AUROC 0.813/F1 0.703 (binary) and AUROC 0.804/F1 0.609 (three-class), exceeding group-robustness baselines such as GroupDRO (binary: AUROC 0.786/F1 0.626; three-class: AUROC 0.789/F1 0.544).

Conclusion:

RetSyn presents an effective and scalable synthetic data framework that significantly enhances the robustness and generalizability of AI-based DR screening models in LMICs. By addressing the critical challenges posed by varying image quality and device characteristics, RetSyn facilitates more reliable deployment of AI diagnostics in underserved regions. Additionally, the release of the first publicly available paired smartphone-tabletop retinal image dataset will support further research into cross-device DR screening solutions.
目的:基于人工智能的DR筛查在低收入和中等收入国家(LMICs)是有希望的,在这些国家,有限的人力资源限制了获得专家主导的项目。然而,目前的系统在真实世界的图像质量变化下往往会退化,特别是对于低收入和中等收入国家至关重要的便携式设备。本研究旨在开发Retsyn,这是一种综合数据增强框架,可提高设备和成像条件下的筛选稳健性。方法:RetSyn利用先进的扩散模型生成具有不同设备和成像质量特征的合成视网膜图像。为了解决以下挑战:(1)便携式设备数据稀缺;(2)疾病和质量分布不平衡;(3)图像质量变化,RetSyn使用类别和质量条件扩散进行可控合成,使用群体平衡损失来增加少数(质量,疾病)对的覆盖率,并使用小型配对智能手机-桌面集的直接偏好优化校准步骤。然后将合成的图像用于增强分类器训练。结果:retsyn生成图像的有效性通过训练视网膜诊断模型在真实和合成数据的组合进行评估。RetSyn在域内和域外产生一致的增益。在低质量桌面图像上,F1从0.781提高到0.874(二值),从0.607提高到0.703(三类),AUROC分别达到0.982和0.951。在域外便携式图像上,RetSyn达到了AUROC 0.813/F1 0.703(二进制)和AUROC 0.804/F1 0.609(三级),超过了GroupDRO(二进制:AUROC 0.786/F1 0.626;三级:AUROC 0.789/F1 0.544)等组鲁棒性基线。结论:RetSyn提供了一个有效且可扩展的综合数据框架,显著增强了基于人工智能的低收入国家DR筛选模型的鲁棒性和泛化性。通过解决不同图像质量和设备特性带来的关键挑战,RetSyn有助于在服务不足的地区更可靠地部署人工智能诊断。此外,第一个公开可用的配对智能手机桌面视网膜图像数据集的发布将支持跨设备DR筛查解决方案的进一步研究。
{"title":"Enhancing AI-based diabetic retinopathy screening in low- and middle-income countries with synthetic data","authors":"Zitao Shuai ,&nbsp;Chenwei Wu ,&nbsp;Zhengxu Tang ,&nbsp;David Restrepo ,&nbsp;Michael Morley ,&nbsp;Luis Filipe Nakayama","doi":"10.1016/j.jbi.2025.104938","DOIUrl":"10.1016/j.jbi.2025.104938","url":null,"abstract":"<div><h3>Objective:</h3><div>AI-based DR screening is promising in low- and middle-income countries (LMICs), where limited human resources constrain access to specialist-led programs. However, current systems often degrade under real-world image-quality variations, especially with portable devices that are vital for low- and middle-income countries. This study aims to develop Retsyn, a synthetic-data augmentation framework that improves screening robustness across devices and imaging conditions.</div></div><div><h3>Methods:</h3><div>RetSyn leverages advanced diffusion models to generate synthetic retinal images with diverse device and imaging quality characteristics. To address the challenges of (1) portable device data scarcity, (2) disease and quality distribution imbalance, and (3) varying image quality, RetSyn uses class and quality-conditioned diffusion for controllable synthesis, a group-balanced loss to increase coverage of minority (quality, disease) pairs, and a Direct Preference Optimization alignment step with a small paired smartphone–tabletop set. The synthesized images are then used to augment classifier training.</div></div><div><h3>Results:</h3><div>The effectiveness of RetSyn-generated images was evaluated by training retinal diagnosis models on a combination of real and synthetic data. RetSyn yields consistent gains in-domain and out-of-domain. On low-quality tabletop images, F1 improves from 0.781 to 0.874 (binary) and 0.607 to 0.703 (three-class), while AUROC reaches 0.982 and 0.951, respectively. On out-of-domain portable images, RetSyn attains AUROC 0.813/F1 0.703 (binary) and AUROC 0.804/F1 0.609 (three-class), exceeding group-robustness baselines such as GroupDRO (binary: AUROC 0.786/F1 0.626; three-class: AUROC 0.789/F1 0.544).</div></div><div><h3>Conclusion:</h3><div>RetSyn presents an effective and scalable synthetic data framework that significantly enhances the robustness and generalizability of AI-based DR screening models in LMICs. By addressing the critical challenges posed by varying image quality and device characteristics, RetSyn facilitates more reliable deployment of AI diagnostics in underserved regions. Additionally, the release of the first publicly available paired smartphone-tabletop retinal image dataset will support further research into cross-device DR screening solutions.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104938"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145370266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable scientific interest profiling using large language models 使用大型语言模型的可伸缩科学兴趣分析。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-11-01 DOI: 10.1016/j.jbi.2025.104949
Yilun Liang , Gongbo Zhang , Edward Sun , Betina Idnay , Yilu Fang , Fangyi Chen , Casey Ta , Yifan Peng , Chunhua Weng

Objective

Research profiles highlight scientists’ research focus, enabling talent discovery and fostering collaborations, but they are often outdated. Automated, scalable methods are urgently needed to keep these profiles current.

Methods

In this study, we design and evaluate two Large Language Models (LLMs)-based methods to generate scientific interest profiles—one summarizing researchers’ PubMed abstracts and the other generating a summary using their publications’ Medical Subject Headings (MeSH) terms—and compare these machine-generated profiles with researchers’ self-summarized interests. We collected the titles, MeSH terms, and abstracts of PubMed publications for 595 faculty members affiliated with Columbia University Irving Medical Center (CUIMC), for 167 of whom we obtained human-written online research profiles. Subsequently, GPT-4o-mini, a state-of-the-art LLM, was prompted to summarize each researcher’s interests. Both manual and automated evaluations were conducted to characterize the similarities and differences between the machine-generated and self-written research profiles.

Results

The similarity study showed low ROUGE-L, BLEU, and METEOR scores, reflecting little overlap between terminologies used in machine-generated and self-written profiles. BERTScore analysis revealed moderate semantic similarity between machine-generated and reference summaries (F1: 0.542 for MeSH-based, 0.555 for abstract-based), despite low lexical overlap. In validation, paraphrased summaries achieved a higher F1 of 0.851. A further comparison between the original and paraphrased manually written summaries indicates the limitations of such metrics. Kullback-Leibler (KL) Divergence of term frequency-inverse document frequency (TF-IDF) values (8.56 and 8.58 for profiles derived from MeSH terms and abstracts, respectively) suggests that machine-generated summaries employ different keywords than human-written summaries. Manual reviews further showed that 77.78% rated the overall impression of MeSH-based profiling as “good” or “excellent,” with readability receiving favorable ratings in 93.44% of cases, though granularity and factual accuracy varied. Overall, panel reviews favored 67.86% of machine-generated profiles derived from MeSH terms over those derived from abstracts.

Conclusion

LLMs promise to automate scientific interest profiling at scale. Profiles derived from MeSH terms have better readability than profiles derived from abstracts. Overall, machine-generated summaries differ from human-written ones in their choice of concepts, with the latter initiating more novel ideas.
研究概况突出了科学家的研究重点,使人才发现和促进合作,但它们往往过时。我们迫切需要自动化的、可伸缩的方法来保持这些概要文件的最新状态。方法:在本研究中,我们设计并评估了两种基于大语言模型(llm)的方法来生成科学兴趣概况——一种是总结研究人员的PubMed摘要,另一种是使用他们出版物的医学主题标题(MeSH)术语生成摘要——并将这些机器生成的概况与研究人员自己总结的兴趣进行比较。我们收集了哥伦比亚大学欧文医学中心(CUIMC)附属595名教职员工的PubMed出版物的标题、MeSH术语和摘要,其中167人获得了人工撰写的在线研究资料。随后,gpt - 40 -mini,一个最先进的LLM,被提示总结每个研究人员的兴趣。手工和自动评估都进行了,以表征机器生成的和自己编写的研究概况之间的异同。结果:相似性研究显示ROUGE-L、BLEU和METEOR得分较低,反映了机器生成和自己编写的概要文件中使用的术语之间几乎没有重叠。BERTScore分析显示,机器生成的摘要和参考摘要之间的语义相似度适中(基于mesh的F1为0.542,基于摘要的F1为0.555),尽管词汇重叠很少。在验证中,意译摘要的F1值更高,为0.851。进一步比较原始的和意译的手工写的摘要表明了这种度量的局限性。术语频率逆文档频率(TF-IDF)值的散度(分别为8.56和8.58,分别来自MeSH术语和摘要)表明,机器生成的摘要使用的关键词不同于人工编写的摘要。手工评审进一步表明,77.78%的人将基于mesh的分析的总体印象评为“好”或“优秀”,尽管粒度和事实准确性有所不同,但在93.44%的案例中,可读性获得了好评。总体而言,专家组评审对源自MeSH术语的67.86%的机器生成的概要比对源自摘要的概要更青睐。结论:法学硕士有望大规模自动化科学兴趣分析。从MeSH术语派生的概要文件比从摘要派生的概要文件具有更好的可读性。总的来说,机器生成的摘要与人类编写的摘要在概念选择上不同,后者会提出更多新颖的想法。
{"title":"Scalable scientific interest profiling using large language models","authors":"Yilun Liang ,&nbsp;Gongbo Zhang ,&nbsp;Edward Sun ,&nbsp;Betina Idnay ,&nbsp;Yilu Fang ,&nbsp;Fangyi Chen ,&nbsp;Casey Ta ,&nbsp;Yifan Peng ,&nbsp;Chunhua Weng","doi":"10.1016/j.jbi.2025.104949","DOIUrl":"10.1016/j.jbi.2025.104949","url":null,"abstract":"<div><h3>Objective</h3><div>Research profiles highlight scientists’ research focus, enabling talent discovery and fostering collaborations, but they are often outdated. Automated, scalable methods are urgently needed to keep these profiles current.</div></div><div><h3>Methods</h3><div>In this study, we design and evaluate two Large Language Models (LLMs)-based methods to generate scientific interest profiles—one summarizing researchers’ PubMed abstracts and the other generating a summary using their publications’ Medical Subject Headings (MeSH) terms—and compare these machine-generated profiles with researchers’ self-summarized interests. We collected the titles, MeSH terms, and abstracts of PubMed publications for 595 faculty members affiliated with Columbia University Irving Medical Center (CUIMC), for 167 of whom we obtained human-written online research profiles. Subsequently, GPT-4o-mini, a state-of-the-art LLM, was prompted to summarize each researcher’s interests. Both manual and automated evaluations were conducted to characterize the similarities and differences between the machine-generated and self-written research profiles.</div></div><div><h3>Results</h3><div>The similarity study showed low ROUGE-L, BLEU, and METEOR scores, reflecting little overlap between terminologies used in machine-generated and self-written profiles. BERTScore analysis revealed moderate semantic similarity between machine-generated and reference summaries (F1: 0.542 for MeSH-based, 0.555 for abstract-based), despite low lexical overlap. In validation, paraphrased summaries achieved a higher F1 of 0.851. A further comparison between the original and paraphrased manually written summaries indicates the limitations of such metrics. Kullback-Leibler (KL) Divergence of term frequency-inverse document frequency (TF-IDF) values (8.56 and 8.58 for profiles derived from MeSH terms and abstracts, respectively) suggests that machine-generated summaries employ different keywords than human-written summaries. Manual reviews further showed that 77.78% rated the overall impression of MeSH-based profiling as “good” or “excellent,” with readability receiving favorable ratings in 93.44% of cases, though granularity and factual accuracy varied. Overall, panel reviews favored 67.86% of machine-generated profiles derived from MeSH terms over those derived from abstracts.</div></div><div><h3>Conclusion</h3><div>LLMs promise to automate scientific interest profiling at scale. Profiles derived from MeSH terms have better readability than profiles derived from abstracts. Overall, machine-generated summaries differ from human-written ones in their choice of concepts, with the latter initiating more novel ideas.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104949"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145431708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical pathway-aware large language models for reliable and transparent medical dialogue 临床路径感知大语言模型可靠和透明的医疗对话。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-10-31 DOI: 10.1016/j.jbi.2025.104942
Jiageng Wu , Xian Wu , Yefeng Zheng , Jie Yang

Objective:

Large language models (LLMs) offer promising potential in answering real-time medical queries, but they often produce lengthy, generic, and even hallucinatory responses. We aim to develop a reliable and interpretable medical dialogue system that incorporates clinical reasoning and then mitigates the risk of hallucination.

Methods:

Two large datasets of real-world online consultation, MedDG and KaMed, were used for evaluation. We proposed a Medical Dialogue System with Knowledge Enhancement and Clinical Pathway Encoding (MedKP), which integrates an external medical knowledge graph and encodes internal clinical pathways to model physician reasoning. Performance was compared with state-of-the-art baselines, including GPT-4o and LLaMA3.1-70B. A multi-dimensional evaluation framework assessed (1) clinical relevance (medical entity-based), (2) textual similarity (ROUGE, BLEU), (3) semantic alignment (BERTScore), and (4) hallucination and consistency via an external LLM-based judge, as well as parallel human evaluation.

Results:

Across both datasets, MedKP (6B) achieved the best overall performance, outperforming other advanced baselines and producing responses that align more closely with those of human physicians. For clinical relevance, MedKP reached a macro F1-score of medical entity at 31.41 on MedDG (previous best DFMed: 24.76, improved 30.41%) and 26.62 on KaMed (previous best LLaM-A3.1-70B: 20.67, improved 25.62%). Consistent improvements were observed across other metrics. Ablation studies further validated the effectiveness of each model component.

Conclusion:

Our results highlight the critical role of clinical reasoning in advancing trustworthy AI for digital healthcare. By enhancing the reliability, coherence, and transparency of AI-generated responses, this pathway-aware approach bridges the gap between LLMs and real-world clinical workflows, improving the accessibility of high-quality telemedicine services, particularly benefiting underserved populations.
目的:大型语言模型(llm)在回答实时医疗查询方面提供了很好的潜力,但它们经常产生冗长、通用甚至是幻觉的响应。我们的目标是开发一个可靠的和可解释的医疗对话系统,结合临床推理,然后减轻幻觉的风险。方法:使用MedDG和KaMed两大真实在线咨询数据集进行评估。我们提出了一个具有知识增强和临床路径编码的医学对话系统(MedKP),该系统集成了外部医学知识图和编码内部临床路径来模拟医生推理。性能比较了最先进的基准,包括gpt - 40和LLaMA3.1-70B。多维评估框架评估了(1)临床相关性(基于医学实体),(2)文本相似性(ROUGE, BLEU),(3)语义一致性(BERTScore),以及(4)幻觉和一致性,通过外部基于llm的判断,以及并行的人类评估。结果:在两个数据集中,MedKP (6B)取得了最佳的总体表现,优于其他先进的基线,并产生与人类医生更接近的反应。临床相关性方面,MedKP在MedDG上达到宏观医疗实体f1评分31.41分(前最佳DFMed: 24.76分,提高30.41%),在KaMed上达到26.62分(前最佳LLaM-A3.1-70B: 20.67分,提高25.62%)。在其他指标中观察到一致的改进。消融研究进一步验证了各模型组成部分的有效性。结论:我们的研究结果强调了临床推理在推进数字医疗领域值得信赖的人工智能方面的关键作用。通过提高人工智能生成响应的可靠性、一致性和透明度,这种路径感知方法弥合了法学硕士与现实世界临床工作流程之间的差距,提高了高质量远程医疗服务的可及性,特别是使服务不足的人群受益。
{"title":"Clinical pathway-aware large language models for reliable and transparent medical dialogue","authors":"Jiageng Wu ,&nbsp;Xian Wu ,&nbsp;Yefeng Zheng ,&nbsp;Jie Yang","doi":"10.1016/j.jbi.2025.104942","DOIUrl":"10.1016/j.jbi.2025.104942","url":null,"abstract":"<div><h3>Objective:</h3><div>Large language models (LLMs) offer promising potential in answering real-time medical queries, but they often produce lengthy, generic, and even hallucinatory responses. We aim to develop a reliable and interpretable medical dialogue system that incorporates clinical reasoning and then mitigates the risk of hallucination.</div></div><div><h3>Methods:</h3><div>Two large datasets of real-world online consultation, MedDG and KaMed, were used for evaluation. We proposed a Medical Dialogue System with Knowledge Enhancement and Clinical Pathway Encoding (MedKP), which integrates an external medical knowledge graph and encodes internal clinical pathways to model physician reasoning. Performance was compared with state-of-the-art baselines, including GPT-4o and LLaMA3.1-70B. A multi-dimensional evaluation framework assessed (1) clinical relevance (medical entity-based), (2) textual similarity (ROUGE, BLEU), (3) semantic alignment (BERTScore), and (4) hallucination and consistency via an external LLM-based judge, as well as parallel human evaluation.</div></div><div><h3>Results:</h3><div>Across both datasets, MedKP (6B) achieved the best overall performance, outperforming other advanced baselines and producing responses that align more closely with those of human physicians. For clinical relevance, MedKP reached a macro F1-score of medical entity at 31.41 on MedDG (previous best DFMed: 24.76, improved 30.41%) and 26.62 on KaMed (previous best LLaM-A3.1-70B: 20.67, improved 25.62%). Consistent improvements were observed across other metrics. Ablation studies further validated the effectiveness of each model component.</div></div><div><h3>Conclusion:</h3><div>Our results highlight the critical role of clinical reasoning in advancing trustworthy AI for digital healthcare. By enhancing the reliability, coherence, and transparency of AI-generated responses, this pathway-aware approach bridges the gap between LLMs and real-world clinical workflows, improving the accessibility of high-quality telemedicine services, particularly benefiting underserved populations.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104942"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145431672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pseudo-labeling and knowledge-guided contrastive learning for radiology report generation 伪标记与知识导向对比学习在放射学报告生成中的应用。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-10-22 DOI: 10.1016/j.jbi.2025.104941
Fan Ye , Xuan Hu , Yihao Ding , Feifei Liu

Objective:

Radiology report generation (RRG) is a transformative technology in the field of radiology imaging that aims to address the critical need for consistency and comprehensiveness in diagnostic interpretation. Although recent advances in graph-based representation learning have demonstrated excellent performance in disease progression modeling, their application in radiology report generation still suffers from three inherent limitations: (i) semantic separation between local image features and free-text descriptions, (ii) inherent noise in automated medical concept annotation, and (iii) lack of anatomical constraints in cross-modal attention mechanisms.

Method:

This study proposes a pseudo-label and knowledge-guided comparative learning (PKCL) framework, which addresses the above issues through a novel fusion of dynamic query learning and knowledge-guided contrastive learning. The PKCL framework employs a trainable cross-modal query matrix (QM) to learn shared representations through parameter-sharing self-attention mechanisms between imaging and text encoders. The QM is used during training to query disease-related visual regions in reports and enables dynamic alignment between radiological features and textual descriptions during both training and inference. Additionally, this method combines pseudo labels with an adaptive top-k weighted feature fusion strategy to enhance learning from standard comparisons and leverages pre-built knowledge graphs via the XRayVision (Cohen et al., 2022) model to account for disease relationships and anatomical dependencies, thereby improving the clinical accuracy of generated reports.

Results:

Comprehensive evaluations on the IU-Xray and MIMIC-CXR datasets demonstrate that PKCL achieves state-of-the-art performance on both natural language generation metrics and clinical efficacy metrics. Specifically, it obtains 0.499 BLEU-1 and 0.374 RL on IU-Xray, and 0.346 BLEU-1 and 0.277 RL on MIMIC-CXR, outperforming prior methods such as R2GEN and CMCL.
Furthermore, PKCL exhibited robust generalization on the out-of-domain Montgomery County X-ray Set, effectively handling its low-resource conditions and brief, diagnostic-level textual supervision.

Conclusion:

The framework’s ability to maintain semantic consistency when generating clinically relevant reports represents a significant advancement over existing methods, particularly in capturing the subtle relationships between radiological findings and their textual descriptions.
目的:放射学报告生成(RRG)是放射学成像领域的一项变革性技术,旨在解决诊断解释中一致性和全面性的关键需求。尽管基于图的表示学习的最新进展在疾病进展建模方面表现出色,但其在放射学报告生成中的应用仍然存在三个固有限制:(i)局部图像特征和自由文本描述之间的语义分离,(ii)自动医学概念注释中的固有噪声,以及(iii)跨模态注意机制缺乏解剖约束。方法:本研究提出了一个伪标签和知识引导比较学习(PKCL)框架,该框架通过动态查询学习和知识引导对比学习的新颖融合来解决上述问题。PKCL框架采用可训练的跨模态查询矩阵(QM),通过图像和文本编码器之间的参数共享自注意机制来学习共享表示。QM在训练期间用于查询报告中与疾病相关的视觉区域,并在训练和推理期间实现放射特征和文本描述之间的动态对齐。此外,该方法将伪标签与自适应top-k加权特征融合策略相结合,以增强从标准比较中学习的能力,并通过XRayVision (Cohen等人,2022)模型利用预先构建的知识图来解释疾病关系和解剖依赖性,从而提高生成报告的临床准确性。结果:对iu - x射线和MIMIC-CXR数据集的综合评估表明,PKCL在自然语言生成指标和临床疗效指标上都达到了最先进的水平。具体而言,该方法在IU-Xray上获得0.499 BLEU-1和0.374 RL,在MIMIC-CXR上获得0.346 BLEU-1和0.277 RL,优于R2GEN和CMCL等先前的方法。此外,PKCL在域外蒙哥马利县x射线集上表现出鲁棒泛化,有效地处理了其低资源条件和简短的诊断级文本监督。结论:该框架在生成临床相关报告时保持语义一致性的能力代表了现有方法的重大进步,特别是在捕捉放射学发现与其文本描述之间的微妙关系方面。
{"title":"Pseudo-labeling and knowledge-guided contrastive learning for radiology report generation","authors":"Fan Ye ,&nbsp;Xuan Hu ,&nbsp;Yihao Ding ,&nbsp;Feifei Liu","doi":"10.1016/j.jbi.2025.104941","DOIUrl":"10.1016/j.jbi.2025.104941","url":null,"abstract":"<div><h3>Objective:</h3><div>Radiology report generation (RRG) is a transformative technology in the field of radiology imaging that aims to address the critical need for consistency and comprehensiveness in diagnostic interpretation. Although recent advances in graph-based representation learning have demonstrated excellent performance in disease progression modeling, their application in radiology report generation still suffers from three inherent limitations: (i) semantic separation between local image features and free-text descriptions, (ii) inherent noise in automated medical concept annotation, and (iii) lack of anatomical constraints in cross-modal attention mechanisms.</div></div><div><h3>Method:</h3><div>This study proposes a pseudo-label and knowledge-guided comparative learning (PKCL) framework, which addresses the above issues through a novel fusion of dynamic query learning and knowledge-guided contrastive learning. The PKCL framework employs a trainable cross-modal query matrix (QM) to learn shared representations through parameter-sharing self-attention mechanisms between imaging and text encoders. The QM is used during training to query disease-related visual regions in reports and enables dynamic alignment between radiological features and textual descriptions during both training and inference. Additionally, this method combines pseudo labels with an adaptive top-k weighted feature fusion strategy to enhance learning from standard comparisons and leverages pre-built knowledge graphs via the XRayVision (Cohen et al., 2022) model to account for disease relationships and anatomical dependencies, thereby improving the clinical accuracy of generated reports.</div></div><div><h3>Results:</h3><div>Comprehensive evaluations on the IU-Xray and MIMIC-CXR datasets demonstrate that PKCL achieves state-of-the-art performance on both natural language generation metrics and clinical efficacy metrics. Specifically, it obtains 0.499 BLEU-1 and 0.374 RL on IU-Xray, and 0.346 BLEU-1 and 0.277 RL on MIMIC-CXR, outperforming prior methods such as R2GEN and CMCL.</div><div>Furthermore, PKCL exhibited robust generalization on the out-of-domain Montgomery County X-ray Set, effectively handling its low-resource conditions and brief, diagnostic-level textual supervision.</div></div><div><h3>Conclusion:</h3><div>The framework’s ability to maintain semantic consistency when generating clinically relevant reports represents a significant advancement over existing methods, particularly in capturing the subtle relationships between radiological findings and their textual descriptions.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104941"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145368032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TopicForest: embedding-driven hierarchical clustering and labeling for biomedical literature TopicForest:生物医学文献的嵌入驱动分层聚类和标记。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-11-14 DOI: 10.1016/j.jbi.2025.104958
Chia-Hsuan Chang , Brian Ondov , Bin Choi , Xueqing Peng , Huan He , Hua Xu

Objective

The rapid expansion of biomedical literature necessitates effective approaches for organizing and interpreting complex research topics. Existing embedding-based topic modeling techniques provide flat clusters at single granularities, which ignores the reality of complex hierarchies of subjects. Our objective is to instead create a forest of topic trees, each of which start from a broad area and drill down to narrow specialties.

Methods

We propose TopicForest, a new embedding-driven hierarchical clustering and labeling framework that involves: (1) embedding biomedical abstracts within a high-dimensional semantic space using contrastively trained LLMs, (2) manifold learning to reduce dimensionality for visual interpretation, (3) hierarchical clustering via binary partitioning and multi-level dendrogram cutting, and (4) recursive LLM-based topic summarization to efficiently generate concise and coherent labels from the smallest clusters up to broad subjects covering thousands of publications. We construct a corpus comprising 24,366 biomedical abstracts from Scientific Reports, leveraging its human-curated topic hierarchy as gold-standard for evaluation. We evaluate clustering performance using Adjusted Mutual Information (AMI) and Dasgupta’s cost, while labeling quality is evaluated based on diversity and hierarchical affinity.

Results

TopicForest’s dendrogram cutting achieves AMI scores comparable to or better than flat embedding-based clustering methods such as BERTopic (with K-means or HDBSCAN) across multiple dimension-reduction strategies (t-SNE and UMAP), while uniquely providing multi-scale topic granularity. It also outperforms the deep hierarchical topic model HyperMiner, yielding higher AMI scores and comparable Dasgupta’s costs. For labeling, the proposed LLM recursive labeling method surpasses both c-TF-IDF and HyperMiner, achieving higher label diversity and hierarchical affinity, while maintaining efficient token usage. Furthermore, TopicForest maintains stable clustering quality across different embedding models, demonstrating robustness and generalizability in hierarchical topic discovery.

Conclusion

Through novel integration of LLMs, dimension reduction, and advanced hierarchical clustering techniques, TopicForest provides effective and interpretable hierarchical topic modeling for biomedical literature, facilitating multi-scale exploration and visualization of literature corpora.
目的:生物医学文献的快速增长需要有效的方法来组织和解释复杂的研究课题。现有的基于嵌入的主题建模技术提供了单粒度的平面聚类,忽略了主题复杂层次的现实。我们的目标是创建一个主题树的森林,每个主题树都从一个广泛的领域开始,并向下钻取到狭窄的专业。方法:我们提出了TopicForest,一个新的嵌入驱动的分层聚类和标记框架,包括:(1)使用对比训练的llm在高维语义空间中嵌入生物医学摘要;(2)歧形学习以降维为视觉解释;(3)通过二元划分和多级树图切割进行分层聚类;(4)基于递归llm的主题摘要,从最小的聚类到涵盖数千种出版物的广泛主题,有效地生成简洁连贯的标签。我们从《科学报告》中构建了一个包含24,366篇生物医学摘要的语料库,利用其人工策划的主题层次作为评估的黄金标准。我们使用调整互信息(AMI)和Dasgupta成本来评估聚类性能,而基于多样性和层次亲和力来评估标记质量。结果:TopicForest的树形图切割在多个降维策略(t-SNE和UMAP)上实现了与基于平面嵌入的聚类方法(如BERTopic(使用K-means或HDBSCAN))相当或更好的AMI分数,同时独特地提供了多尺度主题粒度。它也优于深度分层主题模型HyperMiner,产生更高的AMI分数和可比较的Dasgupta的成本。在标记方面,本文提出的LLM递归标记方法超越了c-TF-IDF和HyperMiner,实现了更高的标签多样性和层次亲和性,同时保持了高效的令牌使用。此外,TopicForest在不同嵌入模型之间保持稳定的聚类质量,展示了分层主题发现的鲁棒性和泛化性。结论:TopicForest通过llm、降维和先进的分层聚类技术的新颖集成,为生物医学文献提供了有效的、可解释的分层主题建模,促进了文献语料库的多尺度探索和可视化。
{"title":"TopicForest: embedding-driven hierarchical clustering and labeling for biomedical literature","authors":"Chia-Hsuan Chang ,&nbsp;Brian Ondov ,&nbsp;Bin Choi ,&nbsp;Xueqing Peng ,&nbsp;Huan He ,&nbsp;Hua Xu","doi":"10.1016/j.jbi.2025.104958","DOIUrl":"10.1016/j.jbi.2025.104958","url":null,"abstract":"<div><h3>Objective</h3><div>The rapid expansion of biomedical literature necessitates effective approaches for organizing and interpreting complex research topics. Existing embedding-based topic modeling techniques provide flat clusters at single granularities, which ignores the reality of complex hierarchies of subjects. Our objective is to instead create a forest of topic trees, each of which start from a broad area and drill down to narrow specialties.</div></div><div><h3>Methods</h3><div>We propose TopicForest, a new embedding-driven hierarchical clustering and labeling framework that involves: (1) embedding biomedical abstracts within a high-dimensional semantic space using contrastively trained LLMs, (2) manifold learning to reduce dimensionality for visual interpretation, (3) hierarchical clustering via binary partitioning and multi-level dendrogram cutting, and (4) recursive LLM-based topic summarization to efficiently generate concise and coherent labels from the smallest clusters up to broad subjects covering thousands of publications. We construct a corpus comprising 24,366 biomedical abstracts from Scientific Reports, leveraging its human-curated topic hierarchy as gold-standard for evaluation. We evaluate clustering performance using Adjusted Mutual Information (AMI) and Dasgupta’s cost, while labeling quality is evaluated based on diversity and hierarchical affinity.</div></div><div><h3>Results</h3><div>TopicForest’s dendrogram cutting achieves AMI scores comparable to or better than flat embedding-based clustering methods such as BERTopic (with K-means or HDBSCAN) across multiple dimension-reduction strategies (t-SNE and UMAP), while uniquely providing multi-scale topic granularity. It also outperforms the deep hierarchical topic model HyperMiner, yielding higher AMI scores and comparable Dasgupta’s costs. For labeling, the proposed LLM recursive labeling method surpasses both c-TF-IDF and HyperMiner, achieving higher label diversity and hierarchical affinity, while maintaining efficient token usage. Furthermore, TopicForest maintains stable clustering quality across different embedding models, demonstrating robustness and generalizability in hierarchical topic discovery.</div></div><div><h3>Conclusion</h3><div>Through novel integration of LLMs, dimension reduction, and advanced hierarchical clustering techniques, TopicForest provides effective and interpretable hierarchical topic modeling for biomedical literature, facilitating multi-scale exploration and visualization of literature corpora.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104958"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145534439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention-based synthetic data generation for calibration-enhanced survival analysis: A case study for chronic kidney disease using electronic health records 基于注意力的合成数据生成用于校准增强生存分析:使用电子健康记录的慢性肾脏疾病案例研究。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-11-07 DOI: 10.1016/j.jbi.2025.104928
Nicholas I-Hsien Kuo, Blanca Gallego, Louisa Jorm

Objectives

Access to real-world healthcare data is constrained by privacy regulations and data imbalances, hindering the development of fair and reliable clinical prediction models. Synthetic data offers a potential solution, yet existing methods often fail to maintain calibration or enable subgroup-specific augmentation. This study introduces Masked Clinical Modelling (MCM), an attention-based synthetic data generation framework designed to enhance survival model calibration in both global and stratified analyses.

Methods

MCM uses masked feature reconstruction to learn feature dependencies without explicitly training on survival objectives. It supports both standalone dataset synthesis and conditional data augmentation, enabling the generation of targeted synthetic subcohorts without retraining. Evaluated on a chronic kidney disease (CKD) electronic health record (EHR) dataset, MCM was benchmarked against eight baseline methods, including variational autoencoders, GANs, SMOTE variants, and a recent risk-aware distillation model. Model performance was assessed via calibration loss, Cox model consistency, and Kaplan–Meier fidelity.

Results

MCM-generated data closely replicated statistical properties of the real dataset, pre- served hazard ratios, and matched time-to-event curves with high fidelity. Cox models trained on MCM-augmented data demonstrated improved calibration, reducing overall calibration loss by 15% and subgroup meta-calibration loss by 9% compared to unaugmented data. These improvements held across multiple high-risk subgroups including those with diabetes, renal dys- function, and advanced age. Unlike competing methods, MCM achieved this without retraining or outcome-specific tuning.

Conclusions

MCM offers a practical and flexible framework for generating synthetic survival data that improves risk model calibration. By supporting both reproducible dataset synthesis and conditional subgroup augmentation, MCM bridges privacy-preserving data access with calibration-aware learning. This work highlights the role of synthetic data not just as a privacy tool, but as a vehicle for improving equity and reliability in clinical modelling.
目的:对现实世界医疗保健数据的访问受到隐私法规和数据不平衡的限制,阻碍了公平可靠的临床预测模型的发展。合成数据提供了一个潜在的解决方案,但现有的方法往往无法维持校准或使特定子组的增强。本研究引入了掩蔽临床模型(MCM),这是一种基于注意力的合成数据生成框架,旨在增强全局和分层分析中的生存模型校准。方法:MCM在没有明确生存目标训练的情况下,使用掩模特征重构来学习特征依赖关系。它既支持独立数据集合成,也支持有条件的数据增强,无需再训练即可生成目标合成子队列。在慢性肾脏疾病(CKD)电子健康记录(EHR)数据集上进行评估,MCM与八种基线方法进行基准测试,包括变分自动编码器、gan、SMOTE变体和最近的风险感知蒸馏模型。通过校准损失、Cox模型一致性和Kaplan-Meier保真度评估模型性能。结果:mcm生成的数据与真实数据集的统计特性、预先保存的风险比和高保真度匹配的时间-事件曲线非常接近。在mcm增强数据上训练的Cox模型显示,与未增强数据相比,校准效果更好,总体校准损失减少了15%,亚组元校准损失减少了9%。这些改善在多个高危亚组中都存在,包括糖尿病患者、肾功能不全患者和老年患者。与其他竞争方法不同,MCM无需重新训练或特定于结果的调优即可实现此目标。结论:MCM为生成综合生存数据提供了一个实用而灵活的框架,可改善风险模型校准。通过支持可重复数据集合成和条件子组扩展,MCM将隐私保护数据访问与校准感知学习连接起来。这项工作强调了合成数据的作用,不仅是作为隐私工具,而且作为提高临床建模公平性和可靠性的工具。
{"title":"Attention-based synthetic data generation for calibration-enhanced survival analysis: A case study for chronic kidney disease using electronic health records","authors":"Nicholas I-Hsien Kuo,&nbsp;Blanca Gallego,&nbsp;Louisa Jorm","doi":"10.1016/j.jbi.2025.104928","DOIUrl":"10.1016/j.jbi.2025.104928","url":null,"abstract":"<div><h3>Objectives</h3><div>Access to real-world healthcare data is constrained by privacy regulations and data imbalances, hindering the development of fair and reliable clinical prediction models. Synthetic data offers a potential solution, yet existing methods often fail to maintain calibration or enable subgroup-specific augmentation. This study introduces Masked Clinical Modelling (MCM), an attention-based synthetic data generation framework designed to enhance survival model calibration in both global and stratified analyses.</div></div><div><h3>Methods</h3><div>MCM uses masked feature reconstruction to learn feature dependencies without explicitly training on survival objectives. It supports both standalone dataset synthesis and conditional data augmentation, enabling the generation of targeted synthetic subcohorts without retraining. Evaluated on a chronic kidney disease (CKD) electronic health record (EHR) dataset, MCM was benchmarked against eight baseline methods, including variational autoencoders, GANs, SMOTE variants, and a recent risk-aware distillation model. Model performance was assessed via calibration loss, Cox model consistency, and Kaplan–Meier fidelity.</div></div><div><h3>Results</h3><div>MCM-generated data closely replicated statistical properties of the real dataset, pre- served hazard ratios, and matched time-to-event curves with high fidelity. Cox models trained on MCM-augmented data demonstrated improved calibration, reducing overall calibration loss by 15% and subgroup <em>meta</em>-calibration loss by 9% compared to unaugmented data. These improvements held across multiple high-risk subgroups including those with diabetes, renal dys- function, and advanced age. Unlike competing methods, MCM achieved this without retraining or outcome-specific tuning.</div></div><div><h3>Conclusions</h3><div>MCM offers a practical and flexible framework for generating synthetic survival data that improves risk model calibration. By supporting both reproducible dataset synthesis and conditional subgroup augmentation, MCM bridges privacy-preserving data access with calibration-aware learning. This work highlights the role of synthetic data not just as a privacy tool, but as a vehicle for improving equity and reliability in clinical modelling.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104928"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145476802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SemNovel – A new approach to detecting semantic novelty of biomedical publications using embeddings of large language models 半新颖——一种利用大型语言模型嵌入来检测生物医学出版物语义新颖性的新方法。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-11-14 DOI: 10.1016/j.jbi.2025.104952
Xueqing Peng , Yutong Xie , Huan He , Brian Ondov , Kalpana Raja , Qijia Liu , Qiaozhu Mei , Hua Xu

Objective

The rapid growth of scientific literature necessitates robust methods to identify novel contributions. However, there is currently no widely-recognized measurement of novelty in biomedical research. Existing approaches typically quantify novelty using isolated article features, such as keywords, MeSH terms, or references, potentially losing important context and nuance from the semantic content of the text.

Methods

We propose SemNovel, a semantic novelty detection framework that leverages embeddings from Large Language Models (LLMs) to capture richer semantic content. Specifically, we adopt LLM-embedder (BAAI/llm-embedder) for semantic universe construction, a unified embedding model that integrates Llama2-7B-Chat as its foundation and BGE base as the embedding backbone. We employ t-distributed Stochastic Neighbor Embedding (t-SNE) for 2D visualization and project the entire PubMed library into a “semantic universe”. A SemNovel score is calculated for each article based on its distance from prior publications. We validated SemNovel’s effectiveness through its correlation with future research impact and its ability to distinguish groundbreaking studies. We further explored its potential for analyzing trends in research trajectories and interdisciplinary collaboration. To enhance usability, we developed an interactive interface for users to analyze SemNovel scores.

Results

The SemNovel score exhibited a positive correlation with future research impact, as measured by citation counts (ρ = 0.1782, p < 0.001, Spearman rank correlation), independent of factors such as journal impact factors (JIFs), publication years, and author counts, and outperformed previous semantic novelty indicators. It effectively identified highly novel papers, including Nobel Prize-winning studies (p < 0.001, Kolmogorov-Smirnov test). SemNovel also revealed trends in the evolution of scientific research, exemplified in the PD-1/PD-L1 field, and underscored the role of interdisciplinary collaboration in enhancing biomedical research novelty.

Conclusion

SemNovel represents a scalable and robust method for quantifying semantic novelty in biomedical literature. It provides a powerful tool for uncovering groundbreaking research, tracking scientific progress, and analyzing trends in innovation.
目的:科学文献的快速增长需要强有力的方法来识别新的贡献。然而,目前在生物医学研究中没有广泛认可的新颖性测量方法。现有的方法通常使用孤立的文章特征(如关键词、MeSH术语或参考文献)来量化新颖性,这可能会失去重要的上下文和文本语义内容的细微差别。方法:我们提出了SemNovel,一个语义新颖性检测框架,利用来自大型语言模型(llm)的嵌入来捕获更丰富的语义内容。具体而言,我们采用LLM-embedder (BAAI/ LLM-embedder)构建语义宇宙,这是一个以Llama2-7B-Chat为基础,BGE基为嵌入骨干的统一嵌入模型。我们采用t分布随机邻居嵌入(t-SNE)进行二维可视化,并将整个PubMed库投影到一个“语义宇宙”中。根据每篇文章与先前出版物的距离计算其SemNovel分数。我们通过其与未来研究影响的相关性以及区分突破性研究的能力来验证SemNovel的有效性。我们进一步探索了它在分析研究轨迹和跨学科合作趋势方面的潜力。为了提高可用性,我们为用户开发了一个交互界面来分析SemNovel分数。结果:SemNovel分数与未来的研究影响呈正相关,通过引用计数来衡量(ρ = 0.1782,p )。结论:SemNovel是量化生物医学文献中语义新颖性的一种可扩展且稳健的方法。它为发现突破性研究、跟踪科学进展和分析创新趋势提供了强大的工具。
{"title":"SemNovel – A new approach to detecting semantic novelty of biomedical publications using embeddings of large language models","authors":"Xueqing Peng ,&nbsp;Yutong Xie ,&nbsp;Huan He ,&nbsp;Brian Ondov ,&nbsp;Kalpana Raja ,&nbsp;Qijia Liu ,&nbsp;Qiaozhu Mei ,&nbsp;Hua Xu","doi":"10.1016/j.jbi.2025.104952","DOIUrl":"10.1016/j.jbi.2025.104952","url":null,"abstract":"<div><h3>Objective</h3><div>The rapid growth of scientific literature necessitates robust methods to identify novel contributions. However, there is currently no widely-recognized measurement of novelty in biomedical research. Existing approaches typically quantify novelty using isolated article features, such as keywords, MeSH terms, or references, potentially losing important context and nuance from the semantic content of the text.</div></div><div><h3>Methods</h3><div>We propose SemNovel, a semantic novelty detection framework that leverages embeddings from Large Language Models (LLMs) to capture richer semantic content. Specifically, we adopt LLM-embedder (BAAI/llm-embedder) for semantic universe construction, a unified embedding model that integrates Llama2-7B-Chat as its foundation and BGE base as the embedding backbone. We employ t-distributed Stochastic Neighbor Embedding (t-SNE) for 2D visualization and project the entire PubMed library into a “semantic universe”. A SemNovel score is calculated for each article based on its distance from prior publications. We validated SemNovel’s effectiveness through its correlation with future research impact and its ability to distinguish groundbreaking studies. We further explored its potential for analyzing trends in research trajectories and interdisciplinary collaboration. To enhance usability, we developed an interactive interface for users to analyze SemNovel scores.</div></div><div><h3>Results</h3><div>The SemNovel score exhibited a positive correlation with future research impact, as measured by citation counts (<em>ρ</em> = 0.1782, <em>p</em> &lt; 0.001, Spearman rank correlation), independent of factors such as journal impact factors (JIFs), publication years, and author counts, and outperformed previous semantic novelty indicators. It effectively identified highly novel papers, including Nobel Prize-winning studies (<em>p</em> &lt; 0.001, Kolmogorov-Smirnov test). SemNovel also revealed trends in the evolution of scientific research, exemplified in the PD-1/PD-L1 field, and underscored the role of interdisciplinary collaboration in enhancing biomedical research novelty.</div></div><div><h3>Conclusion</h3><div>SemNovel represents a scalable and robust method for quantifying semantic novelty in biomedical literature. It provides a powerful tool for uncovering groundbreaking research, tracking scientific progress, and analyzing trends in innovation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104952"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145534411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling novel bladder cancer associations from multicentred primary and secondary care electronic health records by machine learning: a case-control study 通过机器学习从多中心初级和二级保健电子健康记录中揭示新的膀胱癌关联:一项病例对照研究
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-11-15 DOI: 10.1016/j.jbi.2025.104959
Xu Wang , Andrea Preston , Jonathan Aning , Shang-Ming Zhou

Objective

The rising incidence and mortality in bladder cancer (BC) underscore the importance of identifying asscociated features. Current reliance on haematuria as a primary indicator for BC proves inadequate. While mining electronic health records (EHRs) offer potential of identifying BC-related signals, traditional data-driven methods struggle with high-dimensional datasets. This study aims to uncover novel BC-associated clinical signals by developing Parsimony-driven cAtegory-balaNced binary Signal extractor for Primary Care EHRs (PanSPICE) tailored to extremely high-dimensional data linked from multi-centres.

Methods

We collected BC cases and control patients (n = 64,884) linked at patient-level from Welsh nationwide databases, yielding 48,261 features in primary care settings. The PanSPICE approach begins with information gain to pre-rank features, then applies Retentive Stickiness Binary Particle Swarm Optimisation (RSBPSO) combined with C5.0 classification tree to overcome computational barriers in feature selection. A two-layer optimisation treated clinical signals in care processes (POC), diagnoses (DIAG), and medications (MED) separately to prevent feature masking. A tailored fitness function for RSBPSO to simultaneously optimise model performance and feature sparsity. Associations of the selected features were interpreted using logistic regression models adjusted for deprivation indices.

Results

The PanSPICE identified 38 optimal features (AUC (area under the curve) = 0.81, 95 % CI: 0.80–0.82), including urinary tract infections (OR = 2.19, 95 % CI: 2.05–2.14) and inverse associations with stroke (OR = 0.64, 95 % CI: 0.54–0.74) and dementia (OR = 0.25, 95 % CI: 0.17–0.35). Gender stratification revealed female-specific urine glucose testing association (OR = 1.24, 95 % CI: 1.08–1.43). Certain medications, such as trimethoprim, were positively associated with BC, while others, including ramipril and prednisolone, showed protective effects.

Conclusion

The PanSPICE enables efficient high-dimensional EHR analysis, revealing under-recognised potential BC risk profiles and protective comorbidities. Gender-specific differences in BC associations highlight the importance of gender-stratified analyses, while computational advances provide a template for EHR-based clinical discovery. Findings warrant further mechanistic research into neurological protective pathways.
目的:膀胱癌(BC)发病率和死亡率的上升强调了识别相关特征的重要性。目前依赖血尿作为BC的主要指标被证明是不充分的。虽然挖掘电子健康记录(EHRs)提供了识别bc相关信号的潜力,但传统的数据驱动方法难以处理高维数据集。本研究旨在通过开发用于初级保健电子病历(PanSPICE)的parsimony驱动的类别平衡二进制信号提取器(PanSPICE)来发现新的bc相关临床信号,该提取器专为来自多中心的高维数据量身定制。方法:我们从威尔士全国数据库中收集BC病例和对照患者(n = 64,884),在初级保健机构中获得48,261个特征。PanSPICE方法从信息增益开始对特征进行预排序,然后将保留粘性二元粒子群优化(RSBPSO)与C5.0分类树相结合,克服特征选择中的计算障碍。两层优化分别处理护理过程(POC)、诊断(DIAG)和药物(MED)中的临床信号,以防止特征掩蔽。为RSBPSO量身定制适应度函数,同时优化模型性能和特征稀疏性。所选特征之间的关联使用剥夺指数调整后的逻辑回归模型进行解释。结果:38 PanSPICE确定最优特性(AUC(曲线下的面积) = 0.81,95 % CI: 0.80 - -0.82),包括尿路感染(或 = 2.19,95 % CI: 2.05 - -2.14)和逆协会与中风(或 = 0.64,95 % CI: 0.54 - -0.74)和老年痴呆症(或 = 0.25,95 % CI: 0.17 - -0.35)。性别分层显示女性特异性尿糖检测相关(OR = 1.24,95 % CI: 1.08-1.43)。某些药物,如甲氧苄啶,与BC呈正相关,而其他药物,包括雷米普利和强的松龙,显示出保护作用。结论:PanSPICE实现了高效的高维电子病历分析,揭示了未被识别的潜在BC风险概况和保护性合并症。BC相关性的性别差异突出了性别分层分析的重要性,而计算的进步为基于ehr的临床发现提供了模板。研究结果支持对神经保护通路进行进一步的机制研究。
{"title":"Unveiling novel bladder cancer associations from multicentred primary and secondary care electronic health records by machine learning: a case-control study","authors":"Xu Wang ,&nbsp;Andrea Preston ,&nbsp;Jonathan Aning ,&nbsp;Shang-Ming Zhou","doi":"10.1016/j.jbi.2025.104959","DOIUrl":"10.1016/j.jbi.2025.104959","url":null,"abstract":"<div><h3>Objective</h3><div>The rising incidence and mortality in bladder cancer (BC) underscore the importance of identifying asscociated features. Current reliance on haematuria as a primary indicator for BC proves inadequate. While mining electronic health records (EHRs) offer potential of identifying BC-related signals, traditional data-driven methods struggle with high-dimensional datasets. This study aims to uncover novel BC-associated clinical signals by developing Parsimony-driven cAtegory-balaNced binary Signal extractor for Primary Care EHRs (PanSPICE) tailored to extremely high-dimensional data linked from multi-centres.</div></div><div><h3>Methods</h3><div>We collected BC cases and control patients (n = 64,884) linked at patient-level from Welsh nationwide databases, yielding 48,261 features in primary care settings. The PanSPICE approach begins with information gain to pre-rank features, then applies Retentive Stickiness Binary Particle Swarm Optimisation (RSBPSO) combined with C5.0 classification tree to overcome computational barriers in feature selection. A two-layer optimisation treated clinical signals in care processes (POC), diagnoses (DIAG), and medications (MED) separately to prevent feature masking. A tailored fitness function for RSBPSO to simultaneously optimise model performance and feature sparsity. Associations of the selected features were interpreted using logistic regression models adjusted for deprivation indices.</div></div><div><h3>Results</h3><div>The PanSPICE identified 38 optimal features (AUC (area under the curve) = 0.81, 95 % CI: 0.80–0.82), including urinary tract infections (OR = 2.19, 95 % CI: 2.05–2.14) and inverse associations with stroke (OR = 0.64, 95 % CI: 0.54–0.74) and dementia (OR = 0.25, 95 % CI: 0.17–0.35). Gender stratification revealed female-specific urine glucose testing association (OR = 1.24, 95 % CI: 1.08–1.43). Certain medications, such as trimethoprim, were positively associated with BC, while others, including ramipril and prednisolone, showed protective effects.</div></div><div><h3>Conclusion</h3><div>The PanSPICE enables efficient high-dimensional EHR analysis, revealing under-recognised potential BC risk profiles and protective comorbidities. Gender-specific differences in BC associations highlight the importance of gender-stratified analyses, while computational advances provide a template for EHR-based clinical discovery. Findings warrant further mechanistic research into neurological protective pathways.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104959"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145534494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransDiffECG: Semantically controllable ECG synthesis via transformer-based diffusion modeling TransDiffECG:基于变压器扩散建模的语义可控心电合成。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-10-27 DOI: 10.1016/j.jbi.2025.104948
Yuxin Lin , Jing Ma , Suyu Dong , Chaoyu Sun , Wanting Cong , Kuanquan Wang , Gongning Luo , Wei Wang

Objective:

Existing generative models for electrocardiogram (ECG) synthesis often lack fine-grained, interpretable control, limiting their utility for addressing data scarcity and imbalance. This study aims to develop a model capable of producing diverse and semantically controllable synthetic ECGs to fill this critical gap.

Methods:

We propose TransDiffECG, a novel Transformer-based diffusion model that integrates semantic information injection and global temporal modeling to enable fine-grained control over ECG synthesis. The model allows user-controllable generation of ECG signals with customized physiological details. We establish a comprehensive evaluation protocol, including downstream segmentation and classification tasks, to rigorously assess the authenticity and utility of the generated signals. Extensive experiments are conducted on both single-lead (QTDB) and multi-lead (LUDB) ECG datasets.

Results:

TransDiffECG significantly outperforms state-of-the-art baselines. On the multi-lead LUDB dataset, it achieved superior signal quality (MMD: 3.21×102; Pearson Correlation: 0.6177). The utility of the synthetic data was confirmed in downstream tasks, where data augmentation improved atrial fibrillation classification to an AUROC of 0.9451. Moreover, a segmentation model trained solely on our synthetic data rivaled one trained on real data (e.g., 98% precision/recall on QTDB).

Conclusion:

TransDiffECG represents a significant advancement in synthetic medical signal generation by bridging the gap between clinical interpretability and generative flexibility. Its ability to generate semantically controllable and clinically valid ECGs greatly expands the application potential of generative models in healthcare research and practice.
目的:现有的心电图合成生成模型往往缺乏细粒度、可解释的控制,限制了它们在解决数据稀缺性和不平衡性方面的应用。本研究旨在开发一种能够产生多样化和语义可控的合成心电图的模型来填补这一关键空白。方法:我们提出了一种新的基于变压器的扩散模型TransDiffECG,该模型集成了语义信息注入和全局时间建模,可以对ECG合成进行细粒度控制。该模型允许用户可控地生成具有定制生理细节的心电信号。我们建立了一个全面的评估协议,包括下游分割和分类任务,以严格评估生成信号的真实性和实用性。在单导联(QTDB)和多导联(LUDB)心电数据集上进行了广泛的实验。结果:TransDiffECG显著优于最先进的基线。在多导联LUDB数据集上,它获得了优越的信号质量(MMD: 3.21×10-2; Pearson Correlation: 0.6177)。合成数据的效用在下游任务中得到证实,其中数据增强将房颤分类提高到AUROC为0.9451。此外,仅在我们的合成数据上训练的分割模型可以与在真实数据上训练的模型相媲美(例如,在QTDB上的精确度/召回率为98%)。结论:TransDiffECG通过弥合临床可解释性和生成灵活性之间的差距,代表了合成医学信号生成的重大进步。生成语义可控且临床有效的心电图的能力极大地拓展了生成模型在医疗保健研究和实践中的应用潜力。
{"title":"TransDiffECG: Semantically controllable ECG synthesis via transformer-based diffusion modeling","authors":"Yuxin Lin ,&nbsp;Jing Ma ,&nbsp;Suyu Dong ,&nbsp;Chaoyu Sun ,&nbsp;Wanting Cong ,&nbsp;Kuanquan Wang ,&nbsp;Gongning Luo ,&nbsp;Wei Wang","doi":"10.1016/j.jbi.2025.104948","DOIUrl":"10.1016/j.jbi.2025.104948","url":null,"abstract":"<div><h3>Objective:</h3><div>Existing generative models for electrocardiogram (ECG) synthesis often lack fine-grained, interpretable control, limiting their utility for addressing data scarcity and imbalance. This study aims to develop a model capable of producing diverse and semantically controllable synthetic ECGs to fill this critical gap.</div></div><div><h3>Methods:</h3><div>We propose TransDiffECG, a novel Transformer-based diffusion model that integrates semantic information injection and global temporal modeling to enable fine-grained control over ECG synthesis. The model allows user-controllable generation of ECG signals with customized physiological details. We establish a comprehensive evaluation protocol, including downstream segmentation and classification tasks, to rigorously assess the authenticity and utility of the generated signals. Extensive experiments are conducted on both single-lead (QTDB) and multi-lead (LUDB) ECG datasets.</div></div><div><h3>Results:</h3><div>TransDiffECG significantly outperforms state-of-the-art baselines. On the multi-lead LUDB dataset, it achieved superior signal quality (MMD: <span><math><mrow><mn>3</mn><mo>.</mo><mn>21</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>2</mn></mrow></msup></mrow></math></span>; Pearson Correlation: 0.6177). The utility of the synthetic data was confirmed in downstream tasks, where data augmentation improved atrial fibrillation classification to an AUROC of 0.9451. Moreover, a segmentation model trained solely on our synthetic data rivaled one trained on real data (e.g., <span><math><mrow><mo>∼</mo><mn>98</mn><mtext>%</mtext></mrow></math></span> precision/recall on QTDB).</div></div><div><h3>Conclusion:</h3><div>TransDiffECG represents a significant advancement in synthetic medical signal generation by bridging the gap between clinical interpretability and generative flexibility. Its ability to generate semantically controllable and clinically valid ECGs greatly expands the application potential of generative models in healthcare research and practice.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104948"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145400897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLM-DQR: Large language model-based automated generation of data quality rules for electronic health records LLM-DQR:基于大型语言模型的电子健康记录数据质量规则自动生成。
IF 4.5 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-01 Epub Date: 2025-11-06 DOI: 10.1016/j.jbi.2025.104951
Shuyang Xie , Hailing Cai , Yaoqin Sun, Xudong Lv

Objective

To develop and evaluate LLM-DQR, an automated approach using large language models to generate electronic health record data quality rules, addressing the limitations of current manual and automated methods that suffer from low efficiency, limited flexibility, and inadequate coverage of complex business logic.

Materials and Methods

We designed a comprehensive pipeline with three core components: (1) standardized input processing integrating database schemas, natural language requirements, and sample data; (2) Chain-of-Thought prompt engineering for guided rule generation; and (3) closed-loop validation with deduplication, sandbox execution, and iterative debugging. The approach was evaluated on two distinct, publicly available datasets: the Paediatric Intensive Care (PIC) dataset and the Medical Information Mart for Intensive Care (MIMIC-IV) dataset. Performance was compared against manual expert construction (expert-DQR) and clinical information model-based generation (CIM-DQR).

Results

LLM-DQR demonstrated higher performance across all evaluation metrics. The GPT implementation achieved overall coverage rates of 97.1% on the PIC dataset and 99.6% on the MIMIC-IV dataset, outperforming CIM-DQR. Performance was particularly strong for complex dimensions: achieving 100% coverage for Consistency rules on both datasets, whereas CIM-DQR achieved 0%. Construction time was reduced by over 10-fold compared to manual methods. Additionally, on the PIC dataset, LLM-DQR generated 89 extra, expert-validated rules.

Discussion

The stronger performance demonstrates LLMs’ capability to understand complex EHR data patterns and assessment requirements, functioning as data quality analysis assistants with domain knowledge and logical reasoning capabilities.

Conclusion

LLM-DQR provides an efficient, scalable solution for automated data quality rule generation in clinical settings, offering considerable improvements over traditional approaches.
目的:开发和评估LLM-DQR,一种使用大型语言模型生成电子健康记录数据质量规则的自动化方法,解决当前手动和自动化方法效率低、灵活性有限以及对复杂业务逻辑覆盖不足的局限性。材料和方法:我们设计了一个完整的管道,包括三个核心组件:(1)集成数据库模式、自然语言需求和样本数据的标准化输入处理;(2)引导规则生成的思维链提示工程;(3)采用重复数据删除、沙盒执行和迭代调试的闭环验证。该方法在两个不同的公开数据集上进行了评估:儿科重症监护(PIC)数据集和重症监护医疗信息集市(MIMIC-IV)数据集。与手工专家构建(expert- dqr)和基于临床信息模型生成(CIM-DQR)的性能进行比较。结果:LLM-DQR在所有评估指标中表现出更高的性能。GPT实现在PIC数据集上的总体覆盖率为97.1%,在MIMIC-IV数据集上的总体覆盖率为99.6%,优于CIM-DQR。对于复杂维度,性能特别强:在两个数据集上实现了100%的一致性规则覆盖率,而CIM-DQR实现了0%。与手工方法相比,施工时间缩短了10倍以上。此外,在PIC数据集上,LLM-DQR生成了89条额外的、经过专家验证的规则。讨论:较强的性能表明llm有能力理解复杂的EHR数据模式和评估需求,具有领域知识和逻辑推理能力的数据质量分析助手。结论:LLM-DQR为临床环境中的自动数据质量规则生成提供了高效、可扩展的解决方案,比传统方法有了很大的改进。
{"title":"LLM-DQR: Large language model-based automated generation of data quality rules for electronic health records","authors":"Shuyang Xie ,&nbsp;Hailing Cai ,&nbsp;Yaoqin Sun,&nbsp;Xudong Lv","doi":"10.1016/j.jbi.2025.104951","DOIUrl":"10.1016/j.jbi.2025.104951","url":null,"abstract":"<div><h3>Objective</h3><div>To develop and evaluate LLM-DQR, an automated approach using large language models to generate electronic health record data quality rules, addressing the limitations of current manual and automated methods that suffer from low efficiency, limited flexibility, and inadequate coverage of complex business logic.</div></div><div><h3>Materials and Methods</h3><div>We designed a comprehensive pipeline with three core components: (1) standardized input processing integrating database schemas, natural language requirements, and sample data; (2) Chain-of-Thought prompt engineering for guided rule generation; and (3) closed-loop validation with deduplication, sandbox execution, and iterative debugging. The approach was evaluated on two distinct, publicly available datasets: the Paediatric Intensive Care (PIC) dataset and the Medical Information Mart for Intensive Care (MIMIC-IV) dataset. Performance was compared against manual expert construction (expert-DQR) and clinical information model-based generation (CIM-DQR).</div></div><div><h3>Results</h3><div>LLM-DQR demonstrated higher performance across all evaluation metrics. The GPT implementation achieved overall coverage rates of 97.1% on the PIC dataset and 99.6% on the MIMIC-IV dataset, outperforming CIM-DQR. Performance was particularly strong for complex dimensions: achieving 100% coverage for Consistency rules on both datasets, whereas CIM-DQR achieved 0%. Construction time was reduced by over 10-fold compared to manual methods. Additionally, on the PIC dataset, LLM-DQR generated 89 extra, expert-validated rules.</div></div><div><h3>Discussion</h3><div>The stronger performance demonstrates LLMs’ capability to understand complex EHR data patterns and assessment requirements, functioning as data quality analysis assistants with domain knowledge and logical reasoning capabilities.</div></div><div><h3>Conclusion</h3><div>LLM-DQR provides an efficient, scalable solution for automated data quality rule generation in clinical settings, offering considerable improvements over traditional approaches.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"172 ","pages":"Article 104951"},"PeriodicalIF":4.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145476965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Biomedical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1