首页 > 最新文献

Annual Review of Biomedical Data Science最新文献

英文 中文
Statistical Methods for Understanding Trajectories in Genetic Epidemiology. 理解遗传流行病学轨迹的统计方法。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-03-25 DOI: 10.1146/annurev-biodatasci-092724-035434
Geng Wang, Lavinia Paternoster, Nicole M Warrington

Genetic influences on how human traits change over time remain underexplored and may play an important role in disease processes. In this review, we explore emerging statistical approaches for incorporating longitudinal data on trait trajectories into genetic epidemiology studies, including longitudinal genome-wide association studies, polygenic scores, and Mendelian randomization. We discuss the caution required when analyzing longitudinal data focused on disease progression, where analyses are conducted within a group of patients rather than the general population. Finally, we outline the large longitudinal data resources that are available and discuss future directions in trajectory-based genetic epidemiological studies. Embracing time as a critical dimension of human traits offers deeper insight into disease pathways and intervention opportunities.

基因对人类特征随时间变化的影响仍未得到充分研究,可能在疾病过程中发挥重要作用。在这篇综述中,我们探讨了将性状轨迹的纵向数据纳入遗传流行病学研究的新兴统计方法,包括纵向全基因组关联研究、多基因评分和孟德尔随机化。我们讨论了在分析集中于疾病进展的纵向数据时需要注意的问题,其中分析是在一组患者而不是一般人群中进行的。最后,我们概述了现有的大型纵向数据资源,并讨论了基于轨迹的遗传流行病学研究的未来方向。将时间作为人类特征的一个关键维度,可以更深入地了解疾病途径和干预机会。
{"title":"Statistical Methods for Understanding Trajectories in Genetic Epidemiology.","authors":"Geng Wang, Lavinia Paternoster, Nicole M Warrington","doi":"10.1146/annurev-biodatasci-092724-035434","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-092724-035434","url":null,"abstract":"<p><p>Genetic influences on how human traits change over time remain underexplored and may play an important role in disease processes. In this review, we explore emerging statistical approaches for incorporating longitudinal data on trait trajectories into genetic epidemiology studies, including longitudinal genome-wide association studies, polygenic scores, and Mendelian randomization. We discuss the caution required when analyzing longitudinal data focused on disease progression, where analyses are conducted within a group of patients rather than the general population. Finally, we outline the large longitudinal data resources that are available and discuss future directions in trajectory-based genetic epidemiological studies. Embracing time as a critical dimension of human traits offers deeper insight into disease pathways and intervention opportunities.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147515301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy and Security Throughout the Health Data Life Cycle: From Primary Care to Research Networks. 整个健康数据生命周期的隐私和安全:从初级保健到研究网络。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-03-24 DOI: 10.1146/annurev-biodatasci-092724-031932
Bradley A Malin, Chao Yan, Luca Bonomi

Health data are increasingly generated, shared, and analyzed across an ever-growing collection of settings. While these developments enable new forms of biomedical discovery and clinical decision support, they also introduce evolving privacy, security, and trust challenges that extend beyond traditional regulatory and technical frameworks. In this review, we characterize the various risks and protections throughout the health data life cycle, from data generation and primary use in healthcare to secondary use in research and artificial intelligence (AI) model development. We discuss how regulation, organizational practices, and technological choices shape data protection requirements, and we discuss and contextualize emerging threats, such as incidental disclosures through AI tools. We further review technical approaches for mitigating these risks, including access control and auditing, reidentification risk assessment and statistical mechanisms for risk mitigation (e.g., differential privacy), and synthetic data generation. We also consider how collaboration across disparate organizations may be achieved through federated learning mechanisms and cryptographic technologies, such as secure multiparty computation. Throughout, we highlight trade-offs between privacy protection and data utility, and we articulate practical challenges in deploying these methods at scale. We conclude by identifying open issues for the field, including the need for standardized metrics and greater transparency to support trust in data-driven healthcare and research.

越来越多的健康数据在不断增长的设置集合中生成、共享和分析。虽然这些发展促成了新形式的生物医学发现和临床决策支持,但它们也带来了超越传统监管和技术框架的不断发展的隐私、安全和信任挑战。在这篇综述中,我们描述了整个健康数据生命周期中的各种风险和保护措施,从数据生成和医疗保健中的主要使用到研究和人工智能(AI)模型开发中的次要使用。我们讨论了监管、组织实践和技术选择如何影响数据保护要求,并讨论了新出现的威胁,例如通过人工智能工具偶然披露的威胁。我们进一步审查了减轻这些风险的技术方法,包括访问控制和审计、重新识别风险评估和减轻风险的统计机制(例如,差异隐私)以及合成数据生成。我们还将考虑如何通过联邦学习机制和加密技术(如安全多方计算)实现跨不同组织的协作。在整个过程中,我们强调了隐私保护和数据效用之间的权衡,并阐明了大规模部署这些方法的实际挑战。最后,我们确定了该领域的开放问题,包括需要标准化指标和更高的透明度,以支持对数据驱动的医疗保健和研究的信任。
{"title":"Privacy and Security Throughout the Health Data Life Cycle: From Primary Care to Research Networks.","authors":"Bradley A Malin, Chao Yan, Luca Bonomi","doi":"10.1146/annurev-biodatasci-092724-031932","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-092724-031932","url":null,"abstract":"<p><p>Health data are increasingly generated, shared, and analyzed across an ever-growing collection of settings. While these developments enable new forms of biomedical discovery and clinical decision support, they also introduce evolving privacy, security, and trust challenges that extend beyond traditional regulatory and technical frameworks. In this review, we characterize the various risks and protections throughout the health data life cycle, from data generation and primary use in healthcare to secondary use in research and artificial intelligence (AI) model development. We discuss how regulation, organizational practices, and technological choices shape data protection requirements, and we discuss and contextualize emerging threats, such as incidental disclosures through AI tools. We further review technical approaches for mitigating these risks, including access control and auditing, reidentification risk assessment and statistical mechanisms for risk mitigation (e.g., differential privacy), and synthetic data generation. We also consider how collaboration across disparate organizations may be achieved through federated learning mechanisms and cryptographic technologies, such as secure multiparty computation. Throughout, we highlight trade-offs between privacy protection and data utility, and we articulate practical challenges in deploying these methods at scale. We conclude by identifying open issues for the field, including the need for standardized metrics and greater transparency to support trust in data-driven healthcare and research.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2026-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147515337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Electronic Health Record-Linked Biobank Expansion Reveals Global Health Inequities. 电子健康记录相关生物库的扩张揭示了全球健康不平等。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-03-09 DOI: 10.1146/annurev-biodatasci-092724-030452
Manuel Corpas, Oyesola Ojewunmi, Heinner Guio, Segun Fatumo

Electronic health record (EHR)-linked biobanks are transforming biomedical research, enabling population-scale studies that integrate genomic, clinical, and phenotypic data. Yet as these resources proliferate, it remains unclear how their research outputs reflect global health priorities. This article presents a comprehensive review of five globally established EHR-linked biobanks: UK Biobank, the Million Veteran Program, FinnGen, the All of Us Research Program, and the Estonian Biobank. Drawing on 14,142 peer-reviewed publications from 2000 to 2024, we show how each biobank displays a distinct thematic profile, shaped by institutional mandates, population focus, and methodological design. We further evaluate alignment with global disease burden by mapping biobank-linked publications to 25 high-priority disease areas using World Health Organization disability-adjusted life years data. Our burden-adjusted gap scores and opportunity indices reveal striking underrepresentation of conditions such as malaria, tuberculosis, and diarrheal diseases when comparing biobank research output against high-priority diseases.

与电子健康记录(EHR)相关的生物银行正在改变生物医学研究,使整合基因组、临床和表型数据的人口规模研究成为可能。然而,随着这些资源的激增,尚不清楚它们的研究成果如何反映全球卫生优先事项。本文介绍了五个全球建立的ehr相关生物银行的全面审查:英国生物银行、百万退伍军人计划、FinnGen、我们所有人研究计划和爱沙尼亚生物银行。根据2000年至2024年14142份同行评议的出版物,我们展示了每个生物库如何显示出不同的主题概况,由机构授权、人口重点和方法设计形成。通过使用世界卫生组织残疾调整生命年数据,将生物库相关出版物映射到25个高优先疾病领域,我们进一步评估了与全球疾病负担的一致性。我们的负担调整差距分数和机会指数显示,在比较生物库研究产出与高优先级疾病时,疟疾、结核病和腹泻等疾病的代表性明显不足。
{"title":"Electronic Health Record-Linked Biobank Expansion Reveals Global Health Inequities.","authors":"Manuel Corpas, Oyesola Ojewunmi, Heinner Guio, Segun Fatumo","doi":"10.1146/annurev-biodatasci-092724-030452","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-092724-030452","url":null,"abstract":"<p><p>Electronic health record (EHR)-linked biobanks are transforming biomedical research, enabling population-scale studies that integrate genomic, clinical, and phenotypic data. Yet as these resources proliferate, it remains unclear how their research outputs reflect global health priorities. This article presents a comprehensive review of five globally established EHR-linked biobanks: UK Biobank, the Million Veteran Program, FinnGen, the All of Us Research Program, and the Estonian Biobank. Drawing on 14,142 peer-reviewed publications from 2000 to 2024, we show how each biobank displays a distinct thematic profile, shaped by institutional mandates, population focus, and methodological design. We further evaluate alignment with global disease burden by mapping biobank-linked publications to 25 high-priority disease areas using World Health Organization disability-adjusted life years data. Our burden-adjusted gap scores and opportunity indices reveal striking underrepresentation of conditions such as malaria, tuberculosis, and diarrheal diseases when comparing biobank research output against high-priority diseases.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Precision Oncology: Multimodal and Multiscale Methods to Promote Mechanistic Understanding. 精确肿瘤学:多模式和多尺度方法促进机制理解。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-03-06 DOI: 10.1146/annurev-biodatasci-092724-043507
Kivilcim Ozturk, Adam Klie, Hannah Carter

Precision medicine aims to tailor treatment to the individual, improving medical outcomes and quality of life. Realizing this vision requires understanding how disease mechanisms and drug responses vary across patients. Advances in molecular profiling have enabled detailed measurement of genetic, epigenetic, spatial, and imaging features at multiple biological scales, from single cells to tissues. These rich and complementary data promise insight into the drivers of human disease where individual data layers have often provided an incomplete picture. Increasingly, studies span multiple measurement modalities and scales, presenting both opportunities and challenges. Central among these is how to combine data types to uncover actionable biology. This review surveys computational strategies for analyzing multimodal and multiscale datasets, distinguishing between approaches that treat each modality independently and those that perform true integrative modeling. We highlight emerging methods, with a focus on oncology, where these tools are helping to reveal mechanisms and guide therapeutic decisions.

精准医疗旨在为个体量身定制治疗方案,改善医疗效果和生活质量。实现这一愿景需要了解不同患者的疾病机制和药物反应如何不同。分子图谱技术的进步使得从单细胞到组织的多个生物尺度上的遗传、表观遗传、空间和成像特征的详细测量成为可能。这些丰富和互补的数据有望深入了解人类疾病的驱动因素,而单个数据层往往提供不完整的情况。越来越多的研究跨越了多种测量方式和尺度,带来了机遇和挑战。其中的核心是如何结合数据类型来揭示可操作的生物学。本文综述了分析多模态和多尺度数据集的计算策略,区分了独立处理每个模态的方法和执行真正的综合建模的方法。我们强调新兴方法,重点是肿瘤学,这些工具有助于揭示机制并指导治疗决策。
{"title":"Precision Oncology: Multimodal and Multiscale Methods to Promote Mechanistic Understanding.","authors":"Kivilcim Ozturk, Adam Klie, Hannah Carter","doi":"10.1146/annurev-biodatasci-092724-043507","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-092724-043507","url":null,"abstract":"<p><p>Precision medicine aims to tailor treatment to the individual, improving medical outcomes and quality of life. Realizing this vision requires understanding how disease mechanisms and drug responses vary across patients. Advances in molecular profiling have enabled detailed measurement of genetic, epigenetic, spatial, and imaging features at multiple biological scales, from single cells to tissues. These rich and complementary data promise insight into the drivers of human disease where individual data layers have often provided an incomplete picture. Increasingly, studies span multiple measurement modalities and scales, presenting both opportunities and challenges. Central among these is how to combine data types to uncover actionable biology. This review surveys computational strategies for analyzing multimodal and multiscale datasets, distinguishing between approaches that treat each modality independently and those that perform true integrative modeling. We highlight emerging methods, with a focus on oncology, where these tools are helping to reveal mechanisms and guide therapeutic decisions.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147370407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Strategies for Creating Robust Patient Groups to Study Diverse Conditions with Electronic Health Records. 创建健全的患者群体以研究电子健康记录的不同条件的策略。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-04-08 DOI: 10.1146/annurev-biodatasci-020722-114525
Grace D Ramey, Hannah Takasuka, John A Capra

The growth of electronic health record (EHR) databases in size and availability has created an unprecedented opportunity to better understand human health and disease. However, conducting robust EHR studies requires careful filtering criteria and study design, as EHRs pose several challenges that can confound analyses and lead to inaccurate results. Here we review these challenges and make suggestions about how to avoid or adjust for major confounders and biases in common EHR study designs. We further highlight qualities of EHR data that make different diseases more or less feasible for study. These recommendations for conducting research using EHRs will help inform database selection, improve reproducibility of results across the field, and enhance the validity of study results.

电子健康记录(EHR)数据库在规模和可用性方面的增长为更好地了解人类健康和疾病创造了前所未有的机会。然而,进行稳健的电子病历研究需要仔细的过滤标准和研究设计,因为电子病历带来了一些挑战,可能会混淆分析并导致不准确的结果。在这里,我们回顾了这些挑战,并就如何避免或调整常见电子病历研究设计中的主要混杂因素和偏差提出建议。我们进一步强调电子病历数据的质量,使不同的疾病或多或少具有研究的可行性。这些使用电子病历进行研究的建议将有助于为数据库选择提供信息,提高整个领域结果的可重复性,并增强研究结果的有效性。
{"title":"Strategies for Creating Robust Patient Groups to Study Diverse Conditions with Electronic Health Records.","authors":"Grace D Ramey, Hannah Takasuka, John A Capra","doi":"10.1146/annurev-biodatasci-020722-114525","DOIUrl":"10.1146/annurev-biodatasci-020722-114525","url":null,"abstract":"<p><p>The growth of electronic health record (EHR) databases in size and availability has created an unprecedented opportunity to better understand human health and disease. However, conducting robust EHR studies requires careful filtering criteria and study design, as EHRs pose several challenges that can confound analyses and lead to inaccurate results. Here we review these challenges and make suggestions about how to avoid or adjust for major confounders and biases in common EHR study designs. We further highlight qualities of EHR data that make different diseases more or less feasible for study. These recommendations for conducting research using EHRs will help inform database selection, improve reproducibility of results across the field, and enhance the validity of study results.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"317-340"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143812613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping, Modeling, and Reprogramming Cell-Fate Decision-Making Systems. 绘图、建模和重编程细胞命运决策系统。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-05-01 DOI: 10.1146/annurev-biodatasci-101424-121439
Lucy Ham, Taylor E Woodward, Megan A Coomer, Michael P H Stumpf

Many cellular processes involve information processing and decision-making. We can probe these processes at increasing molecular detail. The analysis of heterogeneous data remains a challenge that requires new ways of thinking about cells in quantitative, predictive, and mechanistic ways. We discuss the role of mathematical models in the context of cell-fate decision-making systems across the tree of life. Complex multicellular organisms have been a particular focus, but single-celled organisms also have to sense and respond to their environment. We center our discussion around the idea of design principles that we can learn from observations and modeling and exploit in order to (re)-design or guide cellular behavior.

许多细胞过程涉及信息处理和决策。我们可以通过增加分子细节来探索这些过程。异构数据的分析仍然是一个挑战,需要以定量、预测和机制的方式思考细胞的新方法。我们讨论了数学模型在整个生命之树的细胞命运决策系统中的作用。复杂的多细胞生物一直是一个特别的焦点,但单细胞生物也必须感知和响应他们的环境。我们的讨论集中在设计原则的概念上,我们可以从观察和建模中学习,并利用这些原则来(重新)设计或指导细胞行为。
{"title":"Mapping, Modeling, and Reprogramming Cell-Fate Decision-Making Systems.","authors":"Lucy Ham, Taylor E Woodward, Megan A Coomer, Michael P H Stumpf","doi":"10.1146/annurev-biodatasci-101424-121439","DOIUrl":"10.1146/annurev-biodatasci-101424-121439","url":null,"abstract":"<p><p>Many cellular processes involve information processing and decision-making. We can probe these processes at increasing molecular detail. The analysis of heterogeneous data remains a challenge that requires new ways of thinking about cells in quantitative, predictive, and mechanistic ways. We discuss the role of mathematical models in the context of cell-fate decision-making systems across the tree of life. Complex multicellular organisms have been a particular focus, but single-celled organisms also have to sense and respond to their environment. We center our discussion around the idea of design principles that we can learn from observations and modeling and exploit in order to (re)-design or guide cellular behavior.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"537-562"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143984534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The TITAN-X Platform Integrates Big Data, Artificial Intelligence, Bioinformatics, and Advanced Computational Modeling to Understand Immune Responses and Develop the Next Wave of Precision Medicines. TITAN-X平台集成了大数据、人工智能、生物信息学和先进的计算建模,以了解免疫反应并开发下一波精准医疗。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-04-16 DOI: 10.1146/annurev-biodatasci-103123-094804
Ryan Baker, Josep Bassaganya-Riera, Nuria Tubau-Juni, Andrew J Leber, Raquel Hontecillas

The TITAN-X Precision Medicine Platform was engineered to rapidly, fully, and efficiently utilize large-scale immunology datasets, including public data, in drug discovery and development. TITAN-X integrates big data with artificial intelligence (AI), bioinformatics, and advanced computational modeling to seamlessly transition from early target discovery to clinical testing of new therapeutics, developing biomarker-driven precision medicines tailored to specific patient populations. We illustrate the capabilities of TITAN-X through four case studies, demonstrating its use in computationally driven target discovery; characterization of novel immunometabolic mechanisms in infectious, inflammatory, and autoimmune diseases; and identification of biomarker signatures for patient stratification in clinical trials designed to maximize therapeutic efficacy and safety. Data-driven and AI-powered approaches like TITAN-X are enhancing the pace of drug development, reducing costs, tailoring treatments, and increasing the probability of success in clinical trials.

TITAN-X精密医学平台旨在快速、全面、高效地利用大规模免疫学数据集,包括公共数据,用于药物发现和开发。TITAN-X将大数据与人工智能(AI)、生物信息学和先进的计算建模相结合,从早期靶点发现无缝过渡到新疗法的临床测试,开发针对特定患者群体的生物标志物驱动的精准药物。我们通过四个案例研究说明TITAN-X的能力,展示其在计算驱动的目标发现中的使用;传染病、炎症和自身免疫性疾病中新的免疫代谢机制的表征在临床试验中识别患者分层的生物标志物特征,以最大限度地提高治疗效果和安全性。像TITAN-X这样的数据驱动和人工智能驱动的方法正在加快药物开发的步伐,降低成本,定制治疗方法,并增加临床试验成功的可能性。
{"title":"The TITAN-X Platform Integrates Big Data, Artificial Intelligence, Bioinformatics, and Advanced Computational Modeling to Understand Immune Responses and Develop the Next Wave of Precision Medicines.","authors":"Ryan Baker, Josep Bassaganya-Riera, Nuria Tubau-Juni, Andrew J Leber, Raquel Hontecillas","doi":"10.1146/annurev-biodatasci-103123-094804","DOIUrl":"10.1146/annurev-biodatasci-103123-094804","url":null,"abstract":"<p><p>The TITAN-X Precision Medicine Platform was engineered to rapidly, fully, and efficiently utilize large-scale immunology datasets, including public data, in drug discovery and development. TITAN-X integrates big data with artificial intelligence (AI), bioinformatics, and advanced computational modeling to seamlessly transition from early target discovery to clinical testing of new therapeutics, developing biomarker-driven precision medicines tailored to specific patient populations. We illustrate the capabilities of TITAN-X through four case studies, demonstrating its use in computationally driven target discovery; characterization of novel immunometabolic mechanisms in infectious, inflammatory, and autoimmune diseases; and identification of biomarker signatures for patient stratification in clinical trials designed to maximize therapeutic efficacy and safety. Data-driven and AI-powered approaches like TITAN-X are enhancing the pace of drug development, reducing costs, tailoring treatments, and increasing the probability of success in clinical trials.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"447-469"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144039901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meta-Analysis and Federated Learning over Decentralized Distributed Research Networks. 分散式分布式研究网络的元分析和联邦学习。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 DOI: 10.1146/annurev-biodatasci-103123-094441
Yiwen Lu, Bingyu Zhang, Jiayi Tong, Yong Chen

Distributed research networks have transformed modern clinical research by enabling large-scale, multi-institutional collaborations while maintaining patient privacy. Two prominent methodologies within these frameworks-meta-analysis and federated learning-address the challenges of synthesizing evidence from decentralized data. Meta-analysis aggregates study-level results to provide robust, interpretable estimates, making it a cornerstone of evidence synthesis for association studies. Federated learning complements this by enabling complex downstream tasks, such as predictive modeling and counterfactual inference, while preserving data privacy through privacy-preserving distributed algorithms. Federated learning facilitates communication-efficient computation and adapts seamlessly to heterogeneous datasets across diverse institutions. This review emphasizes the complementary strengths of federated learning's scalability, flexibility, and readiness for implementation alongside meta-analysis's robust frameworks for evidence synthesis and aggregation in clinical research. Integrations of synthetic data, artificial intelligence (AI)-enhanced harmonization, and hybrid human-AI frameworks are proposed as future directions, promising to further advance both methodologies and enhance their combined impact on privacy-conscious, data-driven healthcare research.

分布式研究网络通过实现大规模、多机构合作,同时维护患者隐私,改变了现代临床研究。这些框架中的两个突出的方法——元分析和联合学习——解决了从分散的数据中合成证据的挑战。荟萃分析汇集了研究水平的结果,以提供可靠的、可解释的估计,使其成为关联研究证据综合的基石。联邦学习通过支持复杂的下游任务(如预测建模和反事实推理)来补充这一点,同时通过保护隐私的分布式算法来保护数据隐私。联邦学习促进了高效通信的计算,并无缝地适应不同机构的异构数据集。这篇综述强调了联邦学习的可扩展性、灵活性和实施准备的互补优势,以及meta分析在临床研究中用于证据合成和聚合的强大框架。合成数据、人工智能(AI)增强的协调和混合人类-人工智能框架的集成被认为是未来的方向,有望进一步推进这两种方法,并增强它们对隐私意识、数据驱动的医疗保健研究的综合影响。
{"title":"Meta-Analysis and Federated Learning over Decentralized Distributed Research Networks.","authors":"Yiwen Lu, Bingyu Zhang, Jiayi Tong, Yong Chen","doi":"10.1146/annurev-biodatasci-103123-094441","DOIUrl":"10.1146/annurev-biodatasci-103123-094441","url":null,"abstract":"<p><p>Distributed research networks have transformed modern clinical research by enabling large-scale, multi-institutional collaborations while maintaining patient privacy. Two prominent methodologies within these frameworks-meta-analysis and federated learning-address the challenges of synthesizing evidence from decentralized data. Meta-analysis aggregates study-level results to provide robust, interpretable estimates, making it a cornerstone of evidence synthesis for association studies. Federated learning complements this by enabling complex downstream tasks, such as predictive modeling and counterfactual inference, while preserving data privacy through privacy-preserving distributed algorithms. Federated learning facilitates communication-efficient computation and adapts seamlessly to heterogeneous datasets across diverse institutions. This review emphasizes the complementary strengths of federated learning's scalability, flexibility, and readiness for implementation alongside meta-analysis's robust frameworks for evidence synthesis and aggregation in clinical research. Integrations of synthetic data, artificial intelligence (AI)-enhanced harmonization, and hybrid human-AI frameworks are proposed as future directions, promising to further advance both methodologies and enhance their combined impact on privacy-conscious, data-driven healthcare research.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"8 1","pages":"405-421"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144822752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Foundation Models for Translational Cancer Biology. 转化癌症生物学基础模型。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-01-29 DOI: 10.1146/annurev-biodatasci-103123-095633
Kevin K Tsang, Sophia Kivelson, Jose M Acitores Cortina, Aditi Kuchi, Jacob S Berkowitz, Hongyu Liu, Apoorva Srinivasan, Nadine A Friedrich, Yasaman Fatapour, Nicholas P Tatonetti

Cancer remains a leading cause of death globally. The complexity and diversity of cancer-related datasets across different specialties pose challenges in refining precision medicine for oncology. Foundation models offer a promising solution. Trained on vast amounts of data, these models develop a broad understanding across a wide range of tasks. We examine the role of foundation models in domains relevant to cancer research, including natural language processing, computer vision, molecular biology, and cheminformatics. Through a review of state-of-the-art methods, we explore how these models have already advanced translational cancer research goals such as precision tumor classification and artificial intelligence-assisted surgery. We also discuss prospective advances in areas like early tumor detection, personalized cancer treatment, and drug discovery. This review provides researchers with a curated set of resources and methodologies, offers practitioners a deeper understanding of how these models enhance cancer care, and points to opportunities for future applications of foundation models in cancer research.

癌症仍然是全球死亡的主要原因。不同专科癌症相关数据集的复杂性和多样性为完善肿瘤精准医疗带来了挑战。基础模型提供了一个很有前景的解决方案。通过对大量数据的训练,这些模型能对各种任务形成广泛的理解。我们研究了基础模型在癌症研究相关领域的作用,包括自然语言处理、计算机视觉、分子生物学和化学信息学。通过回顾最先进的方法,我们探讨了这些模型是如何推进肿瘤精准分类和人工智能辅助手术等转化癌症研究目标的。我们还讨论了早期肿瘤检测、个性化癌症治疗和药物发现等领域的前瞻性进展。这篇综述为研究人员提供了一套精心策划的资源和方法,让从业人员更深入地了解这些模型如何加强癌症护理,并指出了未来在癌症研究中应用基础模型的机会。
{"title":"Foundation Models for Translational Cancer Biology.","authors":"Kevin K Tsang, Sophia Kivelson, Jose M Acitores Cortina, Aditi Kuchi, Jacob S Berkowitz, Hongyu Liu, Apoorva Srinivasan, Nadine A Friedrich, Yasaman Fatapour, Nicholas P Tatonetti","doi":"10.1146/annurev-biodatasci-103123-095633","DOIUrl":"10.1146/annurev-biodatasci-103123-095633","url":null,"abstract":"<p><p>Cancer remains a leading cause of death globally. The complexity and diversity of cancer-related datasets across different specialties pose challenges in refining precision medicine for oncology. Foundation models offer a promising solution. Trained on vast amounts of data, these models develop a broad understanding across a wide range of tasks. We examine the role of foundation models in domains relevant to cancer research, including natural language processing, computer vision, molecular biology, and cheminformatics. Through a review of state-of-the-art methods, we explore how these models have already advanced translational cancer research goals such as precision tumor classification and artificial intelligence-assisted surgery. We also discuss prospective advances in areas like early tumor detection, personalized cancer treatment, and drug discovery. This review provides researchers with a curated set of resources and methodologies, offers practitioners a deeper understanding of how these models enhance cancer care, and points to opportunities for future applications of foundation models in cancer research.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"51-80"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical Text Generation: Are We There Yet? 临床文本生成:我们做到了吗?
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-03-18 DOI: 10.1146/annurev-biodatasci-103123-095202
Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol

Generative artificial intelligence (AI), operationalized as large language models, is increasingly used in the biomedical field to assist with a range of text processing tasks including text classification, information extraction, and decision support. In this article, we focus on the primary purpose of generative language models, namely the production of unstructured text. We review past and current methods used to generate text as well as methods for evaluating open text generation, i.e., in contexts where no reference text is available for comparison. We discuss clinical applications that can benefit from high quality, ethically designed text generation, such as clinical note generation and synthetic text generation in support of secondary use of health data. We also raise awareness of the risks involved with generative AI such as overconfidence in outputs due to anthropomorphism and the risk of representational and allocation harms due to biases.

生成式人工智能(AI)作为大型语言模型,越来越多地用于生物医学领域,以协助完成一系列文本处理任务,包括文本分类、信息提取和决策支持。在本文中,我们将重点讨论生成语言模型的主要目的,即生成非结构化文本。我们回顾了过去和当前用于生成文本的方法,以及评估开放文本生成的方法,即在没有参考文本可用于比较的上下文中。我们讨论了可以从高质量、合乎伦理设计的文本生成中受益的临床应用,例如临床记录生成和合成文本生成,以支持健康数据的二次使用。我们还提高了对生成式人工智能所涉及的风险的认识,例如由于拟人化而对产出的过度自信,以及由于偏见而导致的代表性和分配损害的风险。
{"title":"Clinical Text Generation: Are We There Yet?","authors":"Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol","doi":"10.1146/annurev-biodatasci-103123-095202","DOIUrl":"10.1146/annurev-biodatasci-103123-095202","url":null,"abstract":"<p><p>Generative artificial intelligence (AI), operationalized as large language models, is increasingly used in the biomedical field to assist with a range of text processing tasks including text classification, information extraction, and decision support. In this article, we focus on the primary purpose of generative language models, namely the production of unstructured text. We review past and current methods used to generate text as well as methods for evaluating open text generation, i.e., in contexts where no reference text is available for comparison. We discuss clinical applications that can benefit from high quality, ethically designed text generation, such as clinical note generation and synthetic text generation in support of secondary use of health data. We also raise awareness of the risks involved with generative AI such as overconfidence in outputs due to anthropomorphism and the risk of representational and allocation harms due to biases.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"173-198"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143658875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annual Review of Biomedical Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1