首页 > 最新文献

Annual Review of Biomedical Data Science最新文献

英文 中文
Strategies for Creating Robust Patient Groups to Study Diverse Conditions with Electronic Health Records. 创建健全的患者群体以研究电子健康记录的不同条件的策略。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-04-08 DOI: 10.1146/annurev-biodatasci-020722-114525
Grace D Ramey, Hannah Takasuka, John A Capra

The growth of electronic health record (EHR) databases in size and availability has created an unprecedented opportunity to better understand human health and disease. However, conducting robust EHR studies requires careful filtering criteria and study design, as EHRs pose several challenges that can confound analyses and lead to inaccurate results. Here we review these challenges and make suggestions about how to avoid or adjust for major confounders and biases in common EHR study designs. We further highlight qualities of EHR data that make different diseases more or less feasible for study. These recommendations for conducting research using EHRs will help inform database selection, improve reproducibility of results across the field, and enhance the validity of study results.

电子健康记录(EHR)数据库在规模和可用性方面的增长为更好地了解人类健康和疾病创造了前所未有的机会。然而,进行稳健的电子病历研究需要仔细的过滤标准和研究设计,因为电子病历带来了一些挑战,可能会混淆分析并导致不准确的结果。在这里,我们回顾了这些挑战,并就如何避免或调整常见电子病历研究设计中的主要混杂因素和偏差提出建议。我们进一步强调电子病历数据的质量,使不同的疾病或多或少具有研究的可行性。这些使用电子病历进行研究的建议将有助于为数据库选择提供信息,提高整个领域结果的可重复性,并增强研究结果的有效性。
{"title":"Strategies for Creating Robust Patient Groups to Study Diverse Conditions with Electronic Health Records.","authors":"Grace D Ramey, Hannah Takasuka, John A Capra","doi":"10.1146/annurev-biodatasci-020722-114525","DOIUrl":"10.1146/annurev-biodatasci-020722-114525","url":null,"abstract":"<p><p>The growth of electronic health record (EHR) databases in size and availability has created an unprecedented opportunity to better understand human health and disease. However, conducting robust EHR studies requires careful filtering criteria and study design, as EHRs pose several challenges that can confound analyses and lead to inaccurate results. Here we review these challenges and make suggestions about how to avoid or adjust for major confounders and biases in common EHR study designs. We further highlight qualities of EHR data that make different diseases more or less feasible for study. These recommendations for conducting research using EHRs will help inform database selection, improve reproducibility of results across the field, and enhance the validity of study results.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"317-340"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143812613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping, Modeling, and Reprogramming Cell-Fate Decision-Making Systems. 绘图、建模和重编程细胞命运决策系统。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-05-01 DOI: 10.1146/annurev-biodatasci-101424-121439
Lucy Ham, Taylor E Woodward, Megan A Coomer, Michael P H Stumpf

Many cellular processes involve information processing and decision-making. We can probe these processes at increasing molecular detail. The analysis of heterogeneous data remains a challenge that requires new ways of thinking about cells in quantitative, predictive, and mechanistic ways. We discuss the role of mathematical models in the context of cell-fate decision-making systems across the tree of life. Complex multicellular organisms have been a particular focus, but single-celled organisms also have to sense and respond to their environment. We center our discussion around the idea of design principles that we can learn from observations and modeling and exploit in order to (re)-design or guide cellular behavior.

许多细胞过程涉及信息处理和决策。我们可以通过增加分子细节来探索这些过程。异构数据的分析仍然是一个挑战,需要以定量、预测和机制的方式思考细胞的新方法。我们讨论了数学模型在整个生命之树的细胞命运决策系统中的作用。复杂的多细胞生物一直是一个特别的焦点,但单细胞生物也必须感知和响应他们的环境。我们的讨论集中在设计原则的概念上,我们可以从观察和建模中学习,并利用这些原则来(重新)设计或指导细胞行为。
{"title":"Mapping, Modeling, and Reprogramming Cell-Fate Decision-Making Systems.","authors":"Lucy Ham, Taylor E Woodward, Megan A Coomer, Michael P H Stumpf","doi":"10.1146/annurev-biodatasci-101424-121439","DOIUrl":"10.1146/annurev-biodatasci-101424-121439","url":null,"abstract":"<p><p>Many cellular processes involve information processing and decision-making. We can probe these processes at increasing molecular detail. The analysis of heterogeneous data remains a challenge that requires new ways of thinking about cells in quantitative, predictive, and mechanistic ways. We discuss the role of mathematical models in the context of cell-fate decision-making systems across the tree of life. Complex multicellular organisms have been a particular focus, but single-celled organisms also have to sense and respond to their environment. We center our discussion around the idea of design principles that we can learn from observations and modeling and exploit in order to (re)-design or guide cellular behavior.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"537-562"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143984534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The TITAN-X Platform Integrates Big Data, Artificial Intelligence, Bioinformatics, and Advanced Computational Modeling to Understand Immune Responses and Develop the Next Wave of Precision Medicines. TITAN-X平台集成了大数据、人工智能、生物信息学和先进的计算建模,以了解免疫反应并开发下一波精准医疗。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-04-16 DOI: 10.1146/annurev-biodatasci-103123-094804
Ryan Baker, Josep Bassaganya-Riera, Nuria Tubau-Juni, Andrew J Leber, Raquel Hontecillas

The TITAN-X Precision Medicine Platform was engineered to rapidly, fully, and efficiently utilize large-scale immunology datasets, including public data, in drug discovery and development. TITAN-X integrates big data with artificial intelligence (AI), bioinformatics, and advanced computational modeling to seamlessly transition from early target discovery to clinical testing of new therapeutics, developing biomarker-driven precision medicines tailored to specific patient populations. We illustrate the capabilities of TITAN-X through four case studies, demonstrating its use in computationally driven target discovery; characterization of novel immunometabolic mechanisms in infectious, inflammatory, and autoimmune diseases; and identification of biomarker signatures for patient stratification in clinical trials designed to maximize therapeutic efficacy and safety. Data-driven and AI-powered approaches like TITAN-X are enhancing the pace of drug development, reducing costs, tailoring treatments, and increasing the probability of success in clinical trials.

TITAN-X精密医学平台旨在快速、全面、高效地利用大规模免疫学数据集,包括公共数据,用于药物发现和开发。TITAN-X将大数据与人工智能(AI)、生物信息学和先进的计算建模相结合,从早期靶点发现无缝过渡到新疗法的临床测试,开发针对特定患者群体的生物标志物驱动的精准药物。我们通过四个案例研究说明TITAN-X的能力,展示其在计算驱动的目标发现中的使用;传染病、炎症和自身免疫性疾病中新的免疫代谢机制的表征在临床试验中识别患者分层的生物标志物特征,以最大限度地提高治疗效果和安全性。像TITAN-X这样的数据驱动和人工智能驱动的方法正在加快药物开发的步伐,降低成本,定制治疗方法,并增加临床试验成功的可能性。
{"title":"The TITAN-X Platform Integrates Big Data, Artificial Intelligence, Bioinformatics, and Advanced Computational Modeling to Understand Immune Responses and Develop the Next Wave of Precision Medicines.","authors":"Ryan Baker, Josep Bassaganya-Riera, Nuria Tubau-Juni, Andrew J Leber, Raquel Hontecillas","doi":"10.1146/annurev-biodatasci-103123-094804","DOIUrl":"10.1146/annurev-biodatasci-103123-094804","url":null,"abstract":"<p><p>The TITAN-X Precision Medicine Platform was engineered to rapidly, fully, and efficiently utilize large-scale immunology datasets, including public data, in drug discovery and development. TITAN-X integrates big data with artificial intelligence (AI), bioinformatics, and advanced computational modeling to seamlessly transition from early target discovery to clinical testing of new therapeutics, developing biomarker-driven precision medicines tailored to specific patient populations. We illustrate the capabilities of TITAN-X through four case studies, demonstrating its use in computationally driven target discovery; characterization of novel immunometabolic mechanisms in infectious, inflammatory, and autoimmune diseases; and identification of biomarker signatures for patient stratification in clinical trials designed to maximize therapeutic efficacy and safety. Data-driven and AI-powered approaches like TITAN-X are enhancing the pace of drug development, reducing costs, tailoring treatments, and increasing the probability of success in clinical trials.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"447-469"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144039901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meta-Analysis and Federated Learning over Decentralized Distributed Research Networks. 分散式分布式研究网络的元分析和联邦学习。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 DOI: 10.1146/annurev-biodatasci-103123-094441
Yiwen Lu, Bingyu Zhang, Jiayi Tong, Yong Chen

Distributed research networks have transformed modern clinical research by enabling large-scale, multi-institutional collaborations while maintaining patient privacy. Two prominent methodologies within these frameworks-meta-analysis and federated learning-address the challenges of synthesizing evidence from decentralized data. Meta-analysis aggregates study-level results to provide robust, interpretable estimates, making it a cornerstone of evidence synthesis for association studies. Federated learning complements this by enabling complex downstream tasks, such as predictive modeling and counterfactual inference, while preserving data privacy through privacy-preserving distributed algorithms. Federated learning facilitates communication-efficient computation and adapts seamlessly to heterogeneous datasets across diverse institutions. This review emphasizes the complementary strengths of federated learning's scalability, flexibility, and readiness for implementation alongside meta-analysis's robust frameworks for evidence synthesis and aggregation in clinical research. Integrations of synthetic data, artificial intelligence (AI)-enhanced harmonization, and hybrid human-AI frameworks are proposed as future directions, promising to further advance both methodologies and enhance their combined impact on privacy-conscious, data-driven healthcare research.

分布式研究网络通过实现大规模、多机构合作,同时维护患者隐私,改变了现代临床研究。这些框架中的两个突出的方法——元分析和联合学习——解决了从分散的数据中合成证据的挑战。荟萃分析汇集了研究水平的结果,以提供可靠的、可解释的估计,使其成为关联研究证据综合的基石。联邦学习通过支持复杂的下游任务(如预测建模和反事实推理)来补充这一点,同时通过保护隐私的分布式算法来保护数据隐私。联邦学习促进了高效通信的计算,并无缝地适应不同机构的异构数据集。这篇综述强调了联邦学习的可扩展性、灵活性和实施准备的互补优势,以及meta分析在临床研究中用于证据合成和聚合的强大框架。合成数据、人工智能(AI)增强的协调和混合人类-人工智能框架的集成被认为是未来的方向,有望进一步推进这两种方法,并增强它们对隐私意识、数据驱动的医疗保健研究的综合影响。
{"title":"Meta-Analysis and Federated Learning over Decentralized Distributed Research Networks.","authors":"Yiwen Lu, Bingyu Zhang, Jiayi Tong, Yong Chen","doi":"10.1146/annurev-biodatasci-103123-094441","DOIUrl":"10.1146/annurev-biodatasci-103123-094441","url":null,"abstract":"<p><p>Distributed research networks have transformed modern clinical research by enabling large-scale, multi-institutional collaborations while maintaining patient privacy. Two prominent methodologies within these frameworks-meta-analysis and federated learning-address the challenges of synthesizing evidence from decentralized data. Meta-analysis aggregates study-level results to provide robust, interpretable estimates, making it a cornerstone of evidence synthesis for association studies. Federated learning complements this by enabling complex downstream tasks, such as predictive modeling and counterfactual inference, while preserving data privacy through privacy-preserving distributed algorithms. Federated learning facilitates communication-efficient computation and adapts seamlessly to heterogeneous datasets across diverse institutions. This review emphasizes the complementary strengths of federated learning's scalability, flexibility, and readiness for implementation alongside meta-analysis's robust frameworks for evidence synthesis and aggregation in clinical research. Integrations of synthetic data, artificial intelligence (AI)-enhanced harmonization, and hybrid human-AI frameworks are proposed as future directions, promising to further advance both methodologies and enhance their combined impact on privacy-conscious, data-driven healthcare research.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"8 1","pages":"405-421"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144822752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Foundation Models for Translational Cancer Biology. 转化癌症生物学基础模型。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-01-29 DOI: 10.1146/annurev-biodatasci-103123-095633
Kevin K Tsang, Sophia Kivelson, Jose M Acitores Cortina, Aditi Kuchi, Jacob S Berkowitz, Hongyu Liu, Apoorva Srinivasan, Nadine A Friedrich, Yasaman Fatapour, Nicholas P Tatonetti

Cancer remains a leading cause of death globally. The complexity and diversity of cancer-related datasets across different specialties pose challenges in refining precision medicine for oncology. Foundation models offer a promising solution. Trained on vast amounts of data, these models develop a broad understanding across a wide range of tasks. We examine the role of foundation models in domains relevant to cancer research, including natural language processing, computer vision, molecular biology, and cheminformatics. Through a review of state-of-the-art methods, we explore how these models have already advanced translational cancer research goals such as precision tumor classification and artificial intelligence-assisted surgery. We also discuss prospective advances in areas like early tumor detection, personalized cancer treatment, and drug discovery. This review provides researchers with a curated set of resources and methodologies, offers practitioners a deeper understanding of how these models enhance cancer care, and points to opportunities for future applications of foundation models in cancer research.

癌症仍然是全球死亡的主要原因。不同专科癌症相关数据集的复杂性和多样性为完善肿瘤精准医疗带来了挑战。基础模型提供了一个很有前景的解决方案。通过对大量数据的训练,这些模型能对各种任务形成广泛的理解。我们研究了基础模型在癌症研究相关领域的作用,包括自然语言处理、计算机视觉、分子生物学和化学信息学。通过回顾最先进的方法,我们探讨了这些模型是如何推进肿瘤精准分类和人工智能辅助手术等转化癌症研究目标的。我们还讨论了早期肿瘤检测、个性化癌症治疗和药物发现等领域的前瞻性进展。这篇综述为研究人员提供了一套精心策划的资源和方法,让从业人员更深入地了解这些模型如何加强癌症护理,并指出了未来在癌症研究中应用基础模型的机会。
{"title":"Foundation Models for Translational Cancer Biology.","authors":"Kevin K Tsang, Sophia Kivelson, Jose M Acitores Cortina, Aditi Kuchi, Jacob S Berkowitz, Hongyu Liu, Apoorva Srinivasan, Nadine A Friedrich, Yasaman Fatapour, Nicholas P Tatonetti","doi":"10.1146/annurev-biodatasci-103123-095633","DOIUrl":"10.1146/annurev-biodatasci-103123-095633","url":null,"abstract":"<p><p>Cancer remains a leading cause of death globally. The complexity and diversity of cancer-related datasets across different specialties pose challenges in refining precision medicine for oncology. Foundation models offer a promising solution. Trained on vast amounts of data, these models develop a broad understanding across a wide range of tasks. We examine the role of foundation models in domains relevant to cancer research, including natural language processing, computer vision, molecular biology, and cheminformatics. Through a review of state-of-the-art methods, we explore how these models have already advanced translational cancer research goals such as precision tumor classification and artificial intelligence-assisted surgery. We also discuss prospective advances in areas like early tumor detection, personalized cancer treatment, and drug discovery. This review provides researchers with a curated set of resources and methodologies, offers practitioners a deeper understanding of how these models enhance cancer care, and points to opportunities for future applications of foundation models in cancer research.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"51-80"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical Text Generation: Are We There Yet? 临床文本生成:我们做到了吗?
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-03-18 DOI: 10.1146/annurev-biodatasci-103123-095202
Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol

Generative artificial intelligence (AI), operationalized as large language models, is increasingly used in the biomedical field to assist with a range of text processing tasks including text classification, information extraction, and decision support. In this article, we focus on the primary purpose of generative language models, namely the production of unstructured text. We review past and current methods used to generate text as well as methods for evaluating open text generation, i.e., in contexts where no reference text is available for comparison. We discuss clinical applications that can benefit from high quality, ethically designed text generation, such as clinical note generation and synthetic text generation in support of secondary use of health data. We also raise awareness of the risks involved with generative AI such as overconfidence in outputs due to anthropomorphism and the risk of representational and allocation harms due to biases.

生成式人工智能(AI)作为大型语言模型,越来越多地用于生物医学领域,以协助完成一系列文本处理任务,包括文本分类、信息提取和决策支持。在本文中,我们将重点讨论生成语言模型的主要目的,即生成非结构化文本。我们回顾了过去和当前用于生成文本的方法,以及评估开放文本生成的方法,即在没有参考文本可用于比较的上下文中。我们讨论了可以从高质量、合乎伦理设计的文本生成中受益的临床应用,例如临床记录生成和合成文本生成,以支持健康数据的二次使用。我们还提高了对生成式人工智能所涉及的风险的认识,例如由于拟人化而对产出的过度自信,以及由于偏见而导致的代表性和分配损害的风险。
{"title":"Clinical Text Generation: Are We There Yet?","authors":"Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol","doi":"10.1146/annurev-biodatasci-103123-095202","DOIUrl":"10.1146/annurev-biodatasci-103123-095202","url":null,"abstract":"<p><p>Generative artificial intelligence (AI), operationalized as large language models, is increasingly used in the biomedical field to assist with a range of text processing tasks including text classification, information extraction, and decision support. In this article, we focus on the primary purpose of generative language models, namely the production of unstructured text. We review past and current methods used to generate text as well as methods for evaluating open text generation, i.e., in contexts where no reference text is available for comparison. We discuss clinical applications that can benefit from high quality, ethically designed text generation, such as clinical note generation and synthetic text generation in support of secondary use of health data. We also raise awareness of the risks involved with generative AI such as overconfidence in outputs due to anthropomorphism and the risk of representational and allocation harms due to biases.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"173-198"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143658875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conditional Generative Models for Synthetic Tabular Data: Applications for Precision Medicine and Diverse Representations. 合成表格数据的条件生成模型:精准医疗和多样化表示的应用。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-01-14 DOI: 10.1146/annurev-biodatasci-103123-094844
Kara Liu, Russ B Altman

Tabular medical datasets, like electronic health records (EHRs), biobanks, and structured clinical trial data, are rich sources of information with the potential to advance precision medicine and optimize patient care. However, real-world medical datasets have limited patient diversity and cannot simulate hypothetical outcomes, both of which are necessary for equitable and effective medical research. Fueled by recent advancements in machine learning, generative models offer a promising solution to these data limitations by generating enhanced synthetic data. This review highlights the potential of conditional generative models (CGMs) to create patient-specific synthetic data for a variety of precision medicine applications. We survey CGM approaches that tackle two medical applications: correcting for data representation biases and simulating digital health twins. We additionally explore how the surveyed methods handle modeling tabular medical data and briefly discuss evaluation criteria. Finally, we summarize the technical, medical, and ethical challenges that must be addressed before CGMs can be effectively and safely deployed in the medical field.

表格式医疗数据集,如电子健康记录(EHRs)、生物银行和结构化临床试验数据,是丰富的信息源,具有推进精准医疗和优化患者护理的潜力。然而,现实世界的医疗数据集具有有限的患者多样性,无法模拟假设的结果,这两者对于公平和有效的医学研究都是必要的。在机器学习最新进展的推动下,生成模型通过生成增强的合成数据,为这些数据限制提供了一个有希望的解决方案。这篇综述强调了条件生成模型(cgm)在为各种精准医学应用创建患者特定合成数据方面的潜力。我们调查了CGM解决两种医疗应用的方法:纠正数据表示偏差和模拟数字健康双胞胎。此外,我们还探讨了调查方法如何处理表格医学数据的建模,并简要讨论了评估标准。最后,我们总结了在cgm能够有效和安全地应用于医疗领域之前必须解决的技术、医学和伦理挑战。
{"title":"Conditional Generative Models for Synthetic Tabular Data: Applications for Precision Medicine and Diverse Representations.","authors":"Kara Liu, Russ B Altman","doi":"10.1146/annurev-biodatasci-103123-094844","DOIUrl":"10.1146/annurev-biodatasci-103123-094844","url":null,"abstract":"<p><p>Tabular medical datasets, like electronic health records (EHRs), biobanks, and structured clinical trial data, are rich sources of information with the potential to advance precision medicine and optimize patient care. However, real-world medical datasets have limited patient diversity and cannot simulate hypothetical outcomes, both of which are necessary for equitable and effective medical research. Fueled by recent advancements in machine learning, generative models offer a promising solution to these data limitations by generating enhanced synthetic data. This review highlights the potential of conditional generative models (CGMs) to create patient-specific synthetic data for a variety of precision medicine applications. We survey CGM approaches that tackle two medical applications: correcting for data representation biases and simulating digital health twins. We additionally explore how the surveyed methods handle modeling tabular medical data and briefly discuss evaluation criteria. Finally, we summarize the technical, medical, and ethical challenges that must be addressed before CGMs can be effectively and safely deployed in the medical field.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"21-49"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142984817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrative Data Science in Drug Safety Research: Experiences, Challenges, and Perspectives. 药物安全研究中的综合数据科学:经验、挑战和展望。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-04-01 DOI: 10.1146/annurev-biodatasci-103123-095506
Ferran Sanz

Pharmaceutical research and development largely depend on the quantity and quality of data that are available to support projects. The secondary use of data by means of collaborative and integrative approaches is yielding promising results in drug safety research. However, there are challenges that must be overcome in these integrative approaches, such as interoperability issues, intellectual property protection, and, in the case of clinical information, personal data safeguards. The OMOP common data model and the EHDEN and DARWIN EU platforms constitute successful examples of data sharing initiatives in the clinical domain, while the eTOX, eTRANSAFE, and VICT3R international projects are examples of corporate data sharing in toxicology research. The VICT3R project is using these shared data for generating virtual control groups to be applied in nonclinical drug safety assessment. Drug-related knowledge bases that integrate information from different sources also constitute useful tools in the drug safety domain.

药物研究和开发在很大程度上取决于可用于支持项目的数据的数量和质量。通过协作和综合方法对数据的二次利用正在药物安全研究中产生有希望的结果。然而,在这些综合方法中必须克服一些挑战,例如互操作性问题、知识产权保护,以及在临床信息的情况下的个人数据保护。OMOP通用数据模型和EHDEN和DARWIN欧盟平台构成了临床领域数据共享倡议的成功范例,而eTOX、eTRANSAFE和VICT3R国际项目是毒理学研究中企业数据共享的范例。VICT3R项目利用这些共享数据生成虚拟对照组,用于非临床药物安全性评估。与药物有关的知识库整合了来自不同来源的信息,也构成了药物安全领域的有用工具。
{"title":"Integrative Data Science in Drug Safety Research: Experiences, Challenges, and Perspectives.","authors":"Ferran Sanz","doi":"10.1146/annurev-biodatasci-103123-095506","DOIUrl":"10.1146/annurev-biodatasci-103123-095506","url":null,"abstract":"<p><p>Pharmaceutical research and development largely depend on the quantity and quality of data that are available to support projects. The secondary use of data by means of collaborative and integrative approaches is yielding promising results in drug safety research. However, there are challenges that must be overcome in these integrative approaches, such as interoperability issues, intellectual property protection, and, in the case of clinical information, personal data safeguards. The OMOP common data model and the EHDEN and DARWIN EU platforms constitute successful examples of data sharing initiatives in the clinical domain, while the eTOX, eTRANSAFE, and VICT3R international projects are examples of corporate data sharing in toxicology research. The VICT3R project is using these shared data for generating virtual control groups to be applied in nonclinical drug safety assessment. Drug-related knowledge bases that integrate information from different sources also constitute useful tools in the drug safety domain.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"275-285"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143765289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating Artificial Intelligence in Dermatological Cancer Screening and Diagnosis: Efficacy, Challenges, and Future Directions. 将人工智能应用于皮肤病癌症筛查和诊断:疗效、挑战和未来方向。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-05-01 DOI: 10.1146/annurev-biodatasci-103123-094521
Vivian Utti, Vasiliki Bikia, Ank A Agarwal, Roxana Daneshjou

Skin cancer is the most common cancer in the United States, with incidence rates continuing to rise both nationally and globally, posing significant health and economic burdens. These challenges are compounded by shortages in dermatological care and barriers to insurance access. To address these gaps, artificial intelligence (AI) and deep learning technologies offer promising solutions, enhancing skin cancer screening and diagnosis. AI has the potential to improve diagnostic accuracy and expand access to care, but significant challenges restrict its deployment. These challenges include clinical validation, algorithmic bias, regulatory oversight, and patient acceptance. Ethical concerns, such as disparities in access and fairness of AI algorithms, also require attention. In this review, we explore these limitations and outline future directions, including advancements in teledermatology and vision-language models (VLMs). Future research should focus on improving VLM reliability and interpretability and developing systems capable of integrating clinical context with dermatological images in a way that assists, rather than replaces, clinicians in making more accurate, timely diagnoses.

皮肤癌是美国最常见的癌症,其发病率在全国和全球范围内都在持续上升,造成了重大的健康和经济负担。皮肤科护理的短缺和获得保险的障碍使这些挑战更加复杂。为了弥补这些差距,人工智能(AI)和深度学习技术提供了有前途的解决方案,加强了皮肤癌的筛查和诊断。人工智能具有提高诊断准确性和扩大医疗服务可及性的潜力,但重大挑战限制了其部署。这些挑战包括临床验证、算法偏差、监管监督和患者接受度。人工智能算法在获取和公平性方面的差异等伦理问题也需要关注。在这篇综述中,我们探讨了这些局限性并概述了未来的发展方向,包括远程皮肤病学和视觉语言模型(VLMs)的进展。未来的研究应侧重于提高VLM的可靠性和可解释性,并开发能够将临床背景与皮肤科图像相结合的系统,以帮助而不是取代临床医生做出更准确、及时的诊断。
{"title":"Integrating Artificial Intelligence in Dermatological Cancer Screening and Diagnosis: Efficacy, Challenges, and Future Directions.","authors":"Vivian Utti, Vasiliki Bikia, Ank A Agarwal, Roxana Daneshjou","doi":"10.1146/annurev-biodatasci-103123-094521","DOIUrl":"10.1146/annurev-biodatasci-103123-094521","url":null,"abstract":"<p><p>Skin cancer is the most common cancer in the United States, with incidence rates continuing to rise both nationally and globally, posing significant health and economic burdens. These challenges are compounded by shortages in dermatological care and barriers to insurance access. To address these gaps, artificial intelligence (AI) and deep learning technologies offer promising solutions, enhancing skin cancer screening and diagnosis. AI has the potential to improve diagnostic accuracy and expand access to care, but significant challenges restrict its deployment. These challenges include clinical validation, algorithmic bias, regulatory oversight, and patient acceptance. Ethical concerns, such as disparities in access and fairness of AI algorithms, also require attention. In this review, we explore these limitations and outline future directions, including advancements in teledermatology and vision-language models (VLMs). Future research should focus on improving VLM reliability and interpretability and developing systems capable of integrating clinical context with dermatological images in a way that assists, rather than replaces, clinicians in making more accurate, timely diagnoses.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"591-603"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144001083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Network-Based Approaches for Drug Target Identification. 基于网络的药物靶标识别方法。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-01 Epub Date: 2025-04-16 DOI: 10.1146/annurev-biodatasci-101424-120950
Thodoris Koutsandreas, Kalliopi Tsafou, Heiko Horn, Ian Barrett, Evangelia Petsalaki

Drug target identification is the first step in drug development, and its importance is underscored by the fact that, even when using genetic evidence to improve success rates, only a small fraction of lead targets end up approved for use in the clinic. One of the reasons for this is the lack of in-depth understanding of the complexity of human diseases.In this review we argue that network-based approaches, which are able to capture relationships between relevant genes and proteins, and diverse data modalities have high potential for improving drug target identification and drug repurposing. We present the evolution of network-based methods that have been developed for this purpose and discuss the limitations of these approaches that are holding them back from making an impact in the clinic. We finish by presenting our recommendations for overcoming these limitations, for example, by leveraging emerging technologies such as artificial intelligence and knowledge graphs.

药物靶标识别是药物开发的第一步,它的重要性被这样一个事实所强调,即即使使用遗传证据来提高成功率,只有一小部分先导靶标最终被批准用于临床。其中一个原因是缺乏对人类疾病复杂性的深入了解。在这篇综述中,我们认为基于网络的方法,能够捕捉相关基因和蛋白质之间的关系,以及不同的数据模式,在改善药物靶点识别和药物再利用方面具有很大的潜力。我们介绍了为此目的而开发的基于网络的方法的发展,并讨论了这些方法的局限性,这些局限性阻碍了它们在临床中产生影响。最后,我们提出了克服这些限制的建议,例如,利用人工智能和知识图谱等新兴技术。
{"title":"Network-Based Approaches for Drug Target Identification.","authors":"Thodoris Koutsandreas, Kalliopi Tsafou, Heiko Horn, Ian Barrett, Evangelia Petsalaki","doi":"10.1146/annurev-biodatasci-101424-120950","DOIUrl":"10.1146/annurev-biodatasci-101424-120950","url":null,"abstract":"<p><p>Drug target identification is the first step in drug development, and its importance is underscored by the fact that, even when using genetic evidence to improve success rates, only a small fraction of lead targets end up approved for use in the clinic. One of the reasons for this is the lack of in-depth understanding of the complexity of human diseases.In this review we argue that network-based approaches, which are able to capture relationships between relevant genes and proteins, and diverse data modalities have high potential for improving drug target identification and drug repurposing. We present the evolution of network-based methods that have been developed for this purpose and discuss the limitations of these approaches that are holding them back from making an impact in the clinic. We finish by presenting our recommendations for overcoming these limitations, for example, by leveraging emerging technologies such as artificial intelligence and knowledge graphs.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"423-446"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144050632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annual Review of Biomedical Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1