Pub Date : 2025-08-01DOI: 10.1146/annurev-biodatasci-103023-050856
Zijun Frank Zhang, Huixin Zhan, Tinghui Wu, Robert Burns, Jasreet Hundal, Helio A Costa
Deep learning and artificial intelligence (AI) have seen explosive growth and success in biomedical applications in the last decade, largely due to the rapid development of deep neural networks and their underlying neural network (NN) architectures. Here, we explore biomedical deep learning and AI from the specific perspective of NN architectures. We discuss widely varying design principles of NN architectures, their use in particular biomedical applications, and the assumptions (often hidden) built into them. We explore neural architecture search techniques that automate the design of NN topology to optimize task performance. Advanced neural architectures are being developed for both molecular and healthcare applications, employing elements of graph networks, transformers, and interpretable NNs, and we discuss and summarize the design considerations and unique advantages of each architecture. Future advances will include the employment of multimodal language models and smaller highly focused mechanistic models that build on the success of today's large models.
{"title":"The Expanding Landscape of Neural Architectures and Their Impact in Biomedicine.","authors":"Zijun Frank Zhang, Huixin Zhan, Tinghui Wu, Robert Burns, Jasreet Hundal, Helio A Costa","doi":"10.1146/annurev-biodatasci-103023-050856","DOIUrl":"10.1146/annurev-biodatasci-103023-050856","url":null,"abstract":"<p><p>Deep learning and artificial intelligence (AI) have seen explosive growth and success in biomedical applications in the last decade, largely due to the rapid development of deep neural networks and their underlying neural network (NN) architectures. Here, we explore biomedical deep learning and AI from the specific perspective of NN architectures. We discuss widely varying design principles of NN architectures, their use in particular biomedical applications, and the assumptions (often hidden) built into them. We explore neural architecture search techniques that automate the design of NN topology to optimize task performance. Advanced neural architectures are being developed for both molecular and healthcare applications, employing elements of graph networks, transformers, and interpretable NNs, and we discuss and summarize the design considerations and unique advantages of each architecture. Future advances will include the employment of multimodal language models and smaller highly focused mechanistic models that build on the success of today's large models.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"8 1","pages":"101-124"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144822754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biomedicine has rapidly digitized over recent decades, from genomic sequencing to electronic medical records. Now, the rise of large language models (LLMs) is driving a generative artificial intelligence (AI) revolution in natural language processing (NLP). Together, these trends create unprecedented possibilities to optimize patient care and accelerate biomedical discovery. Biomedical NLP already boosts productivity by automating labor-intensive tasks such as knowledge extraction and medical abstraction. Emerging approaches promise creativity gain, surpassing standard healthcare practices and uncovering emergent capabilities through Web-scale biomedical knowledge and population-level patient data. However, LLMs remain prone to hallucinations and omissions, and ensuring compliance and safety is vital in order to do no harm. Incorporating diverse modalities such as imaging and genomics is also essential for comprehensive solutions. We review these challenges and opportunities in biomedical NLP, offering historical context, surveying the current state of the art, and exploring frontiers for AI researchers and biomedical practitioners.
{"title":"Biomedical Natural Language Processing in the Era of Large Language Models.","authors":"Naoto Usuyama, Cliff Wong, Sheng Zhang, Tristan Naumann, Hoifung Poon","doi":"10.1146/annurev-biodatasci-103123-095406","DOIUrl":"10.1146/annurev-biodatasci-103123-095406","url":null,"abstract":"<p><p>Biomedicine has rapidly digitized over recent decades, from genomic sequencing to electronic medical records. Now, the rise of large language models (LLMs) is driving a generative artificial intelligence (AI) revolution in natural language processing (NLP). Together, these trends create unprecedented possibilities to optimize patient care and accelerate biomedical discovery. Biomedical NLP already boosts productivity by automating labor-intensive tasks such as knowledge extraction and medical abstraction. Emerging approaches promise creativity gain, surpassing standard healthcare practices and uncovering emergent capabilities through Web-scale biomedical knowledge and population-level patient data. However, LLMs remain prone to hallucinations and omissions, and ensuring compliance and safety is vital in order to do no harm. Incorporating diverse modalities such as imaging and genomics is also essential for comprehensive solutions. We review these challenges and opportunities in biomedical NLP, offering historical context, surveying the current state of the art, and exploring frontiers for AI researchers and biomedical practitioners.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"471-490"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144052846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2025-04-08DOI: 10.1146/annurev-biodatasci-103123-094851
Yifan Yang, Qiao Jin, Qingqing Zhu, Zhizheng Wang, Francisco Erramuspe Álvarez, Nicholas Wan, Benjamin Hou, Zhiyong Lu
Large language models (LLMs) have gained significant attention in the medical domain for their human-level capabilities, leading to increased efforts to explore their potential in various healthcare applications. However, despite such a promising future, there are multiple challenges and obstacles that remain for their real-world uses in practical settings. This work discusses key challenges for LLMs in medical applications from four unique aspects: operational vulnerabilities, ethical and social considerations, performance and assessment difficulties, and legal and regulatory compliance. Addressing these challenges is crucial for leveraging LLMs to their full potential and ensuring their responsible integration into healthcare.
{"title":"Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare.","authors":"Yifan Yang, Qiao Jin, Qingqing Zhu, Zhizheng Wang, Francisco Erramuspe Álvarez, Nicholas Wan, Benjamin Hou, Zhiyong Lu","doi":"10.1146/annurev-biodatasci-103123-094851","DOIUrl":"10.1146/annurev-biodatasci-103123-094851","url":null,"abstract":"<p><p>Large language models (LLMs) have gained significant attention in the medical domain for their human-level capabilities, leading to increased efforts to explore their potential in various healthcare applications. However, despite such a promising future, there are multiple challenges and obstacles that remain for their real-world uses in practical settings. This work discusses key challenges for LLMs in medical applications from four unique aspects: operational vulnerabilities, ethical and social considerations, performance and assessment difficulties, and legal and regulatory compliance. Addressing these challenges is crucial for leveraging LLMs to their full potential and ensuring their responsible integration into healthcare.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"305-316"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143812609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2025-04-21DOI: 10.1146/annurev-biodatasci-103123-094601
Anthony L Lin, Amanda B Parrish, Michael Cary, Christina Silcox, Suresh Balu, J Eric Jelovsek, Cara O'Brien, Michael Pencina, Eric Poon, Nicoleta J Economou-Zavlanos
The potential of algorithm-based clinical decision support (CDS) in healthcare continues to increase with the growing field of artificial intelligence (AI)-enabled CDS. The use of these technologies to support clinicians, patients, and health systems is still quite new, and to date, implementors and regulators are still identifying the best processes and practices to ensure the effective, safe, and equitable use of these technology solutions. To assist individuals and organizations interested in implementation of algorithm-based CDS and AI-enabled CDS in healthcare, this article reviews the important regulatory decisions that form the landscape within which algorithm-based CDS has emerged, modern governance frameworks used to oversee these CDS systems, nuances in evaluation and monitoring throughout the CDS life cycle, best practices for real-world implementation, safety and equity considerations, and avenues for future collaboration and innovation.
{"title":"Algorithm-Based Clinical Decision Support: Evolving Regulatory Landscape and Best Practices for Local Oversight.","authors":"Anthony L Lin, Amanda B Parrish, Michael Cary, Christina Silcox, Suresh Balu, J Eric Jelovsek, Cara O'Brien, Michael Pencina, Eric Poon, Nicoleta J Economou-Zavlanos","doi":"10.1146/annurev-biodatasci-103123-094601","DOIUrl":"10.1146/annurev-biodatasci-103123-094601","url":null,"abstract":"<p><p>The potential of algorithm-based clinical decision support (CDS) in healthcare continues to increase with the growing field of artificial intelligence (AI)-enabled CDS. The use of these technologies to support clinicians, patients, and health systems is still quite new, and to date, implementors and regulators are still identifying the best processes and practices to ensure the effective, safe, and equitable use of these technology solutions. To assist individuals and organizations interested in implementation of algorithm-based CDS and AI-enabled CDS in healthcare, this article reviews the important regulatory decisions that form the landscape within which algorithm-based CDS has emerged, modern governance frameworks used to oversee these CDS systems, nuances in evaluation and monitoring throughout the CDS life cycle, best practices for real-world implementation, safety and equity considerations, and avenues for future collaboration and innovation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"491-507"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144022132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2025-04-08DOI: 10.1146/annurev-biodatasci-103123-095737
Abdoul Jalil Djiberou Mahamadou, Artem A Trotsyuk
Efforts to mitigate bias and enhance fairness in the artificial intelligence (AI) community have predominantly focused on technical solutions. While numerous reviews have addressed bias in AI, this review uniquely focuses on the practical limitations of technical solutions in healthcare settings, providing a structured analysis across five key dimensions affecting their real-world implementation: who defines bias and fairness, which mitigation strategy to use and prioritize among dozens that are inconsistent and incompatible, when in the AI development stages the solutions are most effective, for which populations, and the context for which the solutions are designed. We illustrate each limitation with empirical studies focusing on healthcare and biomedical applications. Moreover, we discuss how value-sensitive AI, a framework derived from technology design, can engage stakeholders and ensure that their values are embodied in bias and fairness mitigation solutions. Finally, we discuss areas that require further investigation and provide practical recommendations to address the limitations covered in the study.
{"title":"Revisiting Technical Bias Mitigation Strategies.","authors":"Abdoul Jalil Djiberou Mahamadou, Artem A Trotsyuk","doi":"10.1146/annurev-biodatasci-103123-095737","DOIUrl":"10.1146/annurev-biodatasci-103123-095737","url":null,"abstract":"<p><p>Efforts to mitigate bias and enhance fairness in the artificial intelligence (AI) community have predominantly focused on technical solutions. While numerous reviews have addressed bias in AI, this review uniquely focuses on the practical limitations of technical solutions in healthcare settings, providing a structured analysis across five key dimensions affecting their real-world implementation: who defines bias and fairness, which mitigation strategy to use and prioritize among dozens that are inconsistent and incompatible, when in the AI development stages the solutions are most effective, for which populations, and the context for which the solutions are designed. We illustrate each limitation with empirical studies focusing on healthcare and biomedical applications. Moreover, we discuss how value-sensitive AI, a framework derived from technology design, can engage stakeholders and ensure that their values are embodied in bias and fairness mitigation solutions. Finally, we discuss areas that require further investigation and provide practical recommendations to address the limitations covered in the study.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"287-303"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143812611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2025-02-20DOI: 10.1146/annurev-biodatasci-103123-095355
Marc Subirana-Granés, Jill Hoffman, Haoyu Zhang, Christina Akirtava, Sutanu Nandi, Kevin Fotso, Milton Pividori
Understanding the genetic basis of complex traits is a longstanding challenge in the field of genomics. Genome-wide association studies have identified thousands of variant-trait associations, but most of these variants are located in noncoding regions, making the link to biological function elusive. While traditional approaches, such as transcriptome-wide association studies (TWAS), have advanced our understanding by linking genetic variants to gene expression, they often overlook gene-gene interactions. Here, we review current approaches to integrate different molecular data, leveraging machine learning methods to identify gene modules based on coexpression and functional relationships. These integrative approaches, such as PhenoPLIER, combine TWAS and drug-induced transcriptional profiles to effectively capture biologically meaningful gene networks. This integration provides a context-specific understanding of disease processes while highlighting both core and peripheral genes. These insights pave the way for novel therapeutic targets and enhance the interpretability of genetic studies in personalized medicine.
{"title":"Genetic Studies Through the Lens of Gene Networks.","authors":"Marc Subirana-Granés, Jill Hoffman, Haoyu Zhang, Christina Akirtava, Sutanu Nandi, Kevin Fotso, Milton Pividori","doi":"10.1146/annurev-biodatasci-103123-095355","DOIUrl":"10.1146/annurev-biodatasci-103123-095355","url":null,"abstract":"<p><p>Understanding the genetic basis of complex traits is a longstanding challenge in the field of genomics. Genome-wide association studies have identified thousands of variant-trait associations, but most of these variants are located in noncoding regions, making the link to biological function elusive. While traditional approaches, such as transcriptome-wide association studies (TWAS), have advanced our understanding by linking genetic variants to gene expression, they often overlook gene-gene interactions. Here, we review current approaches to integrate different molecular data, leveraging machine learning methods to identify gene modules based on coexpression and functional relationships. These integrative approaches, such as PhenoPLIER, combine TWAS and drug-induced transcriptional profiles to effectively capture biologically meaningful gene networks. This integration provides a context-specific understanding of disease processes while highlighting both core and peripheral genes. These insights pave the way for novel therapeutic targets and enhance the interpretability of genetic studies in personalized medicine.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"125-147"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12310179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143469408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2025-05-01DOI: 10.1146/annurev-biodatasci-103123-094729
Justin Kauffman, Riccardo Miotto, Eyal Klang, Anthony Costa, Beau Norgeot, Marinka Zitnik, Shameer Khader, Fei Wang, Girish N Nadkarni, Benjamin S Glicksberg
This review aims to elucidate the role and impact of embedding techniques in the analysis and utilization of electronic health record data for research. By integrating multidimensional, incongruent, and often unstructured medical data for machine learning models, embeddings provide a powerful tool for enhancing data utility, especially under certain conditions and for asking certain questions. We explore a variety of embedding methods, including but not limited to word embeddings, graph embeddings, and other deep learning models. We highlight key applications of embeddings that are representative of a variety of areas of research, including predictive modeling, patient stratification, clinical decision support, and beyond. Finally, we show how to evaluate the impact and quality of embeddings in real-world clinical settings, assessing their performance against traditional models and noting areas where they deliver substantial improvements or fall short.
{"title":"Embedding Methods for Electronic Health Record Research.","authors":"Justin Kauffman, Riccardo Miotto, Eyal Klang, Anthony Costa, Beau Norgeot, Marinka Zitnik, Shameer Khader, Fei Wang, Girish N Nadkarni, Benjamin S Glicksberg","doi":"10.1146/annurev-biodatasci-103123-094729","DOIUrl":"10.1146/annurev-biodatasci-103123-094729","url":null,"abstract":"<p><p>This review aims to elucidate the role and impact of embedding techniques in the analysis and utilization of electronic health record data for research. By integrating multidimensional, incongruent, and often unstructured medical data for machine learning models, embeddings provide a powerful tool for enhancing data utility, especially under certain conditions and for asking certain questions. We explore a variety of embedding methods, including but not limited to word embeddings, graph embeddings, and other deep learning models. We highlight key applications of embeddings that are representative of a variety of areas of research, including predictive modeling, patient stratification, clinical decision support, and beyond. Finally, we show how to evaluate the impact and quality of embeddings in real-world clinical settings, assessing their performance against traditional models and noting areas where they deliver substantial improvements or fall short.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"563-590"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144048052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2025-05-01DOI: 10.1146/annurev-biodatasci-103123-095644
Hope Zehr, Alberto Baiardi, Francesco Tacchino, Anthony Gandon, Laurin E Fischer, Yue Xu, Frank P DiFilippo, Leonardo Guidoni, Pi A B Haase, Walter N Talarico, Martina Stella, Fabio Tarocco, Anton Nykänen, Aaron Fitzpatrick, Aaron Miller, Leander Thiessen, Stefan Knecht, Elsi-Mari Borrelli, Sabrina Maniscalco, Fabijan Pavošević, Ivano Tavernelli, Edward Maytin, Vijay Krishna
Use of light in healthcare is evolving with increasing applications of photodynamic therapy (PDT) for treating various cancers. PDT utilizes light-activated molecules called photosensitizers (PSs) that generate reactive oxygen species (ROSs) to induce tumor cell apoptosis and necrosis. However, the use of PDT is limited by the availability of PSs that can be activated by deep tissue-penetrating near-infrared light, exhibit low dark toxicity, and produce ROSs efficiently. Here we review the different categories of PS currently used in clinical or preclinical trials and highlight the significance of advanced computational methods, including density functional and wave function-based quantum chemistry, for understanding the molecular mechanisms involved in PS activation. Despite advancements in classical computational techniques, the complexities of excited state dynamics in highly correlated molecular systems demand innovative simulation approaches such as quantum computing. We propose that quantum computing holds promise for accurately modeling the excited-state properties of PSs to optimize their design and broaden clinical applications.
{"title":"Quantum Computing for Photosensitizer Design in Photodynamic Therapy.","authors":"Hope Zehr, Alberto Baiardi, Francesco Tacchino, Anthony Gandon, Laurin E Fischer, Yue Xu, Frank P DiFilippo, Leonardo Guidoni, Pi A B Haase, Walter N Talarico, Martina Stella, Fabio Tarocco, Anton Nykänen, Aaron Fitzpatrick, Aaron Miller, Leander Thiessen, Stefan Knecht, Elsi-Mari Borrelli, Sabrina Maniscalco, Fabijan Pavošević, Ivano Tavernelli, Edward Maytin, Vijay Krishna","doi":"10.1146/annurev-biodatasci-103123-095644","DOIUrl":"10.1146/annurev-biodatasci-103123-095644","url":null,"abstract":"<p><p>Use of light in healthcare is evolving with increasing applications of photodynamic therapy (PDT) for treating various cancers. PDT utilizes light-activated molecules called photosensitizers (PSs) that generate reactive oxygen species (ROSs) to induce tumor cell apoptosis and necrosis. However, the use of PDT is limited by the availability of PSs that can be activated by deep tissue-penetrating near-infrared light, exhibit low dark toxicity, and produce ROSs efficiently. Here we review the different categories of PS currently used in clinical or preclinical trials and highlight the significance of advanced computational methods, including density functional and wave function-based quantum chemistry, for understanding the molecular mechanisms involved in PS activation. Despite advancements in classical computational techniques, the complexities of excited state dynamics in highly correlated molecular systems demand innovative simulation approaches such as quantum computing. We propose that quantum computing holds promise for accurately modeling the excited-state properties of PSs to optimize their design and broaden clinical applications.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"509-536"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144017049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2024-07-24DOI: 10.1146/annurev-biodatasci-122220-115746
Ruowang Li, Joseph D Romano, Yong Chen, Jason H Moore
The progress of precision medicine research hinges on the gathering and analysis of extensive and diverse clinical datasets. With the continued expansion of modalities, scales, and sources of clinical datasets, it becomes imperative to devise methods for aggregating information from these varied sources to achieve a comprehensive understanding of diseases. In this review, we describe two important approaches for the analysis of diverse clinical datasets, namely the centralized model and federated model. We compare and contrast the strengths and weaknesses inherent in each model and present recent progress in methodologies and their associated challenges. Finally, we present an outlook on the opportunities that both models hold for the future analysis of clinical data.
{"title":"Centralized and Federated Models for the Analysis of Clinical Data.","authors":"Ruowang Li, Joseph D Romano, Yong Chen, Jason H Moore","doi":"10.1146/annurev-biodatasci-122220-115746","DOIUrl":"10.1146/annurev-biodatasci-122220-115746","url":null,"abstract":"<p><p>The progress of precision medicine research hinges on the gathering and analysis of extensive and diverse clinical datasets. With the continued expansion of modalities, scales, and sources of clinical datasets, it becomes imperative to devise methods for aggregating information from these varied sources to achieve a comprehensive understanding of diseases. In this review, we describe two important approaches for the analysis of diverse clinical datasets, namely the centralized model and federated model. We compare and contrast the strengths and weaknesses inherent in each model and present recent progress in methodologies and their associated challenges. Finally, we present an outlook on the opportunities that both models hold for the future analysis of clinical data.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"179-199"},"PeriodicalIF":6.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11571052/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140899793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01Epub Date: 2024-07-24DOI: 10.1146/annurev-biodatasci-102523-104225
Annabel C Beichman, Luke Zhu, Kelley Harris
Novel sequencing technologies are making it increasingly possible to measure the mutation rates of somatic cell lineages. Accurate germline mutation rate measurement technologies have also been available for a decade, making it possible to assess how this fundamental evolutionary parameter varies across the tree of life. Here, we review some classical theories about germline and somatic mutation rate evolution that were formulated using principles of population genetics and the biology of aging and cancer. We find that somatic mutation rate measurements, while still limited in phylogenetic diversity, seem consistent with the theory that selection to preserve the soma is proportional to life span. However, germline and somatic theories make conflicting predictions regarding which species should have the most accurate DNA repair. Resolving this conflict will require carefully measuring how mutation rates scale with time and cell division and achieving a better understanding of mutation rate pleiotropy among cell types.
新的测序技术使测量体细胞系突变率变得越来越可能。精确的种系突变率测量技术也已问世十年,这使得评估这一基本进化参数在整个生命树中的变化情况成为可能。在此,我们回顾了一些关于种系和体细胞突变率进化的经典理论,这些理论是利用群体遗传学和衰老与癌症生物学原理提出的。我们发现,体细胞突变率的测量结果虽然在系统发育多样性方面仍然有限,但似乎与保护体细胞的选择与寿命成正比的理论相一致。然而,生殖细胞理论和体细胞理论在预测哪个物种的 DNA 修复最准确方面存在冲突。要解决这一矛盾,需要仔细测量突变率如何随时间和细胞分裂而变化,并更好地了解细胞类型之间的突变率褶积性。
{"title":"The Evolutionary Interplay of Somatic and Germline Mutation Rates.","authors":"Annabel C Beichman, Luke Zhu, Kelley Harris","doi":"10.1146/annurev-biodatasci-102523-104225","DOIUrl":"10.1146/annurev-biodatasci-102523-104225","url":null,"abstract":"<p><p>Novel sequencing technologies are making it increasingly possible to measure the mutation rates of somatic cell lineages. Accurate germline mutation rate measurement technologies have also been available for a decade, making it possible to assess how this fundamental evolutionary parameter varies across the tree of life. Here, we review some classical theories about germline and somatic mutation rate evolution that were formulated using principles of population genetics and the biology of aging and cancer. We find that somatic mutation rate measurements, while still limited in phylogenetic diversity, seem consistent with the theory that selection to preserve the soma is proportional to life span. However, germline and somatic theories make conflicting predictions regarding which species should have the most accurate DNA repair. Resolving this conflict will require carefully measuring how mutation rates scale with time and cell division and achieving a better understanding of mutation rate pleiotropy among cell types.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"83-105"},"PeriodicalIF":6.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12254932/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140872288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}