首页 > 最新文献

Annual Review of Biomedical Data Science最新文献

英文 中文
Mapping the Multiscale Proteomic Organization of Cellular and Disease Phenotypes. 绘制细胞和疾病表型的多尺度蛋白质组组织图。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-01 Epub Date: 2024-07-24 DOI: 10.1146/annurev-biodatasci-102423-113534
Anthony Cesnik, Leah V Schaffer, Ishan Gaur, Mayank Jain, Trey Ideker, Emma Lundberg

While the primary sequences of human proteins have been cataloged for over a decade, determining how these are organized into a dynamic collection of multiprotein assemblies, with structures and functions spanning biological scales, is an ongoing venture. Systematic and data-driven analyses of these higher-order structures are emerging, facilitating the discovery and understanding of cellular phenotypes. At present, knowledge of protein localization and function has been primarily derived from manual annotation and curation in resources such as the Gene Ontology, which are biased toward richly annotated genes in the literature. Here, we envision a future powered by data-driven mapping of protein assemblies. These maps can capture and decode cellular functions through the integration of protein expression, localization, and interaction data across length scales and timescales. In this review, we focus on progress toward constructing integrated cell maps that accelerate the life sciences and translational research.

虽然人类蛋白质的主要序列已经编目十多年,但确定这些蛋白质是如何组织成一个动态的多蛋白集合体,其结构和功能跨越生物尺度,仍是一项持续的工作。对这些高阶结构的系统化和数据驱动分析正在兴起,有助于发现和理解细胞表型。目前,有关蛋白质定位和功能的知识主要来自人工注释和基因本体等资源的整理,这些资源偏重于文献中注释丰富的基因。在这里,我们设想了一个由数据驱动的蛋白质组装图谱驱动的未来。通过整合跨长度尺度和时间尺度的蛋白质表达、定位和相互作用数据,这些图谱可以捕捉和解码细胞功能。在这篇综述中,我们将重点介绍构建集成细胞图谱的进展,以加速生命科学和转化研究的发展。
{"title":"Mapping the Multiscale Proteomic Organization of Cellular and Disease Phenotypes.","authors":"Anthony Cesnik, Leah V Schaffer, Ishan Gaur, Mayank Jain, Trey Ideker, Emma Lundberg","doi":"10.1146/annurev-biodatasci-102423-113534","DOIUrl":"10.1146/annurev-biodatasci-102423-113534","url":null,"abstract":"<p><p>While the primary sequences of human proteins have been cataloged for over a decade, determining how these are organized into a dynamic collection of multiprotein assemblies, with structures and functions spanning biological scales, is an ongoing venture. Systematic and data-driven analyses of these higher-order structures are emerging, facilitating the discovery and understanding of cellular phenotypes. At present, knowledge of protein localization and function has been primarily derived from manual annotation and curation in resources such as the Gene Ontology, which are biased toward richly annotated genes in the literature. Here, we envision a future powered by data-driven mapping of protein assemblies. These maps can capture and decode cellular functions through the integration of protein expression, localization, and interaction data across length scales and timescales. In this review, we focus on progress toward constructing integrated cell maps that accelerate the life sciences and translational research.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"369-389"},"PeriodicalIF":6.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11343683/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140946150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Employing Informatics Strategies in Alzheimer's Disease Research: A Review from Genetics, Multiomics, and Biomarkers to Clinical Outcomes. 在阿尔茨海默病研究中采用信息学策略:从遗传学、多组学、生物标记物到临床结果的回顾。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-01 Epub Date: 2024-07-24 DOI: 10.1146/annurev-biodatasci-102423-121021
Jingxuan Bao, Brian N Lee, Junhao Wen, Mansu Kim, Shizhuo Mu, Shu Yang, Christos Davatzikos, Qi Long, Marylyn D Ritchie, Li Shen

Alzheimer's disease (AD) is a critical national concern, affecting 5.8 million people and costing more than $250 billion annually. However, there is no available cure. Thus, effective strategies are in urgent need to discover AD biomarkers for disease early detection and drug development. In this review, we study AD from a biomedical data scientist perspective to discuss the four fundamental components in AD research: genetics (G), molecular multiomics (M), multimodal imaging biomarkers (B), and clinical outcomes (O) (collectively referred to as the GMBO framework). We provide a comprehensive review of common statistical and informatics methodologies for each component within the GMBO framework, accompanied by the major findings from landmark AD studies. Our review highlights the potential of multimodal biobank data in addressing key challenges in AD, such as early diagnosis, disease heterogeneity, and therapeutic development. We identify major hurdles in AD research, including data scarcity and complexity, and advocate for enhanced collaboration, data harmonization, and advanced modeling techniques. This review aims to be an essential guide for understanding current biomedical data science strategies in AD research, emphasizing the need for integrated, multidisciplinary approaches to advance our understanding and management of AD.

阿尔茨海默氏症(AD)是一个全国性的重大问题,影响到 580 万人,每年造成的损失超过 2,500 亿美元。然而,目前尚无治疗方法。因此,迫切需要有效的策略来发现阿兹海默症生物标志物,以用于疾病的早期检测和药物开发。在这篇综述中,我们从生物医学数据科学家的角度研究了AD,讨论了AD研究的四个基本组成部分:遗传学(G)、分子多组学(M)、多模态成像生物标志物(B)和临床结果(O)(统称为GMBO框架)。我们全面回顾了 GMBO 框架中每个组成部分的常用统计和信息学方法,并附有具有里程碑意义的 AD 研究的主要发现。我们的综述强调了多模态生物库数据在应对 AD 关键挑战(如早期诊断、疾病异质性和治疗开发)方面的潜力。我们指出了 AD 研究中的主要障碍,包括数据稀缺性和复杂性,并倡导加强合作、统一数据和采用先进的建模技术。这篇综述旨在成为了解当前 AD 研究中生物医学数据科学策略的重要指南,强调我们需要综合、多学科的方法来促进我们对 AD 的理解和管理。
{"title":"Employing Informatics Strategies in Alzheimer's Disease Research: A Review from Genetics, Multiomics, and Biomarkers to Clinical Outcomes.","authors":"Jingxuan Bao, Brian N Lee, Junhao Wen, Mansu Kim, Shizhuo Mu, Shu Yang, Christos Davatzikos, Qi Long, Marylyn D Ritchie, Li Shen","doi":"10.1146/annurev-biodatasci-102423-121021","DOIUrl":"10.1146/annurev-biodatasci-102423-121021","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is a critical national concern, affecting 5.8 million people and costing more than $250 billion annually. However, there is no available cure. Thus, effective strategies are in urgent need to discover AD biomarkers for disease early detection and drug development. In this review, we study AD from a biomedical data scientist perspective to discuss the four fundamental components in AD research: genetics (G), molecular multiomics (M), multimodal imaging biomarkers (B), and clinical outcomes (O) (collectively referred to as the GMBO framework). We provide a comprehensive review of common statistical and informatics methodologies for each component within the GMBO framework, accompanied by the major findings from landmark AD studies. Our review highlights the potential of multimodal biobank data in addressing key challenges in AD, such as early diagnosis, disease heterogeneity, and therapeutic development. We identify major hurdles in AD research, including data scarcity and complexity, and advocate for enhanced collaboration, data harmonization, and advanced modeling techniques. This review aims to be an essential guide for understanding current biomedical data science strategies in AD research, emphasizing the need for integrated, multidisciplinary approaches to advance our understanding and management of AD.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"391-418"},"PeriodicalIF":6.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525791/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141288709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatially Resolved Single-Cell Omics: Methods, Challenges, and Future Perspectives. 空间分辨单细胞图像学:方法、挑战和未来展望》。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-01 Epub Date: 2024-07-24 DOI: 10.1146/annurev-biodatasci-102523-103640
Felipe Segato Dezem, Wani Arjumand, Hannah DuBose, Natalia Silva Morosini, Jasmine Plummer

Overlaying omics data onto spatial biological dimensions has been a promising technology to provide high-resolution insights into the interactome and cellular heterogeneity relative to the organization of the molecular microenvironment of tissue samples in normal and disease states. Spatial omics can be categorized into three major modalities: (a) next-generation sequencing-based assays, (b) imaging-based spatially resolved transcriptomics approaches including in situ hybridization/in situ sequencing, and (c) imaging-based spatial proteomics. These modalities allow assessment of transcripts and proteins at a cellular level, generating large and computationally challenging datasets. The lack of standardized computational pipelines to analyze and integrate these nonuniform structured data has made it necessary to apply artificial intelligence and machine learning strategies to best visualize and translate their complexity. In this review, we summarize the currently available techniques and computational strategies, highlight their advantages and limitations, and discuss their future prospects in the scientific field.

将全局组学数据叠加到空间生物维度上是一项前景广阔的技术,可提供对正常和疾病状态下组织样本分子微环境组织的相互作用组和细胞异质性的高分辨率洞察。空间全息技术可分为三种主要模式:(a) 基于新一代测序的检测,(b) 基于成像的空间分辨转录组学 RNA 方法,包括原位杂交/原位测序,以及 (c) 基于成像的蛋白质组学。这些方法可在细胞水平评估转录本和蛋白质,产生大量计算难度高的数据集。由于缺乏标准化的计算管道来分析和整合这些非统一结构的数据,因此有必要应用人工智能和机器学习策略来最好地可视化和转化其复杂性。在这篇综述中,我们总结了目前可用的技术和计算策略,强调了它们的优势和局限性,并讨论了它们在科学领域的未来前景。
{"title":"Spatially Resolved Single-Cell Omics: Methods, Challenges, and Future Perspectives.","authors":"Felipe Segato Dezem, Wani Arjumand, Hannah DuBose, Natalia Silva Morosini, Jasmine Plummer","doi":"10.1146/annurev-biodatasci-102523-103640","DOIUrl":"10.1146/annurev-biodatasci-102523-103640","url":null,"abstract":"<p><p>Overlaying omics data onto spatial biological dimensions has been a promising technology to provide high-resolution insights into the interactome and cellular heterogeneity relative to the organization of the molecular microenvironment of tissue samples in normal and disease states. Spatial omics can be categorized into three major modalities: (<i>a</i>) next-generation sequencing-based assays, (<i>b</i>) imaging-based spatially resolved transcriptomics approaches including in situ hybridization/in situ sequencing, and (<i>c</i>) imaging-based spatial proteomics. These modalities allow assessment of transcripts and proteins at a cellular level, generating large and computationally challenging datasets. The lack of standardized computational pipelines to analyze and integrate these nonuniform structured data has made it necessary to apply artificial intelligence and machine learning strategies to best visualize and translate their complexity. In this review, we summarize the currently available techniques and computational strategies, highlight their advantages and limitations, and discuss their future prospects in the scientific field.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"131-153"},"PeriodicalIF":6.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141071246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-Enhancing Technologies in Biomedical Data Science. 生物医学数据科学中的隐私增强技术。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-01 DOI: 10.1146/annurev-biodatasci-120423-120107
Hyunghoon Cho, David Froelicher, Natnatee Dokmai, Anupama Nandi, Shuvom Sadhuka, Matthew M Hong, Bonnie Berger

The rapidly growing scale and variety of biomedical data repositories raise important privacy concerns. Conventional frameworks for collecting and sharing human subject data offer limited privacy protection, often necessitating the creation of data silos. Privacy-enhancing technologies (PETs) promise to safeguard these data and broaden their usage by providing means to share and analyze sensitive data while protecting privacy. Here, we review prominent PETs and illustrate their role in advancing biomedicine. We describe key use cases of PETs and their latest technical advances and highlight recent applications of PETs in a range of biomedical domains. We conclude by discussing outstanding challenges and social considerations that need to be addressed to facilitate a broader adoption of PETs in biomedical data science.

生物医学数据储存库的规模和种类迅速增加,引起了人们对隐私问题的关注。收集和共享人体数据的传统框架对隐私的保护有限,往往需要建立数据孤岛。隐私增强技术(PET)有望在保护隐私的同时,通过提供共享和分析敏感数据的方法来保护这些数据并扩大其使用范围。在此,我们回顾了著名的 PET,并说明了它们在推动生物医学发展方面的作用。我们描述了 PET 的关键用例及其最新技术进展,并重点介绍了 PET 在一系列生物医学领域的最新应用。最后,我们讨论了在生物医学数据科学中更广泛地采用 PETs 所面临的挑战和需要解决的社会问题。
{"title":"Privacy-Enhancing Technologies in Biomedical Data Science.","authors":"Hyunghoon Cho, David Froelicher, Natnatee Dokmai, Anupama Nandi, Shuvom Sadhuka, Matthew M Hong, Bonnie Berger","doi":"10.1146/annurev-biodatasci-120423-120107","DOIUrl":"10.1146/annurev-biodatasci-120423-120107","url":null,"abstract":"<p><p>The rapidly growing scale and variety of biomedical data repositories raise important privacy concerns. Conventional frameworks for collecting and sharing human subject data offer limited privacy protection, often necessitating the creation of data silos. Privacy-enhancing technologies (PETs) promise to safeguard these data and broaden their usage by providing means to share and analyze sensitive data while protecting privacy. Here, we review prominent PETs and illustrate their role in advancing biomedicine. We describe key use cases of PETs and their latest technical advances and highlight recent applications of PETs in a range of biomedical domains. We conclude by discussing outstanding challenges and social considerations that need to be addressed to facilitate a broader adoption of PETs in biomedical data science.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"7 1","pages":"317-343"},"PeriodicalIF":6.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11346580/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing Artificial Intelligence in Multimodal Omics Data Integration: Paving the Path for the Next Frontier in Precision Medicine. 在多模态 Omics 数据整合中利用人工智能:为精准医学的下一个前沿领域铺平道路。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-01 Epub Date: 2024-07-24 DOI: 10.1146/annurev-biodatasci-102523-103801
Yonghyun Nam, Jaesik Kim, Sang-Hyuk Jung, Jakob Woerner, Erica H Suh, Dong-Gi Lee, Manu Shivakumar, Matthew E Lee, Dokyoon Kim

The integration of multiomics data with detailed phenotypic insights from electronic health records marks a paradigm shift in biomedical research, offering unparalleled holistic views into health and disease pathways. This review delineates the current landscape of multimodal omics data integration, emphasizing its transformative potential in generating a comprehensive understanding of complex biological systems. We explore robust methodologies for data integration, ranging from concatenation-based to transformation-based and network-based strategies, designed to harness the intricate nuances of diverse data types. Our discussion extends from incorporating large-scale population biobanks to dissecting high-dimensional omics layers at the single-cell level. The review underscores the emerging role of large language models in artificial intelligence, anticipating their influence as a near-future pivot in data integration approaches. Highlighting both achievements and hurdles, we advocate for a concerted effort toward sophisticated integration models, fortifying the foundation for groundbreaking discoveries in precision medicine.

将多组学数据与电子健康记录中的详细表型分析整合在一起,标志着生物医学研究模式的转变,为人们提供了无与伦比的健康和疾病路径的整体视角。本综述描述了多模态组学数据整合的现状,强调了其在全面了解复杂生物系统方面的变革潜力。我们探讨了强大的数据整合方法,从基于连接的方法到基于转换和基于网络的策略,旨在利用不同数据类型的复杂细微差别。我们的讨论范围从纳入大规模群体生物库到剖析单细胞水平的高维 omics 层面。这篇综述强调了大型语言模型在人工智能中的新兴作用,预计它们的影响将在不久的将来成为数据整合方法的支点。在强调成就和障碍的同时,我们主张共同努力建立复杂的整合模型,为精准医学的突破性发现奠定基础。
{"title":"Harnessing Artificial Intelligence in Multimodal Omics Data Integration: Paving the Path for the Next Frontier in Precision Medicine.","authors":"Yonghyun Nam, Jaesik Kim, Sang-Hyuk Jung, Jakob Woerner, Erica H Suh, Dong-Gi Lee, Manu Shivakumar, Matthew E Lee, Dokyoon Kim","doi":"10.1146/annurev-biodatasci-102523-103801","DOIUrl":"10.1146/annurev-biodatasci-102523-103801","url":null,"abstract":"<p><p>The integration of multiomics data with detailed phenotypic insights from electronic health records marks a paradigm shift in biomedical research, offering unparalleled holistic views into health and disease pathways. This review delineates the current landscape of multimodal omics data integration, emphasizing its transformative potential in generating a comprehensive understanding of complex biological systems. We explore robust methodologies for data integration, ranging from concatenation-based to transformation-based and network-based strategies, designed to harness the intricate nuances of diverse data types. Our discussion extends from incorporating large-scale population biobanks to dissecting high-dimensional omics layers at the single-cell level. The review underscores the emerging role of large language models in artificial intelligence, anticipating their influence as a near-future pivot in data integration approaches. Highlighting both achievements and hurdles, we advocate for a concerted effort toward sophisticated integration models, fortifying the foundation for groundbreaking discoveries in precision medicine.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"225-250"},"PeriodicalIF":6.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11972123/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141071239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping the Human Cell Surface Interactome: A Key to Decode Cell-to-Cell Communication. 绘制人类细胞表面相互作用组:解码细胞间通讯的一把钥匙
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-01 Epub Date: 2024-07-24 DOI: 10.1146/annurev-biodatasci-102523-103821
Jarrod Shilts, Gavin J Wright

Proteins on the surfaces of cells serve as physical connection points to bridge one cell with another, enabling direct communication between cells and cohesive structure. As biomedical research makes the leap from characterizing individual cells toward understanding the multicellular organization of the human body, the binding interactions between molecules on the surfaces of cells are foundational both for computational models and for clinical efforts to exploit these influential receptor pathways. To achieve this grander vision, we must assemble the full interactome of ways surface proteins can link together. This review investigates how close we are to knowing the human cell surface protein interactome. We summarize the current state of databases and systematic technologies to assemble surface protein interactomes, while highlighting substantial gaps that remain. We aim for this to serve as a road map for eventually building a more robust picture of the human cell surface protein interactome.

细胞表面的蛋白质是一个细胞与另一个细胞之间的物理连接点,可实现细胞间的直接交流和内聚结构。随着生物医学研究从描述单个细胞向了解人体的多细胞组织飞跃,细胞表面分子之间的结合相互作用对于计算模型和临床利用这些有影响力的受体通路都是至关重要的。为了实现这一更远大的愿景,我们必须汇集表面蛋白连接方式的全部相互作用组。本综述探讨了我们离了解人类细胞表面蛋白相互作用组还有多远。我们总结了用于组装表面蛋白相互作用组的数据库和系统技术的现状,同时强调了仍然存在的巨大差距。我们希望以此为路线图,最终建立一个更强大的人类细胞表面蛋白相互作用组图谱。
{"title":"Mapping the Human Cell Surface Interactome: A Key to Decode Cell-to-Cell Communication.","authors":"Jarrod Shilts, Gavin J Wright","doi":"10.1146/annurev-biodatasci-102523-103821","DOIUrl":"10.1146/annurev-biodatasci-102523-103821","url":null,"abstract":"<p><p>Proteins on the surfaces of cells serve as physical connection points to bridge one cell with another, enabling direct communication between cells and cohesive structure. As biomedical research makes the leap from characterizing individual cells toward understanding the multicellular organization of the human body, the binding interactions between molecules on the surfaces of cells are foundational both for computational models and for clinical efforts to exploit these influential receptor pathways. To achieve this grander vision, we must assemble the full interactome of ways surface proteins can link together. This review investigates how close we are to knowing the human cell surface protein interactome. We summarize the current state of databases and systematic technologies to assemble surface protein interactomes, while highlighting substantial gaps that remain. We aim for this to serve as a road map for eventually building a more robust picture of the human cell surface protein interactome.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"155-177"},"PeriodicalIF":6.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140899795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disease Trajectories from Healthcare Data: Methodologies, Key Results, and Future Perspectives. 医疗保健数据中的疾病轨迹:方法、主要成果和未来展望。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-01 DOI: 10.1146/annurev-biodatasci-110123-041001
Isabella Friis Jørgensen, Amalie Dahl Haue, Davide Placido, Jessica Xin Hjaltelin, Søren Brunak

Disease trajectories, defined as sequential, directional disease associations, have become an intense research field driven by the availability of electronic population-wide healthcare data and sufficient computational power. Here, we provide an overview of disease trajectory studies with a focus on European work, including ontologies used as well as computational methodologies for the construction of disease trajectories. We also discuss different applications of disease trajectories from descriptive risk identification to disease progression, patient stratification, and personalized predictions using machine learning. We describe challenges and opportunities in the area that eventually will benefit from initiatives such as the European Health Data Space, which, with time, will make it possible to analyze data from cohorts comprising hundreds of millions of patients.

疾病轨迹被定义为连续的、方向性的疾病关联,在全人口电子医疗数据的可用性和充足的计算能力的推动下,疾病轨迹已成为一个热门研究领域。在此,我们以欧洲的研究为重点,概述了疾病轨迹研究,包括用于构建疾病轨迹的本体论和计算方法。我们还讨论了疾病轨迹的不同应用,从描述性风险识别到疾病进展、患者分层以及使用机器学习进行个性化预测。我们描述了该领域的挑战和机遇,这些挑战和机遇最终将受益于欧洲健康数据空间(European Health Data Space)等倡议,随着时间的推移,这些倡议将使分析来自数亿患者队列的数据成为可能。
{"title":"Disease Trajectories from Healthcare Data: Methodologies, Key Results, and Future Perspectives.","authors":"Isabella Friis Jørgensen, Amalie Dahl Haue, Davide Placido, Jessica Xin Hjaltelin, Søren Brunak","doi":"10.1146/annurev-biodatasci-110123-041001","DOIUrl":"10.1146/annurev-biodatasci-110123-041001","url":null,"abstract":"<p><p>Disease trajectories, defined as sequential, directional disease associations, have become an intense research field driven by the availability of electronic population-wide healthcare data and sufficient computational power. Here, we provide an overview of disease trajectory studies with a focus on European work, including ontologies used as well as computational methodologies for the construction of disease trajectories. We also discuss different applications of disease trajectories from descriptive risk identification to disease progression, patient stratification, and personalized predictions using machine learning. We describe challenges and opportunities in the area that eventually will benefit from initiatives such as the European Health Data Space, which, with time, will make it possible to analyze data from cohorts comprising hundreds of millions of patients.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"7 1","pages":"251-276"},"PeriodicalIF":6.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Science Methods for Real-World Evidence Generation in Real-World Data. 在真实世界数据中生成证据的数据科学方法。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-01 Epub Date: 2024-07-24 DOI: 10.1146/annurev-biodatasci-102423-113220
Fang Liu

In the healthcare landscape, data science (DS) methods have emerged as indispensable tools to harness real-world data (RWD) from various data sources such as electronic health records, claim and registry data, and data gathered from digital health technologies. Real-world evidence (RWE) generated from RWD empowers researchers, clinicians, and policymakers with a more comprehensive understanding of real-world patient outcomes. Nevertheless, persistent challenges in RWD (e.g., messiness, voluminousness, heterogeneity, multimodality) and a growing awareness of the need for trustworthy and reliable RWE demand innovative, robust, and valid DS methods for analyzing RWD. In this article, I review some common current DS methods for extracting RWE and valuable insights from complex and diverse RWD. This article encompasses the entire RWE-generation pipeline, from study design with RWD to data preprocessing, exploratory analysis, methods for analyzing RWD, and trustworthiness and reliability guarantees, along with data ethics considerations and open-source tools. This review, tailored for an audience that may not be experts in DS, aspires to offer a systematic review of DS methods and assists readers in selecting suitable DS methods and enhancing the process of RWE generation for addressing their specific challenges.

在医疗保健领域,数据科学(DS)方法已成为利用来自各种数据源(如电子健康记录、索赔和登记数据以及从数字医疗技术中收集的数据)的真实世界数据(RWD)的不可或缺的工具。由真实世界数据生成的真实世界证据(RWE)使研究人员、临床医生和政策制定者能够更全面地了解真实世界中患者的治疗效果。然而,RWD 中持续存在的挑战(如杂乱性、大量性、异质性、多模态性)以及人们对可信和可靠 RWE 需求的日益增长的认识,都要求采用创新、稳健和有效的 DS 方法来分析 RWD。在本文中,我回顾了当前一些常见的从复杂多样的 RWD 中提取 RWE 和有价值见解的 DS 方法。本文涵盖了整个 RWE 生成流程,从使用 RWD 的研究设计到数据预处理、探索性分析、RWD 分析方法、可信度和可靠性保证,以及数据伦理考虑和开源工具。这篇综述是为可能不是数据挖掘专家的读者量身定制的,旨在对数据挖掘方法进行系统综述,帮助读者选择合适的数据挖掘方法,并改进 RWE 生成过程,以解决他们面临的具体挑战。
{"title":"Data Science Methods for Real-World Evidence Generation in Real-World Data.","authors":"Fang Liu","doi":"10.1146/annurev-biodatasci-102423-113220","DOIUrl":"10.1146/annurev-biodatasci-102423-113220","url":null,"abstract":"<p><p>In the healthcare landscape, data science (DS) methods have emerged as indispensable tools to harness real-world data (RWD) from various data sources such as electronic health records, claim and registry data, and data gathered from digital health technologies. Real-world evidence (RWE) generated from RWD empowers researchers, clinicians, and policymakers with a more comprehensive understanding of real-world patient outcomes. Nevertheless, persistent challenges in RWD (e.g., messiness, voluminousness, heterogeneity, multimodality) and a growing awareness of the need for trustworthy and reliable RWE demand innovative, robust, and valid DS methods for analyzing RWD. In this article, I review some common current DS methods for extracting RWE and valuable insights from complex and diverse RWD. This article encompasses the entire RWE-generation pipeline, from study design with RWD to data preprocessing, exploratory analysis, methods for analyzing RWD, and trustworthiness and reliability guarantees, along with data ethics considerations and open-source tools. This review, tailored for an audience that may not be experts in DS, aspires to offer a systematic review of DS methods and assists readers in selecting suitable DS methods and enhancing the process of RWE generation for addressing their specific challenges.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"201-224"},"PeriodicalIF":6.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140946147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Value Proposition of Coordinated Population Cohorts Across Africa. 全非洲协调人口群组的价值主张。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-01 DOI: 10.1146/annurev-biodatasci-020722-015026
Michèle Ramsay, Amelia C Crampin, Ayaga A Bawah, Evelyn Gitau, Kobus Herbst

Building longitudinal population cohorts in Africa for coordinated research and surveillance can influence the setting of national health priorities, lead to the introduction of appropriate interventions, and provide evidence for targeted treatment, leading to better health across the continent. However, compared to cohorts from the global north, longitudinal continental African population cohorts remain scarce, are relatively small in size, and lack data complexity. As infections and noncommunicable diseases disproportionately affect Africa's approximately 1.4 billion inhabitants, African cohorts present a unique opportunity for research and surveillance. High genetic diversity in African populations and multiomic research studies, together with detailed phenotyping and clinical profiling, will be a treasure trove for discovery. The outcomes, including novel drug targets, biological pathways for disease, and gene-environment interactions, will boost precision medicine approaches, not only in Africa but across the globe.

在非洲建立用于协调研究和监测的纵向人口队列,可以影响国家卫生优先事项的制定,促使采取适当的干预措施,并为有针对性的治疗提供证据,从而改善整个非洲大陆的健康状况。然而,与全球北方的队列相比,非洲大陆的纵向人口队列仍然很少,规模相对较小,而且缺乏数据的复杂性。由于感染和非传染性疾病对非洲约 14 亿居民的影响尤为严重,非洲队列为研究和监测提供了一个独特的机会。非洲人口的遗传多样性很高,多基因组研究以及详细的表型和临床分析将成为发现疾病的宝库。这些成果,包括新的药物靶点、疾病的生物学途径以及基因与环境的相互作用,将不仅在非洲,而且在全球范围内促进精准医疗方法的发展。
{"title":"The Value Proposition of Coordinated Population Cohorts Across Africa.","authors":"Michèle Ramsay, Amelia C Crampin, Ayaga A Bawah, Evelyn Gitau, Kobus Herbst","doi":"10.1146/annurev-biodatasci-020722-015026","DOIUrl":"10.1146/annurev-biodatasci-020722-015026","url":null,"abstract":"<p><p>Building longitudinal population cohorts in Africa for coordinated research and surveillance can influence the setting of national health priorities, lead to the introduction of appropriate interventions, and provide evidence for targeted treatment, leading to better health across the continent. However, compared to cohorts from the global north, longitudinal continental African population cohorts remain scarce, are relatively small in size, and lack data complexity. As infections and noncommunicable diseases disproportionately affect Africa's approximately 1.4 billion inhabitants, African cohorts present a unique opportunity for research and surveillance. High genetic diversity in African populations and multiomic research studies, together with detailed phenotyping and clinical profiling, will be a treasure trove for discovery. The outcomes, including novel drug targets, biological pathways for disease, and gene-environment interactions, will boost precision medicine approaches, not only in Africa but across the globe.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"7 1","pages":"277-294"},"PeriodicalIF":6.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7618365/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Artificial Intelligence in Medicine. 图谱人工智能在医学中的应用。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-01 Epub Date: 2024-07-24 DOI: 10.1146/annurev-biodatasci-110723-024625
Ruth Johnson, Michelle M Li, Ayush Noori, Owen Queen, Marinka Zitnik

In clinical artificial intelligence (AI), graph representation learning, mainly through graph neural networks and graph transformer architectures, stands out for its capability to capture intricate relationships and structures within clinical datasets. With diverse data-from patient records to imaging-graph AI models process data holistically by viewing modalities and entities within them as nodes interconnected by their relationships. Graph AI facilitates model transfer across clinical tasks, enabling models to generalize across patient populations without additional parameters and with minimal to no retraining. However, the importance of human-centered design and model interpretability in clinical decision-making cannot be overstated. Since graph AI models capture information through localized neural transformations defined on relational datasets, they offer both an opportunity and a challenge in elucidating model rationale. Knowledge graphs can enhance interpretability by aligning model-driven insights with medical knowledge. Emerging graph AI models integrate diverse data modalities through pretraining, facilitate interactive feedback loops, and foster human-AI collaboration, paving the way toward clinically meaningful predictions.

在临床人工智能(AI)领域,主要通过图神经网络和图转换器架构进行的图表示学习,因其能够捕捉临床数据集中错综复杂的关系和结构而脱颖而出。对于从病人记录到成像的各种数据,图人工智能模型通过将模式和其中的实体视为由其关系相互连接的节点,从而全面地处理数据。图谱人工智能促进了模型在临床任务中的转移,使模型能够在患者群体中推广,而无需额外参数,并且只需极少甚至无需重新训练。然而,在临床决策中,以人为本的设计和模型可解释性的重要性怎么强调都不为过。由于图人工智能模型是通过定义在关系数据集上的局部神经变换来捕捉信息的,因此在阐明模型原理方面既是机遇也是挑战。知识图谱可以将模型驱动的见解与医学知识相结合,从而提高可解释性。新兴的图人工智能模型通过预训练整合了多种数据模式,促进了交互式反馈循环,并促进了人类与人工智能的合作,为实现有临床意义的预测铺平了道路。
{"title":"Graph Artificial Intelligence in Medicine.","authors":"Ruth Johnson, Michelle M Li, Ayush Noori, Owen Queen, Marinka Zitnik","doi":"10.1146/annurev-biodatasci-110723-024625","DOIUrl":"10.1146/annurev-biodatasci-110723-024625","url":null,"abstract":"<p><p>In clinical artificial intelligence (AI), graph representation learning, mainly through graph neural networks and graph transformer architectures, stands out for its capability to capture intricate relationships and structures within clinical datasets. With diverse data-from patient records to imaging-graph AI models process data holistically by viewing modalities and entities within them as nodes interconnected by their relationships. Graph AI facilitates model transfer across clinical tasks, enabling models to generalize across patient populations without additional parameters and with minimal to no retraining. However, the importance of human-centered design and model interpretability in clinical decision-making cannot be overstated. Since graph AI models capture information through localized neural transformations defined on relational datasets, they offer both an opportunity and a challenge in elucidating model rationale. Knowledge graphs can enhance interpretability by aligning model-driven insights with medical knowledge. Emerging graph AI models integrate diverse data modalities through pretraining, facilitate interactive feedback loops, and foster human-AI collaboration, paving the way toward clinically meaningful predictions.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"345-368"},"PeriodicalIF":6.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11344018/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140946148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annual Review of Biomedical Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1