首页 > 最新文献

Annual Review of Biomedical Data Science最新文献

英文 中文
Large-Scale Analysis of Genetic and Clinical Patient Data 遗传和临床患者数据的大规模分析
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013508
M. Ritchie
Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.
生物医学数据科学在过去十年中经历了新数据的爆炸式增长。由于现代分子技术的成熟,大量的遗传和基因组数据越来越多地出现在大型、多样化的数据集中。除了这些分子数据外,还可以从卫生保健提供者组织、临床试验、人口健康登记和流行病学研究的综合临床数据集中获得密集、丰富的表型数据。随着我们对这些问题和挑战的理解不断出现,研究这些大型遗传/基因组和临床数据集的方法和方法也在迅速发展。在这篇综述中,最新的遗传/基因组分析方法以及复杂的表型组学将被讨论。这个领域正在改变和适应新的数据类型,以及计算和机器学习方面的技术进步。因此,我也将讨论这个令人兴奋和创新的领域未来的挑战。精准医疗的前景在很大程度上依赖于以有意义的方式将复杂的遗传/基因组数据与临床表型结合起来的能力。
{"title":"Large-Scale Analysis of Genetic and Clinical Patient Data","authors":"M. Ritchie","doi":"10.1146/ANNUREV-BIODATASCI-080917-013508","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013508","url":null,"abstract":"Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013508","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46186041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture 从组织到细胞类型再返回:组织结构的单细胞基因表达分析
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013452
Xi Chen, S. Teichmann, K. Meyer
With the recent transformative developments in single-cell genomics and, in particular, single-cell gene expression analysis, it is now possible to study tissues at the single-cell level, rather than having to rely on data from bulk measurements. Here we review the rapid developments in single-cell RNA sequencing (scRNA-seq) protocols that have the potential for unbiased identification and profiling of all cell types within a tissue or organism. In addition, novel approaches for spatial profiling of gene expression allow us to map individual cells and cell types back into the three-dimensional context of organs. The combination of in-depth single-cell and spatial gene expression data will reveal tissue architecture in unprecedented detail, generating a wealth of biological knowledge and a better understanding of many diseases.
随着最近单细胞基因组学,特别是单细胞基因表达分析的革命性发展,现在可以在单细胞水平上研究组织,而不必依赖于大量测量的数据。在这里,我们回顾了单细胞RNA测序(scRNA-seq)方案的快速发展,这些方案具有对组织或生物体内所有细胞类型进行无偏鉴定和分析的潜力。此外,基因表达空间谱的新方法使我们能够将单个细胞和细胞类型映射回器官的三维背景。深入的单细胞和空间基因表达数据的结合将以前所未有的细节揭示组织结构,产生丰富的生物学知识并更好地了解许多疾病。
{"title":"From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture","authors":"Xi Chen, S. Teichmann, K. Meyer","doi":"10.1146/ANNUREV-BIODATASCI-080917-013452","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013452","url":null,"abstract":"With the recent transformative developments in single-cell genomics and, in particular, single-cell gene expression analysis, it is now possible to study tissues at the single-cell level, rather than having to rely on data from bulk measurements. Here we review the rapid developments in single-cell RNA sequencing (scRNA-seq) protocols that have the potential for unbiased identification and profiling of all cell types within a tissue or organism. In addition, novel approaches for spatial profiling of gene expression allow us to map individual cells and cell types back into the three-dimensional context of organs. The combination of in-depth single-cell and spatial gene expression data will reveal tissue architecture in unprecedented detail, generating a wealth of biological knowledge and a better understanding of many diseases.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013452","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48410668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Deep Learning in Biomedical Data Science 生物医学数据科学中的深度学习
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013343
P. Baldi
Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.
自20世纪80年代以来,深度学习和生物医学数据一直在共同发展,相互促进。生物医学数据的广度、复杂性和迅速扩大的规模刺激了新型深度学习方法的发展,将这些方法应用于生物医学数据导致了科学发现和实际解决方案。本综述提供了该领域的技术和历史指针,并调查了当前深度学习在生物医学数据中的应用,这些数据组织在五个子领域,大致是越来越大的空间尺度:化学信息学、蛋白质组学、基因组学和转录组学、生物医学成像和医疗保健。本文还简要讨论了深度学习方法中的黑箱问题。
{"title":"Deep Learning in Biomedical Data Science","authors":"P. Baldi","doi":"10.1146/ANNUREV-BIODATASCI-080917-013343","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013343","url":null,"abstract":"Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013343","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42925605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Network Analysis as a Grand Unifier in Biomedical Data Science 网络分析是生物医学数据科学的一大统一体
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013444
Patrick D. McGillivray, Declan Clarke, W. Meyerson, Jing Zhang, Donghoon Lee, Mengting Gu, Sushant Kumar, Holly Zhou, M. Gerstein
Biomedical data scientists study many types of networks, ranging from those formed by neurons to those created by molecular interactions. People often criticize these networks as uninterpretable diagrams termed hairballs; however, here we show that molecular biological networks can be interpreted in several straightforward ways. First, we can break down a network into smaller components, focusing on individual pathways and modules. Second, we can compute global statistics describing the network as a whole. Third, we can compare networks. These comparisons can be within the same context (e.g., between two gene regulatory networks) or cross-disciplinary (e.g., between regulatory networks and governmental hierarchies). The latter comparisons can transfer a formalism, such as that for Markov chains, from one context to another or relate our intuitions in a familiar setting (e.g., social networks) to the relatively unfamiliar molecular context. Finally, key aspects of molecular networks are dynamics and evolution, i.e., how they evolve over time and how genetic variants affect them. By studying the relationships between variants in networks, we can begin to interpret many common diseases, such as cancer and heart disease.
生物医学数据科学家研究了许多类型的网络,从神经元形成的网络到分子相互作用产生的网络。人们经常批评这些网络是被称为毛球的难以理解的图表;然而,在这里我们展示了分子生物学网络可以用几种简单的方式来解释。首先,我们可以将网络分解为更小的组件,重点关注单个路径和模块。其次,我们可以计算将网络描述为一个整体的全局统计数据。第三,我们可以比较网络。这些比较可以在相同的背景下(例如,在两个基因调控网络之间)或跨学科(例如,监管网络和政府层级之间)。后一种比较可以将形式主义(如马尔可夫链)从一个上下文转移到另一个上下文,或者将我们在熟悉环境(如社交网络)中的直觉与相对陌生的分子上下文联系起来。最后,分子网络的关键方面是动力学和进化,即它们如何随着时间的推移进化,以及遗传变异如何影响它们。通过研究网络中变异之间的关系,我们可以开始解释许多常见疾病,如癌症和心脏病。
{"title":"Network Analysis as a Grand Unifier in Biomedical Data Science","authors":"Patrick D. McGillivray, Declan Clarke, W. Meyerson, Jing Zhang, Donghoon Lee, Mengting Gu, Sushant Kumar, Holly Zhou, M. Gerstein","doi":"10.1146/ANNUREV-BIODATASCI-080917-013444","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013444","url":null,"abstract":"Biomedical data scientists study many types of networks, ranging from those formed by neurons to those created by molecular interactions. People often criticize these networks as uninterpretable diagrams termed hairballs; however, here we show that molecular biological networks can be interpreted in several straightforward ways. First, we can break down a network into smaller components, focusing on individual pathways and modules. Second, we can compute global statistics describing the network as a whole. Third, we can compare networks. These comparisons can be within the same context (e.g., between two gene regulatory networks) or cross-disciplinary (e.g., between regulatory networks and governmental hierarchies). The latter comparisons can transfer a formalism, such as that for Markov chains, from one context to another or relate our intuitions in a familiar setting (e.g., social networks) to the relatively unfamiliar molecular context. Finally, key aspects of molecular networks are dynamics and evolution, i.e., how they evolve over time and how genetic variants affect them. By studying the relationships between variants in networks, we can begin to interpret many common diseases, such as cancer and heart disease.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013444","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49037025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
A Census of Disease Ontologies 疾病本体论普查
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013459
M. Haendel, J. McMurry, R. Relevo, C. Mungall, P. Robinson, C. Chute
For centuries, humans have sought to classify diseases based on phenotypic presentation and available treatments. Today, a wide landscape of strategies, resources, and tools exist to classify patients and diseases. Ontologies can provide a robust foundation of logic for precise stratification and classification along diverse axes such as etiology, development, treatment, and genetics. Disease and phenotype ontologies are used in four primary ways: ( a) search, retrieval, and annotation of knowledge; ( b) data integration and analysis; ( c) clinical decision support; and ( d) knowledge discovery. Computational inference can connect existing knowledge and generate new insights and hypotheses about drug targets, prognosis prediction, or diagnosis. In this review, we examine the rise of disease and phenotype ontologies and the diverse ways they are represented and applied in biomedicine.
几个世纪以来,人类一直试图根据表型表现和可用的治疗方法对疾病进行分类。今天,存在着广泛的策略、资源和工具来对患者和疾病进行分类。本体论可以为沿着病因、发展、治疗和遗传学等不同轴线进行精确的分层和分类提供坚实的逻辑基础。疾病和表型本体论主要有四种使用方式:(a)知识的搜索、检索和注释;(b)数据整合和分析;(c)临床决策支持;以及(d)知识发现。计算推理可以连接现有知识,并产生关于药物靶点、预后预测或诊断的新见解和假设。在这篇综述中,我们研究了疾病和表型本体论的兴起,以及它们在生物医学中的不同表现和应用方式。
{"title":"A Census of Disease Ontologies","authors":"M. Haendel, J. McMurry, R. Relevo, C. Mungall, P. Robinson, C. Chute","doi":"10.1146/ANNUREV-BIODATASCI-080917-013459","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013459","url":null,"abstract":"For centuries, humans have sought to classify diseases based on phenotypic presentation and available treatments. Today, a wide landscape of strategies, resources, and tools exist to classify patients and diseases. Ontologies can provide a robust foundation of logic for precise stratification and classification along diverse axes such as etiology, development, treatment, and genetics. Disease and phenotype ontologies are used in four primary ways: ( a) search, retrieval, and annotation of knowledge; ( b) data integration and analysis; ( c) clinical decision support; and ( d) knowledge discovery. Computational inference can connect existing knowledge and generate new insights and hypotheses about drug targets, prognosis prediction, or diagnosis. In this review, we examine the rise of disease and phenotype ontologies and the diverse ways they are represented and applied in biomedicine.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013459","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49330122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Data Science Issues in Studying Protein-RNA Interactions with CLIP Technologies. 利用 CLIP 技术研究蛋白质-RNA 相互作用的数据科学问题。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/annurev-biodatasci-080917-013525
Anob M Chakrabarti, Nejc Haberman, Arne Praznik, Nicholas M Luscombe, Jernej Ule

An interplay of experimental and computational methods is required to achieve a comprehensive understanding of protein-RNA interactions. UV crosslinking and immunoprecipitation (CLIP) identifies endogenous interactions by sequencing RNA fragments that copurify with a selected RNA-binding protein under stringent conditions. Here we focus on approaches for the analysis of the resulting data and appraise the methods for peak calling, visualization, analysis, and computational modeling of protein-RNA binding sites. We advocate that the sensitivity and specificity of data be assessed in combination for computational quality control. Moreover, we demonstrate the value of analyzing sequence motif enrichment in peaks assigned from CLIP data and of visualizing RNA maps, which examine the positional distribution of peaks around regulated landmarks in transcripts. We use these to assess how variations in CLIP data quality and in different peak calling methods affect the insights into regulatory mechanisms. We conclude by discussing future opportunities for the computational analysis of protein-RNA interaction experiments.

要全面了解蛋白质与 RNA 的相互作用,需要实验和计算方法的相互作用。紫外交联和免疫沉淀(CLIP)通过对在严格条件下与所选 RNA 结合蛋白共聚的 RNA 片段进行测序,来确定内源相互作用。在此,我们重点介绍分析所得数据的方法,并对蛋白质-RNA 结合位点的峰值调用、可视化、分析和计算建模方法进行评估。我们主张结合评估数据的灵敏度和特异性来进行计算质量控制。此外,我们还展示了分析根据 CLIP 数据分配的峰中序列主题富集的价值,以及 RNA 地图可视化的价值,RNA 地图可检查转录本中调控地标周围峰的位置分布。我们利用这些方法来评估 CLIP 数据质量和不同峰值调用方法的变化如何影响对调控机制的洞察。最后,我们讨论了对蛋白质-RNA 相互作用实验进行计算分析的未来机遇。
{"title":"Data Science Issues in Studying Protein-RNA Interactions with CLIP Technologies.","authors":"Anob M Chakrabarti, Nejc Haberman, Arne Praznik, Nicholas M Luscombe, Jernej Ule","doi":"10.1146/annurev-biodatasci-080917-013525","DOIUrl":"10.1146/annurev-biodatasci-080917-013525","url":null,"abstract":"<p><p>An interplay of experimental and computational methods is required to achieve a comprehensive understanding of protein-RNA interactions. UV crosslinking and immunoprecipitation (CLIP) identifies endogenous interactions by sequencing RNA fragments that copurify with a selected RNA-binding protein under stringent conditions. Here we focus on approaches for the analysis of the resulting data and appraise the methods for peak calling, visualization, analysis, and computational modeling of protein-RNA binding sites. We advocate that the sensitivity and specificity of data be assessed in combination for computational quality control. Moreover, we demonstrate the value of analyzing sequence motif enrichment in peaks assigned from CLIP data and of visualizing RNA maps, which examine the positional distribution of peaks around regulated landmarks in transcripts. We use these to assess how variations in CLIP data quality and in different peak calling methods affect the insights into regulatory mechanisms. We conclude by discussing future opportunities for the computational analysis of protein-RNA interaction experiments.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 1","pages":"235-261"},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614488/pdf/EMS174063.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9404672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Big Data Approaches for Modeling Response and Resistance to Cancer Drugs. 癌症药物反应和耐药性建模的大数据方法。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013350
Peng Jiang, W. Sellers, X. S. Liu
Despite significant progress in cancer research, current standard-of-care drugs fail to cure many types of cancers. Hence, there is an urgent need to identify better predictive biomarkers and treatment regimes. Conventionally, insights from hypothesis-driven studies are the primary force for cancer biology and therapeutic discoveries. Recently, the rapid growth of big data resources, catalyzed by breakthroughs in high-throughput technologies, has resulted in a paradigm shift in cancer therapeutic research. The combination of computational methods and genomics data has led to several successful clinical applications. In this review, we focus on recent advances in data-driven methods to model anticancer drug efficacy, and we present the challenges and opportunities for data science in cancer therapeutic research.
尽管癌症研究取得了重大进展,但目前的标准治疗药物无法治愈许多类型的癌症。因此,迫切需要确定更好的预测性生物标志物和治疗方案。传统上,来自假设驱动的研究的见解是癌症生物学和治疗发现的主要力量。最近,在高通量技术突破的催化下,大数据资源的快速增长导致了癌症治疗研究的范式转变。计算方法和基因组学数据的结合已经导致了一些成功的临床应用。在这篇综述中,我们重点介绍了数据驱动的抗癌药物疗效建模方法的最新进展,并介绍了数据科学在癌症治疗研究中的挑战和机遇。
{"title":"Big Data Approaches for Modeling Response and Resistance to Cancer Drugs.","authors":"Peng Jiang, W. Sellers, X. S. Liu","doi":"10.1146/ANNUREV-BIODATASCI-080917-013350","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013350","url":null,"abstract":"Despite significant progress in cancer research, current standard-of-care drugs fail to cure many types of cancers. Hence, there is an urgent need to identify better predictive biomarkers and treatment regimes. Conventionally, insights from hypothesis-driven studies are the primary force for cancer biology and therapeutic discoveries. Recently, the rapid growth of big data resources, catalyzed by breakthroughs in high-throughput technologies, has resulted in a paradigm shift in cancer therapeutic research. The combination of computational methods and genomics data has led to several successful clinical applications. In this review, we focus on recent advances in data-driven methods to model anticancer drug efficacy, and we present the challenges and opportunities for data science in cancer therapeutic research.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 1","pages":"1-27"},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013350","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46709281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
What is Biomedical Data Science and Do We Need an Annual Review of It? 什么是生物医学数据科学?我们需要对其进行年度审查吗?
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BD-01-041718-100001
R. Altman, M. Levitt
We are pleased to bring you the first volume of the Annual Review of Biomedical Data Science. It spans a range of biological and medical research challenges that are data intensive and focused on the creation of novel methodologies to advance biomedical science discovery. The term “data science” describes expertise associated with taking (usually large) data sets and annotating, cleaning, organizing, storing, and analyzing them for the purposes of extracting knowledge. It merges the disciplines of statistics, computer science, and computational engineering. Many are irritated by the term—all of science depends ultimately on data, and many of the activities listed above sound like engineering (which is about solving problems) and not science (which is about discovery of new knowledge). If “data science” is not about science and the adjective “data” has no particular meaning, why does this term exist? Indeed, the allied fields of informatics have existed for several decades in many forms—medical informatics, clinical informatics, health informatics, bioinformatics, and biomedical informatics—and variants all refer to the development of methods to analyze data, information, and knowledge within the space of biology and medicine. Practitioners of these fields are quick to point out that most if not all of data science falls within the purview of informatics. Informatics is a broad field that includes the social aspects of interacting with data, information, and knowledge; the challenges of human–computer interfaces; and the issues associated with introducing disruptive new computational interventions into systems (like hospitals and laboratories) with existing workflows. So why is the introduction of a new name for the field necessary? The term “data science” has gained recognition, and the widespread comfort with it suggests it serves a useful purpose. Here we offer some observations on the diverse use of the moniker for many activities:
我们很高兴为您带来《生物医学数据科学年度评论》的第一卷。它涵盖了一系列生物和医学研究挑战,这些挑战是数据密集型的,重点是创造新的方法来推进生物医学科学的发现。“数据科学”一词描述了与获取(通常是大型)数据集以及注释、清理、组织、存储和分析数据集以提取知识相关的专业知识。它融合了统计学、计算机科学和计算工程的学科。许多人对这个词感到恼火——所有的科学最终都取决于数据,上面列出的许多活动听起来像是工程(关于解决问题),而不是科学(关于发现新知识)。如果“数据科学”不是关于科学的,而形容词“数据”没有特别的含义,为什么这个词会存在?事实上,信息学的相关领域已经以多种形式存在了几十年——医学信息学、临床信息学、健康信息学、生物信息学和生物医学信息学——而变体都指的是在生物学和医学领域内分析数据、信息和知识的方法的发展。这些领域的从业者很快指出,如果不是全部的话,大多数数据科学都属于信息学的范畴。信息学是一个广泛的领域,包括与数据、信息和知识互动的社会方面;人机界面的挑战;以及将破坏性的新计算干预引入具有现有工作流程的系统(如医院和实验室)的相关问题。那么,为什么有必要为该领域引入一个新名称呢?“数据科学”一词已经得到了认可,人们对它的普遍认同表明它有着有用的用途。在这里,我们对这个名字在许多活动中的不同使用提出了一些看法:
{"title":"What is Biomedical Data Science and Do We Need an Annual Review of It?","authors":"R. Altman, M. Levitt","doi":"10.1146/ANNUREV-BD-01-041718-100001","DOIUrl":"https://doi.org/10.1146/ANNUREV-BD-01-041718-100001","url":null,"abstract":"We are pleased to bring you the first volume of the Annual Review of Biomedical Data Science. It spans a range of biological and medical research challenges that are data intensive and focused on the creation of novel methodologies to advance biomedical science discovery. The term “data science” describes expertise associated with taking (usually large) data sets and annotating, cleaning, organizing, storing, and analyzing them for the purposes of extracting knowledge. It merges the disciplines of statistics, computer science, and computational engineering. Many are irritated by the term—all of science depends ultimately on data, and many of the activities listed above sound like engineering (which is about solving problems) and not science (which is about discovery of new knowledge). If “data science” is not about science and the adjective “data” has no particular meaning, why does this term exist? Indeed, the allied fields of informatics have existed for several decades in many forms—medical informatics, clinical informatics, health informatics, bioinformatics, and biomedical informatics—and variants all refer to the development of methods to analyze data, information, and knowledge within the space of biology and medicine. Practitioners of these fields are quick to point out that most if not all of data science falls within the purview of informatics. Informatics is a broad field that includes the social aspects of interacting with data, information, and knowledge; the challenges of human–computer interfaces; and the issues associated with introducing disruptive new computational interventions into systems (like hospitals and laboratories) with existing workflows. So why is the introduction of a new name for the field necessary? The term “data science” has gained recognition, and the widespread comfort with it suggests it serves a useful purpose. Here we offer some observations on the diverse use of the moniker for many activities:","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BD-01-041718-100001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41873602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Alignment-Free Sequence Analysis and Applications. 无配位序列分析及应用。
IF 7 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-01 Epub Date: 2018-04-25 DOI: 10.1146/annurev-biodatasci-080917-013431
Jie Ren, Xin Bai, Yang Young Lu, Kujin Tang, Ying Wang, Gesine Reinert, Fengzhu Sun

Genome and metagenome comparisons based on large amounts of next generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads. Alignment-free approaches based on the counts of word patterns in NGS data do not depend on the complete genome and are generally computationally efficient. Thus, they contribute significantly to genome and metagenome comparison. Recently, novel statistical approaches have been developed for the comparison of both long and shotgun sequences. These approaches have been applied to many problems including the comparison of gene regulatory regions, genome sequences, metagenomes, binning contigs in metagenomic data, identification of virus-host interactions, and detection of horizontal gene transfers. We provide an updated review of these applications and other related developments of word-count based approaches for alignment-free sequence analysis.

基于大量新一代测序(NGS)数据的基因组和元基因组比较对基于比对的方法提出了巨大挑战,因为数据量巨大,读数长度相对较短。基于 NGS 数据中字模式计数的无比对方法不依赖于完整的基因组,通常计算效率较高。因此,它们对基因组和元基因组比较有很大的帮助。最近,人们开发了新的统计方法来比较长序列和霰弹枪序列。这些方法已被应用于许多问题,包括基因调控区、基因组序列、元基因组的比较,元基因组数据中等位基因的分选,病毒-宿主相互作用的鉴定,以及水平基因转移的检测。我们将对这些应用以及基于字数的无比对序列分析方法的其他相关发展进行最新综述。
{"title":"Alignment-Free Sequence Analysis and Applications.","authors":"Jie Ren, Xin Bai, Yang Young Lu, Kujin Tang, Ying Wang, Gesine Reinert, Fengzhu Sun","doi":"10.1146/annurev-biodatasci-080917-013431","DOIUrl":"10.1146/annurev-biodatasci-080917-013431","url":null,"abstract":"<p><p>Genome and metagenome comparisons based on large amounts of next generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads. Alignment-free approaches based on the counts of word patterns in NGS data do not depend on the complete genome and are generally computationally efficient. Thus, they contribute significantly to genome and metagenome comparison. Recently, novel statistical approaches have been developed for the comparison of both long and shotgun sequences. These approaches have been applied to many problems including the comparison of gene regulatory regions, genome sequences, metagenomes, binning contigs in metagenomic data, identification of virus-host interactions, and detection of horizontal gene transfers. We provide an updated review of these applications and other related developments of word-count based approaches for alignment-free sequence analysis.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 ","pages":"93-114"},"PeriodicalIF":7.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6905628/pdf/nihms-1016592.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37450115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models. 电子表型研究进展:从基于规则的定义到机器学习模型。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-01 Epub Date: 2018-05-23 DOI: 10.1146/annurev-biodatasci-080917-013315
Juan M Banda, Martin Seneviratne, Tina Hernandez-Boussard, Nigam H Shah

With the widespread adoption of electronic health records (EHRs), large repositories of structured and unstructured patient data are becoming available to conduct observational studies. Finding patients with specific conditions or outcomes, known as phenotyping, is one of the most fundamental research problems encountered when using these new EHR data. Phenotyping forms the basis of translational research, comparative effectiveness studies, clinical decision support, and population health analyses using routinely collected EHR data. We review the evolution of electronic phenotyping, from the early rule-based methods to the cutting edge of supervised and unsupervised machine learning models. We aim to cover the most influential papers in commensurate detail, with a focus on both methodology and implementation. Finally, future research directions are explored.

随着电子健康记录(EHR)的广泛采用,结构化和非结构化患者数据的大型存储库正可用于进行观察性研究。在使用这些新的EHR数据时,发现具有特定条件或结果的患者,即表型,是遇到的最基本的研究问题之一。表型是转化研究、比较有效性研究、临床决策支持和使用常规收集的EHR数据进行人群健康分析的基础。我们回顾了电子表型的演变,从早期的基于规则的方法到有监督和无监督机器学习模型的前沿。我们的目标是详细报道最具影响力的文件,重点关注方法和执行。最后,对未来的研究方向进行了探讨。
{"title":"Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models.","authors":"Juan M Banda,&nbsp;Martin Seneviratne,&nbsp;Tina Hernandez-Boussard,&nbsp;Nigam H Shah","doi":"10.1146/annurev-biodatasci-080917-013315","DOIUrl":"10.1146/annurev-biodatasci-080917-013315","url":null,"abstract":"<p><p>With the widespread adoption of electronic health records (EHRs), large repositories of structured and unstructured patient data are becoming available to conduct observational studies. Finding patients with specific conditions or outcomes, known as phenotyping, is one of the most fundamental research problems encountered when using these new EHR data. Phenotyping forms the basis of translational research, comparative effectiveness studies, clinical decision support, and population health analyses using routinely collected EHR data. We review the evolution of electronic phenotyping, from the early rule-based methods to the cutting edge of supervised and unsupervised machine learning models. We aim to cover the most influential papers in commensurate detail, with a focus on both methodology and implementation. Finally, future research directions are explored.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 ","pages":"53-68"},"PeriodicalIF":6.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/annurev-biodatasci-080917-013315","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37072036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 125
期刊
Annual Review of Biomedical Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1