首页 > 最新文献

Annual Review of Biomedical Data Science最新文献

英文 中文
Visualization of Biomedical Data 生物医学数据的可视化
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013424
S. O’Donoghue, B. Baldi, S. Clark, A. Darling, J. Hogan, Sandeep Kaur, L. Maier-Hein, Davis J. McCarthy, W. Moore, Esther Stenau, J. Swedlow, Jenny Vuong, J. Procter
The rapid increase in volume and complexity of biomedical data requires changes in research, communication, and clinical practices. This includes learning how to effectively integrate automated analysis with high–data density visualizations that clearly express complex phenomena. In this review, we summarize key principles and resources from data visualization research that help address this difficult challenge. We then survey how visualization is being used in a selection of emerging biomedical research areas, including three-dimensional genomics, single-cell RNA sequencing (RNA-seq), the protein structure universe, phosphoproteomics, augmented reality–assisted surgery, and metagenomics. While specific research areas need highly tailored visualizations, there are common challenges that can be addressed with general methods and strategies. Also common, however, are poor visualization practices. We outline ongoing initiatives aimed at improving visualization practices in biomedical research via better tools, peer-to-peer learning, and interdisciplinary collaboration with computer scientists, science communicators, and graphic designers. These changes are revolutionizing how we see and think about our data.
生物医学数据的数量和复杂性的迅速增加要求在研究、交流和临床实践方面做出改变。这包括学习如何有效地将自动化分析与清晰表达复杂现象的高数据密度可视化相集成。在这篇综述中,我们总结了数据可视化研究的关键原则和资源,以帮助解决这一困难的挑战。然后,我们调查了可视化在新兴生物医学研究领域的应用情况,包括三维基因组学、单细胞RNA测序(RNA-seq)、蛋白质结构领域、磷蛋白质组学、增强现实辅助手术和宏基因组学。虽然特定的研究领域需要高度定制的可视化,但有一些共同的挑战可以用一般的方法和策略来解决。然而,同样常见的是糟糕的可视化实践。我们概述了正在进行的旨在通过更好的工具、点对点学习以及与计算机科学家、科学传播者和平面设计师的跨学科合作来改善生物医学研究中可视化实践的举措。这些变化正在彻底改变我们看待和思考数据的方式。
{"title":"Visualization of Biomedical Data","authors":"S. O’Donoghue, B. Baldi, S. Clark, A. Darling, J. Hogan, Sandeep Kaur, L. Maier-Hein, Davis J. McCarthy, W. Moore, Esther Stenau, J. Swedlow, Jenny Vuong, J. Procter","doi":"10.1146/ANNUREV-BIODATASCI-080917-013424","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013424","url":null,"abstract":"The rapid increase in volume and complexity of biomedical data requires changes in research, communication, and clinical practices. This includes learning how to effectively integrate automated analysis with high–data density visualizations that clearly express complex phenomena. In this review, we summarize key principles and resources from data visualization research that help address this difficult challenge. We then survey how visualization is being used in a selection of emerging biomedical research areas, including three-dimensional genomics, single-cell RNA sequencing (RNA-seq), the protein structure universe, phosphoproteomics, augmented reality–assisted surgery, and metagenomics. While specific research areas need highly tailored visualizations, there are common challenges that can be addressed with general methods and strategies. Also common, however, are poor visualization practices. We outline ongoing initiatives aimed at improving visualization practices in biomedical research via better tools, peer-to-peer learning, and interdisciplinary collaboration with computer scientists, science communicators, and graphic designers. These changes are revolutionizing how we see and think about our data.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013424","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48064895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data 理解基于质谱的霰弹枪蛋白质组学数据的计算方法
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013516
Pavel Sinitcyn, J. Rudolph, J. Cox
Computational proteomics is the data science concerned with the identification and quantification of proteins from high-throughput data and the biological interpretation of their concentration changes, posttranslational modifications, interactions, and subcellular localizations. Today, these data most often originate from mass spectrometry–based shotgun proteomics experiments. In this review, we survey computational methods for the analysis of such proteomics data, focusing on the explanation of the key concepts. Starting with mass spectrometric feature detection, we then cover methods for the identification of peptides. Subsequently, protein inference and the control of false discovery rates are highly important topics covered. We then discuss methods for the quantification of peptides and proteins. A section on downstream data analysis covers exploratory statistics, network analysis, machine learning, and multiomics data integration. Finally, we discuss current developments and provide an outlook on what the near future of computational proteomics might bear.
计算蛋白质组学是一门数据科学,涉及从高通量数据中识别和定量蛋白质,以及对其浓度变化、翻译后修饰、相互作用和亚细胞定位的生物学解释。如今,这些数据通常来源于基于质谱的鸟枪蛋白质组学实验。在这篇综述中,我们综述了分析此类蛋白质组学数据的计算方法,重点是对关键概念的解释。从质谱特征检测开始,我们介绍了肽的鉴定方法。随后,蛋白质推断和错误发现率的控制是非常重要的主题。然后我们讨论了肽和蛋白质的定量方法。下游数据分析部分涵盖探索性统计、网络分析、机器学习和多组学数据集成。最后,我们讨论了当前的发展,并对计算蛋白质组学在不久的将来可能产生的影响进行了展望。
{"title":"Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data","authors":"Pavel Sinitcyn, J. Rudolph, J. Cox","doi":"10.1146/ANNUREV-BIODATASCI-080917-013516","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013516","url":null,"abstract":"Computational proteomics is the data science concerned with the identification and quantification of proteins from high-throughput data and the biological interpretation of their concentration changes, posttranslational modifications, interactions, and subcellular localizations. Today, these data most often originate from mass spectrometry–based shotgun proteomics experiments. In this review, we survey computational methods for the analysis of such proteomics data, focusing on the explanation of the key concepts. Starting with mass spectrometric feature detection, we then cover methods for the identification of peptides. Subsequently, protein inference and the control of false discovery rates are highly important topics covered. We then discuss methods for the quantification of peptides and proteins. A section on downstream data analysis covers exploratory statistics, network analysis, machine learning, and multiomics data integration. Finally, we discuss current developments and provide an outlook on what the near future of computational proteomics might bear.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013516","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43457511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 102
Opportunities and Challenges of Whole-Cell and -Tissue Simulations of the Outer Retina in Health and Disease 外视网膜全细胞和组织模拟在健康和疾病中的机遇和挑战
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013356
P. Luthert, Luis Serrano, C. Kiel
Visual processing starts in the outer retina, where photoreceptor cells sense photons that trigger electrical responses. Retinal pigment epithelial cells are located external to the photoreceptor layer and have critical functions in supporting cell and tissue homeostasis and thus sustaining a healthy retina. The high level of specialization makes the retina vulnerable to alterations that promote retinal degeneration. In this review, we discuss opportunities and challenges in proposing whole-cell and -tissue simulations of the human outer retina. An implicit position taken throughout this review is that mapping diverse data sets onto integrative computational models is likely to be a pivotal approach to understanding complex disease and developing novel interventions.
视觉处理始于视网膜外层,感光细胞在那里感应触发电反应的光子。视网膜色素上皮细胞位于感光层外部,在支持细胞和组织稳态,从而维持健康视网膜方面具有关键功能。高度的专业化使视网膜容易受到促进视网膜变性的改变的影响。在这篇综述中,我们讨论了提出人类外视网膜的全细胞和组织模拟的机会和挑战。在这篇综述中,一个隐含的立场是,将不同的数据集映射到综合计算模型上可能是理解复杂疾病和开发新干预措施的关键方法。
{"title":"Opportunities and Challenges of Whole-Cell and -Tissue Simulations of the Outer Retina in Health and Disease","authors":"P. Luthert, Luis Serrano, C. Kiel","doi":"10.1146/ANNUREV-BIODATASCI-080917-013356","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013356","url":null,"abstract":"Visual processing starts in the outer retina, where photoreceptor cells sense photons that trigger electrical responses. Retinal pigment epithelial cells are located external to the photoreceptor layer and have critical functions in supporting cell and tissue homeostasis and thus sustaining a healthy retina. The high level of specialization makes the retina vulnerable to alterations that promote retinal degeneration. In this review, we discuss opportunities and challenges in proposing whole-cell and -tissue simulations of the human outer retina. An implicit position taken throughout this review is that mapping diverse data sets onto integrative computational models is likely to be a pivotal approach to understanding complex disease and developing novel interventions.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013356","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44532075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Science as a Culinary Art: How Data Science and Informatics Will Change Knowledge Discovery for Everyone 科学作为烹饪艺术:数据科学和信息学将如何改变每个人的知识发现
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BD-01-041718-100011
N. Tatonetti
There are 7.6 billion scientists on this planet. Every one of us uses the scientific method in our daily lives. We are continually forming new hypotheses—the fastest route for the morning commute, the best strategy for keeping an orchid healthy, or the appropriate cooking time for a bone-in ribeye. We then test these hypotheses against our observations, reevaluate and adjust our views, and then do it all over again. Granted, these are not the rigorous randomized experiments used by research laboratories, but not all knowledge comes from controlled studies. The example of cooking is especially interesting, as I personally think culinary science to be humanity’s most advanced. For 1.9 million years (1), nearly every human has come up with new ideas about how to prepare food. Today alone, billions will form hypotheses about the right combination of spices, temperatures,andwinepairings.Eachofthesehypotheseswillbetested,evaluatedfortheirsuccess, and accepted or rejected, ultimately contributing to the body of human culinary knowledge. Imaginehowadvancedmedicinewouldbeifeveryhumanwasequippedtoformandtestbiomedical research hypotheses the way that we do for cooking! Not only would the mass of knowledge be greater, but it would arguably be more useful as well. The knowledge generated would naturally be contextual—in other words, knowledge specific to particular regions or subpopulations would emerge. Medicine as a scientific discipline will especially benefit from contextual knowledge. The needs and risks of those living in, say, sub-Saharan Africa are much different than those of Inuits living near the Arctic Circle. The push toward precision medicine is evidence that contextual knowledge is recognized as necessary to advance human health. Contextual knowledge made possible by newly available data,
这个星球上有76亿科学家。我们每个人在日常生活中都使用科学的方法。我们不断形成新的假设——早上通勤的最快路线,保持兰花健康的最佳策略,或者里贝耶骨头的合适烹饪时间。然后,我们将这些假设与我们的观察结果进行比较,重新评估和调整我们的观点,然后重新进行。诚然,这些并不是研究实验室使用的严格的随机实验,但并非所有知识都来自对照研究。烹饪的例子特别有趣,因为我个人认为烹饪科学是人类最先进的。190万年来,几乎每个人都对如何准备食物有了新的想法。仅在今天,数十亿人就会形成关于香料、温度和葡萄酒搭配的正确组合的假设。这些假设中的每一个都会被测试、评估其成功与否,并被接受或拒绝,最终为人类烹饪知识的积累做出贡献。想象一下,每一个人都能像烹饪一样,对生物医学研究假设进行测试,这将是多么先进的医学!不仅知识的质量会更大,而且可以说它也会更有用。所产生的知识自然是有上下文的——换句话说,特定区域或亚群体的知识会出现。医学作为一门科学学科,将特别受益于背景知识。生活在撒哈拉以南非洲的人们的需求和风险与生活在北极圈附近的因纽特人的需求和危险大不相同。精准医学的发展证明,背景知识被认为是促进人类健康所必需的。通过新的可用数据使上下文知识成为可能,
{"title":"Science as a Culinary Art: How Data Science and Informatics Will Change Knowledge Discovery for Everyone","authors":"N. Tatonetti","doi":"10.1146/ANNUREV-BD-01-041718-100011","DOIUrl":"https://doi.org/10.1146/ANNUREV-BD-01-041718-100011","url":null,"abstract":"There are 7.6 billion scientists on this planet. Every one of us uses the scientific method in our daily lives. We are continually forming new hypotheses—the fastest route for the morning commute, the best strategy for keeping an orchid healthy, or the appropriate cooking time for a bone-in ribeye. We then test these hypotheses against our observations, reevaluate and adjust our views, and then do it all over again. Granted, these are not the rigorous randomized experiments used by research laboratories, but not all knowledge comes from controlled studies. The example of cooking is especially interesting, as I personally think culinary science to be humanity’s most advanced. For 1.9 million years (1), nearly every human has come up with new ideas about how to prepare food. Today alone, billions will form hypotheses about the right combination of spices, temperatures,andwinepairings.Eachofthesehypotheseswillbetested,evaluatedfortheirsuccess, and accepted or rejected, ultimately contributing to the body of human culinary knowledge. Imaginehowadvancedmedicinewouldbeifeveryhumanwasequippedtoformandtestbiomedical research hypotheses the way that we do for cooking! Not only would the mass of knowledge be greater, but it would arguably be more useful as well. The knowledge generated would naturally be contextual—in other words, knowledge specific to particular regions or subpopulations would emerge. Medicine as a scientific discipline will especially benefit from contextual knowledge. The needs and risks of those living in, say, sub-Saharan Africa are much different than those of Inuits living near the Arctic Circle. The push toward precision medicine is evidence that contextual knowledge is recognized as necessary to advance human health. Contextual knowledge made possible by newly available data,","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BD-01-041718-100011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49021964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-Scale Analysis of Genetic and Clinical Patient Data 遗传和临床患者数据的大规模分析
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013508
M. Ritchie
Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.
生物医学数据科学在过去十年中经历了新数据的爆炸式增长。由于现代分子技术的成熟,大量的遗传和基因组数据越来越多地出现在大型、多样化的数据集中。除了这些分子数据外,还可以从卫生保健提供者组织、临床试验、人口健康登记和流行病学研究的综合临床数据集中获得密集、丰富的表型数据。随着我们对这些问题和挑战的理解不断出现,研究这些大型遗传/基因组和临床数据集的方法和方法也在迅速发展。在这篇综述中,最新的遗传/基因组分析方法以及复杂的表型组学将被讨论。这个领域正在改变和适应新的数据类型,以及计算和机器学习方面的技术进步。因此,我也将讨论这个令人兴奋和创新的领域未来的挑战。精准医疗的前景在很大程度上依赖于以有意义的方式将复杂的遗传/基因组数据与临床表型结合起来的能力。
{"title":"Large-Scale Analysis of Genetic and Clinical Patient Data","authors":"M. Ritchie","doi":"10.1146/ANNUREV-BIODATASCI-080917-013508","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013508","url":null,"abstract":"Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013508","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46186041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture 从组织到细胞类型再返回:组织结构的单细胞基因表达分析
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013452
Xi Chen, S. Teichmann, K. Meyer
With the recent transformative developments in single-cell genomics and, in particular, single-cell gene expression analysis, it is now possible to study tissues at the single-cell level, rather than having to rely on data from bulk measurements. Here we review the rapid developments in single-cell RNA sequencing (scRNA-seq) protocols that have the potential for unbiased identification and profiling of all cell types within a tissue or organism. In addition, novel approaches for spatial profiling of gene expression allow us to map individual cells and cell types back into the three-dimensional context of organs. The combination of in-depth single-cell and spatial gene expression data will reveal tissue architecture in unprecedented detail, generating a wealth of biological knowledge and a better understanding of many diseases.
随着最近单细胞基因组学,特别是单细胞基因表达分析的革命性发展,现在可以在单细胞水平上研究组织,而不必依赖于大量测量的数据。在这里,我们回顾了单细胞RNA测序(scRNA-seq)方案的快速发展,这些方案具有对组织或生物体内所有细胞类型进行无偏鉴定和分析的潜力。此外,基因表达空间谱的新方法使我们能够将单个细胞和细胞类型映射回器官的三维背景。深入的单细胞和空间基因表达数据的结合将以前所未有的细节揭示组织结构,产生丰富的生物学知识并更好地了解许多疾病。
{"title":"From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture","authors":"Xi Chen, S. Teichmann, K. Meyer","doi":"10.1146/ANNUREV-BIODATASCI-080917-013452","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013452","url":null,"abstract":"With the recent transformative developments in single-cell genomics and, in particular, single-cell gene expression analysis, it is now possible to study tissues at the single-cell level, rather than having to rely on data from bulk measurements. Here we review the rapid developments in single-cell RNA sequencing (scRNA-seq) protocols that have the potential for unbiased identification and profiling of all cell types within a tissue or organism. In addition, novel approaches for spatial profiling of gene expression allow us to map individual cells and cell types back into the three-dimensional context of organs. The combination of in-depth single-cell and spatial gene expression data will reveal tissue architecture in unprecedented detail, generating a wealth of biological knowledge and a better understanding of many diseases.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013452","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48410668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Deep Learning in Biomedical Data Science 生物医学数据科学中的深度学习
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013343
P. Baldi
Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.
自20世纪80年代以来,深度学习和生物医学数据一直在共同发展,相互促进。生物医学数据的广度、复杂性和迅速扩大的规模刺激了新型深度学习方法的发展,将这些方法应用于生物医学数据导致了科学发现和实际解决方案。本综述提供了该领域的技术和历史指针,并调查了当前深度学习在生物医学数据中的应用,这些数据组织在五个子领域,大致是越来越大的空间尺度:化学信息学、蛋白质组学、基因组学和转录组学、生物医学成像和医疗保健。本文还简要讨论了深度学习方法中的黑箱问题。
{"title":"Deep Learning in Biomedical Data Science","authors":"P. Baldi","doi":"10.1146/ANNUREV-BIODATASCI-080917-013343","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013343","url":null,"abstract":"Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013343","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42925605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Network Analysis as a Grand Unifier in Biomedical Data Science 网络分析是生物医学数据科学的一大统一体
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013444
Patrick D. McGillivray, Declan Clarke, W. Meyerson, Jing Zhang, Donghoon Lee, Mengting Gu, Sushant Kumar, Holly Zhou, M. Gerstein
Biomedical data scientists study many types of networks, ranging from those formed by neurons to those created by molecular interactions. People often criticize these networks as uninterpretable diagrams termed hairballs; however, here we show that molecular biological networks can be interpreted in several straightforward ways. First, we can break down a network into smaller components, focusing on individual pathways and modules. Second, we can compute global statistics describing the network as a whole. Third, we can compare networks. These comparisons can be within the same context (e.g., between two gene regulatory networks) or cross-disciplinary (e.g., between regulatory networks and governmental hierarchies). The latter comparisons can transfer a formalism, such as that for Markov chains, from one context to another or relate our intuitions in a familiar setting (e.g., social networks) to the relatively unfamiliar molecular context. Finally, key aspects of molecular networks are dynamics and evolution, i.e., how they evolve over time and how genetic variants affect them. By studying the relationships between variants in networks, we can begin to interpret many common diseases, such as cancer and heart disease.
生物医学数据科学家研究了许多类型的网络,从神经元形成的网络到分子相互作用产生的网络。人们经常批评这些网络是被称为毛球的难以理解的图表;然而,在这里我们展示了分子生物学网络可以用几种简单的方式来解释。首先,我们可以将网络分解为更小的组件,重点关注单个路径和模块。其次,我们可以计算将网络描述为一个整体的全局统计数据。第三,我们可以比较网络。这些比较可以在相同的背景下(例如,在两个基因调控网络之间)或跨学科(例如,监管网络和政府层级之间)。后一种比较可以将形式主义(如马尔可夫链)从一个上下文转移到另一个上下文,或者将我们在熟悉环境(如社交网络)中的直觉与相对陌生的分子上下文联系起来。最后,分子网络的关键方面是动力学和进化,即它们如何随着时间的推移进化,以及遗传变异如何影响它们。通过研究网络中变异之间的关系,我们可以开始解释许多常见疾病,如癌症和心脏病。
{"title":"Network Analysis as a Grand Unifier in Biomedical Data Science","authors":"Patrick D. McGillivray, Declan Clarke, W. Meyerson, Jing Zhang, Donghoon Lee, Mengting Gu, Sushant Kumar, Holly Zhou, M. Gerstein","doi":"10.1146/ANNUREV-BIODATASCI-080917-013444","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013444","url":null,"abstract":"Biomedical data scientists study many types of networks, ranging from those formed by neurons to those created by molecular interactions. People often criticize these networks as uninterpretable diagrams termed hairballs; however, here we show that molecular biological networks can be interpreted in several straightforward ways. First, we can break down a network into smaller components, focusing on individual pathways and modules. Second, we can compute global statistics describing the network as a whole. Third, we can compare networks. These comparisons can be within the same context (e.g., between two gene regulatory networks) or cross-disciplinary (e.g., between regulatory networks and governmental hierarchies). The latter comparisons can transfer a formalism, such as that for Markov chains, from one context to another or relate our intuitions in a familiar setting (e.g., social networks) to the relatively unfamiliar molecular context. Finally, key aspects of molecular networks are dynamics and evolution, i.e., how they evolve over time and how genetic variants affect them. By studying the relationships between variants in networks, we can begin to interpret many common diseases, such as cancer and heart disease.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013444","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49037025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
A Census of Disease Ontologies 疾病本体论普查
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013459
M. Haendel, J. McMurry, R. Relevo, C. Mungall, P. Robinson, C. Chute
For centuries, humans have sought to classify diseases based on phenotypic presentation and available treatments. Today, a wide landscape of strategies, resources, and tools exist to classify patients and diseases. Ontologies can provide a robust foundation of logic for precise stratification and classification along diverse axes such as etiology, development, treatment, and genetics. Disease and phenotype ontologies are used in four primary ways: ( a) search, retrieval, and annotation of knowledge; ( b) data integration and analysis; ( c) clinical decision support; and ( d) knowledge discovery. Computational inference can connect existing knowledge and generate new insights and hypotheses about drug targets, prognosis prediction, or diagnosis. In this review, we examine the rise of disease and phenotype ontologies and the diverse ways they are represented and applied in biomedicine.
几个世纪以来,人类一直试图根据表型表现和可用的治疗方法对疾病进行分类。今天,存在着广泛的策略、资源和工具来对患者和疾病进行分类。本体论可以为沿着病因、发展、治疗和遗传学等不同轴线进行精确的分层和分类提供坚实的逻辑基础。疾病和表型本体论主要有四种使用方式:(a)知识的搜索、检索和注释;(b)数据整合和分析;(c)临床决策支持;以及(d)知识发现。计算推理可以连接现有知识,并产生关于药物靶点、预后预测或诊断的新见解和假设。在这篇综述中,我们研究了疾病和表型本体论的兴起,以及它们在生物医学中的不同表现和应用方式。
{"title":"A Census of Disease Ontologies","authors":"M. Haendel, J. McMurry, R. Relevo, C. Mungall, P. Robinson, C. Chute","doi":"10.1146/ANNUREV-BIODATASCI-080917-013459","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013459","url":null,"abstract":"For centuries, humans have sought to classify diseases based on phenotypic presentation and available treatments. Today, a wide landscape of strategies, resources, and tools exist to classify patients and diseases. Ontologies can provide a robust foundation of logic for precise stratification and classification along diverse axes such as etiology, development, treatment, and genetics. Disease and phenotype ontologies are used in four primary ways: ( a) search, retrieval, and annotation of knowledge; ( b) data integration and analysis; ( c) clinical decision support; and ( d) knowledge discovery. Computational inference can connect existing knowledge and generate new insights and hypotheses about drug targets, prognosis prediction, or diagnosis. In this review, we examine the rise of disease and phenotype ontologies and the diverse ways they are represented and applied in biomedicine.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013459","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49330122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Data Science Issues in Studying Protein-RNA Interactions with CLIP Technologies. 利用 CLIP 技术研究蛋白质-RNA 相互作用的数据科学问题。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-07-20 DOI: 10.1146/annurev-biodatasci-080917-013525
Anob M Chakrabarti, Nejc Haberman, Arne Praznik, Nicholas M Luscombe, Jernej Ule

An interplay of experimental and computational methods is required to achieve a comprehensive understanding of protein-RNA interactions. UV crosslinking and immunoprecipitation (CLIP) identifies endogenous interactions by sequencing RNA fragments that copurify with a selected RNA-binding protein under stringent conditions. Here we focus on approaches for the analysis of the resulting data and appraise the methods for peak calling, visualization, analysis, and computational modeling of protein-RNA binding sites. We advocate that the sensitivity and specificity of data be assessed in combination for computational quality control. Moreover, we demonstrate the value of analyzing sequence motif enrichment in peaks assigned from CLIP data and of visualizing RNA maps, which examine the positional distribution of peaks around regulated landmarks in transcripts. We use these to assess how variations in CLIP data quality and in different peak calling methods affect the insights into regulatory mechanisms. We conclude by discussing future opportunities for the computational analysis of protein-RNA interaction experiments.

要全面了解蛋白质与 RNA 的相互作用,需要实验和计算方法的相互作用。紫外交联和免疫沉淀(CLIP)通过对在严格条件下与所选 RNA 结合蛋白共聚的 RNA 片段进行测序,来确定内源相互作用。在此,我们重点介绍分析所得数据的方法,并对蛋白质-RNA 结合位点的峰值调用、可视化、分析和计算建模方法进行评估。我们主张结合评估数据的灵敏度和特异性来进行计算质量控制。此外,我们还展示了分析根据 CLIP 数据分配的峰中序列主题富集的价值,以及 RNA 地图可视化的价值,RNA 地图可检查转录本中调控地标周围峰的位置分布。我们利用这些方法来评估 CLIP 数据质量和不同峰值调用方法的变化如何影响对调控机制的洞察。最后,我们讨论了对蛋白质-RNA 相互作用实验进行计算分析的未来机遇。
{"title":"Data Science Issues in Studying Protein-RNA Interactions with CLIP Technologies.","authors":"Anob M Chakrabarti, Nejc Haberman, Arne Praznik, Nicholas M Luscombe, Jernej Ule","doi":"10.1146/annurev-biodatasci-080917-013525","DOIUrl":"10.1146/annurev-biodatasci-080917-013525","url":null,"abstract":"<p><p>An interplay of experimental and computational methods is required to achieve a comprehensive understanding of protein-RNA interactions. UV crosslinking and immunoprecipitation (CLIP) identifies endogenous interactions by sequencing RNA fragments that copurify with a selected RNA-binding protein under stringent conditions. Here we focus on approaches for the analysis of the resulting data and appraise the methods for peak calling, visualization, analysis, and computational modeling of protein-RNA binding sites. We advocate that the sensitivity and specificity of data be assessed in combination for computational quality control. Moreover, we demonstrate the value of analyzing sequence motif enrichment in peaks assigned from CLIP data and of visualizing RNA maps, which examine the positional distribution of peaks around regulated landmarks in transcripts. We use these to assess how variations in CLIP data quality and in different peak calling methods affect the insights into regulatory mechanisms. We conclude by discussing future opportunities for the computational analysis of protein-RNA interaction experiments.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 1","pages":"235-261"},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614488/pdf/EMS174063.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9404672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annual Review of Biomedical Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1