Pub Date : 2018-07-20DOI: 10.1146/ANNUREV-BIODATASCI-080917-013424
S. O’Donoghue, B. Baldi, S. Clark, A. Darling, J. Hogan, Sandeep Kaur, L. Maier-Hein, Davis J. McCarthy, W. Moore, Esther Stenau, J. Swedlow, Jenny Vuong, J. Procter
The rapid increase in volume and complexity of biomedical data requires changes in research, communication, and clinical practices. This includes learning how to effectively integrate automated analysis with high–data density visualizations that clearly express complex phenomena. In this review, we summarize key principles and resources from data visualization research that help address this difficult challenge. We then survey how visualization is being used in a selection of emerging biomedical research areas, including three-dimensional genomics, single-cell RNA sequencing (RNA-seq), the protein structure universe, phosphoproteomics, augmented reality–assisted surgery, and metagenomics. While specific research areas need highly tailored visualizations, there are common challenges that can be addressed with general methods and strategies. Also common, however, are poor visualization practices. We outline ongoing initiatives aimed at improving visualization practices in biomedical research via better tools, peer-to-peer learning, and interdisciplinary collaboration with computer scientists, science communicators, and graphic designers. These changes are revolutionizing how we see and think about our data.
{"title":"Visualization of Biomedical Data","authors":"S. O’Donoghue, B. Baldi, S. Clark, A. Darling, J. Hogan, Sandeep Kaur, L. Maier-Hein, Davis J. McCarthy, W. Moore, Esther Stenau, J. Swedlow, Jenny Vuong, J. Procter","doi":"10.1146/ANNUREV-BIODATASCI-080917-013424","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013424","url":null,"abstract":"The rapid increase in volume and complexity of biomedical data requires changes in research, communication, and clinical practices. This includes learning how to effectively integrate automated analysis with high–data density visualizations that clearly express complex phenomena. In this review, we summarize key principles and resources from data visualization research that help address this difficult challenge. We then survey how visualization is being used in a selection of emerging biomedical research areas, including three-dimensional genomics, single-cell RNA sequencing (RNA-seq), the protein structure universe, phosphoproteomics, augmented reality–assisted surgery, and metagenomics. While specific research areas need highly tailored visualizations, there are common challenges that can be addressed with general methods and strategies. Also common, however, are poor visualization practices. We outline ongoing initiatives aimed at improving visualization practices in biomedical research via better tools, peer-to-peer learning, and interdisciplinary collaboration with computer scientists, science communicators, and graphic designers. These changes are revolutionizing how we see and think about our data.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013424","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48064895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-20DOI: 10.1146/ANNUREV-BIODATASCI-080917-013516
Pavel Sinitcyn, J. Rudolph, J. Cox
Computational proteomics is the data science concerned with the identification and quantification of proteins from high-throughput data and the biological interpretation of their concentration changes, posttranslational modifications, interactions, and subcellular localizations. Today, these data most often originate from mass spectrometry–based shotgun proteomics experiments. In this review, we survey computational methods for the analysis of such proteomics data, focusing on the explanation of the key concepts. Starting with mass spectrometric feature detection, we then cover methods for the identification of peptides. Subsequently, protein inference and the control of false discovery rates are highly important topics covered. We then discuss methods for the quantification of peptides and proteins. A section on downstream data analysis covers exploratory statistics, network analysis, machine learning, and multiomics data integration. Finally, we discuss current developments and provide an outlook on what the near future of computational proteomics might bear.
{"title":"Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data","authors":"Pavel Sinitcyn, J. Rudolph, J. Cox","doi":"10.1146/ANNUREV-BIODATASCI-080917-013516","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013516","url":null,"abstract":"Computational proteomics is the data science concerned with the identification and quantification of proteins from high-throughput data and the biological interpretation of their concentration changes, posttranslational modifications, interactions, and subcellular localizations. Today, these data most often originate from mass spectrometry–based shotgun proteomics experiments. In this review, we survey computational methods for the analysis of such proteomics data, focusing on the explanation of the key concepts. Starting with mass spectrometric feature detection, we then cover methods for the identification of peptides. Subsequently, protein inference and the control of false discovery rates are highly important topics covered. We then discuss methods for the quantification of peptides and proteins. A section on downstream data analysis covers exploratory statistics, network analysis, machine learning, and multiomics data integration. Finally, we discuss current developments and provide an outlook on what the near future of computational proteomics might bear.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013516","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43457511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-20DOI: 10.1146/ANNUREV-BIODATASCI-080917-013356
P. Luthert, Luis Serrano, C. Kiel
Visual processing starts in the outer retina, where photoreceptor cells sense photons that trigger electrical responses. Retinal pigment epithelial cells are located external to the photoreceptor layer and have critical functions in supporting cell and tissue homeostasis and thus sustaining a healthy retina. The high level of specialization makes the retina vulnerable to alterations that promote retinal degeneration. In this review, we discuss opportunities and challenges in proposing whole-cell and -tissue simulations of the human outer retina. An implicit position taken throughout this review is that mapping diverse data sets onto integrative computational models is likely to be a pivotal approach to understanding complex disease and developing novel interventions.
{"title":"Opportunities and Challenges of Whole-Cell and -Tissue Simulations of the Outer Retina in Health and Disease","authors":"P. Luthert, Luis Serrano, C. Kiel","doi":"10.1146/ANNUREV-BIODATASCI-080917-013356","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013356","url":null,"abstract":"Visual processing starts in the outer retina, where photoreceptor cells sense photons that trigger electrical responses. Retinal pigment epithelial cells are located external to the photoreceptor layer and have critical functions in supporting cell and tissue homeostasis and thus sustaining a healthy retina. The high level of specialization makes the retina vulnerable to alterations that promote retinal degeneration. In this review, we discuss opportunities and challenges in proposing whole-cell and -tissue simulations of the human outer retina. An implicit position taken throughout this review is that mapping diverse data sets onto integrative computational models is likely to be a pivotal approach to understanding complex disease and developing novel interventions.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013356","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44532075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-20DOI: 10.1146/ANNUREV-BD-01-041718-100011
N. Tatonetti
There are 7.6 billion scientists on this planet. Every one of us uses the scientific method in our daily lives. We are continually forming new hypotheses—the fastest route for the morning commute, the best strategy for keeping an orchid healthy, or the appropriate cooking time for a bone-in ribeye. We then test these hypotheses against our observations, reevaluate and adjust our views, and then do it all over again. Granted, these are not the rigorous randomized experiments used by research laboratories, but not all knowledge comes from controlled studies. The example of cooking is especially interesting, as I personally think culinary science to be humanity’s most advanced. For 1.9 million years (1), nearly every human has come up with new ideas about how to prepare food. Today alone, billions will form hypotheses about the right combination of spices, temperatures,andwinepairings.Eachofthesehypotheseswillbetested,evaluatedfortheirsuccess, and accepted or rejected, ultimately contributing to the body of human culinary knowledge. Imaginehowadvancedmedicinewouldbeifeveryhumanwasequippedtoformandtestbiomedical research hypotheses the way that we do for cooking! Not only would the mass of knowledge be greater, but it would arguably be more useful as well. The knowledge generated would naturally be contextual—in other words, knowledge specific to particular regions or subpopulations would emerge. Medicine as a scientific discipline will especially benefit from contextual knowledge. The needs and risks of those living in, say, sub-Saharan Africa are much different than those of Inuits living near the Arctic Circle. The push toward precision medicine is evidence that contextual knowledge is recognized as necessary to advance human health. Contextual knowledge made possible by newly available data,
{"title":"Science as a Culinary Art: How Data Science and Informatics Will Change Knowledge Discovery for Everyone","authors":"N. Tatonetti","doi":"10.1146/ANNUREV-BD-01-041718-100011","DOIUrl":"https://doi.org/10.1146/ANNUREV-BD-01-041718-100011","url":null,"abstract":"There are 7.6 billion scientists on this planet. Every one of us uses the scientific method in our daily lives. We are continually forming new hypotheses—the fastest route for the morning commute, the best strategy for keeping an orchid healthy, or the appropriate cooking time for a bone-in ribeye. We then test these hypotheses against our observations, reevaluate and adjust our views, and then do it all over again. Granted, these are not the rigorous randomized experiments used by research laboratories, but not all knowledge comes from controlled studies. The example of cooking is especially interesting, as I personally think culinary science to be humanity’s most advanced. For 1.9 million years (1), nearly every human has come up with new ideas about how to prepare food. Today alone, billions will form hypotheses about the right combination of spices, temperatures,andwinepairings.Eachofthesehypotheseswillbetested,evaluatedfortheirsuccess, and accepted or rejected, ultimately contributing to the body of human culinary knowledge. Imaginehowadvancedmedicinewouldbeifeveryhumanwasequippedtoformandtestbiomedical research hypotheses the way that we do for cooking! Not only would the mass of knowledge be greater, but it would arguably be more useful as well. The knowledge generated would naturally be contextual—in other words, knowledge specific to particular regions or subpopulations would emerge. Medicine as a scientific discipline will especially benefit from contextual knowledge. The needs and risks of those living in, say, sub-Saharan Africa are much different than those of Inuits living near the Arctic Circle. The push toward precision medicine is evidence that contextual knowledge is recognized as necessary to advance human health. Contextual knowledge made possible by newly available data,","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BD-01-041718-100011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49021964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-20DOI: 10.1146/ANNUREV-BIODATASCI-080917-013508
M. Ritchie
Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.
{"title":"Large-Scale Analysis of Genetic and Clinical Patient Data","authors":"M. Ritchie","doi":"10.1146/ANNUREV-BIODATASCI-080917-013508","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013508","url":null,"abstract":"Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013508","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46186041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-20DOI: 10.1146/ANNUREV-BIODATASCI-080917-013452
Xi Chen, S. Teichmann, K. Meyer
With the recent transformative developments in single-cell genomics and, in particular, single-cell gene expression analysis, it is now possible to study tissues at the single-cell level, rather than having to rely on data from bulk measurements. Here we review the rapid developments in single-cell RNA sequencing (scRNA-seq) protocols that have the potential for unbiased identification and profiling of all cell types within a tissue or organism. In addition, novel approaches for spatial profiling of gene expression allow us to map individual cells and cell types back into the three-dimensional context of organs. The combination of in-depth single-cell and spatial gene expression data will reveal tissue architecture in unprecedented detail, generating a wealth of biological knowledge and a better understanding of many diseases.
{"title":"From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture","authors":"Xi Chen, S. Teichmann, K. Meyer","doi":"10.1146/ANNUREV-BIODATASCI-080917-013452","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013452","url":null,"abstract":"With the recent transformative developments in single-cell genomics and, in particular, single-cell gene expression analysis, it is now possible to study tissues at the single-cell level, rather than having to rely on data from bulk measurements. Here we review the rapid developments in single-cell RNA sequencing (scRNA-seq) protocols that have the potential for unbiased identification and profiling of all cell types within a tissue or organism. In addition, novel approaches for spatial profiling of gene expression allow us to map individual cells and cell types back into the three-dimensional context of organs. The combination of in-depth single-cell and spatial gene expression data will reveal tissue architecture in unprecedented detail, generating a wealth of biological knowledge and a better understanding of many diseases.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013452","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48410668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-20DOI: 10.1146/ANNUREV-BIODATASCI-080917-013343
P. Baldi
Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.
{"title":"Deep Learning in Biomedical Data Science","authors":"P. Baldi","doi":"10.1146/ANNUREV-BIODATASCI-080917-013343","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013343","url":null,"abstract":"Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013343","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42925605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-20DOI: 10.1146/ANNUREV-BIODATASCI-080917-013444
Patrick D. McGillivray, Declan Clarke, W. Meyerson, Jing Zhang, Donghoon Lee, Mengting Gu, Sushant Kumar, Holly Zhou, M. Gerstein
Biomedical data scientists study many types of networks, ranging from those formed by neurons to those created by molecular interactions. People often criticize these networks as uninterpretable diagrams termed hairballs; however, here we show that molecular biological networks can be interpreted in several straightforward ways. First, we can break down a network into smaller components, focusing on individual pathways and modules. Second, we can compute global statistics describing the network as a whole. Third, we can compare networks. These comparisons can be within the same context (e.g., between two gene regulatory networks) or cross-disciplinary (e.g., between regulatory networks and governmental hierarchies). The latter comparisons can transfer a formalism, such as that for Markov chains, from one context to another or relate our intuitions in a familiar setting (e.g., social networks) to the relatively unfamiliar molecular context. Finally, key aspects of molecular networks are dynamics and evolution, i.e., how they evolve over time and how genetic variants affect them. By studying the relationships between variants in networks, we can begin to interpret many common diseases, such as cancer and heart disease.
{"title":"Network Analysis as a Grand Unifier in Biomedical Data Science","authors":"Patrick D. McGillivray, Declan Clarke, W. Meyerson, Jing Zhang, Donghoon Lee, Mengting Gu, Sushant Kumar, Holly Zhou, M. Gerstein","doi":"10.1146/ANNUREV-BIODATASCI-080917-013444","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013444","url":null,"abstract":"Biomedical data scientists study many types of networks, ranging from those formed by neurons to those created by molecular interactions. People often criticize these networks as uninterpretable diagrams termed hairballs; however, here we show that molecular biological networks can be interpreted in several straightforward ways. First, we can break down a network into smaller components, focusing on individual pathways and modules. Second, we can compute global statistics describing the network as a whole. Third, we can compare networks. These comparisons can be within the same context (e.g., between two gene regulatory networks) or cross-disciplinary (e.g., between regulatory networks and governmental hierarchies). The latter comparisons can transfer a formalism, such as that for Markov chains, from one context to another or relate our intuitions in a familiar setting (e.g., social networks) to the relatively unfamiliar molecular context. Finally, key aspects of molecular networks are dynamics and evolution, i.e., how they evolve over time and how genetic variants affect them. By studying the relationships between variants in networks, we can begin to interpret many common diseases, such as cancer and heart disease.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013444","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49037025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-20DOI: 10.1146/ANNUREV-BIODATASCI-080917-013459
M. Haendel, J. McMurry, R. Relevo, C. Mungall, P. Robinson, C. Chute
For centuries, humans have sought to classify diseases based on phenotypic presentation and available treatments. Today, a wide landscape of strategies, resources, and tools exist to classify patients and diseases. Ontologies can provide a robust foundation of logic for precise stratification and classification along diverse axes such as etiology, development, treatment, and genetics. Disease and phenotype ontologies are used in four primary ways: ( a) search, retrieval, and annotation of knowledge; ( b) data integration and analysis; ( c) clinical decision support; and ( d) knowledge discovery. Computational inference can connect existing knowledge and generate new insights and hypotheses about drug targets, prognosis prediction, or diagnosis. In this review, we examine the rise of disease and phenotype ontologies and the diverse ways they are represented and applied in biomedicine.
{"title":"A Census of Disease Ontologies","authors":"M. Haendel, J. McMurry, R. Relevo, C. Mungall, P. Robinson, C. Chute","doi":"10.1146/ANNUREV-BIODATASCI-080917-013459","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013459","url":null,"abstract":"For centuries, humans have sought to classify diseases based on phenotypic presentation and available treatments. Today, a wide landscape of strategies, resources, and tools exist to classify patients and diseases. Ontologies can provide a robust foundation of logic for precise stratification and classification along diverse axes such as etiology, development, treatment, and genetics. Disease and phenotype ontologies are used in four primary ways: ( a) search, retrieval, and annotation of knowledge; ( b) data integration and analysis; ( c) clinical decision support; and ( d) knowledge discovery. Computational inference can connect existing knowledge and generate new insights and hypotheses about drug targets, prognosis prediction, or diagnosis. In this review, we examine the rise of disease and phenotype ontologies and the diverse ways they are represented and applied in biomedicine.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013459","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49330122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-20DOI: 10.1146/annurev-biodatasci-080917-013525
Anob M Chakrabarti, Nejc Haberman, Arne Praznik, Nicholas M Luscombe, Jernej Ule
An interplay of experimental and computational methods is required to achieve a comprehensive understanding of protein-RNA interactions. UV crosslinking and immunoprecipitation (CLIP) identifies endogenous interactions by sequencing RNA fragments that copurify with a selected RNA-binding protein under stringent conditions. Here we focus on approaches for the analysis of the resulting data and appraise the methods for peak calling, visualization, analysis, and computational modeling of protein-RNA binding sites. We advocate that the sensitivity and specificity of data be assessed in combination for computational quality control. Moreover, we demonstrate the value of analyzing sequence motif enrichment in peaks assigned from CLIP data and of visualizing RNA maps, which examine the positional distribution of peaks around regulated landmarks in transcripts. We use these to assess how variations in CLIP data quality and in different peak calling methods affect the insights into regulatory mechanisms. We conclude by discussing future opportunities for the computational analysis of protein-RNA interaction experiments.
{"title":"Data Science Issues in Studying Protein-RNA Interactions with CLIP Technologies.","authors":"Anob M Chakrabarti, Nejc Haberman, Arne Praznik, Nicholas M Luscombe, Jernej Ule","doi":"10.1146/annurev-biodatasci-080917-013525","DOIUrl":"10.1146/annurev-biodatasci-080917-013525","url":null,"abstract":"<p><p>An interplay of experimental and computational methods is required to achieve a comprehensive understanding of protein-RNA interactions. UV crosslinking and immunoprecipitation (CLIP) identifies endogenous interactions by sequencing RNA fragments that copurify with a selected RNA-binding protein under stringent conditions. Here we focus on approaches for the analysis of the resulting data and appraise the methods for peak calling, visualization, analysis, and computational modeling of protein-RNA binding sites. We advocate that the sensitivity and specificity of data be assessed in combination for computational quality control. Moreover, we demonstrate the value of analyzing sequence motif enrichment in peaks assigned from CLIP data and of visualizing RNA maps, which examine the positional distribution of peaks around regulated landmarks in transcripts. We use these to assess how variations in CLIP data quality and in different peak calling methods affect the insights into regulatory mechanisms. We conclude by discussing future opportunities for the computational analysis of protein-RNA interaction experiments.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 1","pages":"235-261"},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614488/pdf/EMS174063.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9404672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}