Pub Date : 2019-07-22DOI: 10.1146/ANNUREV-BIODATASCI-080917-013328
J. Hériché, S. Alexander, J. Ellenberg
Fluorescence microscopy imaging has long been complementary to DNA sequencing- and mass spectrometry–based omics in biomedical research, but these approaches are now converging. On the one hand, omics methods are moving from in vitro methods that average across large cell populations to in situ molecular characterization tools with single-cell sensitivity. On the other hand, fluorescence microscopy imaging has moved from a morphological description of tissues and cells to quantitative molecular profiling with single-molecule resolution. Recent technological developments underpinned by computational methods have started to blur the lines between imaging and omics and have made their direct correlation and seamless integration an exciting possibility. As this trend continues rapidly, it will allow us to create comprehensive molecular profiles of living systems with spatial and temporal context and subcellular resolution. Key to achieving this ambitious goal will be novel computational methods and successfully dealing with the challenges of data integration and sharing as well as cloud-enabled big data analysis.
{"title":"Integrating Imaging and Omics: Computational Methods and Challenges","authors":"J. Hériché, S. Alexander, J. Ellenberg","doi":"10.1146/ANNUREV-BIODATASCI-080917-013328","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013328","url":null,"abstract":"Fluorescence microscopy imaging has long been complementary to DNA sequencing- and mass spectrometry–based omics in biomedical research, but these approaches are now converging. On the one hand, omics methods are moving from in vitro methods that average across large cell populations to in situ molecular characterization tools with single-cell sensitivity. On the other hand, fluorescence microscopy imaging has moved from a morphological description of tissues and cells to quantitative molecular profiling with single-molecule resolution. Recent technological developments underpinned by computational methods have started to blur the lines between imaging and omics and have made their direct correlation and seamless integration an exciting possibility. As this trend continues rapidly, it will allow us to create comprehensive molecular profiles of living systems with spatial and temporal context and subcellular resolution. Key to achieving this ambitious goal will be novel computational methods and successfully dealing with the challenges of data integration and sharing as well as cloud-enabled big data analysis.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013328","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43681862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-22DOI: 10.1146/ANNUREV-BIODATASCI-072018-021321
J. Vamathevan, R. Apweiler, E. Birney
Technological advances have continuously driven the generation of bio-molecular data and the development of bioinformatics infrastructure, which enables data reuse for scientific discovery. Several types of data management resources have arisen, such as data deposition databases, added-value databases or knowledgebases, and biology-driven portals. In this review, we provide a unique overview of the gradual evolution of these resources and discuss the goals and features that must be considered in their development. With the increasing application of genomics in the health care context and with 60 to 500 million whole genomes estimated to be sequenced by 2022, biomedical research infrastructure is transforming, too. Systems for federated access, portable tools, provision of reference data, and interpretation tools will enable researchers to derive maximal benefits from these data. Collaboration, coordination, and sustainability of data resources are key to ensure that biomedical knowledge management can scale with technology shifts and growing data volumes.
{"title":"Biomolecular Data Resources: Bioinformatics Infrastructure for Biomedical Data Science","authors":"J. Vamathevan, R. Apweiler, E. Birney","doi":"10.1146/ANNUREV-BIODATASCI-072018-021321","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021321","url":null,"abstract":"Technological advances have continuously driven the generation of bio-molecular data and the development of bioinformatics infrastructure, which enables data reuse for scientific discovery. Several types of data management resources have arisen, such as data deposition databases, added-value databases or knowledgebases, and biology-driven portals. In this review, we provide a unique overview of the gradual evolution of these resources and discuss the goals and features that must be considered in their development. With the increasing application of genomics in the health care context and with 60 to 500 million whole genomes estimated to be sequenced by 2022, biomedical research infrastructure is transforming, too. Systems for federated access, portable tools, provision of reference data, and interpretation tools will enable researchers to derive maximal benefits from these data. Collaboration, coordination, and sustainability of data resources are key to ensure that biomedical knowledge management can scale with technology shifts and growing data volumes.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45228710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-22DOI: 10.1146/ANNUREV-BIODATASCI-072018-021211
A. Keenan, Megan L. Wojciechowicz, Zichen Wang, Kathleen M. Jagodnik, S. L. Jenkins, Alexander Lachmann, Avi Ma’ayan
Connectivity mapping resources consist of signatures representing changes in cellular state following systematic small-molecule, disease, gene, or other form of perturbations. Such resources enable the characterization of signatures from novel perturbations based on similarity; provide a global view of the space of many themed perturbations; and allow the ability to predict cellular, tissue, and organismal phenotypes for perturbagens. A signature search engine enables hypothesis generation by finding connections between query signatures and the database of signatures. This framework has been used to identify connections between small molecules and their targets, to discover cell-specific responses to perturbations and ways to reverse disease expression states with small molecules, and to predict small-molecule mimickers for existing drugs. This review provides a historical perspective and the current state of connectivity mapping resources with a focus on both methodology and community implementations.
{"title":"Connectivity Mapping: Methods and Applications","authors":"A. Keenan, Megan L. Wojciechowicz, Zichen Wang, Kathleen M. Jagodnik, S. L. Jenkins, Alexander Lachmann, Avi Ma’ayan","doi":"10.1146/ANNUREV-BIODATASCI-072018-021211","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021211","url":null,"abstract":"Connectivity mapping resources consist of signatures representing changes in cellular state following systematic small-molecule, disease, gene, or other form of perturbations. Such resources enable the characterization of signatures from novel perturbations based on similarity; provide a global view of the space of many themed perturbations; and allow the ability to predict cellular, tissue, and organismal phenotypes for perturbagens. A signature search engine enables hypothesis generation by finding connections between query signatures and the database of signatures. This framework has been used to identify connections between small molecules and their targets, to discover cell-specific responses to perturbations and ways to reverse disease expression states with small molecules, and to predict small-molecule mimickers for existing drugs. This review provides a historical perspective and the current state of connectivity mapping resources with a focus on both methodology and community implementations.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021211","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49485099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-22DOI: 10.1146/ANNUREV-BIODATASCI-072018-021339
C. Deng, Timothy P. Daley, G. Brandine, Andrew D. Smith
High-throughput sequencing technologies have evolved at a stellar pace for almost a decade and have greatly advanced our understanding of genome biology. In these sampling-based technologies, there is an important detail that is often overlooked in the analysis of the data and the design of the experiments, specifically that the sampled observations often do not give a representative picture of the underlying population. This has long been recognized as a problem in statistical ecology and in the broader statistics literature. In this review, we discuss the connections between these fields, methodological advances that parallel both the needs and opportunities of large-scale data analysis, and specific applications in modern biology. In the process we describe unique aspects of applying these approaches to sequencing technologies, including sequencing error, population and individual heterogeneity, and the design of experiments.
{"title":"Molecular Heterogeneity in Large-Scale Biological Data: Techniques and Applications","authors":"C. Deng, Timothy P. Daley, G. Brandine, Andrew D. Smith","doi":"10.1146/ANNUREV-BIODATASCI-072018-021339","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021339","url":null,"abstract":"High-throughput sequencing technologies have evolved at a stellar pace for almost a decade and have greatly advanced our understanding of genome biology. In these sampling-based technologies, there is an important detail that is often overlooked in the analysis of the data and the design of the experiments, specifically that the sampled observations often do not give a representative picture of the underlying population. This has long been recognized as a problem in statistical ecology and in the broader statistics literature. In this review, we discuss the connections between these fields, methodological advances that parallel both the needs and opportunities of large-scale data analysis, and specific applications in modern biology. In the process we describe unique aspects of applying these approaches to sequencing technologies, including sequencing error, population and individual heterogeneity, and the design of experiments.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021339","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44142841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-22DOI: 10.1146/ANNUREV-BIODATASCI-072018-021305
F. Cutrale, S. Fraser, Le A. Trinh
Embryonic development is highly complex and dynamic, requiring the coordination of numerous molecular and cellular events at precise times and places. Advances in imaging technology have made it possible to follow developmental processes at cellular, tissue, and organ levels over time as they take place in the intact embryo. Parallel innovations of in vivo probes permit imaging to report on molecular, physiological, and anatomical events of embryogenesis, but the resulting multidimensional data sets pose significant challenges for extracting knowledge. In this review, we discuss recent and emerging advances in imaging technologies, in vivo labeling, and data processing that offer the greatest potential for jointly deciphering the intricate cellular dynamics and the underlying molecular mechanisms. Our discussion of the emerging area of “image-omics” highlights both the challenges of data analysis and the promise of more fully embracing computation and data science for rapidly advancing our understanding of biology.
{"title":"Imaging, Visualization, and Computation in Developmental Biology","authors":"F. Cutrale, S. Fraser, Le A. Trinh","doi":"10.1146/ANNUREV-BIODATASCI-072018-021305","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021305","url":null,"abstract":"Embryonic development is highly complex and dynamic, requiring the coordination of numerous molecular and cellular events at precise times and places. Advances in imaging technology have made it possible to follow developmental processes at cellular, tissue, and organ levels over time as they take place in the intact embryo. Parallel innovations of in vivo probes permit imaging to report on molecular, physiological, and anatomical events of embryogenesis, but the resulting multidimensional data sets pose significant challenges for extracting knowledge. In this review, we discuss recent and emerging advances in imaging technologies, in vivo labeling, and data processing that offer the greatest potential for jointly deciphering the intricate cellular dynamics and the underlying molecular mechanisms. Our discussion of the emerging area of “image-omics” highlights both the challenges of data analysis and the promise of more fully embracing computation and data science for rapidly advancing our understanding of biology.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021305","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47191858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-22DOI: 10.1146/ANNUREV-BIODATASCI-072018-021348
G. Way, C. Greene
Pathway and cell type signatures are patterns present in transcriptome data that are associated with biological processes or phenotypic consequences. These signatures result from specific cell type and pathway expression but can require large transcriptomic compendia to detect. Machine learning techniques can be powerful tools for signature discovery through their ability to provide accurate and interpretable results. In this review, we discuss various machine learning applications to extract pathway and cell type signatures from transcriptomic compendia. We focus on the biological motivations and interpretation for both supervised and unsupervised learning approaches in this setting. We consider recent advances, including deep learning, and their applications to expanding bulk and single-cell RNA data. As data and computational resources increase, there will be more opportunities for machine learning to aid in revealing biological signatures.
{"title":"Discovering Pathway and Cell Type Signatures in Transcriptomic Compendia with Machine Learning","authors":"G. Way, C. Greene","doi":"10.1146/ANNUREV-BIODATASCI-072018-021348","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021348","url":null,"abstract":"Pathway and cell type signatures are patterns present in transcriptome data that are associated with biological processes or phenotypic consequences. These signatures result from specific cell type and pathway expression but can require large transcriptomic compendia to detect. Machine learning techniques can be powerful tools for signature discovery through their ability to provide accurate and interpretable results. In this review, we discuss various machine learning applications to extract pathway and cell type signatures from transcriptomic compendia. We focus on the biological motivations and interpretation for both supervised and unsupervised learning approaches in this setting. We consider recent advances, including deep learning, and their applications to expanding bulk and single-cell RNA data. As data and computational resources increase, there will be more opportunities for machine learning to aid in revealing biological signatures.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021348","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46673466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-20DOI: 10.1146/ANNUREV-BIODATASCI-072018-021156
G. Marçais, Brad Solomon, Robert Patro, Carl Kingsford
Large-scale genomics demands computational methods that scale sublinearly with the growth of data. We review several data structures and sketching techniques that have been used in genomic analysis methods. Specifically, we focus on four key ideas that take different approaches to achieve sublinear space usage and processing time: compressed full-text indices, approximate membership query data structures, locality-sensitive hashing, and minimizers schemes. We describe these techniques at a high level and give several representative applications of each.
{"title":"Sketching and Sublinear Data Structures in Genomics","authors":"G. Marçais, Brad Solomon, Robert Patro, Carl Kingsford","doi":"10.1146/ANNUREV-BIODATASCI-072018-021156","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021156","url":null,"abstract":"Large-scale genomics demands computational methods that scale sublinearly with the growth of data. We review several data structures and sketching techniques that have been used in genomic analysis methods. Specifically, we focus on four key ideas that take different approaches to achieve sublinear space usage and processing time: compressed full-text indices, approximate membership query data structures, locality-sensitive hashing, and minimizers schemes. We describe these techniques at a high level and give several representative applications of each.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021156","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41454479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-20DOI: 10.1146/ANNUREV-BIODATASCI-072018-021229
M. Hernaez, Dmitri S. Pavlichin, T. Weissman, Idoia Ochoa
Recently, there has been growing interest in genome sequencing, driven by advances in sequencing technology, in terms of both efficiency and affordability. These developments have allowed many to envision whole-genome sequencing as an invaluable tool for both personalized medical care and public health. As a result, increasingly large and ubiquitous genomic data sets are being generated. This poses a significant challenge for the storage and transmission of these data. Already, it is more expensive to store genomic data for a decade than it is to obtain the data in the first place. This situation calls for efficient representations of genomic information. In this review, we emphasize the need for designing specialized compressors tailored to genomic data and describe the main solutions already proposed. We also give general guidelines for storing these data and conclude with our thoughts on the future of genomic formats and compressors.
{"title":"Genomic Data Compression","authors":"M. Hernaez, Dmitri S. Pavlichin, T. Weissman, Idoia Ochoa","doi":"10.1146/ANNUREV-BIODATASCI-072018-021229","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021229","url":null,"abstract":"Recently, there has been growing interest in genome sequencing, driven by advances in sequencing technology, in terms of both efficiency and affordability. These developments have allowed many to envision whole-genome sequencing as an invaluable tool for both personalized medical care and public health. As a result, increasingly large and ubiquitous genomic data sets are being generated. This poses a significant challenge for the storage and transmission of these data. Already, it is more expensive to store genomic data for a decade than it is to obtain the data in the first place. This situation calls for efficient representations of genomic information. In this review, we emphasize the need for designing specialized compressors tailored to genomic data and describe the main solutions already proposed. We also give general guidelines for storing these data and conclude with our thoughts on the future of genomic formats and compressors.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021229","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46626764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1146/annurev-biodatasci-072018-021139
Rhiju Das, Benjamin Keep, Peter Washington, Ingmar H Riedel-Kruse
Over the past decade, scientific discovery games (SDGs) have emerged as a viable approach for biomedical research, engaging hundreds of thousands of volunteer players and resulting in numerous scientific publications. After describing the origins of this novel research approach, we review the scientific output of SDGs across molecular modeling, sequence alignment, neuroscience, pathology, cellular biology, genomics, and human cognition. We find compelling results and technical innovations arising in problem-oriented games such as Foldit and Eterna and in data-oriented games such as EyeWire and Project Discovery. We discuss emergent properties of player communities shared across different projects, including the diversity of communities and the extraordinary contributions of some volunteers, such as paper writing. Finally, we highlight connections to artificial intelligence, biological cloud laboratories, new game genres, science education, and open science that may drive the next generation of SDGs.
{"title":"Scientific Discovery Games for Biomedical Research.","authors":"Rhiju Das, Benjamin Keep, Peter Washington, Ingmar H Riedel-Kruse","doi":"10.1146/annurev-biodatasci-072018-021139","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-072018-021139","url":null,"abstract":"<p><p>Over the past decade, scientific discovery games (SDGs) have emerged as a viable approach for biomedical research, engaging hundreds of thousands of volunteer players and resulting in numerous scientific publications. After describing the origins of this novel research approach, we review the scientific output of SDGs across molecular modeling, sequence alignment, neuroscience, pathology, cellular biology, genomics, and human cognition. We find compelling results and technical innovations arising in problem-oriented games such as Foldit and Eterna and in data-oriented games such as EyeWire and Project Discovery. We discuss emergent properties of player communities shared across different projects, including the diversity of communities and the extraordinary contributions of some volunteers, such as paper writing. Finally, we highlight connections to artificial intelligence, biological cloud laboratories, new game genres, science education, and open science that may drive the next generation of SDGs.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"2 1","pages":"253-279"},"PeriodicalIF":6.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/annurev-biodatasci-072018-021139","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39221797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-17DOI: 10.1146/ANNUREV-BIODATASCI-072018-021255
K. Van den Berge, Katharina M. Hembach, C. Soneson, S. Tiberi, L. Clement, M. Love, Robert Patro, M. Robinson
Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.
{"title":"RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis","authors":"K. Van den Berge, Katharina M. Hembach, C. Soneson, S. Tiberi, L. Clement, M. Love, Robert Patro, M. Robinson","doi":"10.1146/ANNUREV-BIODATASCI-072018-021255","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021255","url":null,"abstract":"Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021255","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48762878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}