首页 > 最新文献

Annual Review of Biomedical Data Science最新文献

英文 中文
Integrating Imaging and Omics: Computational Methods and Challenges 整合成像和组学:计算方法和挑战
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2019-07-22 DOI: 10.1146/ANNUREV-BIODATASCI-080917-013328
J. Hériché, S. Alexander, J. Ellenberg
Fluorescence microscopy imaging has long been complementary to DNA sequencing- and mass spectrometry–based omics in biomedical research, but these approaches are now converging. On the one hand, omics methods are moving from in vitro methods that average across large cell populations to in situ molecular characterization tools with single-cell sensitivity. On the other hand, fluorescence microscopy imaging has moved from a morphological description of tissues and cells to quantitative molecular profiling with single-molecule resolution. Recent technological developments underpinned by computational methods have started to blur the lines between imaging and omics and have made their direct correlation and seamless integration an exciting possibility. As this trend continues rapidly, it will allow us to create comprehensive molecular profiles of living systems with spatial and temporal context and subcellular resolution. Key to achieving this ambitious goal will be novel computational methods and successfully dealing with the challenges of data integration and sharing as well as cloud-enabled big data analysis.
荧光显微镜成像长期以来一直是生物医学研究中基于DNA测序和质谱的组学的补充,但这些方法现在正在融合。一方面,组学方法正在从在大细胞群体中平均的体外方法转向具有单细胞敏感性的原位分子表征工具。另一方面,荧光显微镜成像已经从组织和细胞的形态学描述转向单分子分辨率的定量分子图谱。最近以计算方法为基础的技术发展已经开始模糊成像和组学之间的界限,并使它们的直接相关性和无缝集成成为一种令人兴奋的可能性。随着这一趋势的迅速发展,它将使我们能够创建具有空间和时间背景以及亚细胞分辨率的生物系统的全面分子图谱。实现这一宏伟目标的关键将是新颖的计算方法,并成功应对数据集成和共享以及云计算大数据分析的挑战。
{"title":"Integrating Imaging and Omics: Computational Methods and Challenges","authors":"J. Hériché, S. Alexander, J. Ellenberg","doi":"10.1146/ANNUREV-BIODATASCI-080917-013328","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013328","url":null,"abstract":"Fluorescence microscopy imaging has long been complementary to DNA sequencing- and mass spectrometry–based omics in biomedical research, but these approaches are now converging. On the one hand, omics methods are moving from in vitro methods that average across large cell populations to in situ molecular characterization tools with single-cell sensitivity. On the other hand, fluorescence microscopy imaging has moved from a morphological description of tissues and cells to quantitative molecular profiling with single-molecule resolution. Recent technological developments underpinned by computational methods have started to blur the lines between imaging and omics and have made their direct correlation and seamless integration an exciting possibility. As this trend continues rapidly, it will allow us to create comprehensive molecular profiles of living systems with spatial and temporal context and subcellular resolution. Key to achieving this ambitious goal will be novel computational methods and successfully dealing with the challenges of data integration and sharing as well as cloud-enabled big data analysis.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013328","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43681862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Biomolecular Data Resources: Bioinformatics Infrastructure for Biomedical Data Science 生物分子数据资源:生物医学数据科学的生物信息学基础设施
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2019-07-22 DOI: 10.1146/ANNUREV-BIODATASCI-072018-021321
J. Vamathevan, R. Apweiler, E. Birney
Technological advances have continuously driven the generation of bio-molecular data and the development of bioinformatics infrastructure, which enables data reuse for scientific discovery. Several types of data management resources have arisen, such as data deposition databases, added-value databases or knowledgebases, and biology-driven portals. In this review, we provide a unique overview of the gradual evolution of these resources and discuss the goals and features that must be considered in their development. With the increasing application of genomics in the health care context and with 60 to 500 million whole genomes estimated to be sequenced by 2022, biomedical research infrastructure is transforming, too. Systems for federated access, portable tools, provision of reference data, and interpretation tools will enable researchers to derive maximal benefits from these data. Collaboration, coordination, and sustainability of data resources are key to ensure that biomedical knowledge management can scale with technology shifts and growing data volumes.
技术进步不断推动生物分子数据的生成和生物信息学基础设施的发展,使数据能够用于科学发现。出现了几种类型的数据管理资源,例如数据沉积数据库、增值数据库或知识库以及生物学驱动的门户。在这篇综述中,我们对这些资源的逐渐演变提供了一个独特的概述,并讨论了在开发中必须考虑的目标和特性。随着基因组学在医疗保健领域的应用越来越多,预计到2022年将测序6000万至5亿个全基因组,生物医学研究基础设施也在发生变化。用于联合访问的系统、便携式工具、提供参考数据和解释工具将使研究人员能够从这些数据中获得最大的好处。协作、协调和数据资源的可持续性是确保生物医学知识管理能够随着技术变化和数据量增长而扩展的关键。
{"title":"Biomolecular Data Resources: Bioinformatics Infrastructure for Biomedical Data Science","authors":"J. Vamathevan, R. Apweiler, E. Birney","doi":"10.1146/ANNUREV-BIODATASCI-072018-021321","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021321","url":null,"abstract":"Technological advances have continuously driven the generation of bio-molecular data and the development of bioinformatics infrastructure, which enables data reuse for scientific discovery. Several types of data management resources have arisen, such as data deposition databases, added-value databases or knowledgebases, and biology-driven portals. In this review, we provide a unique overview of the gradual evolution of these resources and discuss the goals and features that must be considered in their development. With the increasing application of genomics in the health care context and with 60 to 500 million whole genomes estimated to be sequenced by 2022, biomedical research infrastructure is transforming, too. Systems for federated access, portable tools, provision of reference data, and interpretation tools will enable researchers to derive maximal benefits from these data. Collaboration, coordination, and sustainability of data resources are key to ensure that biomedical knowledge management can scale with technology shifts and growing data volumes.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45228710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Connectivity Mapping: Methods and Applications 连通性映射:方法与应用
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2019-07-22 DOI: 10.1146/ANNUREV-BIODATASCI-072018-021211
A. Keenan, Megan L. Wojciechowicz, Zichen Wang, Kathleen M. Jagodnik, S. L. Jenkins, Alexander Lachmann, Avi Ma’ayan
Connectivity mapping resources consist of signatures representing changes in cellular state following systematic small-molecule, disease, gene, or other form of perturbations. Such resources enable the characterization of signatures from novel perturbations based on similarity; provide a global view of the space of many themed perturbations; and allow the ability to predict cellular, tissue, and organismal phenotypes for perturbagens. A signature search engine enables hypothesis generation by finding connections between query signatures and the database of signatures. This framework has been used to identify connections between small molecules and their targets, to discover cell-specific responses to perturbations and ways to reverse disease expression states with small molecules, and to predict small-molecule mimickers for existing drugs. This review provides a historical perspective and the current state of connectivity mapping resources with a focus on both methodology and community implementations.
连通性映射资源由表示系统小分子、疾病、基因或其他形式的扰动后细胞状态变化的特征组成。这样的资源使得能够基于相似性对来自新扰动的签名进行表征;提供许多主题扰动的空间的全局视图;并允许预测扰动的细胞、组织和生物体表型的能力。签名搜索引擎通过查找查询签名和签名数据库之间的连接来实现假设生成。该框架已被用于识别小分子与其靶标之间的联系,发现细胞对扰动的特异性反应以及用小分子逆转疾病表达状态的方法,并预测现有药物的小分子拟态物。这篇综述提供了连接映射资源的历史视角和当前状态,重点关注方法论和社区实现。
{"title":"Connectivity Mapping: Methods and Applications","authors":"A. Keenan, Megan L. Wojciechowicz, Zichen Wang, Kathleen M. Jagodnik, S. L. Jenkins, Alexander Lachmann, Avi Ma’ayan","doi":"10.1146/ANNUREV-BIODATASCI-072018-021211","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021211","url":null,"abstract":"Connectivity mapping resources consist of signatures representing changes in cellular state following systematic small-molecule, disease, gene, or other form of perturbations. Such resources enable the characterization of signatures from novel perturbations based on similarity; provide a global view of the space of many themed perturbations; and allow the ability to predict cellular, tissue, and organismal phenotypes for perturbagens. A signature search engine enables hypothesis generation by finding connections between query signatures and the database of signatures. This framework has been used to identify connections between small molecules and their targets, to discover cell-specific responses to perturbations and ways to reverse disease expression states with small molecules, and to predict small-molecule mimickers for existing drugs. This review provides a historical perspective and the current state of connectivity mapping resources with a focus on both methodology and community implementations.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021211","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49485099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Molecular Heterogeneity in Large-Scale Biological Data: Techniques and Applications 大规模生物学数据中的分子异质性:技术与应用
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2019-07-22 DOI: 10.1146/ANNUREV-BIODATASCI-072018-021339
C. Deng, Timothy P. Daley, G. Brandine, Andrew D. Smith
High-throughput sequencing technologies have evolved at a stellar pace for almost a decade and have greatly advanced our understanding of genome biology. In these sampling-based technologies, there is an important detail that is often overlooked in the analysis of the data and the design of the experiments, specifically that the sampled observations often do not give a representative picture of the underlying population. This has long been recognized as a problem in statistical ecology and in the broader statistics literature. In this review, we discuss the connections between these fields, methodological advances that parallel both the needs and opportunities of large-scale data analysis, and specific applications in modern biology. In the process we describe unique aspects of applying these approaches to sequencing technologies, including sequencing error, population and individual heterogeneity, and the design of experiments.
近十年来,高通量测序技术以惊人的速度发展,极大地促进了我们对基因组生物学的理解。在这些基于抽样的技术中,有一个重要的细节在数据分析和实验设计中经常被忽视,特别是抽样观察结果通常不能给出潜在人群的代表性图像。这早已被认为是统计生态学和更广泛的统计文献中的一个问题。在这篇综述中,我们讨论了这些领域之间的联系,平行于大规模数据分析的需求和机会的方法进展,以及在现代生物学中的具体应用。在此过程中,我们描述了将这些方法应用于测序技术的独特方面,包括测序误差,群体和个体异质性以及实验设计。
{"title":"Molecular Heterogeneity in Large-Scale Biological Data: Techniques and Applications","authors":"C. Deng, Timothy P. Daley, G. Brandine, Andrew D. Smith","doi":"10.1146/ANNUREV-BIODATASCI-072018-021339","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021339","url":null,"abstract":"High-throughput sequencing technologies have evolved at a stellar pace for almost a decade and have greatly advanced our understanding of genome biology. In these sampling-based technologies, there is an important detail that is often overlooked in the analysis of the data and the design of the experiments, specifically that the sampled observations often do not give a representative picture of the underlying population. This has long been recognized as a problem in statistical ecology and in the broader statistics literature. In this review, we discuss the connections between these fields, methodological advances that parallel both the needs and opportunities of large-scale data analysis, and specific applications in modern biology. In the process we describe unique aspects of applying these approaches to sequencing technologies, including sequencing error, population and individual heterogeneity, and the design of experiments.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021339","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44142841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Imaging, Visualization, and Computation in Developmental Biology 发育生物学中的成像、可视化和计算
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2019-07-22 DOI: 10.1146/ANNUREV-BIODATASCI-072018-021305
F. Cutrale, S. Fraser, Le A. Trinh
Embryonic development is highly complex and dynamic, requiring the coordination of numerous molecular and cellular events at precise times and places. Advances in imaging technology have made it possible to follow developmental processes at cellular, tissue, and organ levels over time as they take place in the intact embryo. Parallel innovations of in vivo probes permit imaging to report on molecular, physiological, and anatomical events of embryogenesis, but the resulting multidimensional data sets pose significant challenges for extracting knowledge. In this review, we discuss recent and emerging advances in imaging technologies, in vivo labeling, and data processing that offer the greatest potential for jointly deciphering the intricate cellular dynamics and the underlying molecular mechanisms. Our discussion of the emerging area of “image-omics” highlights both the challenges of data analysis and the promise of more fully embracing computation and data science for rapidly advancing our understanding of biology.
胚胎发育是高度复杂和动态的,需要在精确的时间和地点协调许多分子和细胞事件。成像技术的进步使得在细胞、组织和器官水平上跟踪完整胚胎的发育过程成为可能。体内探针的平行创新允许成像报告胚胎发生的分子、生理和解剖事件,但由此产生的多维数据集对提取知识构成了重大挑战。在这篇综述中,我们讨论了成像技术、体内标记和数据处理方面的最新进展,这些进展为共同破译复杂的细胞动力学和潜在的分子机制提供了最大的潜力。我们对“图像组学”这一新兴领域的讨论强调了数据分析的挑战,以及更全面地拥抱计算和数据科学以快速推进我们对生物学的理解的承诺。
{"title":"Imaging, Visualization, and Computation in Developmental Biology","authors":"F. Cutrale, S. Fraser, Le A. Trinh","doi":"10.1146/ANNUREV-BIODATASCI-072018-021305","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021305","url":null,"abstract":"Embryonic development is highly complex and dynamic, requiring the coordination of numerous molecular and cellular events at precise times and places. Advances in imaging technology have made it possible to follow developmental processes at cellular, tissue, and organ levels over time as they take place in the intact embryo. Parallel innovations of in vivo probes permit imaging to report on molecular, physiological, and anatomical events of embryogenesis, but the resulting multidimensional data sets pose significant challenges for extracting knowledge. In this review, we discuss recent and emerging advances in imaging technologies, in vivo labeling, and data processing that offer the greatest potential for jointly deciphering the intricate cellular dynamics and the underlying molecular mechanisms. Our discussion of the emerging area of “image-omics” highlights both the challenges of data analysis and the promise of more fully embracing computation and data science for rapidly advancing our understanding of biology.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021305","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47191858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Discovering Pathway and Cell Type Signatures in Transcriptomic Compendia with Machine Learning 利用机器学习发现转录组简编中的通路和细胞类型特征
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2019-07-22 DOI: 10.1146/ANNUREV-BIODATASCI-072018-021348
G. Way, C. Greene
Pathway and cell type signatures are patterns present in transcriptome data that are associated with biological processes or phenotypic consequences. These signatures result from specific cell type and pathway expression but can require large transcriptomic compendia to detect. Machine learning techniques can be powerful tools for signature discovery through their ability to provide accurate and interpretable results. In this review, we discuss various machine learning applications to extract pathway and cell type signatures from transcriptomic compendia. We focus on the biological motivations and interpretation for both supervised and unsupervised learning approaches in this setting. We consider recent advances, including deep learning, and their applications to expanding bulk and single-cell RNA data. As data and computational resources increase, there will be more opportunities for machine learning to aid in revealing biological signatures.
通路和细胞类型特征是转录组数据中存在的与生物学过程或表型结果相关的模式。这些特征来自特定的细胞类型和途径表达,但可能需要大量的转录组药典来检测。机器学习技术通过其提供准确和可解释结果的能力,可以成为签名发现的强大工具。在这篇综述中,我们讨论了机器学习在从转录组药典中提取途径和细胞类型特征方面的各种应用。在这种情况下,我们关注监督和非监督学习方法的生物学动机和解释。我们考虑了包括深度学习在内的最新进展,以及它们在扩展大量和单细胞RNA数据方面的应用。随着数据和计算资源的增加,机器学习将有更多的机会来帮助揭示生物特征。
{"title":"Discovering Pathway and Cell Type Signatures in Transcriptomic Compendia with Machine Learning","authors":"G. Way, C. Greene","doi":"10.1146/ANNUREV-BIODATASCI-072018-021348","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021348","url":null,"abstract":"Pathway and cell type signatures are patterns present in transcriptome data that are associated with biological processes or phenotypic consequences. These signatures result from specific cell type and pathway expression but can require large transcriptomic compendia to detect. Machine learning techniques can be powerful tools for signature discovery through their ability to provide accurate and interpretable results. In this review, we discuss various machine learning applications to extract pathway and cell type signatures from transcriptomic compendia. We focus on the biological motivations and interpretation for both supervised and unsupervised learning approaches in this setting. We consider recent advances, including deep learning, and their applications to expanding bulk and single-cell RNA data. As data and computational resources increase, there will be more opportunities for machine learning to aid in revealing biological signatures.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021348","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46673466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Sketching and Sublinear Data Structures in Genomics 基因组学中的草图和亚线性数据结构
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2019-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-072018-021156
G. Marçais, Brad Solomon, Robert Patro, Carl Kingsford
Large-scale genomics demands computational methods that scale sublinearly with the growth of data. We review several data structures and sketching techniques that have been used in genomic analysis methods. Specifically, we focus on four key ideas that take different approaches to achieve sublinear space usage and processing time: compressed full-text indices, approximate membership query data structures, locality-sensitive hashing, and minimizers schemes. We describe these techniques at a high level and give several representative applications of each.
大规模基因组学需要随着数据的增长而亚线性扩展的计算方法。我们回顾了基因组分析方法中使用的几种数据结构和绘制技术。具体来说,我们关注四个关键思想,它们采用不同的方法来实现次线性空间的使用和处理时间:压缩全文索引、近似成员查询数据结构、位置敏感哈希和最小化方案。我们对这些技术进行了高水平的描述,并给出了每种技术的几个代表性应用。
{"title":"Sketching and Sublinear Data Structures in Genomics","authors":"G. Marçais, Brad Solomon, Robert Patro, Carl Kingsford","doi":"10.1146/ANNUREV-BIODATASCI-072018-021156","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021156","url":null,"abstract":"Large-scale genomics demands computational methods that scale sublinearly with the growth of data. We review several data structures and sketching techniques that have been used in genomic analysis methods. Specifically, we focus on four key ideas that take different approaches to achieve sublinear space usage and processing time: compressed full-text indices, approximate membership query data structures, locality-sensitive hashing, and minimizers schemes. We describe these techniques at a high level and give several representative applications of each.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021156","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41454479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Genomic Data Compression 基因组数据压缩
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2019-07-20 DOI: 10.1146/ANNUREV-BIODATASCI-072018-021229
M. Hernaez, Dmitri S. Pavlichin, T. Weissman, Idoia Ochoa
Recently, there has been growing interest in genome sequencing, driven by advances in sequencing technology, in terms of both efficiency and affordability. These developments have allowed many to envision whole-genome sequencing as an invaluable tool for both personalized medical care and public health. As a result, increasingly large and ubiquitous genomic data sets are being generated. This poses a significant challenge for the storage and transmission of these data. Already, it is more expensive to store genomic data for a decade than it is to obtain the data in the first place. This situation calls for efficient representations of genomic information. In this review, we emphasize the need for designing specialized compressors tailored to genomic data and describe the main solutions already proposed. We also give general guidelines for storing these data and conclude with our thoughts on the future of genomic formats and compressors.
最近,由于测序技术的进步,人们对基因组测序在效率和可负担性方面的兴趣越来越大。这些发展使许多人能够将全基因组测序视为个性化医疗和公共卫生的宝贵工具。因此,正在生成越来越大且普遍存在的基因组数据集。这对这些数据的存储和传输提出了重大挑战。现在,将基因组数据存储十年的成本已经高于最初获得数据的成本。这种情况要求有效地表示基因组信息。在这篇综述中,我们强调了根据基因组数据设计专用压缩机的必要性,并描述了已经提出的主要解决方案。我们还给出了存储这些数据的一般指南,并总结了我们对基因组格式和压缩器未来的想法。
{"title":"Genomic Data Compression","authors":"M. Hernaez, Dmitri S. Pavlichin, T. Weissman, Idoia Ochoa","doi":"10.1146/ANNUREV-BIODATASCI-072018-021229","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021229","url":null,"abstract":"Recently, there has been growing interest in genome sequencing, driven by advances in sequencing technology, in terms of both efficiency and affordability. These developments have allowed many to envision whole-genome sequencing as an invaluable tool for both personalized medical care and public health. As a result, increasingly large and ubiquitous genomic data sets are being generated. This poses a significant challenge for the storage and transmission of these data. Already, it is more expensive to store genomic data for a decade than it is to obtain the data in the first place. This situation calls for efficient representations of genomic information. In this review, we emphasize the need for designing specialized compressors tailored to genomic data and describe the main solutions already proposed. We also give general guidelines for storing these data and conclude with our thoughts on the future of genomic formats and compressors.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2019-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021229","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46626764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Scientific Discovery Games for Biomedical Research. 生物医学研究的科学发现游戏。
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2019-07-01 DOI: 10.1146/annurev-biodatasci-072018-021139
Rhiju Das, Benjamin Keep, Peter Washington, Ingmar H Riedel-Kruse

Over the past decade, scientific discovery games (SDGs) have emerged as a viable approach for biomedical research, engaging hundreds of thousands of volunteer players and resulting in numerous scientific publications. After describing the origins of this novel research approach, we review the scientific output of SDGs across molecular modeling, sequence alignment, neuroscience, pathology, cellular biology, genomics, and human cognition. We find compelling results and technical innovations arising in problem-oriented games such as Foldit and Eterna and in data-oriented games such as EyeWire and Project Discovery. We discuss emergent properties of player communities shared across different projects, including the diversity of communities and the extraordinary contributions of some volunteers, such as paper writing. Finally, we highlight connections to artificial intelligence, biological cloud laboratories, new game genres, science education, and open science that may drive the next generation of SDGs.

在过去的十年里,科学发现游戏(sdg)已经成为生物医学研究的一种可行方法,吸引了成千上万的志愿者参与,并产生了大量的科学出版物。在描述了这种新颖研究方法的起源之后,我们回顾了可持续发展目标在分子建模、序列比对、神经科学、病理学、细胞生物学、基因组学和人类认知方面的科学成果。我们发现在面向问题的游戏(如《Foldit》和《Eterna》)以及面向数据的游戏(如《EyeWire》和《Project Discovery》)中出现了引人注目的结果和技术创新。我们将讨论不同项目中共享的玩家社区的突发性属性,包括社区的多样性和一些志愿者的杰出贡献,如论文撰写。最后,我们强调了与人工智能、生物云实验室、新游戏类型、科学教育和开放科学之间的联系,这些联系可能会推动下一代可持续发展目标的实现。
{"title":"Scientific Discovery Games for Biomedical Research.","authors":"Rhiju Das,&nbsp;Benjamin Keep,&nbsp;Peter Washington,&nbsp;Ingmar H Riedel-Kruse","doi":"10.1146/annurev-biodatasci-072018-021139","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-072018-021139","url":null,"abstract":"<p><p>Over the past decade, scientific discovery games (SDGs) have emerged as a viable approach for biomedical research, engaging hundreds of thousands of volunteer players and resulting in numerous scientific publications. After describing the origins of this novel research approach, we review the scientific output of SDGs across molecular modeling, sequence alignment, neuroscience, pathology, cellular biology, genomics, and human cognition. We find compelling results and technical innovations arising in problem-oriented games such as Foldit and Eterna and in data-oriented games such as EyeWire and Project Discovery. We discuss emergent properties of player communities shared across different projects, including the diversity of communities and the extraordinary contributions of some volunteers, such as paper writing. Finally, we highlight connections to artificial intelligence, biological cloud laboratories, new game genres, science education, and open science that may drive the next generation of SDGs.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"2 1","pages":"253-279"},"PeriodicalIF":6.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/annurev-biodatasci-072018-021139","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39221797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis RNA测序数据:表达分析的漫游指南
IF 6 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2018-10-17 DOI: 10.1146/ANNUREV-BIODATASCI-072018-021255
K. Van den Berge, Katharina M. Hembach, C. Soneson, S. Tiberi, L. Clement, M. Love, Robert Patro, M. Robinson
Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.
基因表达是观察各种遗传和调控程序结果的基本水平。在短短几年内,转录组基因表达的测量已经令人信服地从微阵列转变为测序。RNA测序(RNA-seq)为大规模分析转录结果提供了一个定量和开放的系统,因此促进了广泛的应用,包括基础科学研究,但也包括农业或临床情况。在过去10年左右的时间里,人们对RNA-seq数据集的特征以及所开发的无数方法的性能有了很多了解。在这篇综述中,我们概述了RNA-seq数据分析的发展,包括实验设计,明确关注基因表达的量化和差异表达的统计方法。我们还强调了新兴的数据类型,如单细胞RNA-seq和使用长读技术的基因表达谱。
{"title":"RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis","authors":"K. Van den Berge, Katharina M. Hembach, C. Soneson, S. Tiberi, L. Clement, M. Love, Robert Patro, M. Robinson","doi":"10.1146/ANNUREV-BIODATASCI-072018-021255","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021255","url":null,"abstract":"Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021255","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48762878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 83
期刊
Annual Review of Biomedical Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1