Computational graph pangenomics: a tutorial on data structures and their applications.

IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Natural Computing Pub Date : 2022-03-01 Epub Date: 2022-03-04 DOI:10.1007/s11047-022-09882-6
Jasmijn A Baaijens, Paola Bonizzoni, Christina Boucher, Gianluca Della Vedova, Yuri Pirola, Raffaella Rizzi, Jouni Sirén
{"title":"Computational graph pangenomics: a tutorial on data structures and their applications.","authors":"Jasmijn A Baaijens, Paola Bonizzoni, Christina Boucher, Gianluca Della Vedova, Yuri Pirola, Raffaella Rizzi, Jouni Sirén","doi":"10.1007/s11047-022-09882-6","DOIUrl":null,"url":null,"abstract":"<p><p>Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations-thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or <i>a graph pangenome</i>, is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of <i>haplotypes</i> and the variability of <i>genotypes</i> in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.</p>","PeriodicalId":49783,"journal":{"name":"Natural Computing","volume":"21 1","pages":"81-108"},"PeriodicalIF":1.7000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10038355/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11047-022-09882-6","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/3/4 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations-thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome, is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
计算图泛函学:数据结构及其应用教程。
计算庞基因组学是一个新兴的研究领域,它正在改变计算机科学家应对生物序列分析挑战的方式。在过去几十年中,组合学、弦学、图论和数据结构的贡献对于开发大量用于分析人类基因组的软件工具至关重要。这些工具使计算生物学家能够在群体规模上开展雄心勃勃的项目,如 "千人基因组计划"。千人基因组计划的一大贡献是描述了人类基因组中广泛的遗传变异,包括在南亚、非洲和欧洲人群中发现了新的变异,从而加强了参考基因组中的变异目录。目前,在个性化医疗方法中需要考虑群体基因组的高变异性和个体基因组的特异性,这正迅速促使人们放弃使用单一参考基因组的传统模式。基于图谱的多基因组表示法或图谱泛基因组正在取代线性参考基因组。这意味着要彻底重新思考分析、存储和获取基因组信息的既定程序。正确应对这些挑战对于面对雄心勃勃的医疗保健项目的计算任务至关重要,这些项目旨在通过对 100 万人进行测序来描述人类的多样性(Stark 等,2019 年)。本教程旨在向读者介绍用于表示图谱泛基因组的数据结构理论的最新进展。我们将讨论图形泛基因组中单体型和基因型变异性的高效表示,并重点介绍在解决人类和微生物(病毒)泛基因组计算问题中的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Natural Computing
Natural Computing Computer Science-Computer Science Applications
CiteScore
4.40
自引率
4.80%
发文量
49
审稿时长
3 months
期刊介绍: The journal is soliciting papers on all aspects of natural computing. Because of the interdisciplinary character of the journal a special effort will be made to solicit survey, review, and tutorial papers which would make research trends in a given subarea more accessible to the broad audience of the journal.
期刊最新文献
Real-time computing and robust memory with deterministic chemical reaction networks Integrated dynamic spiking neural P systems for fault line selection in distribution network Reaction mining for reaction systems Melding Boolean networks and reaction systems under synchronous, asynchronous and most permissive semantics Distinguishing genelet circuit input pulses via a pulse detector
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1