跨生命树基因组学 k-mer 方法指南

Katharine M. Jenike, Lucía Campos-Domínguez, Marilou Boddé, José Cerca, Christina N. Hodson, Michael C. Schatz, Kamil S. Jaron
{"title":"跨生命树基因组学 k-mer 方法指南","authors":"Katharine M. Jenike, Lucía Campos-Domínguez, Marilou Boddé, José Cerca, Christina N. Hodson, Michael C. Schatz, Kamil S. Jaron","doi":"arxiv-2404.01519","DOIUrl":null,"url":null,"abstract":"The wide array of currently available genomes display a wonderful diversity\nin size, composition and structure with many more to come thanks to several\nglobal biodiversity genomics initiatives starting in recent years. However,\nsequencing of genomes, even with all the recent advances, can still be\nchallenging for both technical (e.g. small physical size, contaminated samples,\nor access to appropriate sequencing platforms) and biological reasons (e.g.\ngermline restricted DNA, variable ploidy levels, sex chromosomes, or very large\ngenomes). In recent years, k-mer-based techniques have become popular to\novercome some of these challenges. They are based on the simple process of\ndividing the analysed sequences (e.g. raw reads or genomes) into a set of\nsub-sequences of length k, called k-mers. Despite this apparent simplicity,\nk-mer-based analysis allows for a rapid and intuitive assessment of complex\nsequencing datasets. Here, we provide the first comprehensive review to the\ntheoretical properties and practical applications of k-mers in biodiversity\ngenomics, serving as a reference manual for this powerful approach.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"42 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Guide to k-mer approaches for genomics across the tree of life\",\"authors\":\"Katharine M. Jenike, Lucía Campos-Domínguez, Marilou Boddé, José Cerca, Christina N. Hodson, Michael C. Schatz, Kamil S. Jaron\",\"doi\":\"arxiv-2404.01519\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The wide array of currently available genomes display a wonderful diversity\\nin size, composition and structure with many more to come thanks to several\\nglobal biodiversity genomics initiatives starting in recent years. However,\\nsequencing of genomes, even with all the recent advances, can still be\\nchallenging for both technical (e.g. small physical size, contaminated samples,\\nor access to appropriate sequencing platforms) and biological reasons (e.g.\\ngermline restricted DNA, variable ploidy levels, sex chromosomes, or very large\\ngenomes). In recent years, k-mer-based techniques have become popular to\\novercome some of these challenges. They are based on the simple process of\\ndividing the analysed sequences (e.g. raw reads or genomes) into a set of\\nsub-sequences of length k, called k-mers. Despite this apparent simplicity,\\nk-mer-based analysis allows for a rapid and intuitive assessment of complex\\nsequencing datasets. Here, we provide the first comprehensive review to the\\ntheoretical properties and practical applications of k-mers in biodiversity\\ngenomics, serving as a reference manual for this powerful approach.\",\"PeriodicalId\":501070,\"journal\":{\"name\":\"arXiv - QuanBio - Genomics\",\"volume\":\"42 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Genomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2404.01519\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.01519","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

由于近年来开始的一些全球生物多样性基因组学计划,目前可用的大量基因组在大小、组成和结构上都呈现出了奇妙的多样性,而且还有更多的基因组即将问世。然而,即使基因组测序取得了最新进展,但由于技术(如物理尺寸小、样本受污染或难以获得合适的测序平台)和生物学(如种系受限 DNA、倍性水平不一、性染色体或超大基因组)等原因,测序工作仍然充满挑战。近年来,基于 k-mer的技术开始流行,以克服其中的一些挑战。它们的基础是将分析序列(如原始读数或基因组)划分为一组长度为 k 的子序列(称为 k-mers)的简单过程。尽管表面上看似简单,但基于 k 分子的分析可以快速、直观地评估复杂的测序数据集。在这里,我们首次全面评述了生物多样性基因组学中 k 分子的理论特性和实际应用,为这种强大的方法提供了参考手册。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Guide to k-mer approaches for genomics across the tree of life
The wide array of currently available genomes display a wonderful diversity in size, composition and structure with many more to come thanks to several global biodiversity genomics initiatives starting in recent years. However, sequencing of genomes, even with all the recent advances, can still be challenging for both technical (e.g. small physical size, contaminated samples, or access to appropriate sequencing platforms) and biological reasons (e.g. germline restricted DNA, variable ploidy levels, sex chromosomes, or very large genomes). In recent years, k-mer-based techniques have become popular to overcome some of these challenges. They are based on the simple process of dividing the analysed sequences (e.g. raw reads or genomes) into a set of sub-sequences of length k, called k-mers. Despite this apparent simplicity, k-mer-based analysis allows for a rapid and intuitive assessment of complex sequencing datasets. Here, we provide the first comprehensive review to the theoretical properties and practical applications of k-mers in biodiversity genomics, serving as a reference manual for this powerful approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Allium Vegetables Intake and Digestive System Cancer Risk: A Study Based on Mendelian Randomization, Network Pharmacology and Molecular Docking wgatools: an ultrafast toolkit for manipulating whole genome alignments Selecting Differential Splicing Methods: Practical Considerations Advancements in colored k-mer sets: essentials for the curious Advancements in practical k-mer sets: essentials for the curious
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1