Guide to k-mer approaches for genomics across the tree of life

arXiv - QuanBio - Genomics Pub Date : 2024-04-01 DOI:arxiv-2404.01519

Katharine M. Jenike, Lucía Campos-Domínguez, Marilou Boddé, José Cerca, Christina N. Hodson, Michael C. Schatz, Kamil S. Jaron

{"title":"Guide to k-mer approaches for genomics across the tree of life","authors":"Katharine M. Jenike, Lucía Campos-Domínguez, Marilou Boddé, José Cerca, Christina N. Hodson, Michael C. Schatz, Kamil S. Jaron","doi":"arxiv-2404.01519","DOIUrl":null,"url":null,"abstract":"The wide array of currently available genomes display a wonderful diversity\nin size, composition and structure with many more to come thanks to several\nglobal biodiversity genomics initiatives starting in recent years. However,\nsequencing of genomes, even with all the recent advances, can still be\nchallenging for both technical (e.g. small physical size, contaminated samples,\nor access to appropriate sequencing platforms) and biological reasons (e.g.\ngermline restricted DNA, variable ploidy levels, sex chromosomes, or very large\ngenomes). In recent years, k-mer-based techniques have become popular to\novercome some of these challenges. They are based on the simple process of\ndividing the analysed sequences (e.g. raw reads or genomes) into a set of\nsub-sequences of length k, called k-mers. Despite this apparent simplicity,\nk-mer-based analysis allows for a rapid and intuitive assessment of complex\nsequencing datasets. Here, we provide the first comprehensive review to the\ntheoretical properties and practical applications of k-mers in biodiversity\ngenomics, serving as a reference manual for this powerful approach.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"42 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.01519","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The wide array of currently available genomes display a wonderful diversity in size, composition and structure with many more to come thanks to several global biodiversity genomics initiatives starting in recent years. However, sequencing of genomes, even with all the recent advances, can still be challenging for both technical (e.g. small physical size, contaminated samples, or access to appropriate sequencing platforms) and biological reasons (e.g. germline restricted DNA, variable ploidy levels, sex chromosomes, or very large genomes). In recent years, k-mer-based techniques have become popular to overcome some of these challenges. They are based on the simple process of dividing the analysed sequences (e.g. raw reads or genomes) into a set of sub-sequences of length k, called k-mers. Despite this apparent simplicity, k-mer-based analysis allows for a rapid and intuitive assessment of complex sequencing datasets. Here, we provide the first comprehensive review to the theoretical properties and practical applications of k-mers in biodiversity genomics, serving as a reference manual for this powerful approach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

跨生命树基因组学 k-mer 方法指南

由于近年来开始的一些全球生物多样性基因组学计划，目前可用的大量基因组在大小、组成和结构上都呈现出了奇妙的多样性，而且还有更多的基因组即将问世。然而，即使基因组测序取得了最新进展，但由于技术（如物理尺寸小、样本受污染或难以获得合适的测序平台）和生物学（如种系受限 DNA、倍性水平不一、性染色体或超大基因组）等原因，测序工作仍然充满挑战。近年来，基于 k-mer的技术开始流行，以克服其中的一些挑战。它们的基础是将分析序列（如原始读数或基因组）划分为一组长度为 k 的子序列（称为 k-mers）的简单过程。尽管表面上看似简单，但基于 k 分子的分析可以快速、直观地评估复杂的测序数据集。在这里，我们首次全面评述了生物多样性基因组学中 k 分子的理论特性和实际应用，为这种强大的方法提供了参考手册。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - QuanBio - Genomics

自引率

0.00%

发文量