Dylan Lebatteux, Hugo Soudeyns, I. Boucoiran, S. Gantt, Abdoulaye Baniré Diallo
{"title":"KANALYZER: a method to identify variations of discriminative k-mers in genomic sequences","authors":"Dylan Lebatteux, Hugo Soudeyns, I. Boucoiran, S. Gantt, Abdoulaye Baniré Diallo","doi":"10.1109/BIBM55620.2022.9995370","DOIUrl":null,"url":null,"abstract":"Discriminative k-mers are unique genomic regions that characterize a given viral family, genus, species, or variant. Most existing algorithms for identifying discriminative k-mer sets are limited to returning raw sub-sequences. However, to explain the discriminative properties of a given k-mer for specific taxonomic groups of viruses, it is important to identify the variations (nucleotide sequences derived from an initial k-mer having undergone one or more nucleotide changes) of this k-mer that occur in other groups of viruses. These variations as well as their frequencies of occurrence, their genomic location and their potential influence on biological functions r epresent important insights to understand the classification process. In this article, we introduce KANALYZER, a novel algorithm to identify variations of discriminative k-mers and associated information according to viral taxonomy. The algorithm was assessed to identify k-mer variations in both simulated and real viral sequence sets. In these evaluations, KANALYZER correctly and quickly identified over 95% of the variations and associated information. KANALYZER algorithm is integrated directly into CASTOR-KRFE discriminative k-mers identification tool pipeline. The source code, detailed results and data to reproduce the experiments are available at https://github.com/bioinfoUQAM/CASTOR_KRFE.","PeriodicalId":210337,"journal":{"name":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM55620.2022.9995370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Discriminative k-mers are unique genomic regions that characterize a given viral family, genus, species, or variant. Most existing algorithms for identifying discriminative k-mer sets are limited to returning raw sub-sequences. However, to explain the discriminative properties of a given k-mer for specific taxonomic groups of viruses, it is important to identify the variations (nucleotide sequences derived from an initial k-mer having undergone one or more nucleotide changes) of this k-mer that occur in other groups of viruses. These variations as well as their frequencies of occurrence, their genomic location and their potential influence on biological functions r epresent important insights to understand the classification process. In this article, we introduce KANALYZER, a novel algorithm to identify variations of discriminative k-mers and associated information according to viral taxonomy. The algorithm was assessed to identify k-mer variations in both simulated and real viral sequence sets. In these evaluations, KANALYZER correctly and quickly identified over 95% of the variations and associated information. KANALYZER algorithm is integrated directly into CASTOR-KRFE discriminative k-mers identification tool pipeline. The source code, detailed results and data to reproduce the experiments are available at https://github.com/bioinfoUQAM/CASTOR_KRFE.