Michail Patsakis, Kimonas Provatas, Fotis A Baltoumas, Nikol Chantzi, Ioannis Mouratidis, Georgios A Pavlopoulos, Ilias Georgakopoulos-Soares
{"title":"MAFin: Motif Detection in Multiple Alignment Files.","authors":"Michail Patsakis, Kimonas Provatas, Fotis A Baltoumas, Nikol Chantzi, Ioannis Mouratidis, Georgios A Pavlopoulos, Ilias Georgakopoulos-Soares","doi":"10.1093/bioinformatics/btaf125","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Whole Genome and Proteome Alignments, represented by the Multiple Alignment File (MAF) format, have become a standard approach in comparative genomics and proteomics. These often require identifying conserved motifs, which is crucial for understanding functional and evolutionary relationships. However, current approaches lack a direct method for motif detection within MAF files. We present MAFin, a novel tool that enables efficient motif detection and conservation analysis in MAF files to address this gap, streamlining genomic and proteomic research.</p><p><strong>Results: </strong>We developed MAFin, the first motif detection tool for Multiple Alignment Format files. MAFin enables the multithreaded search of conserved motifs using three approaches: 1) using user-specified k-mers to search the sequences. 2) with regular expressions, in which case one or more patterns are searched, and 3) with predefined Position Weight Matrices. Once the motif has been found, MAFin detects the motif instances and calculates the conservation across the aligned sequences. MAFin also calculates a conservation percentage, which provides information about the conservation levels of each motif across the aligned sequences, based on the number of matches relative to the length of the motif. A set of statistics enables the interpretation of each motif's conservation level, and the detected motifs are exported in JSON and CSV files for downstream analyses.</p><p><strong>Availability: </strong>MAFin is offered as a Python package under the GPL license as a multi-platform application and is available at: https://github.com/Georgakopoulos-Soares-lab/MAFin.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Whole Genome and Proteome Alignments, represented by the Multiple Alignment File (MAF) format, have become a standard approach in comparative genomics and proteomics. These often require identifying conserved motifs, which is crucial for understanding functional and evolutionary relationships. However, current approaches lack a direct method for motif detection within MAF files. We present MAFin, a novel tool that enables efficient motif detection and conservation analysis in MAF files to address this gap, streamlining genomic and proteomic research.
Results: We developed MAFin, the first motif detection tool for Multiple Alignment Format files. MAFin enables the multithreaded search of conserved motifs using three approaches: 1) using user-specified k-mers to search the sequences. 2) with regular expressions, in which case one or more patterns are searched, and 3) with predefined Position Weight Matrices. Once the motif has been found, MAFin detects the motif instances and calculates the conservation across the aligned sequences. MAFin also calculates a conservation percentage, which provides information about the conservation levels of each motif across the aligned sequences, based on the number of matches relative to the length of the motif. A set of statistics enables the interpretation of each motif's conservation level, and the detected motifs are exported in JSON and CSV files for downstream analyses.
Availability: MAFin is offered as a Python package under the GPL license as a multi-platform application and is available at: https://github.com/Georgakopoulos-Soares-lab/MAFin.
Supplementary information: Supplementary data are available at Bioinformatics online.