MAFin: motif detection in multiple alignment files.

Michail Patsakis, Kimonas Provatas, Fotis A Baltoumas, Nikol Chantzi, Ioannis Mouratidis, Georgios A Pavlopoulos, Ilias Georgakopoulos-Soares
{"title":"MAFin: motif detection in multiple alignment files.","authors":"Michail Patsakis, Kimonas Provatas, Fotis A Baltoumas, Nikol Chantzi, Ioannis Mouratidis, Georgios A Pavlopoulos, Ilias Georgakopoulos-Soares","doi":"10.1093/bioinformatics/btaf125","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Whole Genome and Proteome Alignments, represented by the multiple alignment file format, have become a standard approach in comparative genomics and proteomics. These often require identifying conserved motifs, which is crucial for understanding functional and evolutionary relationships. However, current approaches lack a direct method for motif detection within MAF files. We present MAFin, a novel tool that enables efficient motif detection and conservation analysis in MAF files to address this gap, streamlining genomic and proteomic research.</p><p><strong>Results: </strong>We developed MAFin, the first motif detection tool for Multiple Alignment Format files. MAFin enables the multithreaded search of conserved motifs using three approaches: (i) using user-specified k-mers to search the sequences. (ii) with regular expressions, in which case one or more patterns are searched, and (iii) with predefined Position Weight Matrices. Once the motif has been found, MAFin detects the motif instances and calculates the conservation across the aligned sequences. MAFin also calculates a conservation percentage, which provides information about the conservation levels of each motif across the aligned sequences, based on the number of matches relative to the length of the motif. A set of statistics enables the interpretation of each motif's conservation level, and the detected motifs are exported in JSON and CSV files for downstream analyses.</p><p><strong>Availability and implementation: </strong>MAFin is offered as a Python package under the GPL license as a multi-platform application and is available at: https://github.com/Georgakopoulos-Soares-lab/MAFin.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11978385/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Whole Genome and Proteome Alignments, represented by the multiple alignment file format, have become a standard approach in comparative genomics and proteomics. These often require identifying conserved motifs, which is crucial for understanding functional and evolutionary relationships. However, current approaches lack a direct method for motif detection within MAF files. We present MAFin, a novel tool that enables efficient motif detection and conservation analysis in MAF files to address this gap, streamlining genomic and proteomic research.

Results: We developed MAFin, the first motif detection tool for Multiple Alignment Format files. MAFin enables the multithreaded search of conserved motifs using three approaches: (i) using user-specified k-mers to search the sequences. (ii) with regular expressions, in which case one or more patterns are searched, and (iii) with predefined Position Weight Matrices. Once the motif has been found, MAFin detects the motif instances and calculates the conservation across the aligned sequences. MAFin also calculates a conservation percentage, which provides information about the conservation levels of each motif across the aligned sequences, based on the number of matches relative to the length of the motif. A set of statistics enables the interpretation of each motif's conservation level, and the detected motifs are exported in JSON and CSV files for downstream analyses.

Availability and implementation: MAFin is offered as a Python package under the GPL license as a multi-platform application and is available at: https://github.com/Georgakopoulos-Soares-lab/MAFin.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多对齐文件中的基序检测。
动机:以多重比对文件(Multiple Alignment File, MAF)格式为代表的全基因组和蛋白质组比对已经成为比较基因组学和蛋白质组学的标准方法。这些通常需要确定保守的基序,这对于理解功能和进化关系至关重要。然而,目前的方法缺乏MAF文件中基序检测的直接方法。我们提出MAFin,一种新颖的工具,可以在MAF文件中进行有效的基序检测和保守分析,以解决这一差距,简化基因组和蛋白质组学研究。结果:我们开发了首个多序列格式文件基序检测工具MAFin。MAFin支持使用三种方法对保守基序进行多线程搜索:1)使用用户指定的k-mers来搜索序列。2)使用正则表达式,在这种情况下搜索一个或多个模式,以及3)使用预定义的位置权重矩阵。一旦找到了motif, MAFin就会检测motif实例并计算对齐序列之间的守恒。MAFin还计算出一个守恒百分比,该百分比基于相对于基序长度的匹配数量,提供了关于对齐序列中每个基序的守恒水平的信息。一组统计数据可以解释每个基序的保护水平,检测到的基序以JSON和CSV文件导出,供下游分析。可用性:MAFin是在GPL许可下作为多平台应用程序作为Python包提供的,可在:https://github.com/Georgakopoulos-Soares-lab/MAFin.Supplementary获取信息:补充数据可在Bioinformatics在线获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Harvesting more reads from single-cell combinatorial barcoding data with scarecrow. DTH: A nonparametric test for homogeneity of multivariate dispersions. MDCompress: better, faster compression of molecular dynamics simulation trajectories. Accelerated long-read variant calling with Clair3 for whole-genome sequencing. DrugBLIP: Exploring the Protein-Molecule Interaction Mechanisms with a Multi-task Learning Graph Transformer.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1