The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes.

IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS BMC Bioinformatics Pub Date : 2024-12-02 DOI:10.1186/s12859-024-05995-0
Dinithi V Wanniarachchi, Sameera Viswakula, Anushka M Wickramasuriya
{"title":"The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes.","authors":"Dinithi V Wanniarachchi, Sameera Viswakula, Anushka M Wickramasuriya","doi":"10.1186/s12859-024-05995-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The precise prediction of transcription factor binding sites (TFBSs) is pivotal for unraveling the gene regulatory networks underlying biological processes. While numerous tools have emerged for in silico TFBS prediction in recent years, the evolving landscape of computational biology necessitates thorough assessments of tool performance to ensure accuracy and reliability. Only a limited number of studies have been conducted to evaluate the performance of TFBS prediction tools comprehensively. Thus, the present study focused on assessing twelve widely used TFBS prediction tools and four de novo motif discovery tools using a benchmark dataset comprising real, generic, Markov, and negative sequences. TFBSs of Arabidopsis thaliana and Homo sapiens genomes downloaded from the JASPAR database were implanted in these sequences and the performance of tools was evaluated using several statistical parameters at different overlap percentages between the lengths of known and predicted binding sites.</p><p><strong>Results: </strong>Overall, the Multiple Cluster Alignment and Search Tool (MCAST) emerged as the best TFBS prediction tool, followed by Find Individual Motif Occurrences (FIMO) and MOtif Occurrence Detection Suite (MOODS). In addition, MotEvo and Dinucleotide Weight Tensor Toolbox (DWT-toolbox) demonstrated the highest sensitivity in identifying TFBSs at 90% and 80% overlap. Further, MCAST and DWT-toolbox managed to demonstrate the highest sensitivity across all three data types real, generic, and Markov. Among the de novo motif discovery tools, the Multiple Em for Motif Elicitation (MEME) emerged as the best performer. An analysis of the promoter regions of genes involved in the anthocyanin biosynthesis pathway in plants and the pentose phosphate pathway in humans, using the three best-performing tools, revealed considerable variation among the top 20 motifs identified by these tools.</p><p><strong>Conclusion: </strong>The findings of this study lay a robust groundwork for selecting optimal TFBS prediction tools for future research. Given the variability observed in tool performance, employing multiple tools for identifying TFBSs in a set of sequences is highly recommended. In addition, further studies are recommended to develop an integrated toolbox that incorporates TFBS prediction or motif discovery tools, aiming to streamline result precision and accuracy.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"371"},"PeriodicalIF":2.9000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11613939/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05995-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The precise prediction of transcription factor binding sites (TFBSs) is pivotal for unraveling the gene regulatory networks underlying biological processes. While numerous tools have emerged for in silico TFBS prediction in recent years, the evolving landscape of computational biology necessitates thorough assessments of tool performance to ensure accuracy and reliability. Only a limited number of studies have been conducted to evaluate the performance of TFBS prediction tools comprehensively. Thus, the present study focused on assessing twelve widely used TFBS prediction tools and four de novo motif discovery tools using a benchmark dataset comprising real, generic, Markov, and negative sequences. TFBSs of Arabidopsis thaliana and Homo sapiens genomes downloaded from the JASPAR database were implanted in these sequences and the performance of tools was evaluated using several statistical parameters at different overlap percentages between the lengths of known and predicted binding sites.

Results: Overall, the Multiple Cluster Alignment and Search Tool (MCAST) emerged as the best TFBS prediction tool, followed by Find Individual Motif Occurrences (FIMO) and MOtif Occurrence Detection Suite (MOODS). In addition, MotEvo and Dinucleotide Weight Tensor Toolbox (DWT-toolbox) demonstrated the highest sensitivity in identifying TFBSs at 90% and 80% overlap. Further, MCAST and DWT-toolbox managed to demonstrate the highest sensitivity across all three data types real, generic, and Markov. Among the de novo motif discovery tools, the Multiple Em for Motif Elicitation (MEME) emerged as the best performer. An analysis of the promoter regions of genes involved in the anthocyanin biosynthesis pathway in plants and the pentose phosphate pathway in humans, using the three best-performing tools, revealed considerable variation among the top 20 motifs identified by these tools.

Conclusion: The findings of this study lay a robust groundwork for selecting optimal TFBS prediction tools for future research. Given the variability observed in tool performance, employing multiple tools for identifying TFBSs in a set of sequences is highly recommended. In addition, further studies are recommended to develop an integrated toolbox that incorporates TFBS prediction or motif discovery tools, aiming to streamline result precision and accuracy.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人类和拟南芥基因组转录因子结合位点预测工具的评估。
背景:转录因子结合位点(TFBSs)的精确预测对于揭示生物学过程背后的基因调控网络至关重要。虽然近年来出现了许多用于计算机TFBS预测的工具,但计算生物学的不断发展需要对工具性能进行全面评估,以确保准确性和可靠性。只有有限的研究对TFBS预测工具的性能进行了全面的评估。因此,本研究的重点是评估12种广泛使用的TFBS预测工具和4种全新的motif发现工具,使用基准数据集包括真实序列、通用序列、马尔可夫序列和负序列。将从JASPAR数据库下载的拟南芥和智人基因组的TFBSs植入到这些序列中,并使用已知和预测结合位点长度之间不同重叠百分比的几个统计参数来评估工具的性能。结果:总体而言,多簇比对和搜索工具(MCAST)是最佳的TFBS预测工具,其次是查找单个Motif Occurrence (FIMO)和Motif Occurrence Detection Suite (MOODS)。此外,MotEvo和二核苷酸权重张量工具箱(dwt -工具箱)在90%和80%重叠时识别TFBSs的灵敏度最高。此外,MCAST和DWT-toolbox成功地在所有三种数据类型(真实、通用和马尔可夫)中展示了最高的灵敏度。在全新的motif发现工具中,Multiple Em for motif Elicitation (MEME)表现最好。使用三种性能最好的工具对植物花青素生物合成途径和人类戊糖磷酸途径中涉及的基因启动子区域进行了分析,揭示了这些工具确定的前20个基序之间存在相当大的差异。结论:本研究结果为未来研究选择最佳的TFBS预测工具奠定了坚实的基础。考虑到工具性能的可变性,强烈建议使用多种工具在一组序列中识别tfbs。此外,建议进一步研究开发集成TFBS预测或motif发现工具的工具箱,以提高结果的精度和准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
期刊最新文献
Conditional similarity triplets enable covariate-informed representations of single-cell data. TockyPrep: data preprocessing methods for flow cytometric fluorescent timer analysis. Correction: HISS: Snakemake-based workflows for performing SMRT-RenSeq assembly, AgRenSeq and dRenSeq for the discovery of novel plant disease resistance genes. Direct coupling analysis and the attention mechanism. Instance-level semantic segmentation of nuclei based on multimodal structure encoding.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1