A protein secondary structure-based algorithm for partitioning large protein alignments

Thu Kim Le, L. Vinh
{"title":"A protein secondary structure-based algorithm for partitioning large protein alignments","authors":"Thu Kim Le, L. Vinh","doi":"10.1109/KSE56063.2022.9953767","DOIUrl":null,"url":null,"abstract":"The evolutionary process of characters (e.g., nucleotides or amino acids) is heterogeneous among sites of alignments. Applying the same evolutionary model for all sites leads to unreliable results in evolutionary studies. Partitioning alignments into sub-alignments (groups) such that sites in each sub-alignment follow the same model of evolution is a proper and promising approach to adequately handle the heterogeneity among sites. A number of computational methods have been proposed to partition alignments, however, they are unable to properly handle invariant sites. The iterative k-means algorithm is widely used to partition large alignments, unfortunately, recently suspended because it always groups all invariant sites into one group that might distort phylogenetic trees reconstructed from sub-alignments.In this paper, we improve the iterative k-means algorithm for protein alignments by combining both amino acids and their secondary structures to properly partition invariant sites. The protein secondary structure information helps classify invariant sites into different groups each includes both variant and invariant sites. Experiments on real large protein alignments showed that the new algorithm overcomes the pitfall of grouping all invariant sites into one group and consequently produces better partitioning schemes.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE56063.2022.9953767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The evolutionary process of characters (e.g., nucleotides or amino acids) is heterogeneous among sites of alignments. Applying the same evolutionary model for all sites leads to unreliable results in evolutionary studies. Partitioning alignments into sub-alignments (groups) such that sites in each sub-alignment follow the same model of evolution is a proper and promising approach to adequately handle the heterogeneity among sites. A number of computational methods have been proposed to partition alignments, however, they are unable to properly handle invariant sites. The iterative k-means algorithm is widely used to partition large alignments, unfortunately, recently suspended because it always groups all invariant sites into one group that might distort phylogenetic trees reconstructed from sub-alignments.In this paper, we improve the iterative k-means algorithm for protein alignments by combining both amino acids and their secondary structures to properly partition invariant sites. The protein secondary structure information helps classify invariant sites into different groups each includes both variant and invariant sites. Experiments on real large protein alignments showed that the new algorithm overcomes the pitfall of grouping all invariant sites into one group and consequently produces better partitioning schemes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于蛋白质二级结构的大蛋白质序列划分算法
性状(例如,核苷酸或氨基酸)的进化过程在不同的位点之间是异质的。对所有地点采用相同的进化模型会导致进化研究的结果不可靠。将序列划分为子序列(组),使得每个子序列中的位点遵循相同的进化模型,这是一种适当且有前途的方法,可以充分处理位点之间的异质性。已经提出了许多计算方法来划分排列,然而,它们不能正确地处理不变位点。迭代k-means算法被广泛用于划分大型比对,不幸的是,最近被暂停,因为它总是将所有不变位点归为一组,这可能会扭曲由亚比对重建的系统发育树。在本文中,我们改进了迭代k-means算法,通过将氨基酸和它们的二级结构结合到适当的分割不变位点。蛋白质二级结构信息有助于将不变位点分为不同的组,每个组包括变异位点和不变位点。实验表明,新算法克服了将所有不变位点归为一组的缺陷,从而产生了更好的划分方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DWEN: A novel method for accurate estimation of cell type compositions from bulk data samples Polygenic risk scores adaptation for Height in a Vietnamese population Sentiment Classification for Beauty-fashion Reviews An Automated Stub Method for Unit Testing C/C++ Projects Knowledge-based Problem Solving and Reasoning methods
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1