用主成分分析法比较蛋白质序列的独特方法

J. Pal, Shinjini Ghosh, B. Maji, D. K. Bhattacharya
{"title":"用主成分分析法比较蛋白质序列的独特方法","authors":"J. Pal, Shinjini Ghosh, B. Maji, D. K. Bhattacharya","doi":"10.1109/ICTAI53825.2021.9673245","DOIUrl":null,"url":null,"abstract":"Physiochemical properties of amino acids has significant role in the study of comparison of protein sequences. In the literature, an arbitrary and random combination of these properties has been considered for protein sequence comparison. In the present paper, comparison of protein sequences is obtained using only five known physical properties of the amino acids. Principal component analysis (PCA) is applied on the numerical values corresponding to these physical properties related to twenty amino acids to reduce their dimensions. As a result, corresponding to each amino acid 20 TP values are obtained. Protein Sequences are represented based on these 20 TP values. Then cumulative sums on these represented sequences are taken to get the non-degenerate representations of each of the protein sequences. Now a new form of descriptor is obtained using generalized form of three moment vectors consisting of first, second and third order moments. Then distance matrices are obtained by using Euclidean distance as the distance measure. Finally phylogenetic tree based on such distance matrices using the UPGMA algorithm are constructed. The proposed method is applied on 9 ND4, 9 ND6, 16 ND5, 12 Baculovirus and also on 24 TF protein sequences. The result obtained by this new method is at par with the biological reference and also comparable with the results obtained earlier on the same species by other methods.","PeriodicalId":278263,"journal":{"name":"2021 International Conference on Technological Advancements and Innovations (ICTAI)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Unique Approach for Comparison of Protein Sequence Using PCA Analysis\",\"authors\":\"J. Pal, Shinjini Ghosh, B. Maji, D. K. Bhattacharya\",\"doi\":\"10.1109/ICTAI53825.2021.9673245\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Physiochemical properties of amino acids has significant role in the study of comparison of protein sequences. In the literature, an arbitrary and random combination of these properties has been considered for protein sequence comparison. In the present paper, comparison of protein sequences is obtained using only five known physical properties of the amino acids. Principal component analysis (PCA) is applied on the numerical values corresponding to these physical properties related to twenty amino acids to reduce their dimensions. As a result, corresponding to each amino acid 20 TP values are obtained. Protein Sequences are represented based on these 20 TP values. Then cumulative sums on these represented sequences are taken to get the non-degenerate representations of each of the protein sequences. Now a new form of descriptor is obtained using generalized form of three moment vectors consisting of first, second and third order moments. Then distance matrices are obtained by using Euclidean distance as the distance measure. Finally phylogenetic tree based on such distance matrices using the UPGMA algorithm are constructed. The proposed method is applied on 9 ND4, 9 ND6, 16 ND5, 12 Baculovirus and also on 24 TF protein sequences. The result obtained by this new method is at par with the biological reference and also comparable with the results obtained earlier on the same species by other methods.\",\"PeriodicalId\":278263,\"journal\":{\"name\":\"2021 International Conference on Technological Advancements and Innovations (ICTAI)\",\"volume\":\"183 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Technological Advancements and Innovations (ICTAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTAI53825.2021.9673245\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Technological Advancements and Innovations (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI53825.2021.9673245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

氨基酸的理化性质在蛋白质序列比较研究中具有重要意义。在文献中,这些特性的任意和随机组合已被考虑用于蛋白质序列比较。在本论文中,蛋白质序列的比较是获得仅使用五个已知的物理性质的氨基酸。应用主成分分析(PCA)对20种氨基酸的物理性质对应的数值进行降维。因此,每个氨基酸对应20个TP值。蛋白质序列是基于这20个TP值表示的。然后对这些表示的序列进行累积和,得到每个蛋白质序列的非退化表示。利用由一阶、二阶和三阶矩组成的三个矩向量的广义形式,得到了一种新的描述子形式。然后以欧氏距离作为距离度量,得到距离矩阵。最后利用UPGMA算法构建了基于这些距离矩阵的系统进化树。该方法适用于9个ND4、9个ND6、16个ND5、12个杆状病毒和24个TF蛋白序列。新方法得到的结果与生物学参考资料相当,也可与以前用其他方法对同一物种得到的结果相比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Unique Approach for Comparison of Protein Sequence Using PCA Analysis
Physiochemical properties of amino acids has significant role in the study of comparison of protein sequences. In the literature, an arbitrary and random combination of these properties has been considered for protein sequence comparison. In the present paper, comparison of protein sequences is obtained using only five known physical properties of the amino acids. Principal component analysis (PCA) is applied on the numerical values corresponding to these physical properties related to twenty amino acids to reduce their dimensions. As a result, corresponding to each amino acid 20 TP values are obtained. Protein Sequences are represented based on these 20 TP values. Then cumulative sums on these represented sequences are taken to get the non-degenerate representations of each of the protein sequences. Now a new form of descriptor is obtained using generalized form of three moment vectors consisting of first, second and third order moments. Then distance matrices are obtained by using Euclidean distance as the distance measure. Finally phylogenetic tree based on such distance matrices using the UPGMA algorithm are constructed. The proposed method is applied on 9 ND4, 9 ND6, 16 ND5, 12 Baculovirus and also on 24 TF protein sequences. The result obtained by this new method is at par with the biological reference and also comparable with the results obtained earlier on the same species by other methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Malware Detection Using Machine Learning Prediction of Students’ Perceptions towards Technology’ Benefits, Use and Development Dynamic Time Tracking and Task Monitoring Agent Service A Systematic Literature Survey on Generative Adversarial Network Based Crop Disease Identification Study of Convective Heat Transfer Characteristics of Nano Fluids in Circular Tube
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1