{"title":"Protein Sequence Comparison Method Based on 3-ary Huffman Coding","authors":"Zhaohui Qi, Yingqiang Ning, Yinmei Huang","doi":"10.46793/match.90-2.357q","DOIUrl":null,"url":null,"abstract":"Based on 3-ary Huffman coding algorithm, we propose a digital mapping method of protein sequence. Firstly, a 3-ary Huffman tree is defined by the frequency characteristic of 20 amino acids in given protein sequences. The 0-2 codes of 20 amino acids constructed by the 3-ary Huffman tree can convert long protein sequences into one-to-one 0-2 digital sequences. According to the frequency characteristic and the distribution information of 0-2 codes of 20 amino acids in the 0-2 digital sequences, we design the 40-dimensional vectors to characterize the protein sequences. Next, the proposed digital mapping method is used to perform three separate applications, similarity comparison of nine ND6 proteins, evolutionary trend analysis of the 2009 pandemic Human influenza A (H1N1) viruses from January 2020 to June 2022, and the evolution analysis of 95 coronavirus genes. The results illustrate the utility of the proposed method.","PeriodicalId":51115,"journal":{"name":"Match-Communications in Mathematical and in Computer Chemistry","volume":"1 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Match-Communications in Mathematical and in Computer Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.46793/match.90-2.357q","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Based on 3-ary Huffman coding algorithm, we propose a digital mapping method of protein sequence. Firstly, a 3-ary Huffman tree is defined by the frequency characteristic of 20 amino acids in given protein sequences. The 0-2 codes of 20 amino acids constructed by the 3-ary Huffman tree can convert long protein sequences into one-to-one 0-2 digital sequences. According to the frequency characteristic and the distribution information of 0-2 codes of 20 amino acids in the 0-2 digital sequences, we design the 40-dimensional vectors to characterize the protein sequences. Next, the proposed digital mapping method is used to perform three separate applications, similarity comparison of nine ND6 proteins, evolutionary trend analysis of the 2009 pandemic Human influenza A (H1N1) viruses from January 2020 to June 2022, and the evolution analysis of 95 coronavirus genes. The results illustrate the utility of the proposed method.
期刊介绍:
MATCH Communications in Mathematical and in Computer Chemistry publishes papers of original research as well as reviews on chemically important mathematical results and non-routine applications of mathematical techniques to chemical problems. A paper acceptable for publication must contain non-trivial mathematics or communicate non-routine computer-based procedures AND have a clear connection to chemistry. Papers are published without any processing or publication charge.