{"title":"LyFor:Prediction of lysine formylation sites from sequence based features using support vector machine","authors":"Md. Sohrawordi, Md. Al Mehedi Hasan","doi":"10.1109/TENSYMP50017.2020.9230689","DOIUrl":null,"url":null,"abstract":"Lysine formylation is a recently invented post- translational modification (PTM), which mostly resides on nuclear histone proteins. It is mainly responsible for playing an effective role in the mechanisms of cellular chromatin regulation such as DNA binding, DNA repair and protein synthesis and has great effect on other PTMs such as methylation and acetylation. As computational methods are simple, popular and high speedy compared to traditional experimental methods, it is very important and essential to generate mathematical model for proper identification of formylated lysine sites. A useful bioinformatics tool named LyFor, in this study, is developed by using amino acid composition (AAC), amino acid index (AAI), binary encoding (BE) and composition of k-spaced amino acid pair (CKSAAP) feature construction techniques to predict formylated lysine residues and non-formylated lysine residues. Moreover, a dimensional reduction method named principal component analysis (PCA) and randomly oversample method were used for preprocessing training dataset, which was applied to train the model with support vector machine algorithm. We have seen that LyFor achieves a better performance with an accuracy of 90.02 % for 10-fold cross-validation compared to existing models. Therefore, the analysis and prediction of lysine formylation may provide very useful information to study the mechanisms of chromatin regulation.","PeriodicalId":6721,"journal":{"name":"2020 IEEE Region 10 Symposium (TENSYMP)","volume":"53 1","pages":"250-253"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Region 10 Symposium (TENSYMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENSYMP50017.2020.9230689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Lysine formylation is a recently invented post- translational modification (PTM), which mostly resides on nuclear histone proteins. It is mainly responsible for playing an effective role in the mechanisms of cellular chromatin regulation such as DNA binding, DNA repair and protein synthesis and has great effect on other PTMs such as methylation and acetylation. As computational methods are simple, popular and high speedy compared to traditional experimental methods, it is very important and essential to generate mathematical model for proper identification of formylated lysine sites. A useful bioinformatics tool named LyFor, in this study, is developed by using amino acid composition (AAC), amino acid index (AAI), binary encoding (BE) and composition of k-spaced amino acid pair (CKSAAP) feature construction techniques to predict formylated lysine residues and non-formylated lysine residues. Moreover, a dimensional reduction method named principal component analysis (PCA) and randomly oversample method were used for preprocessing training dataset, which was applied to train the model with support vector machine algorithm. We have seen that LyFor achieves a better performance with an accuracy of 90.02 % for 10-fold cross-validation compared to existing models. Therefore, the analysis and prediction of lysine formylation may provide very useful information to study the mechanisms of chromatin regulation.