CBDT-Oglyc: Prediction of O-glycosylation sites using ChiMIC-based balanced decision table and feature selection.

IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Journal of Bioinformatics and Computational Biology Pub Date : 2023-10-01 Epub Date: 2023-10-28 DOI:10.1142/S0219720023500245
Ying Zeng, Zheming Yuan, Yuan Chen, Ying Hu
{"title":"CBDT-Oglyc: Prediction of O-glycosylation sites using ChiMIC-based balanced decision table and feature selection.","authors":"Ying Zeng, Zheming Yuan, Yuan Chen, Ying Hu","doi":"10.1142/S0219720023500245","DOIUrl":null,"url":null,"abstract":"<p><p>O-glycosylation (Oglyc) plays an important role in various biological processes. The key to understanding the mechanisms of Oglyc is identifying the corresponding glycosylation sites. Two critical steps, feature selection and classifier design, greatly affect the accuracy of computational methods for predicting Oglyc sites. Based on an efficient feature selection algorithm and a classifier capable of handling imbalanced datasets, a new computational method, ChiMIC-based balanced decision table O-glycosylation (CBDT-Oglyc), is proposed. ChiMIC-based balanced decision table for O-glycosylation (CBDT-Oglyc), is proposed to predict Oglyc sites in proteins. Sequence characterization is performed by combining amino acid composition (AAC), undirected composition of [Formula: see text]-spaced amino acid pairs (undirected-CKSAAP) and pseudo-position-specific scoring matrix (PsePSSM). Chi-MIC-share algorithm is used for feature selection, which simplifies the model and improves predictive accuracy. For imbalanced classification, a backtracking method based on local chi-square test is designed, and then cost-sensitive learning is incorporated to construct a novel classifier named ChiMIC-based balanced decision table (CBDT). Based on a 1:49 (positives:negatives) training set, the CBDT classifier achieves significantly better prediction performance than traditional classifiers. Moreover, the independent test results on separate human and mouse glycoproteins show that CBDT-Oglyc outperforms previous methods in global accuracy. CBDT-Oglyc shows great promise in predicting Oglyc sites and is expected to facilitate further experimental studies on protein glycosylation.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2350024"},"PeriodicalIF":0.9000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Bioinformatics and Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1142/S0219720023500245","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/28 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

O-glycosylation (Oglyc) plays an important role in various biological processes. The key to understanding the mechanisms of Oglyc is identifying the corresponding glycosylation sites. Two critical steps, feature selection and classifier design, greatly affect the accuracy of computational methods for predicting Oglyc sites. Based on an efficient feature selection algorithm and a classifier capable of handling imbalanced datasets, a new computational method, ChiMIC-based balanced decision table O-glycosylation (CBDT-Oglyc), is proposed. ChiMIC-based balanced decision table for O-glycosylation (CBDT-Oglyc), is proposed to predict Oglyc sites in proteins. Sequence characterization is performed by combining amino acid composition (AAC), undirected composition of [Formula: see text]-spaced amino acid pairs (undirected-CKSAAP) and pseudo-position-specific scoring matrix (PsePSSM). Chi-MIC-share algorithm is used for feature selection, which simplifies the model and improves predictive accuracy. For imbalanced classification, a backtracking method based on local chi-square test is designed, and then cost-sensitive learning is incorporated to construct a novel classifier named ChiMIC-based balanced decision table (CBDT). Based on a 1:49 (positives:negatives) training set, the CBDT classifier achieves significantly better prediction performance than traditional classifiers. Moreover, the independent test results on separate human and mouse glycoproteins show that CBDT-Oglyc outperforms previous methods in global accuracy. CBDT-Oglyc shows great promise in predicting Oglyc sites and is expected to facilitate further experimental studies on protein glycosylation.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CBDT-Oglyc:使用基于ChiMIC的平衡决策表和特征选择预测O-糖基化位点。
O-糖基化在各种生物过程中起着重要作用。了解Oglyc机制的关键是识别相应的糖基化位点。特征选择和分类器设计这两个关键步骤极大地影响了预测Oglyc位点的计算方法的准确性。基于一种有效的特征选择算法和一种能够处理不平衡数据集的分类器,提出了一种新的计算方法——基于ChiMIC的平衡决策表O-糖基化(CBDT-Oglych)。提出了基于ChiMIC的O-糖基化平衡决策表(CBDT-Oglyc)来预测蛋白质中的Oglyc位点。通过结合氨基酸组成(AAC)、[公式:见正文]的无向组成-间隔氨基酸对(无向CKSAAP)和伪位置特异性评分矩阵(PsePSSM)进行序列表征。采用Chi-MIC共享算法进行特征选择,简化了模型,提高了预测精度。对于不平衡分类,设计了一种基于局部卡方检验的回溯方法,然后结合成本敏感学习,构造了一种新的分类器——基于ChiMIC的平衡决策表(CBDT)。基于1:49(正:负)训练集,CBDT分类器实现了比传统分类器更好的预测性能。此外,对单独的人和小鼠糖蛋白的独立测试结果表明,CBDT Oglyc在全局准确性方面优于以前的方法。CBDT-Oglyc在预测Oglyc位点方面显示出巨大的前景,有望促进蛋白质糖基化的进一步实验研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Bioinformatics and Computational Biology
Journal of Bioinformatics and Computational Biology MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
2.10
自引率
0.00%
发文量
57
期刊介绍: The Journal of Bioinformatics and Computational Biology aims to publish high quality, original research articles, expository tutorial papers and review papers as well as short, critical comments on technical issues associated with the analysis of cellular information. The research papers will be technical presentations of new assertions, discoveries and tools, intended for a narrower specialist community. The tutorials, reviews and critical commentary will be targeted at a broader readership of biologists who are interested in using computers but are not knowledgeable about scientific computing, and equally, computer scientists who have an interest in biology but are not familiar with current thrusts nor the language of biology. Such carefully chosen tutorials and articles should greatly accelerate the rate of entry of these new creative scientists into the field.
期刊最新文献
Gene regulatory network inference based on modified adaptive lasso. The use of 4D data-independent acquisition-based proteomic analysis and machine learning to reveal potential biomarkers for stress levels. Molecular dynamics simulations of ribosome-binding sites in theophylline-responsive riboswitch associated with improving the gene expression regulation in chloroplasts. SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales. Improving drug-target interaction prediction through dual-modality fusion with InteractNet.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1