Prediction of Nitration Sites Based on FCBF Method and Stacking Ensemble Model

IF 0.5 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Current Proteomics Pub Date : 2021-01-01 DOI:10.2174/1570164618999210101222637
Min Liu, Lu Zhang, Xinyi Qin, Tao Huang, Ziwei Xu, Guangzhong Liu
{"title":"Prediction of Nitration Sites Based on FCBF Method and Stacking Ensemble Model","authors":"Min Liu, Lu Zhang, Xinyi Qin, Tao Huang, Ziwei Xu, Guangzhong Liu","doi":"10.2174/1570164618999210101222637","DOIUrl":null,"url":null,"abstract":"Nitration is one of the important Post-Translational Modification (PTM) occurring on the tyrosine residues of proteins. The occurrence of protein tyrosine nitration under disease conditions is inevitable and represents a shift from the signal transducing physiological actions of -NO to oxidative and potentially pathogenic pathways. Abnormal protein nitration modification can lead to serious human diseases, including neurodegenerative diseases, acute respiratory distress, organ transplant rejection and lung cancer. It is necessary and important to identify the nitration sites in protein sequences. Predicting that which tyrosine residues in the protein sequence are nitrated and which are not is of great significance for the study of nitration mechanism and related diseases. In this study, a prediction model of nitration sites based on the over-under sampling strategy and the FCBF method was proposed by stacking ensemble learning and fusing multiple features. Firstly, the protein sequence sample was encoded by 2701-dimensional fusion features (PseAAC, PSSM, AAIndex, CKSAAP, Disorder). Secondly, the ranked feature set was generated by the FCBF method according to the symmetric uncertainty metric. Thirdly, in the process of model training, use the over- and under- sampling technique was used to tackle the imbalanced dataset. Finally, the Incremental Feature Selection (IFS) method was adopted to extract an optimal classifier based on 10-fold cross-validation. Results show that the model has significant performance advantages in indicators such as MCC, Recall and F1-score, no matter in what way the comparison was conducted with other classifiers on the independent test set, or made by cross-validation with single-type feature or with fusion-features on the training set. By integrating the FCBF feature ranking methods, over- and under- sampling technique and a stacking model composed of multiple base classifiers, an effective prediction model for nitration PTM sites was build, which can achieve a better recall rate when the ratio of positive and negative samples is highly imbalanced.","PeriodicalId":50601,"journal":{"name":"Current Proteomics","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Proteomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/1570164618999210101222637","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 3

Abstract

Nitration is one of the important Post-Translational Modification (PTM) occurring on the tyrosine residues of proteins. The occurrence of protein tyrosine nitration under disease conditions is inevitable and represents a shift from the signal transducing physiological actions of -NO to oxidative and potentially pathogenic pathways. Abnormal protein nitration modification can lead to serious human diseases, including neurodegenerative diseases, acute respiratory distress, organ transplant rejection and lung cancer. It is necessary and important to identify the nitration sites in protein sequences. Predicting that which tyrosine residues in the protein sequence are nitrated and which are not is of great significance for the study of nitration mechanism and related diseases. In this study, a prediction model of nitration sites based on the over-under sampling strategy and the FCBF method was proposed by stacking ensemble learning and fusing multiple features. Firstly, the protein sequence sample was encoded by 2701-dimensional fusion features (PseAAC, PSSM, AAIndex, CKSAAP, Disorder). Secondly, the ranked feature set was generated by the FCBF method according to the symmetric uncertainty metric. Thirdly, in the process of model training, use the over- and under- sampling technique was used to tackle the imbalanced dataset. Finally, the Incremental Feature Selection (IFS) method was adopted to extract an optimal classifier based on 10-fold cross-validation. Results show that the model has significant performance advantages in indicators such as MCC, Recall and F1-score, no matter in what way the comparison was conducted with other classifiers on the independent test set, or made by cross-validation with single-type feature or with fusion-features on the training set. By integrating the FCBF feature ranking methods, over- and under- sampling technique and a stacking model composed of multiple base classifiers, an effective prediction model for nitration PTM sites was build, which can achieve a better recall rate when the ratio of positive and negative samples is highly imbalanced.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于FCBF法和叠加集成模型的硝化位点预测
硝化作用是发生在蛋白质酪氨酸残基上的重要翻译后修饰(PTM)之一。蛋白酪氨酸硝化在疾病条件下的发生是不可避免的,它代表了从no信号转导生理作用向氧化和潜在致病途径的转变。异常的蛋白质硝化修饰可导致严重的人类疾病,包括神经退行性疾病、急性呼吸窘迫、器官移植排斥反应和肺癌。确定蛋白质序列中的硝化位点是十分必要和重要的。预测蛋白质序列中哪些酪氨酸残基被硝化,哪些不被硝化,对研究硝化机理和相关疾病具有重要意义。在本研究中,提出了一种基于过欠采样策略和FCBF方法的硝化位点预测模型,通过叠加集成学习和融合多个特征。首先,用2701维融合特征(PseAAC、PSSM、aindex、CKSAAP、Disorder)对蛋白序列样本进行编码。其次,根据对称不确定性度量,采用FCBF方法生成排序特征集;第三,在模型训练过程中,采用过采样和欠采样技术处理不平衡数据集。最后,采用增量特征选择(IFS)方法提取基于10次交叉验证的最优分类器。结果表明,无论在独立测试集上与其他分类器进行比较,还是在训练集上与单一类型特征或融合特征进行交叉验证,该模型在MCC、Recall、F1-score等指标上都具有显著的性能优势。通过综合FCBF特征排序方法、过采样和欠采样技术以及多基分类器组成的叠加模型,建立了有效的硝化PTM位点预测模型,该模型在正负样本比例高度不平衡的情况下能够达到较高的召回率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Current Proteomics
Current Proteomics BIOCHEMICAL RESEARCH METHODS-BIOCHEMISTRY & MOLECULAR BIOLOGY
CiteScore
1.60
自引率
0.00%
发文量
25
审稿时长
>0 weeks
期刊介绍: Research in the emerging field of proteomics is growing at an extremely rapid rate. The principal aim of Current Proteomics is to publish well-timed in-depth/mini review articles in this fast-expanding area on topics relevant and significant to the development of proteomics. Current Proteomics is an essential journal for everyone involved in proteomics and related fields in both academia and industry. Current Proteomics publishes in-depth/mini review articles in all aspects of the fast-expanding field of proteomics. All areas of proteomics are covered together with the methodology, software, databases, technological advances and applications of proteomics, including functional proteomics. Diverse technologies covered include but are not limited to: Protein separation and characterization techniques 2-D gel electrophoresis and image analysis Techniques for protein expression profiling including mass spectrometry-based methods and algorithms for correlative database searching Determination of co-translational and post- translational modification of proteins Protein/peptide microarrays Biomolecular interaction analysis Analysis of protein complexes Yeast two-hybrid projects Protein-protein interaction (protein interactome) pathways and cell signaling networks Systems biology Proteome informatics (bioinformatics) Knowledge integration and management tools High-throughput protein structural studies (using mass spectrometry, nuclear magnetic resonance and X-ray crystallography) High-throughput computational methods for protein 3-D structure as well as function determination Robotics, nanotechnology, and microfluidics.
期刊最新文献
Exploring Phytochemical Compounds: A Computational Study for HIV-1 Reverse Transcriptase Inhibition Molecular Docking, Pharmacophore Mapping, and Virtual Screening of Novel Glucokinase Activators as Antidiabetic Agents Comprehensive Analysis of Tertiary Lymphoid Structures in Pancreatic Cancer: Molecular Characteristics and Prognostic Implications miR-124 in Neuroblastoma: Mechanistic Insights, Biomarker Potential, and Therapeutic Prospects The Relationship of Transposable Elements with Non-Coding RNAs in the Emergence of Human Proteins and Peptides
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1