Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information.

IF 2.4 3区 生物学 Q2 MULTIDISCIPLINARY SCIENCES PeerJ Pub Date : 2025-01-30 eCollection Date: 2025-01-01 DOI:10.7717/peerj.18863
Binghua Li, Xin Li, Xiaoyu Li, Li Wang, Jun Lu, Jia Wang
{"title":"Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information.","authors":"Binghua Li, Xin Li, Xiaoyu Li, Li Wang, Jun Lu, Jia Wang","doi":"10.7717/peerj.18863","DOIUrl":null,"url":null,"abstract":"<p><p>Influenza A virus (IAV) has the characteristics of high infectivity and high pathogenicity, which makes IAV infection a serious public health threat. Identifying protein-protein interactions (PPIs) between IAV and human proteins is beneficial for understanding the mechanism of viral infection and designing antiviral drugs. In this article, we developed a sequence-based machine learning method for predicting PPI. First, we applied a new negative sample construction method to establish a high-quality IAV-human PPI dataset. Then we used conjoint triad (CT) and Moran autocorrelation (Moran) to encode biologically relevant features. The joint consideration utilizing the complementary information between contiguous and discontinuous amino acids provides a more comprehensive description of PPI information. After comparing different machine learning models, the eXtreme Gradient Boosting (XGBoost) model was determined as the final model for the prediction. The model achieved an accuracy of 96.89%, precision of 98.79%, recall of 94.85%, F1-score of 96.78%. Finally, we successfully identified 3,269 potential target proteins. Gene ontology (GO) and pathway analysis showed that these genes were highly associated with IAV infection. The analysis of the PPI network further revealed that the predicted proteins were classified as core proteins within the human protein interaction network. This study may encourage the identification of potential targets for the discovery of more effective anti-influenza drugs. The source codes and datasets are available at https://github.com/HVPPIlab/IVA-Human-PPI/.</p>","PeriodicalId":19799,"journal":{"name":"PeerJ","volume":"13 ","pages":"e18863"},"PeriodicalIF":2.4000,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11787804/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.7717/peerj.18863","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Influenza A virus (IAV) has the characteristics of high infectivity and high pathogenicity, which makes IAV infection a serious public health threat. Identifying protein-protein interactions (PPIs) between IAV and human proteins is beneficial for understanding the mechanism of viral infection and designing antiviral drugs. In this article, we developed a sequence-based machine learning method for predicting PPI. First, we applied a new negative sample construction method to establish a high-quality IAV-human PPI dataset. Then we used conjoint triad (CT) and Moran autocorrelation (Moran) to encode biologically relevant features. The joint consideration utilizing the complementary information between contiguous and discontinuous amino acids provides a more comprehensive description of PPI information. After comparing different machine learning models, the eXtreme Gradient Boosting (XGBoost) model was determined as the final model for the prediction. The model achieved an accuracy of 96.89%, precision of 98.79%, recall of 94.85%, F1-score of 96.78%. Finally, we successfully identified 3,269 potential target proteins. Gene ontology (GO) and pathway analysis showed that these genes were highly associated with IAV infection. The analysis of the PPI network further revealed that the predicted proteins were classified as core proteins within the human protein interaction network. This study may encourage the identification of potential targets for the discovery of more effective anti-influenza drugs. The source codes and datasets are available at https://github.com/HVPPIlab/IVA-Human-PPI/.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于连续和不连续氨基酸信息的XGBoost预测甲型流感病毒-人蛋白相互作用
甲型流感病毒(IAV)具有高传染性和高致病性的特点,使其感染成为严重的公共卫生威胁。确定IAV与人蛋白之间的蛋白-蛋白相互作用(PPIs)有助于了解病毒感染机制和设计抗病毒药物。在本文中,我们开发了一种基于序列的机器学习方法来预测PPI。首先,我们采用一种新的负样本构建方法建立了高质量的IAV-human PPI数据集。然后,我们使用联合三联体(CT)和Moran自相关(Moran)编码生物学相关特征。利用连续和不连续氨基酸之间的互补信息的联合考虑提供了更全面的PPI信息描述。在比较了不同的机器学习模型后,最终确定了eXtreme Gradient Boosting (XGBoost)模型作为预测的最终模型。模型的准确率为96.89%,精密度为98.79%,召回率为94.85%,f1得分为96.78%。最终,我们成功鉴定出3269个潜在的靶蛋白。基因本体(GO)和途径分析表明,这些基因与IAV感染高度相关。对PPI网络的分析进一步表明,预测的蛋白被归类为人类蛋白相互作用网络中的核心蛋白。这项研究可能有助于发现更有效的抗流感药物的潜在靶点。源代码和数据集可在https://github.com/HVPPIlab/IVA-Human-PPI/上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
PeerJ
PeerJ MULTIDISCIPLINARY SCIENCES-
CiteScore
4.70
自引率
3.70%
发文量
1665
审稿时长
10 weeks
期刊介绍: PeerJ is an open access peer-reviewed scientific journal covering research in the biological and medical sciences. At PeerJ, authors take out a lifetime publication plan (for as little as $99) which allows them to publish articles in the journal for free, forever. PeerJ has 5 Nobel Prize Winners on the Board; they have won several industry and media awards; and they are widely recognized as being one of the most interesting recent developments in academic publishing.
期刊最新文献
Effective degradation of zearalenone by multiple microbial isolates. Kinesiophobia and alexithymia in knee osteoarthritis: association with radiological severity. Validity and reliability of Insomnia Severity Index among older adults in Indonesia. Predictive value of D4Z4 methylation levels for phenotypic heterogeneity and disease progression in Facioscapulohumeral Muscular Dystrophy with borderline D4Z4 repeat units: a retrospective cohort study. Comparison of primers for the pathogenicity factors vvhA and rpoS in Vibrio vulnificus environmental isolates from the Texas Coastal Bend region of the Gulf of Mexico.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1