CPPCGM: A Highly Efficient Sequence-Based Tool for Simultaneously Identifying and Generating Cell-Penetrating Peptides.

IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Journal of Chemical Information and Modeling Pub Date : 2025-04-14 Epub Date: 2025-03-19 DOI:10.1021/acs.jcim.5c00199
Qiufen Chen, Yuewei Zhang, Jiali Gao, Jun Zhang
{"title":"CPPCGM: A Highly Efficient Sequence-Based Tool for Simultaneously Identifying and Generating Cell-Penetrating Peptides.","authors":"Qiufen Chen, Yuewei Zhang, Jiali Gao, Jun Zhang","doi":"10.1021/acs.jcim.5c00199","DOIUrl":null,"url":null,"abstract":"<p><p>Cell-penetrating peptides (CPPs) are usually short oligopeptides with 5-30 amino acid residues. CPPs have been proven as important drug delivery vehicles into cells through different mechanisms, demonstrating their potential as therapeutic candidates. However, experimental screening and synthesis of CPPs could be time-consuming and expensive. Recently, numerous attempts have been made to develop computational methods as a cost-effective way for screening a number of potential CPP candidates. Despite significant advancements, current methods exhibit limited feature representation capabilities, thereby constraining the potential for further performance enhancements. In this study, we developed a deep learning framework called CPPCGM, which uses protein language models (PLMs) to identify and generate novel CPPs. There are two separate blocks in this framework: CPPClassifier and CPPGenerator. The former utilizes three pretrained models for simple voting, thereby accurately categorizing CPPs and non-CPPs. The latter, similar to a generative adversarial network, including a discriminator and a generator, generates peptides that are not present in the training data set. Our proposed CPPCGM has achieved remarkably high Matthews correlation coefficient scores of 0.876, 0.923, and 0.664 on three data sets based on the classification results. Compared with the state-of-the-art methods, the performance of our method is significantly improved. The results also demonstrated the generating potential of CPPCGM through qualitative and quantitative evaluation of the generated samples. Significantly, using PLM-based methods can optimize peptides for biochemical functions, benefiting drug delivery and biomedical applications. Materials related are publicly available at https://github.com/QiufenChen/CPPCGM.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"3357-3369"},"PeriodicalIF":5.3000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.5c00199","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/19 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

Cell-penetrating peptides (CPPs) are usually short oligopeptides with 5-30 amino acid residues. CPPs have been proven as important drug delivery vehicles into cells through different mechanisms, demonstrating their potential as therapeutic candidates. However, experimental screening and synthesis of CPPs could be time-consuming and expensive. Recently, numerous attempts have been made to develop computational methods as a cost-effective way for screening a number of potential CPP candidates. Despite significant advancements, current methods exhibit limited feature representation capabilities, thereby constraining the potential for further performance enhancements. In this study, we developed a deep learning framework called CPPCGM, which uses protein language models (PLMs) to identify and generate novel CPPs. There are two separate blocks in this framework: CPPClassifier and CPPGenerator. The former utilizes three pretrained models for simple voting, thereby accurately categorizing CPPs and non-CPPs. The latter, similar to a generative adversarial network, including a discriminator and a generator, generates peptides that are not present in the training data set. Our proposed CPPCGM has achieved remarkably high Matthews correlation coefficient scores of 0.876, 0.923, and 0.664 on three data sets based on the classification results. Compared with the state-of-the-art methods, the performance of our method is significantly improved. The results also demonstrated the generating potential of CPPCGM through qualitative and quantitative evaluation of the generated samples. Significantly, using PLM-based methods can optimize peptides for biochemical functions, benefiting drug delivery and biomedical applications. Materials related are publicly available at https://github.com/QiufenChen/CPPCGM.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CPPCGM:一种高效的同时识别和生成细胞穿透肽的测序工具。
细胞穿透肽通常是具有5-30个氨基酸残基的短寡肽。cps已被证明是通过不同机制进入细胞的重要药物递送载体,显示了它们作为治疗候选药物的潜力。然而,实验筛选和合成CPPs可能耗时且昂贵。最近,已经进行了许多尝试,以开发计算方法作为筛选一些潜在的CPP候选人的经济有效的方法。尽管取得了重大进展,但目前的方法表现出有限的特征表示能力,从而限制了进一步性能增强的潜力。在这项研究中,我们开发了一个名为CPPCGM的深度学习框架,它使用蛋白质语言模型(PLMs)来识别和生成新的CPPs。在这个框架中有两个独立的块:CPPClassifier和CPPGenerator。前者利用三个预训练模型进行简单投票,从而准确地对CPPs和非CPPs进行分类。后者类似于生成式对抗网络,包括判别器和生成器,生成训练数据集中不存在的肽。根据分类结果,我们提出的CPPCGM在三个数据集上的马修斯相关系数得分分别为0.876、0.923和0.664。与目前最先进的方法相比,我们的方法的性能有了明显的提高。通过对生成的样品进行定性和定量评价,验证了CPPCGM的生成潜力。值得注意的是,使用基于plm的方法可以优化肽的生化功能,有利于药物传递和生物医学应用。相关资料可在https://github.com/QiufenChen/CPPCGM上公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
期刊最新文献
Multi-View Collaboration Feature Fusion for Protein Function Prediction. Integrating Multiview Information for Enhanced Deep Learning-Based Acute Dermal Toxicity Prediction. QUICK and Robust ESP and RESP Charges for Computational Biochemistry: Open-Source GPU Implementation. Trustworthy Compound-Protein Interaction Prediction with Interpretable and Conformalized Cross-Attention Transformers. Hydroxylase Thermostability Prediction Based on Self-Trained Semisupervised Iteration and Bayesian Dynamic Tuning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1