Peptipedia v2.0: A peptide sequence database and user-friendly web platform. A major update

bioRxiv Pub Date : 2024-07-16 DOI:10.1101/2024.07.11.603053
Gabriel Cabas-Mora, Anamaría Daza, Nicole Soto-García, Valentina Garrido, Diego Alvarez, Marcelo A Navarrete, Lindybeth Sarmiento-Varón, J. H. Sepúlveda Yáñez, Mehdi D. Davari, Frederic Cadet, Á. Olivera-Nappa, Roberto Uribe-Paredes, David Medina-Ortiz
{"title":"Peptipedia v2.0: A peptide sequence database and user-friendly web platform. A major update","authors":"Gabriel Cabas-Mora, Anamaría Daza, Nicole Soto-García, Valentina Garrido, Diego Alvarez, Marcelo A Navarrete, Lindybeth Sarmiento-Varón, J. H. Sepúlveda Yáñez, Mehdi D. Davari, Frederic Cadet, Á. Olivera-Nappa, Roberto Uribe-Paredes, David Medina-Ortiz","doi":"10.1101/2024.07.11.603053","DOIUrl":null,"url":null,"abstract":"In recent years, peptides have gained significant relevance due to their therapeutic properties. The surge in peptide production and synthesis has generated vast amounts of data, enabling the creation of comprehensive databases and information repositories. Advances in sequencing techniques and artificial intelligence have further accelerated the design of tailor-made peptides. However, leveraging these techniques requires versatile and continuously updated storage systems, along with tools that facilitate peptide research and the implementation of machine learning for predictive systems. This work introduces Peptipedia v2.0, one of the most comprehensive public repositories of peptides, supporting biotechnological research by simplifying peptide study and annotation. Peptipedia v2.0 has expanded its collection by over 45% with peptide sequences that have reported biological activities. The functional biological activity tree has been revised and enhanced, incorporating new categories such as cosmetic and dermatological activities, molecular binding, and anti-ageing properties. Utilizing protein language models and machine learning, more than 90 binary classification models have been trained, validated, and incorporated into Peptipedia v2.0. These models exhibit average sensitivities and specificities of 0.877 ± 0.0530 and 0.873 ±0.054, respectively, facilitating the annotation of more than 3.6 million peptide sequences with unknown biological activities, also registered in Peptipedia v2.0. Additionally, Peptipedia v2.0 introduces description tools based on structural and ontological properties and user-friendly machinelearning tools to facilitate the application of machine-learning strategies to study peptide sequences. Peptipedia v2.0 is accessible under the Creative Commons CC BY-NC-ND 4.0 license at https://peptipedia.cl/.","PeriodicalId":9124,"journal":{"name":"bioRxiv","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.11.603053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, peptides have gained significant relevance due to their therapeutic properties. The surge in peptide production and synthesis has generated vast amounts of data, enabling the creation of comprehensive databases and information repositories. Advances in sequencing techniques and artificial intelligence have further accelerated the design of tailor-made peptides. However, leveraging these techniques requires versatile and continuously updated storage systems, along with tools that facilitate peptide research and the implementation of machine learning for predictive systems. This work introduces Peptipedia v2.0, one of the most comprehensive public repositories of peptides, supporting biotechnological research by simplifying peptide study and annotation. Peptipedia v2.0 has expanded its collection by over 45% with peptide sequences that have reported biological activities. The functional biological activity tree has been revised and enhanced, incorporating new categories such as cosmetic and dermatological activities, molecular binding, and anti-ageing properties. Utilizing protein language models and machine learning, more than 90 binary classification models have been trained, validated, and incorporated into Peptipedia v2.0. These models exhibit average sensitivities and specificities of 0.877 ± 0.0530 and 0.873 ±0.054, respectively, facilitating the annotation of more than 3.6 million peptide sequences with unknown biological activities, also registered in Peptipedia v2.0. Additionally, Peptipedia v2.0 introduces description tools based on structural and ontological properties and user-friendly machinelearning tools to facilitate the application of machine-learning strategies to study peptide sequences. Peptipedia v2.0 is accessible under the Creative Commons CC BY-NC-ND 4.0 license at https://peptipedia.cl/.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Peptipedia v2.0:多肽序列数据库和用户友好型网络平台。重大更新
近年来,肽因其治疗特性而变得越来越重要。多肽生产和合成的激增产生了大量数据,从而促成了综合数据库和信息库的建立。测序技术和人工智能的进步进一步加速了定制肽的设计。然而,要充分利用这些技术,需要多功能和不断更新的存储系统,以及促进多肽研究和为预测系统实施机器学习的工具。这项工作介绍了 Peptipedia v2.0,它是最全面的多肽公共资料库之一,通过简化多肽研究和注释支持生物技术研究。Peptipedia v2.0将其收集的有生物活性报道的多肽序列扩大了45%以上。功能生物活性树经过修订和增强,纳入了化妆品和皮肤活性、分子结合和抗衰老特性等新类别。利用蛋白质语言模型和机器学习,90 多个二元分类模型已经过训练、验证并纳入 Peptipedia v2.0。这些模型的平均灵敏度和特异度分别为 0.877 ± 0.0530 和 0.873 ± 0.054,有助于对 360 多万个具有未知生物活性的肽序列进行注释,这些序列也已在 Peptipedia v2.0 中注册。此外,Peptipedia v2.0还引入了基于结构和本体特性的描述工具以及用户友好型机器学习工具,以促进机器学习策略在多肽序列研究中的应用。Peptipedia v2.0 采用知识共享 CC BY-NC-ND 4.0 许可,可在 https://peptipedia.cl/ 访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DGTS overproduced in seed plants is excluded from plastid membranes and promotes endomembrane expansion A distant TANGO1 family member promotes vitellogenin export from the ER in C. elegans Diet-induced obesity mediated through Estrogen-Related Receptor α is independent of intestinal function The Rbfox1/LASR complex controls alternative pre-mRNA splicing by recognition of multi-part RNA regulatory modules The Once and Future Fish: 1300 years of Atlantic herring population structure and demography revealed through ancient DNA and mixed-stock analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1