Graphical models for identifying pore‐forming proteins

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC ACS Applied Electronic Materials Pub Date : 2024-04-15 DOI:10.1002/prot.26687
Nan Xu, Theodore W. Kahn, Theju Jacob, Yan Liu
{"title":"Graphical models for identifying pore‐forming proteins","authors":"Nan Xu, Theodore W. Kahn, Theju Jacob, Yan Liu","doi":"10.1002/prot.26687","DOIUrl":null,"url":null,"abstract":"Pore‐forming toxins (PFTs) are proteins that form lesions in biological membranes. Better understanding of the structure and function of these proteins will be beneficial in a number of biotechnological applications, including the development of new pest control methods in agriculture. When searching for new pore formers, existing sequence homology‐based methods fail to discover truly novel proteins with low sequence identity to known proteins. Search methodologies based on protein structures would help us move beyond this limitation. As the number of known structures for PFTs is very limited, it's quite challenging to identify new proteins having similar structures using computational approaches like deep learning. In this article, we therefore propose a sample‐efficient graphical model, where a protein structure graph is first constructed according to consensus secondary structures. A semi‐Markov conditional random fields model is then developed to perform protein sequence segmentation. We demonstrate that our method is able to distinguish structurally similar proteins even in the absence of sequence similarity (pairwise sequence identity < 0.4)—a feat not achievable by traditional approaches like HMMs. To extract proteins of interest from a genome‐wide protein database for further study, we also develop an efficient framework for UniRef50 with 43 million proteins.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.26687","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Pore‐forming toxins (PFTs) are proteins that form lesions in biological membranes. Better understanding of the structure and function of these proteins will be beneficial in a number of biotechnological applications, including the development of new pest control methods in agriculture. When searching for new pore formers, existing sequence homology‐based methods fail to discover truly novel proteins with low sequence identity to known proteins. Search methodologies based on protein structures would help us move beyond this limitation. As the number of known structures for PFTs is very limited, it's quite challenging to identify new proteins having similar structures using computational approaches like deep learning. In this article, we therefore propose a sample‐efficient graphical model, where a protein structure graph is first constructed according to consensus secondary structures. A semi‐Markov conditional random fields model is then developed to perform protein sequence segmentation. We demonstrate that our method is able to distinguish structurally similar proteins even in the absence of sequence similarity (pairwise sequence identity < 0.4)—a feat not achievable by traditional approaches like HMMs. To extract proteins of interest from a genome‐wide protein database for further study, we also develop an efficient framework for UniRef50 with 43 million proteins.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
识别孔形成蛋白质的图形模型
孔隙形成毒素(PFTs)是在生物膜上形成病变的蛋白质。更好地了解这些蛋白质的结构和功能将有利于许多生物技术应用,包括开发新的农业害虫控制方法。在寻找新的孔隙形成物时,现有的基于序列同源性的方法无法发现与已知蛋白质序列同一性较低的真正新型蛋白质。基于蛋白质结构的搜索方法将帮助我们突破这一局限。由于 PFT 的已知结构数量非常有限,因此使用深度学习等计算方法来识别具有相似结构的新蛋白质相当具有挑战性。因此,我们在本文中提出了一种样本高效图模型,即首先根据共识二级结构构建蛋白质结构图。然后建立一个半马尔科夫条件随机场模型来进行蛋白质序列分割。我们证明,即使在没有序列相似性(成对序列同一性为 0.4)的情况下,我们的方法也能区分结构相似的蛋白质--这是传统方法(如 HMMs)无法实现的。为了从全基因组蛋白质数据库中提取感兴趣的蛋白质供进一步研究,我们还为拥有 4300 万个蛋白质的 UniRef50 开发了一个高效框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
期刊最新文献
Hyperbaric oxygen treatment promotes tendon-bone interface healing in a rabbit model of rotator cuff tears. Oxygen-ozone therapy for myocardial ischemic stroke and cardiovascular disorders. Comparative study on the anti-inflammatory and protective effects of different oxygen therapy regimens on lipopolysaccharide-induced acute lung injury in mice. Heme oxygenase/carbon monoxide system and development of the heart. Hyperbaric oxygen for moderate-to-severe traumatic brain injury: outcomes 5-8 years after injury.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1