蛋白质螺旋盖层基序分类的机器学习

Sean Mullane, Ruoyan Chen, Sridhar Vemulapalli, Eli J. Draizen, Ke Wang, C. Mura, P. Bourne
{"title":"蛋白质螺旋盖层基序分类的机器学习","authors":"Sean Mullane, Ruoyan Chen, Sridhar Vemulapalli, Eli J. Draizen, Ke Wang, C. Mura, P. Bourne","doi":"10.1109/SIEDS.2019.8735646","DOIUrl":null,"url":null,"abstract":"The biological function of a protein stems from its 3-dimensional structure, which is thermodynamically determined by the energetics of interatomic forces between its amino acid building blocks (the order of amino acids, known as the sequence, defines a protein). Given the costs (time, money, human resources) of determining protein structures via experimental means like X-ray crystallography, can we better describe and compare protein 3D structures in a robust and efficient manner, so as to gain meaningful biological insights? We begin by considering a relatively simple problem, limiting ourselves to just protein secondary structural elements. Historically, many computational methods have been devised to classify amino acid residues in a protein chain into one of several discrete “secondary structures”, of which the most well-characterized are the geometrically regular $\\mathbf{a}$-helix and $\\boldsymbol{\\beta}$-sheet; irregular structural patterns, such as ‘turns’ and ‘loops’, are less understood. Here, we present a study of Deep Learning techniques to classify the loop-like end cap structures which delimit a-helices. Previous work used highly empirical and heuristic methods to manually classify helix capping motifs. Instead, we use structural data directly—including (i) backbone torsion angles computed from 3D structures, (ii) macromolecular feature sets (e.g., physicochemical properties), and (iii) helix cap classification data (from CAPS-DB)—as the ground truth to train a bidirectional long short–term memory (BiLSTM) model to classify helix cap residues. We tried different network architectures and scanned hyperparameters in order to train and assess several models; we also trained a Support Vector Classifier (SVC) to use as a baseline. Ultimately, we achieved 85% class-balanced accuracy with a deep BiLSTM model.","PeriodicalId":265421,"journal":{"name":"2019 Systems and Information Engineering Design Symposium (SIEDS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Machine Learning for Classification of Protein Helix Capping Motifs\",\"authors\":\"Sean Mullane, Ruoyan Chen, Sridhar Vemulapalli, Eli J. Draizen, Ke Wang, C. Mura, P. Bourne\",\"doi\":\"10.1109/SIEDS.2019.8735646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The biological function of a protein stems from its 3-dimensional structure, which is thermodynamically determined by the energetics of interatomic forces between its amino acid building blocks (the order of amino acids, known as the sequence, defines a protein). Given the costs (time, money, human resources) of determining protein structures via experimental means like X-ray crystallography, can we better describe and compare protein 3D structures in a robust and efficient manner, so as to gain meaningful biological insights? We begin by considering a relatively simple problem, limiting ourselves to just protein secondary structural elements. Historically, many computational methods have been devised to classify amino acid residues in a protein chain into one of several discrete “secondary structures”, of which the most well-characterized are the geometrically regular $\\\\mathbf{a}$-helix and $\\\\boldsymbol{\\\\beta}$-sheet; irregular structural patterns, such as ‘turns’ and ‘loops’, are less understood. Here, we present a study of Deep Learning techniques to classify the loop-like end cap structures which delimit a-helices. Previous work used highly empirical and heuristic methods to manually classify helix capping motifs. Instead, we use structural data directly—including (i) backbone torsion angles computed from 3D structures, (ii) macromolecular feature sets (e.g., physicochemical properties), and (iii) helix cap classification data (from CAPS-DB)—as the ground truth to train a bidirectional long short–term memory (BiLSTM) model to classify helix cap residues. We tried different network architectures and scanned hyperparameters in order to train and assess several models; we also trained a Support Vector Classifier (SVC) to use as a baseline. Ultimately, we achieved 85% class-balanced accuracy with a deep BiLSTM model.\",\"PeriodicalId\":265421,\"journal\":{\"name\":\"2019 Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIEDS.2019.8735646\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS.2019.8735646","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

蛋白质的生物学功能源于它的三维结构,这种结构是由氨基酸组成单元之间原子间相互作用的能量动力学决定的(氨基酸的顺序,即序列,定义了蛋白质)。考虑到通过x射线晶体学等实验手段确定蛋白质结构的成本(时间、金钱、人力资源),我们能否更好地以稳健、高效的方式描述和比较蛋白质的3D结构,从而获得有意义的生物学见解?我们首先考虑一个相对简单的问题,把我们自己限制在蛋白质二级结构元素上。历史上,已经设计了许多计算方法来将蛋白质链中的氨基酸残基分类为几个离散的“二级结构”之一,其中最具特征的是几何规则的$\mathbf{a}$-helix和$\boldsymbol{\beta}$-sheet;不规则的结构模式,如“转弯”和“循环”,就不太清楚了。在这里,我们提出了一项深度学习技术的研究,以分类划分a-螺旋的环状端帽结构。以前的工作使用高度经验和启发式的方法来手动分类螺旋盖图案。相反,我们直接使用结构数据——包括(i)从3D结构计算的主链扭转角,(ii)大分子特征集(例如,物理化学性质),以及(iii)螺旋帽分类数据(来自CAPS-DB)——作为训练双向长短期记忆(BiLSTM)模型来分类螺旋帽残留物的基础事实。我们尝试了不同的网络架构和扫描超参数,以训练和评估几个模型;我们还训练了一个支持向量分类器(SVC)作为基线。最终,我们使用深度BiLSTM模型实现了85%的类平衡准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Machine Learning for Classification of Protein Helix Capping Motifs
The biological function of a protein stems from its 3-dimensional structure, which is thermodynamically determined by the energetics of interatomic forces between its amino acid building blocks (the order of amino acids, known as the sequence, defines a protein). Given the costs (time, money, human resources) of determining protein structures via experimental means like X-ray crystallography, can we better describe and compare protein 3D structures in a robust and efficient manner, so as to gain meaningful biological insights? We begin by considering a relatively simple problem, limiting ourselves to just protein secondary structural elements. Historically, many computational methods have been devised to classify amino acid residues in a protein chain into one of several discrete “secondary structures”, of which the most well-characterized are the geometrically regular $\mathbf{a}$-helix and $\boldsymbol{\beta}$-sheet; irregular structural patterns, such as ‘turns’ and ‘loops’, are less understood. Here, we present a study of Deep Learning techniques to classify the loop-like end cap structures which delimit a-helices. Previous work used highly empirical and heuristic methods to manually classify helix capping motifs. Instead, we use structural data directly—including (i) backbone torsion angles computed from 3D structures, (ii) macromolecular feature sets (e.g., physicochemical properties), and (iii) helix cap classification data (from CAPS-DB)—as the ground truth to train a bidirectional long short–term memory (BiLSTM) model to classify helix cap residues. We tried different network architectures and scanned hyperparameters in order to train and assess several models; we also trained a Support Vector Classifier (SVC) to use as a baseline. Ultimately, we achieved 85% class-balanced accuracy with a deep BiLSTM model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The Impact of Artificial Intelligence and Internet of Things in the Transformation of E-Business Sector Gamification of eHealth Interventions to Increase User Engagement and Reduce Attrition Modeling User Context from Smartphone Data for Recognition of Health Status Developing a data pipeline to improve accessibility and utilization of Charlottesville's Open Data Portal Deep Learning for Detecting Diseases in Gastrointestinal Biopsy Images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1