Structural analysis of SARS-CoV-2 Spike protein variants through graph embedding.

IF 2 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Network Modeling and Analysis in Health Informatics and Bioinformatics Pub Date : 2023-01-01 DOI:10.1007/s13721-022-00397-9
Pietro Hiram Guzzi, Ugo Lomoio, Barbara Puccio, Pierangelo Veltri
{"title":"Structural analysis of SARS-CoV-2 Spike protein variants through graph embedding.","authors":"Pietro Hiram Guzzi,&nbsp;Ugo Lomoio,&nbsp;Barbara Puccio,&nbsp;Pierangelo Veltri","doi":"10.1007/s13721-022-00397-9","DOIUrl":null,"url":null,"abstract":"<p><p>Since December 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has affected almost all countries. The unprecedented spreading of this virus has led to the insurgence of many variants that impact protein sequence and structure that need continuous monitoring and analysis of the sequences to understand the genetic evolution and to prevent possible dangerous outcomes. Some variants causing the modification of the structure of the proteins, such as the Spike protein S, need to be monitored. Protein contact networks (PCNs) have been recently proposed as a modelling framework for protein structures. In such a framework, the protein structure is represented as an unweighted graph whose nodes are the central atoms of the backbones (C- <math><mi>α</mi></math> ), and edges connect two atoms falling in the spatial distance between 4 and 7 Å. PCN may also be a data-rich representation since we may add to each node/atom biological and topological information. Such formalism enables the possibility of using algorithms from graph theory to analyze the graph. In particular, we refer to graph embedding methods enabling the analysis of such graphs with deep learning methods. In this work, we explore the possibility of embedding PCN using Graph Neural Networks and then analyze in the embedded space each residue to distinguish mutated residues from non-mutated ones. In particular, we analyzed the structure of the Spike protein of the coronavirus. First, we obtained the PCNs of the Spike protein for the wild-type, <math><mi>α</mi></math> , <math><mi>β</mi></math> , and <math><mi>δ</mi></math> variants. Then we used the GraphSage embedding algorithm to obtain an unsupervised embedding. Then we analyzed the point of mutation in the embedded space. Results show the characteristics of the mutation point in the embedding space.</p>","PeriodicalId":44876,"journal":{"name":"Network Modeling and Analysis in Health Informatics and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9718452/pdf/","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Network Modeling and Analysis in Health Informatics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13721-022-00397-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 1

Abstract

Since December 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has affected almost all countries. The unprecedented spreading of this virus has led to the insurgence of many variants that impact protein sequence and structure that need continuous monitoring and analysis of the sequences to understand the genetic evolution and to prevent possible dangerous outcomes. Some variants causing the modification of the structure of the proteins, such as the Spike protein S, need to be monitored. Protein contact networks (PCNs) have been recently proposed as a modelling framework for protein structures. In such a framework, the protein structure is represented as an unweighted graph whose nodes are the central atoms of the backbones (C- α ), and edges connect two atoms falling in the spatial distance between 4 and 7 Å. PCN may also be a data-rich representation since we may add to each node/atom biological and topological information. Such formalism enables the possibility of using algorithms from graph theory to analyze the graph. In particular, we refer to graph embedding methods enabling the analysis of such graphs with deep learning methods. In this work, we explore the possibility of embedding PCN using Graph Neural Networks and then analyze in the embedded space each residue to distinguish mutated residues from non-mutated ones. In particular, we analyzed the structure of the Spike protein of the coronavirus. First, we obtained the PCNs of the Spike protein for the wild-type, α , β , and δ variants. Then we used the GraphSage embedding algorithm to obtain an unsupervised embedding. Then we analyzed the point of mutation in the embedded space. Results show the characteristics of the mutation point in the embedding space.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于图嵌入的SARS-CoV-2刺突蛋白变异结构分析
自2019年12月以来,严重急性呼吸综合征冠状病毒2 (SARS-CoV-2)几乎影响了所有国家。这种病毒前所未有的传播导致了许多影响蛋白质序列和结构的变异的爆发,需要对这些序列进行持续监测和分析,以了解遗传进化并防止可能的危险后果。一些引起蛋白质结构修饰的变异,如Spike蛋白S,需要监测。蛋白质接触网络(PCNs)最近被提出作为蛋白质结构的建模框架。在这种框架中,蛋白质结构被表示为一个未加权的图,其节点是骨架(C- α)的中心原子,边缘连接在4和7之间的空间距离上的两个原子Å。PCN也可以是数据丰富的表示,因为我们可以向每个节点/原子添加生物和拓扑信息。这种形式主义使得使用图论中的算法来分析图成为可能。特别地,我们引用了图嵌入方法,可以使用深度学习方法分析这些图。在这项工作中,我们探索了使用图神经网络嵌入PCN的可能性,然后在嵌入空间中分析每个残基,以区分突变残基和非突变残基。我们特别分析了冠状病毒刺突蛋白的结构。首先,我们获得了野生型、α型、β型和δ型Spike蛋白的pcn。然后使用GraphSage嵌入算法得到无监督嵌入。然后对嵌入空间中的突变点进行分析。结果显示了嵌入空间中突变点的特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
5.40
自引率
4.30%
发文量
43
期刊介绍: NetMAHIB publishes original research articles and reviews reporting how graph theory, statistics, linear algebra and machine learning techniques can be effectively used for modelling and analysis in health informatics and bioinformatics. It aims at creating a synergy between these disciplines by providing a forum for disseminating the latest developments and research findings; hence, results can be shared with readers across institutions, governments, researchers, students, and the industry. The journal emphasizes fundamental contributions on new methodologies, discoveries and techniques that have general applicability and which form the basis for network based modelling, knowledge discovery, knowledge sharing and decision support to the benefit of patients, healthcare professionals and society in traditional and advanced emerging settings, including eHealth and mHealth .
期刊最新文献
Motif discovery in hospital ward vital signs observation networks. An improved cost-sensitive approach toward the selection of wart treatment methods Automatic classification of depressive users on Twitter including temporal analysis A diagnosis model for detection and classification of diabetic retinopathy using deep learning Analysis of cortisol mechanism to predict common genes between PCOS and its co-morbidities
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1