{"title":"Variant Evolution Graph: Can We Infer How SARS-CoV-2 Variants are Evolving?","authors":"Badhan Das, Lenwood S. Heath","doi":"10.1101/2024.09.13.612805","DOIUrl":null,"url":null,"abstract":"The SARS-CoV-2 virus has undergone mutations over time, leading to genetic diversity among circulating viral strains. This genetic diversity can affect the characteristics of the virus, including its transmissibility and the severity of symptoms in infected individuals. During the pandemic, this frequent mutation creates an enormous cloud of variants known as viral quasispecies. Most variation is lost due to the tight bottlenecks imposed by transmission and survival. Advancements in next-generation sequencing have facilitated the rapid and cost-effective production of complete viral genomes, enabling the ongoing monitoring of the evolution of the SARS-CoV-2 genome. However, inferring a reliable phylogeny from GISAID (the Global Initiative on Sharing All Influenza Data) is daunting due to the vast number of sequences. In the face of this complexity, this research proposes a new method of representing the evolutionary and epidemiological relationships among the SARS-CoV-2 variants inspired by quasispecies theory. We aim to build a Variant Evolution Graph (VEG), a novel way to model viral evolution in a local pandemic region based on the mutational distance of the genotypes of the variants. VEG is a directed acyclic graph and not necessarily a tree because a variant can evolve from more than one variant; here, the vertices represent the genotypes of the variants associated with their human hosts, and the edges represent the evolutionary relationships among these variants. A disease transmission network, DTN, which represents the transmission relationships among the hosts, is also proposed and derived from the VEG. We downloaded the genotypes of the variants recorded in GISAID, which are complete, have high coverage, and have a complete collection date from five countries: Somalia (22), Bhutan (102), Hungary (581), Iran (1334), and Nepal (1719). We ran our algorithm on these datasets to get the evolution history of the variants, build the variant evolution graph represented by the adjacency matrix, and infer the disease transmission network. Our research represents a novel and unprecedented contribution to the field of viral evolution, offering new insights and approaches not explored in prior studies.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"65 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.13.612805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The SARS-CoV-2 virus has undergone mutations over time, leading to genetic diversity among circulating viral strains. This genetic diversity can affect the characteristics of the virus, including its transmissibility and the severity of symptoms in infected individuals. During the pandemic, this frequent mutation creates an enormous cloud of variants known as viral quasispecies. Most variation is lost due to the tight bottlenecks imposed by transmission and survival. Advancements in next-generation sequencing have facilitated the rapid and cost-effective production of complete viral genomes, enabling the ongoing monitoring of the evolution of the SARS-CoV-2 genome. However, inferring a reliable phylogeny from GISAID (the Global Initiative on Sharing All Influenza Data) is daunting due to the vast number of sequences. In the face of this complexity, this research proposes a new method of representing the evolutionary and epidemiological relationships among the SARS-CoV-2 variants inspired by quasispecies theory. We aim to build a Variant Evolution Graph (VEG), a novel way to model viral evolution in a local pandemic region based on the mutational distance of the genotypes of the variants. VEG is a directed acyclic graph and not necessarily a tree because a variant can evolve from more than one variant; here, the vertices represent the genotypes of the variants associated with their human hosts, and the edges represent the evolutionary relationships among these variants. A disease transmission network, DTN, which represents the transmission relationships among the hosts, is also proposed and derived from the VEG. We downloaded the genotypes of the variants recorded in GISAID, which are complete, have high coverage, and have a complete collection date from five countries: Somalia (22), Bhutan (102), Hungary (581), Iran (1334), and Nepal (1719). We ran our algorithm on these datasets to get the evolution history of the variants, build the variant evolution graph represented by the adjacency matrix, and infer the disease transmission network. Our research represents a novel and unprecedented contribution to the field of viral evolution, offering new insights and approaches not explored in prior studies.