Adriano de Bernardi Schneider, Michelle Su, Angie S Hinrichs, Jade Wang, Helly Amin, John Bell, Debra A Wadford, Áine O’Toole, Emily Scher, Marc D Perry, Yatish Turakhia, Nicola De Maio, Scott Hughes, Russ Corbett-Detig
{"title":"SARS-CoV-2 lineage assignments using phylogenetic placement/UShER are superior to pangoLEARN machine learning method","authors":"Adriano de Bernardi Schneider, Michelle Su, Angie S Hinrichs, Jade Wang, Helly Amin, John Bell, Debra A Wadford, Áine O’Toole, Emily Scher, Marc D Perry, Yatish Turakhia, Nicola De Maio, Scott Hughes, Russ Corbett-Detig","doi":"10.1093/ve/vead085","DOIUrl":null,"url":null,"abstract":"With the rapid spread and evolution of SARS-CoV-2, the ability to monitor its transmission and distinguish among viral lineages is critical for pandemic response efforts. The most commonly used software for the lineage assignment of newly isolated SARS-CoV-2 genomes is pangolin, which offers two methods of assignment, pangoLEARN and pUShER. PangoLEARN rapidly assigns lineages using a machine learning algorithm, while pUShER performs a phylogenetic placement to identify the lineage corresponding to a newly sequenced genome. In a preliminary study, we observed that pangoLEARN (decision tree model), while substantially faster than pUShER, offered less consistency across different versions of pangolin v3. Here, we expand upon this analysis to include v3 and v4 of pangolin, which moved the default algorithm for lineage assignment from pangoLEARN in v3 to pUShER in v4, and perform a thorough analysis confirming that pUShER is not only more stable across versions but also more accurate. Our findings suggest that future lineage assignment algorithms for various pathogens should consider the value of phylogenetic placement.","PeriodicalId":56026,"journal":{"name":"Virus Evolution","volume":"5 1","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2024-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Virus Evolution","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ve/vead085","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"VIROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid spread and evolution of SARS-CoV-2, the ability to monitor its transmission and distinguish among viral lineages is critical for pandemic response efforts. The most commonly used software for the lineage assignment of newly isolated SARS-CoV-2 genomes is pangolin, which offers two methods of assignment, pangoLEARN and pUShER. PangoLEARN rapidly assigns lineages using a machine learning algorithm, while pUShER performs a phylogenetic placement to identify the lineage corresponding to a newly sequenced genome. In a preliminary study, we observed that pangoLEARN (decision tree model), while substantially faster than pUShER, offered less consistency across different versions of pangolin v3. Here, we expand upon this analysis to include v3 and v4 of pangolin, which moved the default algorithm for lineage assignment from pangoLEARN in v3 to pUShER in v4, and perform a thorough analysis confirming that pUShER is not only more stable across versions but also more accurate. Our findings suggest that future lineage assignment algorithms for various pathogens should consider the value of phylogenetic placement.
期刊介绍:
Virus Evolution is a new Open Access journal focusing on the long-term evolution of viruses, viruses as a model system for studying evolutionary processes, viral molecular epidemiology and environmental virology.
The aim of the journal is to provide a forum for original research papers, reviews, commentaries and a venue for in-depth discussion on the topics relevant to virus evolution.