Predictions of COVID-19 Infection Severity Based on Co-associations between the SNPs of Co-morbid Diseases and COVID-19 through Machine Learning of Genetic Data
R-Y Wang, Tim Qinsong Guo, L. Li, Julia Yutian Jiao, Lena Yiqi Wang
{"title":"Predictions of COVID-19 Infection Severity Based on Co-associations between the SNPs of Co-morbid Diseases and COVID-19 through Machine Learning of Genetic Data","authors":"R-Y Wang, Tim Qinsong Guo, L. Li, Julia Yutian Jiao, Lena Yiqi Wang","doi":"10.1109/ICCSNT50940.2020.9304990","DOIUrl":null,"url":null,"abstract":"In this research, a quantitative model is built to predict people's susceptibility to COVID-19 based on their genomes. Identifying people vulnerable to COVID-19 infections is crucial in stopping the spread of the virus. In previous studies, researchers have found that individuals with comorbid diseases have higher chances of being infected and developing more severe COVID-19 conditions. However, these patterns are only observed through correlational analyses between patient phenotypes and the severity of their COVID-19 infection. In this study, genetic variants underlying the observed comorbidity patterns are analyzed through machine learning of COVID-19 data from GWAS studies, which may reveal biological pathways underlying COVID-19 contraction that are essential to the development of effective and targeted therapeutics. Furthermore, through combining genetic variants with the individual's phenotypes, this study built a Neural Network model and Random Forest classifier to predict an individual's likelihood of COVID-19 infection. The Random Forest Classifier in this study shows that on-going symptoms are generally better predictors of COVID-19 condition (higher impurity-based feature importance) than diseases or medical histories. In addition, when trained with genomic data, the comorbid disease impact ranking deduced by the resulting RF model is highly consistent with phenotypic comorbidity patterns observed in past studies.","PeriodicalId":6794,"journal":{"name":"2020 IEEE 8th International Conference on Computer Science and Network Technology (ICCSNT)","volume":"15 3 1","pages":"92-96"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 8th International Conference on Computer Science and Network Technology (ICCSNT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSNT50940.2020.9304990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
In this research, a quantitative model is built to predict people's susceptibility to COVID-19 based on their genomes. Identifying people vulnerable to COVID-19 infections is crucial in stopping the spread of the virus. In previous studies, researchers have found that individuals with comorbid diseases have higher chances of being infected and developing more severe COVID-19 conditions. However, these patterns are only observed through correlational analyses between patient phenotypes and the severity of their COVID-19 infection. In this study, genetic variants underlying the observed comorbidity patterns are analyzed through machine learning of COVID-19 data from GWAS studies, which may reveal biological pathways underlying COVID-19 contraction that are essential to the development of effective and targeted therapeutics. Furthermore, through combining genetic variants with the individual's phenotypes, this study built a Neural Network model and Random Forest classifier to predict an individual's likelihood of COVID-19 infection. The Random Forest Classifier in this study shows that on-going symptoms are generally better predictors of COVID-19 condition (higher impurity-based feature importance) than diseases or medical histories. In addition, when trained with genomic data, the comorbid disease impact ranking deduced by the resulting RF model is highly consistent with phenotypic comorbidity patterns observed in past studies.