Application of Generative Adversarial Networks on RNASeq data to uncover COVID-19 severity biomarkers

Yvette K. Kalimumbalo , Rosaline W. Macharia , Peter W. Wagacha
{"title":"Application of Generative Adversarial Networks on RNASeq data to uncover COVID-19 severity biomarkers","authors":"Yvette K. Kalimumbalo ,&nbsp;Rosaline W. Macharia ,&nbsp;Peter W. Wagacha","doi":"10.1016/j.abst.2025.01.002","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The COVID-19 pandemic has highlighted the need for reliable biomarkers to predict disease severity and guide treatment strategies. However, the analysis of RNASeq data for biomarker discovery using machine learning is constrained by limited sample sizes, primarily due to cost and privacy considerations. In this study, we applied Generative Adversarial Networks (GANs) to RNASeq data in the process of identifying biomarkers associated with COVID-19 severity.</div></div><div><h3>Methods</h3><div>RNASeq data from COVID-19 patients, along with severity metadata, were collected from the GEO database. Differential expression analysis was conducted and GAN models were trained to augment the original dataset. This enhanced subsequent machine learning models’ robustness and accuracy for biomarker discovery. Feature selection using Recursive Feature Elimination with Cross-Validation (RFECV) identified key biomarkers on cGAN- and cWGAN-augmented datasets.</div></div><div><h3>Results</h3><div>Several key biomarkers significantly associated with disease severity were identified. Gene Ontology Enrichment analysis revealed upregulation of neutrophil degranulation and downregulation of T-cell activity, consistent with previous findings. The ROC analysis using a Random Forest machine learning model and the five most important biomarkers (CCDC65, ZNF239, OTUD7A, CEP126, and TCTN2) achieved high accuracy (AUC: 0.98, Acc: 0.94) in predicting disease severity. These genes are associated with processes such as cilium assembly, IFN activation, and NF-kB pathway suppression.</div></div><div><h3>Conclusions</h3><div>Our results demonstrate that GANs can effectively augment RNASeq data, leading to consistent findings that align with known mechanisms and providing new insights into severe COVID-19 transcriptional responses. Further experimental validation is needed to confirm the applicability of these biomarkers in diverse populations.</div></div>","PeriodicalId":72080,"journal":{"name":"Advances in biomarker sciences and technology","volume":"7 ","pages":"Pages 44-58"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in biomarker sciences and technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S254310642500002X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background

The COVID-19 pandemic has highlighted the need for reliable biomarkers to predict disease severity and guide treatment strategies. However, the analysis of RNASeq data for biomarker discovery using machine learning is constrained by limited sample sizes, primarily due to cost and privacy considerations. In this study, we applied Generative Adversarial Networks (GANs) to RNASeq data in the process of identifying biomarkers associated with COVID-19 severity.

Methods

RNASeq data from COVID-19 patients, along with severity metadata, were collected from the GEO database. Differential expression analysis was conducted and GAN models were trained to augment the original dataset. This enhanced subsequent machine learning models’ robustness and accuracy for biomarker discovery. Feature selection using Recursive Feature Elimination with Cross-Validation (RFECV) identified key biomarkers on cGAN- and cWGAN-augmented datasets.

Results

Several key biomarkers significantly associated with disease severity were identified. Gene Ontology Enrichment analysis revealed upregulation of neutrophil degranulation and downregulation of T-cell activity, consistent with previous findings. The ROC analysis using a Random Forest machine learning model and the five most important biomarkers (CCDC65, ZNF239, OTUD7A, CEP126, and TCTN2) achieved high accuracy (AUC: 0.98, Acc: 0.94) in predicting disease severity. These genes are associated with processes such as cilium assembly, IFN activation, and NF-kB pathway suppression.

Conclusions

Our results demonstrate that GANs can effectively augment RNASeq data, leading to consistent findings that align with known mechanisms and providing new insights into severe COVID-19 transcriptional responses. Further experimental validation is needed to confirm the applicability of these biomarkers in diverse populations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Advances in biomarker sciences and technology
Advances in biomarker sciences and technology Biotechnology, Clinical Biochemistry, Molecular Medicine, Public Health and Health Policy
自引率
0.00%
发文量
0
审稿时长
20 weeks
期刊最新文献
Application of Generative Adversarial Networks on RNASeq data to uncover COVID-19 severity biomarkers Bringing lab to the field: Exploring innovations in point-of-care diagnostics for the rapid detection and management of tropical diseases in resource-limited settings Evaluation of toxicity and antioxidant activities of various crude extracts of leaves and stems of Zygophyllum simplex Unraveling ankylosing spondylitis: Exploring the genetic and immunological factors and latest treatment innovations Etiological connections between initial COVID-19 and two rare infectious diseases
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1