{"title":"Using artificial intelligence to document the hidden RNA virosphere","authors":"Xin Hou, Yong He, Pan Fang, Shi-Qiang Mei, Zan Xu, Wei-Chen Wu, Jun-Hua Tian, Shun Zhang, Zhen-Yu Zeng, Qin-Yu Gou, Gen-Yang Xin, Shi-Jia Le, Yin-Yue Xia, Yu-Lan Zhou, Feng-Ming Hui, Yuan-Fei Pan, John-Sebastian Eden, Zhao-Hui Yang, Chong Han, Yue-Long Shu, Mang Shi","doi":"10.1016/j.cell.2024.09.027","DOIUrl":null,"url":null,"abstract":"Current metagenomic tools can fail to identify highly divergent RNA viruses. We developed a deep learning algorithm, termed LucaProt, to discover highly divergent RNA-dependent RNA polymerase (RdRP) sequences in 10,487 metatranscriptomes generated from diverse global ecosystems. LucaProt integrates both sequence and predicted structural information, enabling the accurate detection of RdRP sequences. Using this approach, we identified 161,979 potential RNA virus species and 180 RNA virus supergroups, including many previously poorly studied groups, as well as RNA virus genomes of exceptional length (up to 47,250 nucleotides) and genomic complexity. A subset of these novel RNA viruses was confirmed by RT-PCR and RNA/DNA sequencing. Newly discovered RNA viruses were present in diverse environments, including air, hot springs, and hydrothermal vents, with virus diversity and abundance varying substantially among ecosystems. This study advances virus discovery, highlights the scale of the virosphere, and provides computational tools to better document the global RNA virome.","PeriodicalId":9656,"journal":{"name":"Cell","volume":"69 1","pages":""},"PeriodicalIF":45.5000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.cell.2024.09.027","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Current metagenomic tools can fail to identify highly divergent RNA viruses. We developed a deep learning algorithm, termed LucaProt, to discover highly divergent RNA-dependent RNA polymerase (RdRP) sequences in 10,487 metatranscriptomes generated from diverse global ecosystems. LucaProt integrates both sequence and predicted structural information, enabling the accurate detection of RdRP sequences. Using this approach, we identified 161,979 potential RNA virus species and 180 RNA virus supergroups, including many previously poorly studied groups, as well as RNA virus genomes of exceptional length (up to 47,250 nucleotides) and genomic complexity. A subset of these novel RNA viruses was confirmed by RT-PCR and RNA/DNA sequencing. Newly discovered RNA viruses were present in diverse environments, including air, hot springs, and hydrothermal vents, with virus diversity and abundance varying substantially among ecosystems. This study advances virus discovery, highlights the scale of the virosphere, and provides computational tools to better document the global RNA virome.
期刊介绍:
Cells is an international, peer-reviewed, open access journal that focuses on cell biology, molecular biology, and biophysics. It is affiliated with several societies, including the Spanish Society for Biochemistry and Molecular Biology (SEBBM), Nordic Autophagy Society (NAS), Spanish Society of Hematology and Hemotherapy (SEHH), and Society for Regenerative Medicine (Russian Federation) (RPO).
The journal publishes research findings of significant importance in various areas of experimental biology, such as cell biology, molecular biology, neuroscience, immunology, virology, microbiology, cancer, human genetics, systems biology, signaling, and disease mechanisms and therapeutics. The primary criterion for considering papers is whether the results contribute to significant conceptual advances or raise thought-provoking questions and hypotheses related to interesting and important biological inquiries.
In addition to primary research articles presented in four formats, Cells also features review and opinion articles in its "leading edge" section, discussing recent research advancements and topics of interest to its wide readership.