Vulture: cloud-enabled scalable mining of microbial reads in public scRNA-seq data.

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES GigaScience Pub Date : 2024-01-02 DOI:10.1093/gigascience/giad117
Junyi Chen, Danqing Yin, Harris Y H Wong, Xin Duan, Ken H O Yu, Joshua W K Ho
{"title":"Vulture: cloud-enabled scalable mining of microbial reads in public scRNA-seq data.","authors":"Junyi Chen, Danqing Yin, Harris Y H Wong, Xin Duan, Ken H O Yu, Joshua W K Ho","doi":"10.1093/gigascience/giad117","DOIUrl":null,"url":null,"abstract":"<p><p>The rapidly growing collection of public single-cell sequencing data has become a valuable resource for molecular, cellular, and microbial discovery. Previous studies mostly overlooked detecting pathogens in human single-cell sequencing data. Moreover, existing bioinformatics tools lack the scalability to deal with big public data. We introduce Vulture, a scalable cloud-based pipeline that performs microbial calling for single-cell RNA sequencing (scRNA-seq) data, enabling meta-analysis of host-microbial studies from the public domain. In our benchmarking experiments, Vulture is 66% to 88% faster than local tools (PathogenTrack and Venus) and 41% faster than the state-of-the-art cloud-based tool Cumulus, while achieving comparable microbial read identification. In terms of the cost on cloud computing systems, Vulture also shows a cost reduction of 83% ($12 vs. ${\\$}$70). We applied Vulture to 2 coronavirus disease 2019, 3 hepatocellular carcinoma (HCC), and 2 gastric cancer human patient cohorts with public sequencing reads data from scRNA-seq experiments and discovered cell type-specific enrichment of severe acute respiratory syndrome coronavirus 2, hepatitis B virus (HBV), and Helicobacter pylori-positive cells, respectively. In the HCC analysis, all cohorts showed hepatocyte-only enrichment of HBV, with cell subtype-associated HBV enrichment based on inferred copy number variations. In summary, Vulture presents a scalable and economical framework to mine unknown host-microbial interactions from large-scale public scRNA-seq data. Vulture is available via an open-source license at https://github.com/holab-hku/Vulture.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10776309/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giad117","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The rapidly growing collection of public single-cell sequencing data has become a valuable resource for molecular, cellular, and microbial discovery. Previous studies mostly overlooked detecting pathogens in human single-cell sequencing data. Moreover, existing bioinformatics tools lack the scalability to deal with big public data. We introduce Vulture, a scalable cloud-based pipeline that performs microbial calling for single-cell RNA sequencing (scRNA-seq) data, enabling meta-analysis of host-microbial studies from the public domain. In our benchmarking experiments, Vulture is 66% to 88% faster than local tools (PathogenTrack and Venus) and 41% faster than the state-of-the-art cloud-based tool Cumulus, while achieving comparable microbial read identification. In terms of the cost on cloud computing systems, Vulture also shows a cost reduction of 83% ($12 vs. ${\$}$70). We applied Vulture to 2 coronavirus disease 2019, 3 hepatocellular carcinoma (HCC), and 2 gastric cancer human patient cohorts with public sequencing reads data from scRNA-seq experiments and discovered cell type-specific enrichment of severe acute respiratory syndrome coronavirus 2, hepatitis B virus (HBV), and Helicobacter pylori-positive cells, respectively. In the HCC analysis, all cohorts showed hepatocyte-only enrichment of HBV, with cell subtype-associated HBV enrichment based on inferred copy number variations. In summary, Vulture presents a scalable and economical framework to mine unknown host-microbial interactions from large-scale public scRNA-seq data. Vulture is available via an open-source license at https://github.com/holab-hku/Vulture.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Vulture:通过云技术对公共 scRNA-seq 数据中的微生物读数进行可扩展的挖掘。
快速增长的公共单细胞测序数据已成为分子、细胞和微生物发现的宝贵资源。以往的研究大多忽视了在人类单细胞测序数据中检测病原体。此外,现有的生物信息学工具缺乏处理大型公共数据的可扩展性。我们介绍了 Vulture,这是一种基于云的可扩展管道,可对单细胞 RNA 测序(scRNA-seq)数据进行微生物调用,从而对来自公共领域的宿主-微生物研究进行荟萃分析。在我们的基准实验中,Vulture的速度比本地工具(PathogenTrack和Venus)快66%到88%,比最先进的云计算工具Cumulus快41%,同时实现了相当的微生物读数识别。就云计算系统的成本而言,Vulture 的成本也降低了 83%(12 美元对 70 美元)。我们将Vulture应用于2个2019年冠状病毒疾病、3个肝细胞癌(HCC)和2个胃癌人类患者队列的scRNA-seq实验公共测序读数数据,分别发现了严重急性呼吸综合征冠状病毒2、乙型肝炎病毒(HBV)和幽门螺旋杆菌阳性细胞的细胞特异性富集。在 HCC 分析中,所有队列都显示出仅肝细胞的 HBV 富集,而细胞亚型相关的 HBV 富集是基于推断的拷贝数变异。总之,Vulture 提供了一个可扩展且经济的框架,可从大规模公共 scRNA-seq 数据中挖掘未知的宿主-微生物相互作用。Vulture 可通过 https://github.com/holab-hku/Vulture 的开源许可获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
期刊最新文献
Cord blood DNA methylation and cell type composition are not significantly associated with severe preeclampsia, after cell type and clinical covariate adjustment. Harnessing Artificial Intelligence for Genomic Variant Prediction: Advances, Challenges, and Future Directions. deMEM: a novel divide-and-conquer framework based on de Bruijn graph for scalable multiple sequence alignment. Genome Assembly of Three Shrub Mangroves in the Genus Acanthus Reveals Two Polyploidy Events and Expansion of Genes Linked to Root Adaptation in Coastal Habitats. Charting Immune Variation Through Genetics and Single-Cell Genomics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1