nf-rnaSeqMetagen: A nextflow metagenomics pipeline for identifying and characterizing microbial sequences from RNA-seq data

Q2 Medicine Medicine in Microecology Pub Date : 2020-06-01 Epub Date: 2020-05-13 DOI:10.1016/j.medmic.2020.100011
Phelelani T. Mpangase , Jacqueline Frost , Michèle Ramsay , Scott Hazelhurst
{"title":"nf-rnaSeqMetagen: A nextflow metagenomics pipeline for identifying and characterizing microbial sequences from RNA-seq data","authors":"Phelelani T. Mpangase ,&nbsp;Jacqueline Frost ,&nbsp;Michèle Ramsay ,&nbsp;Scott Hazelhurst","doi":"10.1016/j.medmic.2020.100011","DOIUrl":null,"url":null,"abstract":"<div><p>Metagenomics is a rapidly growing field aimed at identifying and characterizing the microbial genomes within diverse environmental samples. The key research area in metagenomics is the identification of non-host sequences within a host genomic background, which may represent potential microorganisms associated with the host. The aim of this study was to develop an efficient, portable and reproducible metagenomics pipeline for identifying and characterizing microbial reads from high throughput RNA sequencing (RNA-seq) data. The <span>nf-rnaSeqMetagen</span> pipeline presented in this study was developed using Nextflow as a workflow management system to orchestrate applications used in the pipeline and to handle input/output data between processes. All applications were containerized using Singularity to facilitate parallelization, portability and reproducibility. The pipeline takes RNA-seq reads as input and filters out reads belonging to the host organism. The remaining exogenous reads are then characterized using the <span>kraken2</span> database constructed from bacterial, archaeal, and viral genomes. RNA-seq data from skin samples of patients with the systemic sclerosis (SSc) disease were used to test the pipeline and to identify possible pathogens, so as to better understand the onset and progression of the disease. A number of bacterial species belonging to <em>Arthrobacter</em>, <em>Bacillus</em>, <em>Brachybacterium</em>, <em>Dietzia</em> and <em>Pseudarthrobacter</em> were found to be of clinical relevance and highly common in the SSc patients. <span>nf-rnaSeqMetagen</span> was also extended to work with other metagenomics studies using RNA-seq data and adapted to work on different computational platforms. The <span>nf-rnaSeqMetagen</span> pipeline is freely available on GitHub (<span>https://github.com/phelelani/nf-rnaSeqMetagen</span><svg><path></path></svg>).</p></div>","PeriodicalId":36019,"journal":{"name":"Medicine in Microecology","volume":"4 ","pages":"Article 100011"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.medmic.2020.100011","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicine in Microecology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590097820300082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/5/13 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 2

Abstract

Metagenomics is a rapidly growing field aimed at identifying and characterizing the microbial genomes within diverse environmental samples. The key research area in metagenomics is the identification of non-host sequences within a host genomic background, which may represent potential microorganisms associated with the host. The aim of this study was to develop an efficient, portable and reproducible metagenomics pipeline for identifying and characterizing microbial reads from high throughput RNA sequencing (RNA-seq) data. The nf-rnaSeqMetagen pipeline presented in this study was developed using Nextflow as a workflow management system to orchestrate applications used in the pipeline and to handle input/output data between processes. All applications were containerized using Singularity to facilitate parallelization, portability and reproducibility. The pipeline takes RNA-seq reads as input and filters out reads belonging to the host organism. The remaining exogenous reads are then characterized using the kraken2 database constructed from bacterial, archaeal, and viral genomes. RNA-seq data from skin samples of patients with the systemic sclerosis (SSc) disease were used to test the pipeline and to identify possible pathogens, so as to better understand the onset and progression of the disease. A number of bacterial species belonging to Arthrobacter, Bacillus, Brachybacterium, Dietzia and Pseudarthrobacter were found to be of clinical relevance and highly common in the SSc patients. nf-rnaSeqMetagen was also extended to work with other metagenomics studies using RNA-seq data and adapted to work on different computational platforms. The nf-rnaSeqMetagen pipeline is freely available on GitHub (https://github.com/phelelani/nf-rnaSeqMetagen).

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
nf-rnaSeqMetagen: nextflow宏基因组学管道,用于从RNA-seq数据中鉴定和表征微生物序列
宏基因组学是一个快速发展的领域,旨在识别和表征不同环境样本中的微生物基因组。宏基因组学的关键研究领域是鉴定宿主基因组背景下的非宿主序列,这些序列可能代表与宿主相关的潜在微生物。本研究的目的是开发一种高效、便携和可重复的宏基因组学管道,用于从高通量RNA测序(RNA-seq)数据中鉴定和表征微生物reads。本研究中提出的nf-rnaSeqMetagen管道是使用Nextflow作为工作流管理系统开发的,用于编排管道中使用的应用程序并处理进程之间的输入/输出数据。所有应用程序都使用Singularity容器化,以促进并行化、可移植性和可再现性。该管道以RNA-seq读取作为输入,过滤掉属于宿主生物的读取。然后使用由细菌、古细菌和病毒基因组构建的kraken2数据库对剩余的外源reads进行表征。来自系统性硬化症(SSc)患者皮肤样本的RNA-seq数据用于测试该管道并识别可能的病原体,以便更好地了解疾病的发生和进展。节肢杆菌、芽孢杆菌、短杆菌、Dietzia和假节肢杆菌等多种细菌在SSc患者中具有临床相关性和高度常见性。nf-rnaSeqMetagen还扩展到使用RNA-seq数据与其他宏基因组学研究一起工作,并适应于不同的计算平台。nf-rnaSeqMetagen管道在GitHub (https://github.com/phelelani/nf-rnaSeqMetagen)上免费提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Medicine in Microecology
Medicine in Microecology Medicine-Gastroenterology
CiteScore
5.60
自引率
0.00%
发文量
16
审稿时长
76 days
期刊最新文献
Multi-scale diversity analysis reveals scale-dependent microbial alterations in Parkinson's disease E. coli K12 as a dysbiosis-driven catalyst in colorectal cancer: In silico identification of cross-species protein-protein interactions Interfacing genetics and oral microbiota in schizophrenia: A mechanistic review of bidirectional neuroimmune pathways Protective role of nanocurcumin against malathion-induced toxicity in selected gut microbiota isolates Comparative bioactive compound profiling of endophytic fungi and their respective host plants for sustainable drug discovery
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1