LSTrAP-denovo: Automated Generation of Transcriptome Atlases for Eukaryotic Species Without Genomes.

IF 5.4 2区 生物学 Q1 PLANT SCIENCES Physiologia plantarum Pub Date : 2024-07-01 DOI:10.1111/ppl.14407
Peng Ken Lim, Ruoxi Wang, Marek Mutwil
{"title":"LSTrAP-denovo: Automated Generation of Transcriptome Atlases for Eukaryotic Species Without Genomes.","authors":"Peng Ken Lim, Ruoxi Wang, Marek Mutwil","doi":"10.1111/ppl.14407","DOIUrl":null,"url":null,"abstract":"<p><p>Despite the abundance of species with transcriptomic data, a significant number of species still lack sequenced genomes, making it difficult to study gene function and expression in these organisms. While de novo transcriptome assembly can be used to assemble protein-coding transcripts from RNA-sequencing (RNA-seq) data, the datasets used often only feature samples of arbitrarily selected or similar experimental conditions, which might fail to capture condition-specific transcripts. We developed the Large-Scale Transcriptome Assembly Pipeline for de novo assembled transcripts (LSTrAP-denovo) to automatically generate transcriptome atlases of eukaryotic species. Specifically, given an NCBI TaxID, LSTrAP-denovo can (1) filter undesirable RNA-seq accessions based on read data, (2) select RNA-seq accessions via unsupervised machine learning to construct a sample-balanced dataset for download, (3) assemble transcripts via over-assembly, (4) functionally annotate coding sequences (CDS) from assembled transcripts and (5) generate transcriptome atlases in the form of expression matrices for downstream transcriptomic analyses. LSTrAP-denovo is easy to implement, written in Python, and is freely available at https://github.com/pengkenlim/LSTrAP-denovo/.</p>","PeriodicalId":20164,"journal":{"name":"Physiologia plantarum","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physiologia plantarum","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/ppl.14407","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Despite the abundance of species with transcriptomic data, a significant number of species still lack sequenced genomes, making it difficult to study gene function and expression in these organisms. While de novo transcriptome assembly can be used to assemble protein-coding transcripts from RNA-sequencing (RNA-seq) data, the datasets used often only feature samples of arbitrarily selected or similar experimental conditions, which might fail to capture condition-specific transcripts. We developed the Large-Scale Transcriptome Assembly Pipeline for de novo assembled transcripts (LSTrAP-denovo) to automatically generate transcriptome atlases of eukaryotic species. Specifically, given an NCBI TaxID, LSTrAP-denovo can (1) filter undesirable RNA-seq accessions based on read data, (2) select RNA-seq accessions via unsupervised machine learning to construct a sample-balanced dataset for download, (3) assemble transcripts via over-assembly, (4) functionally annotate coding sequences (CDS) from assembled transcripts and (5) generate transcriptome atlases in the form of expression matrices for downstream transcriptomic analyses. LSTrAP-denovo is easy to implement, written in Python, and is freely available at https://github.com/pengkenlim/LSTrAP-denovo/.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LSTrAP-denovo:自动生成无基因组真核生物物种的转录组图谱。
尽管拥有转录组数据的物种很多,但仍有大量物种缺乏已测序的基因组,因此很难研究这些生物的基因功能和表达。虽然从头开始的转录组组装可用于从 RNA 序列(RNA-seq)数据中组装编码蛋白质的转录本,但所使用的数据集往往只有任意选择或类似实验条件下的样本,可能无法捕获特定条件下的转录本。我们开发了用于从头组装转录本的大规模转录本组组装管道(LSTrAP-denovo),以自动生成真核生物物种的转录本组图谱。具体来说,给定一个 NCBI TaxID,LSTrAP-denovo 可以:(1)根据读取数据过滤不需要的 RNA-seq 序列;(2)通过无监督机器学习选择 RNA-seq 序列,构建一个样本平衡的数据集供下载;(3)通过过度组装组装转录本;(4)从组装的转录本中对编码序列(CDS)进行功能注释;(5)以表达矩阵的形式生成转录组图谱,用于下游转录组分析。LSTrAP-denovo 易于实现,用 Python 编写,可在 https://github.com/pengkenlim/LSTrAP-denovo/ 免费获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Physiologia plantarum
Physiologia plantarum 生物-植物科学
CiteScore
11.00
自引率
3.10%
发文量
224
审稿时长
3.9 months
期刊介绍: Physiologia Plantarum is an international journal committed to publishing the best full-length original research papers that advance our understanding of primary mechanisms of plant development, growth and productivity as well as plant interactions with the biotic and abiotic environment. All organisational levels of experimental plant biology – from molecular and cell biology, biochemistry and biophysics to ecophysiology and global change biology – fall within the scope of the journal. The content is distributed between 5 main subject areas supervised by Subject Editors specialised in the respective domain: (1) biochemistry and metabolism, (2) ecophysiology, stress and adaptation, (3) uptake, transport and assimilation, (4) development, growth and differentiation, (5) photobiology and photosynthesis.
期刊最新文献
Meta-analysis of SnRK2 gene overexpression in response to drought and salt stress. Regulatory effect of pipecolic acid (Pip) on the antioxidant system activity of Mesembryanthemum crystallinum plants exposed to bacterial treatment. Tree species and drought: Two mysterious long-standing counterparts. R2R3-MYB repressor, BrMYB32, regulates anthocyanin biosynthesis in Chinese cabbage. The function of an apple ATP-dependent Phosphofructokinase gene MdPFK5 in regulating salt stress.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1