{"title":"LSTrAP-denovo: Automated Generation of Transcriptome Atlases for Eukaryotic Species Without Genomes.","authors":"Peng Ken Lim, Ruoxi Wang, Marek Mutwil","doi":"10.1111/ppl.14407","DOIUrl":null,"url":null,"abstract":"<p><p>Despite the abundance of species with transcriptomic data, a significant number of species still lack sequenced genomes, making it difficult to study gene function and expression in these organisms. While de novo transcriptome assembly can be used to assemble protein-coding transcripts from RNA-sequencing (RNA-seq) data, the datasets used often only feature samples of arbitrarily selected or similar experimental conditions, which might fail to capture condition-specific transcripts. We developed the Large-Scale Transcriptome Assembly Pipeline for de novo assembled transcripts (LSTrAP-denovo) to automatically generate transcriptome atlases of eukaryotic species. Specifically, given an NCBI TaxID, LSTrAP-denovo can (1) filter undesirable RNA-seq accessions based on read data, (2) select RNA-seq accessions via unsupervised machine learning to construct a sample-balanced dataset for download, (3) assemble transcripts via over-assembly, (4) functionally annotate coding sequences (CDS) from assembled transcripts and (5) generate transcriptome atlases in the form of expression matrices for downstream transcriptomic analyses. LSTrAP-denovo is easy to implement, written in Python, and is freely available at https://github.com/pengkenlim/LSTrAP-denovo/.</p>","PeriodicalId":20164,"journal":{"name":"Physiologia plantarum","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physiologia plantarum","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/ppl.14407","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Despite the abundance of species with transcriptomic data, a significant number of species still lack sequenced genomes, making it difficult to study gene function and expression in these organisms. While de novo transcriptome assembly can be used to assemble protein-coding transcripts from RNA-sequencing (RNA-seq) data, the datasets used often only feature samples of arbitrarily selected or similar experimental conditions, which might fail to capture condition-specific transcripts. We developed the Large-Scale Transcriptome Assembly Pipeline for de novo assembled transcripts (LSTrAP-denovo) to automatically generate transcriptome atlases of eukaryotic species. Specifically, given an NCBI TaxID, LSTrAP-denovo can (1) filter undesirable RNA-seq accessions based on read data, (2) select RNA-seq accessions via unsupervised machine learning to construct a sample-balanced dataset for download, (3) assemble transcripts via over-assembly, (4) functionally annotate coding sequences (CDS) from assembled transcripts and (5) generate transcriptome atlases in the form of expression matrices for downstream transcriptomic analyses. LSTrAP-denovo is easy to implement, written in Python, and is freely available at https://github.com/pengkenlim/LSTrAP-denovo/.
期刊介绍:
Physiologia Plantarum is an international journal committed to publishing the best full-length original research papers that advance our understanding of primary mechanisms of plant development, growth and productivity as well as plant interactions with the biotic and abiotic environment. All organisational levels of experimental plant biology – from molecular and cell biology, biochemistry and biophysics to ecophysiology and global change biology – fall within the scope of the journal. The content is distributed between 5 main subject areas supervised by Subject Editors specialised in the respective domain: (1) biochemistry and metabolism, (2) ecophysiology, stress and adaptation, (3) uptake, transport and assimilation, (4) development, growth and differentiation, (5) photobiology and photosynthesis.