首页 > 最新文献

Genome Biology最新文献

英文 中文
seqQscorer: automated quality control of next-generation sequencing data using machine learning. seqQscorer:使用机器学习的下一代测序数据的自动质量控制。
IF 12.3 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2021-03-05 DOI: 10.1186/s13059-021-02294-2
Steffen Albrecht, Maximilian Sprang, Miguel A Andrade-Navarro, Jean-Fred Fontaine

Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at https://github.com/salbrec/seqQscorer .

下一代测序(NGS)数据文件的质量控制是一项必要而复杂的工作。为了解决这个问题,我们对常见的NGS质量特征进行了统计表征,并开发了一种涉及基于树和深度学习分类算法的新型质量控制程序。在内部和外部功能基因组数据集上验证的预测模型在一定程度上可以推广到未知物种的数据。导出的统计指南和预测模型为NGS数据用户更好地理解质量问题和执行自动质量控制提供了宝贵的资源。我们的指导方针和软件可在https://github.com/salbrec/seqQscorer上获得。
{"title":"seqQscorer: automated quality control of next-generation sequencing data using machine learning.","authors":"Steffen Albrecht,&nbsp;Maximilian Sprang,&nbsp;Miguel A Andrade-Navarro,&nbsp;Jean-Fred Fontaine","doi":"10.1186/s13059-021-02294-2","DOIUrl":"https://doi.org/10.1186/s13059-021-02294-2","url":null,"abstract":"<p><p>Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at https://github.com/salbrec/seqQscorer .</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"75"},"PeriodicalIF":12.3,"publicationDate":"2021-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02294-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25447787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Single cell eQTL analysis identifies cell type-specific genetic control of gene expression in fibroblasts and reprogrammed induced pluripotent stem cells. 单细胞eQTL分析确定了成纤维细胞和重编程诱导多能干细胞中基因表达的细胞类型特异性遗传控制。
IF 12.3 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2021-03-05 DOI: 10.1186/s13059-021-02293-3
Drew Neavin, Quan Nguyen, Maciej S Daniszewski, Helena H Liang, Han Sheng Chiu, Yong Kiat Wee, Anne Senabouth, Samuel W Lukowski, Duncan E Crombie, Grace E Lidgerwood, Damián Hernández, James C Vickers, Anthony L Cook, Nathan J Palpant, Alice Pébay, Alex W Hewitt, Joseph E Powell

Background: The discovery that somatic cells can be reprogrammed to induced pluripotent stem cells (iPSCs) has provided a foundation for in vitro human disease modelling, drug development and population genetics studies. Gene expression plays a critical role in complex disease risk and therapeutic response. However, while the genetic background of reprogrammed cell lines has been shown to strongly influence gene expression, the effect has not been evaluated at the level of individual cells which would provide significant resolution. By integrating single cell RNA-sequencing (scRNA-seq) and population genetics, we apply a framework in which to evaluate cell type-specific effects of genetic variation on gene expression.

Results: Here, we perform scRNA-seq on 64,018 fibroblasts from 79 donors and map expression quantitative trait loci (eQTLs) at the level of individual cell types. We demonstrate that the majority of eQTLs detected in fibroblasts are specific to an individual cell subtype. To address if the allelic effects on gene expression are maintained following cell reprogramming, we generate scRNA-seq data in 19,967 iPSCs from 31 reprogramed donor lines. We again identify highly cell type-specific eQTLs in iPSCs and show that the eQTLs in fibroblasts almost entirely disappear during reprogramming.

Conclusions: This work provides an atlas of how genetic variation influences gene expression across cell subtypes and provides evidence for patterns of genetic architecture that lead to cell type-specific eQTL effects.

背景:体细胞可重编程为诱导多能干细胞(iPSCs)的发现为体外人类疾病建模、药物开发和群体遗传学研究提供了基础。基因表达在复杂疾病的风险和治疗反应中起着关键作用。然而,虽然重编程细胞系的遗传背景已被证明强烈影响基因表达,但这种影响尚未在单个细胞水平上进行评估,这将提供重要的解决方案。通过整合单细胞rna测序(scRNA-seq)和群体遗传学,我们应用了一个框架来评估遗传变异对基因表达的细胞类型特异性影响。结果:在这里,我们对来自79个供体的64,018个成纤维细胞进行了scrna测序,并在单个细胞类型水平上绘制了表达数量性状位点(eQTLs)。我们证明,在成纤维细胞中检测到的大多数eqtl是特定于单个细胞亚型的。为了解决在细胞重编程后是否能维持等位基因对基因表达的影响,我们对来自31个重编程供体系的19,967个iPSCs进行了scRNA-seq数据分析。我们再次在iPSCs中发现了高度细胞类型特异性的eqtl,并表明成纤维细胞中的eqtl在重编程过程中几乎完全消失。结论:这项工作提供了遗传变异如何影响细胞亚型基因表达的图谱,并为导致细胞类型特异性eQTL效应的遗传结构模式提供了证据。
{"title":"Single cell eQTL analysis identifies cell type-specific genetic control of gene expression in fibroblasts and reprogrammed induced pluripotent stem cells.","authors":"Drew Neavin, Quan Nguyen, Maciej S Daniszewski, Helena H Liang, Han Sheng Chiu, Yong Kiat Wee, Anne Senabouth, Samuel W Lukowski, Duncan E Crombie, Grace E Lidgerwood, Damián Hernández, James C Vickers, Anthony L Cook, Nathan J Palpant, Alice Pébay, Alex W Hewitt, Joseph E Powell","doi":"10.1186/s13059-021-02293-3","DOIUrl":"10.1186/s13059-021-02293-3","url":null,"abstract":"<p><strong>Background: </strong>The discovery that somatic cells can be reprogrammed to induced pluripotent stem cells (iPSCs) has provided a foundation for in vitro human disease modelling, drug development and population genetics studies. Gene expression plays a critical role in complex disease risk and therapeutic response. However, while the genetic background of reprogrammed cell lines has been shown to strongly influence gene expression, the effect has not been evaluated at the level of individual cells which would provide significant resolution. By integrating single cell RNA-sequencing (scRNA-seq) and population genetics, we apply a framework in which to evaluate cell type-specific effects of genetic variation on gene expression.</p><p><strong>Results: </strong>Here, we perform scRNA-seq on 64,018 fibroblasts from 79 donors and map expression quantitative trait loci (eQTLs) at the level of individual cell types. We demonstrate that the majority of eQTLs detected in fibroblasts are specific to an individual cell subtype. To address if the allelic effects on gene expression are maintained following cell reprogramming, we generate scRNA-seq data in 19,967 iPSCs from 31 reprogramed donor lines. We again identify highly cell type-specific eQTLs in iPSCs and show that the eQTLs in fibroblasts almost entirely disappear during reprogramming.</p><p><strong>Conclusions: </strong>This work provides an atlas of how genetic variation influences gene expression across cell subtypes and provides evidence for patterns of genetic architecture that lead to cell type-specific eQTL effects.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"76"},"PeriodicalIF":12.3,"publicationDate":"2021-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02293-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25441071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
simATAC: a single-cell ATAC-seq simulation framework. simATAC:单细胞ATAC-seq仿真框架。
IF 12.3 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2021-03-04 DOI: 10.1186/s13059-021-02270-w
Zeinab Navidi, Lin Zhang, Bo Wang

Single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) identifies regulated chromatin accessibility modules at the single-cell resolution. Robust evaluation is critical to the development of scATAC-seq pipelines, which calls for reproducible datasets for benchmarking. We hereby present the simATAC framework, an R package that generates scATAC-seq count matrices that highly resemble real scATAC-seq datasets in library size, sparsity, and chromatin accessibility signals. simATAC deploys statistical models derived from analyzing 90 real scATAC-seq cell groups. simATAC provides a robust and systematic approach to generate in silico scATAC-seq samples with known cell labels for assessing analytical pipelines.

转座酶可及染色质测序的单细胞测定(scATAC-seq)在单细胞分辨率下鉴定受调节的染色质可及性模块。稳健的评估对于scATAC-seq管道的开发至关重要,这需要可重复的数据集进行基准测试。我们在此提出simATAC框架,这是一个R包,它生成的scATAC-seq计数矩阵在库大小、稀疏性和染色质可及性信号方面与真实的scATAC-seq数据集高度相似。simATAC部署的统计模型来源于对90个真实scATAC-seq细胞群的分析。simATAC提供了一种强大而系统的方法来生成具有已知细胞标签的scATAC-seq样品,用于评估分析管道。
{"title":"simATAC: a single-cell ATAC-seq simulation framework.","authors":"Zeinab Navidi,&nbsp;Lin Zhang,&nbsp;Bo Wang","doi":"10.1186/s13059-021-02270-w","DOIUrl":"https://doi.org/10.1186/s13059-021-02270-w","url":null,"abstract":"<p><p>Single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) identifies regulated chromatin accessibility modules at the single-cell resolution. Robust evaluation is critical to the development of scATAC-seq pipelines, which calls for reproducible datasets for benchmarking. We hereby present the simATAC framework, an R package that generates scATAC-seq count matrices that highly resemble real scATAC-seq datasets in library size, sparsity, and chromatin accessibility signals. simATAC deploys statistical models derived from analyzing 90 real scATAC-seq cell groups. simATAC provides a robust and systematic approach to generate in silico scATAC-seq samples with known cell labels for assessing analytical pipelines.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"74"},"PeriodicalIF":12.3,"publicationDate":"2021-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02270-w","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25430881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing. 2passstools:使用机器学习过滤剪接连接的两遍比对提高了长读RNA测序中内含子检测的准确性。
IF 12.3 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2021-03-01 DOI: 10.1186/s13059-021-02296-0
Matthew T Parker, Katarzyna Knop, Geoffrey J Barton, Gordon G Simpson

Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification. Here we apply alignment metrics and machine-learning-derived sequence information to filter spurious splice junctions from long-read alignments and use the remaining junctions to guide realignment in a two-pass approach. This method, available in the software package 2passtools ( https://github.com/bartongroup/2passtools ), improves the accuracy of spliced alignment and transcriptome assembly for species both with and without existing high-quality annotations.

真核生物基因组的转录涉及rna的复杂替代加工。利用长读段对全长rna进行测序,揭示了处理过程的真正复杂性。然而,长读测序技术较高的错误率降低了内含子鉴定的准确性。在这里,我们应用比对指标和机器学习衍生的序列信息来过滤长读比对中的虚假剪接连接,并使用剩余的连接以两遍方法指导重新排列。该方法可在软件包2passtools (https://github.com/bartongroup/2passtools)中获得,无论是否存在高质量的注释,该方法都可以提高物种拼接比对和转录组组装的准确性。
{"title":"2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing.","authors":"Matthew T Parker,&nbsp;Katarzyna Knop,&nbsp;Geoffrey J Barton,&nbsp;Gordon G Simpson","doi":"10.1186/s13059-021-02296-0","DOIUrl":"https://doi.org/10.1186/s13059-021-02296-0","url":null,"abstract":"<p><p>Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification. Here we apply alignment metrics and machine-learning-derived sequence information to filter spurious splice junctions from long-read alignments and use the remaining junctions to guide realignment in a two-pass approach. This method, available in the software package 2passtools ( https://github.com/bartongroup/2passtools ), improves the accuracy of spliced alignment and transcriptome assembly for species both with and without existing high-quality annotations.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"72"},"PeriodicalIF":12.3,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02296-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25418311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Re-evaluating experimental validation in the Big Data Era: a conceptual argument. 重新评估大数据时代的实验验证:一个概念上的争论。
IF 12.3 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2021-02-24 DOI: 10.1186/s13059-021-02292-4
Mohieddin Jafari, Yuanfang Guan, David C Wedge, Naser Ansari-Pour
{"title":"Re-evaluating experimental validation in the Big Data Era: a conceptual argument.","authors":"Mohieddin Jafari,&nbsp;Yuanfang Guan,&nbsp;David C Wedge,&nbsp;Naser Ansari-Pour","doi":"10.1186/s13059-021-02292-4","DOIUrl":"https://doi.org/10.1186/s13059-021-02292-4","url":null,"abstract":"","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"71"},"PeriodicalIF":12.3,"publicationDate":"2021-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02292-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25400441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
MEDALT: single-cell copy number lineage tracing enabling gene discovery. MEDALT:通过单细胞拷贝数系谱追踪发现基因。
IF 12.3 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2021-02-23 DOI: 10.1186/s13059-021-02291-5
Fang Wang, Qihan Wang, Vakul Mohanty, Shaoheng Liang, Jinzhuang Dou, Jincheng Han, Darlan Conterno Minussi, Ruli Gao, Li Ding, Nicholas Navin, Ken Chen

We present a Minimal Event Distance Aneuploidy Lineage Tree (MEDALT) algorithm that infers the evolution history of a cell population based on single-cell copy number (SCCN) profiles, and a statistical routine named lineage speciation analysis (LSA), whichty facilitates discovery of fitness-associated alterations and genes from SCCN lineage trees. MEDALT appears more accurate than phylogenetics approaches in reconstructing copy number lineage. From data from 20 triple-negative breast cancer patients, our approaches effectively prioritize genes that are essential for breast cancer cell fitness and predict patient survival, including those implicating convergent evolution.The source code of our study is available at https://github.com/KChen-lab/MEDALT .

我们提出了一种最小事件距离非整倍体世系树(MEDALT)算法,该算法可根据单细胞拷贝数(SCCN)特征推断细胞群的进化历史,还提出了一种名为世系分化分析(LSA)的统计例程,该例程有助于从SCCN世系树中发现与适配性相关的改变和基因。在重建拷贝数系谱方面,MEDALT似乎比系统发生学方法更准确。从 20 名三阴性乳腺癌患者的数据中,我们的方法有效地优先发现了对乳腺癌细胞适应性至关重要的基因,并预测了患者的生存期,包括那些与趋同进化有关的基因。我们研究的源代码可在 https://github.com/KChen-lab/MEDALT 上获取。
{"title":"MEDALT: single-cell copy number lineage tracing enabling gene discovery.","authors":"Fang Wang, Qihan Wang, Vakul Mohanty, Shaoheng Liang, Jinzhuang Dou, Jincheng Han, Darlan Conterno Minussi, Ruli Gao, Li Ding, Nicholas Navin, Ken Chen","doi":"10.1186/s13059-021-02291-5","DOIUrl":"10.1186/s13059-021-02291-5","url":null,"abstract":"<p><p>We present a Minimal Event Distance Aneuploidy Lineage Tree (MEDALT) algorithm that infers the evolution history of a cell population based on single-cell copy number (SCCN) profiles, and a statistical routine named lineage speciation analysis (LSA), whichty facilitates discovery of fitness-associated alterations and genes from SCCN lineage trees. MEDALT appears more accurate than phylogenetics approaches in reconstructing copy number lineage. From data from 20 triple-negative breast cancer patients, our approaches effectively prioritize genes that are essential for breast cancer cell fitness and predict patient survival, including those implicating convergent evolution.The source code of our study is available at https://github.com/KChen-lab/MEDALT .</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"70"},"PeriodicalIF":12.3,"publicationDate":"2021-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7901082/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25403623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Megabase-scale methylation phasing using nanopore long reads and NanoMethPhase. 使用纳米孔长读数和NanoMethPhase的百万级甲基化分相。
IF 12.3 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2021-02-22 DOI: 10.1186/s13059-021-02283-5
Vahid Akbari, Jean-Michel Garant, Kieran O'Neill, Pawan Pandoh, Richard Moore, Marco A Marra, Martin Hirst, Steven J M Jones

The ability of nanopore sequencing to simultaneously detect modified nucleotides while producing long reads makes it ideal for detecting and phasing allele-specific methylation. However, there is currently no complete software for detecting SNPs, phasing haplotypes, and mapping methylation to these from nanopore sequence data. Here, we present NanoMethPhase, a software tool to phase 5-methylcytosine from nanopore sequencing. We also present SNVoter, which can post-process nanopore SNV calls to improve accuracy in low coverage regions. Together, these tools can accurately detect allele-specific methylation genome-wide using nanopore sequence data with low coverage of about ten-fold redundancy.

纳米孔测序在产生长读段的同时检测修饰核苷酸的能力使其成为检测和分阶段等位基因特异性甲基化的理想选择。然而,目前还没有完整的软件来检测snp,分相单倍型,并从纳米孔序列数据中定位甲基化。在这里,我们提出NanoMethPhase,一个软件工具,从纳米孔测序相5-甲基胞嘧啶。我们还提出了SNVoter,它可以对纳米孔SNV呼叫进行后处理,以提高低覆盖区域的准确性。总之,这些工具可以使用纳米孔序列数据准确地检测等位基因特异性甲基化,并且具有大约10倍冗余的低覆盖率。
{"title":"Megabase-scale methylation phasing using nanopore long reads and NanoMethPhase.","authors":"Vahid Akbari,&nbsp;Jean-Michel Garant,&nbsp;Kieran O'Neill,&nbsp;Pawan Pandoh,&nbsp;Richard Moore,&nbsp;Marco A Marra,&nbsp;Martin Hirst,&nbsp;Steven J M Jones","doi":"10.1186/s13059-021-02283-5","DOIUrl":"https://doi.org/10.1186/s13059-021-02283-5","url":null,"abstract":"<p><p>The ability of nanopore sequencing to simultaneously detect modified nucleotides while producing long reads makes it ideal for detecting and phasing allele-specific methylation. However, there is currently no complete software for detecting SNPs, phasing haplotypes, and mapping methylation to these from nanopore sequence data. Here, we present NanoMethPhase, a software tool to phase 5-methylcytosine from nanopore sequencing. We also present SNVoter, which can post-process nanopore SNV calls to improve accuracy in low coverage regions. Together, these tools can accurately detect allele-specific methylation genome-wide using nanopore sequence data with low coverage of about ten-fold redundancy.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"68"},"PeriodicalIF":12.3,"publicationDate":"2021-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02283-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25395196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
scSorter: assigning cells to known cell types according to marker genes. scSorter:根据标记基因将细胞归入已知的细胞类型。
IF 12.3 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2021-02-22 DOI: 10.1186/s13059-021-02281-7
Hongyu Guo, Jun Li

On single-cell RNA-sequencing data, we consider the problem of assigning cells to known cell types, assuming that the identities of cell-type-specific marker genes are given but their exact expression levels are unavailable, that is, without using a reference dataset. Based on an observation that the expected over-expression of marker genes is often absent in a nonnegligible proportion of cells, we develop a method called scSorter. scSorter allows marker genes to express at a low level and borrows information from the expression of non-marker genes. On both simulated and real data, scSorter shows much higher power compared to existing methods.

在单细胞 RNA 序列数据上,我们考虑了将细胞分配到已知细胞类型的问题,假设细胞类型特异性标记基因的身份已经给出,但它们的确切表达水平不可用,也就是说,不使用参考数据集。我们观察到,在不可忽略的一部分细胞中,标记基因往往不存在预期的过度表达,基于这一观察,我们开发了一种名为 scSorter 的方法。在模拟数据和真实数据上,scSorter 都显示出比现有方法高得多的能力。
{"title":"scSorter: assigning cells to known cell types according to marker genes.","authors":"Hongyu Guo, Jun Li","doi":"10.1186/s13059-021-02281-7","DOIUrl":"10.1186/s13059-021-02281-7","url":null,"abstract":"<p><p>On single-cell RNA-sequencing data, we consider the problem of assigning cells to known cell types, assuming that the identities of cell-type-specific marker genes are given but their exact expression levels are unavailable, that is, without using a reference dataset. Based on an observation that the expected over-expression of marker genes is often absent in a nonnegligible proportion of cells, we develop a method called scSorter. scSorter allows marker genes to express at a low level and borrows information from the expression of non-marker genes. On both simulated and real data, scSorter shows much higher power compared to existing methods.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"69"},"PeriodicalIF":12.3,"publicationDate":"2021-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7898451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25395193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlsnRNA-seq: protoplasting-free full-length single-nucleus RNA profiling in plants. FlsnRNA-seq:植物原生质体无全长单核RNA分析。
IF 12.3 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2021-02-19 DOI: 10.1186/s13059-021-02288-0
Yanping Long, Zhijian Liu, Jinbu Jia, Weipeng Mo, Liang Fang, Dongdong Lu, Bo Liu, Hong Zhang, Wei Chen, Jixian Zhai

The broad application of single-cell RNA profiling in plants has been hindered by the prerequisite of protoplasting that requires digesting the cell walls from different types of plant tissues. Here, we present a protoplasting-free approach, flsnRNA-seq, for large-scale full-length RNA profiling at a single-nucleus level in plants using isolated nuclei. Combined with 10x Genomics and Nanopore long-read sequencing, we validate the robustness of this approach in Arabidopsis root cells and the developing endosperm. Sequencing results demonstrate that it allows for uncovering alternative splicing and polyadenylation-related RNA isoform information at the single-cell level, which facilitates characterizing cell identities.

单细胞RNA谱分析在植物中的广泛应用受到原生质体的先决条件的阻碍,原生质体需要消化来自不同类型植物组织的细胞壁。在这里,我们提出了一种无原生质体的方法,flsnRNA-seq,在植物的单核水平上使用分离的细胞核进行大规模的全长RNA分析。结合10x基因组学和纳米孔长读测序,我们验证了该方法在拟南芥根细胞和发育中的胚乳中的稳健性。测序结果表明,它允许在单细胞水平上发现选择性剪接和聚腺苷化相关的RNA异构体信息,这有助于表征细胞身份。
{"title":"FlsnRNA-seq: protoplasting-free full-length single-nucleus RNA profiling in plants.","authors":"Yanping Long, Zhijian Liu, Jinbu Jia, Weipeng Mo, Liang Fang, Dongdong Lu, Bo Liu, Hong Zhang, Wei Chen, Jixian Zhai","doi":"10.1186/s13059-021-02288-0","DOIUrl":"10.1186/s13059-021-02288-0","url":null,"abstract":"<p><p>The broad application of single-cell RNA profiling in plants has been hindered by the prerequisite of protoplasting that requires digesting the cell walls from different types of plant tissues. Here, we present a protoplasting-free approach, flsnRNA-seq, for large-scale full-length RNA profiling at a single-nucleus level in plants using isolated nuclei. Combined with 10x Genomics and Nanopore long-read sequencing, we validate the robustness of this approach in Arabidopsis root cells and the developing endosperm. Sequencing results demonstrate that it allows for uncovering alternative splicing and polyadenylation-related RNA isoform information at the single-cell level, which facilitates characterizing cell identities.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"66"},"PeriodicalIF":12.3,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02288-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25386338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
ReSeq simulates realistic Illumina high-throughput sequencing data. ReSeq模拟真实的Illumina高通量测序数据。
IF 12.3 1区 生物学 Q1 Agricultural and Biological Sciences Pub Date : 2021-02-19 DOI: 10.1186/s13059-021-02265-7
Stephan Schmeing, Mark D Robinson

In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions at each step of a project. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, which leads to optimistic results for most tools. ReSeq improves the authenticity of synthetic data by extracting and reproducing key components from real data. Major advancements are the inclusion of systematic errors, a fragment-based coverage model and sampling-matrix estimates based on two-dimensional margins. These improvements lead to more faithful performance evaluations. ReSeq is available at https://github.com/schmeing/ReSeq .

在高通量测序数据中,计算工具之间的性能比较对于在项目的每个步骤中做出明智的决策至关重要。模拟是方法比较的关键部分,但对于基因组DNA的标准Illumina测序,它们通常过于简化,这导致大多数工具的结果都很乐观。ReSeq通过从真实数据中提取和再现关键组件来提高合成数据的真实性。主要的进步是包含了系统误差、基于片段的覆盖模型和基于二维边缘的抽样矩阵估计。这些改进导致更可靠的性能评估。ReSeq可在https://github.com/schmeing/ReSeq上获得。
{"title":"ReSeq simulates realistic Illumina high-throughput sequencing data.","authors":"Stephan Schmeing,&nbsp;Mark D Robinson","doi":"10.1186/s13059-021-02265-7","DOIUrl":"https://doi.org/10.1186/s13059-021-02265-7","url":null,"abstract":"<p><p>In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions at each step of a project. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, which leads to optimistic results for most tools. ReSeq improves the authenticity of synthetic data by extracting and reproducing key components from real data. Major advancements are the inclusion of systematic errors, a fragment-based coverage model and sampling-matrix estimates based on two-dimensional margins. These improvements lead to more faithful performance evaluations. ReSeq is available at https://github.com/schmeing/ReSeq .</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"67"},"PeriodicalIF":12.3,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02265-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25386439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Genome Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1