首页 > 最新文献

NAR Genomics and Bioinformatics最新文献

英文 中文
Tumor purity estimated from bulk DNA methylation can be used for adjusting beta values of individual samples to better reflect tumor biology. 根据大量 DNA 甲基化估计的肿瘤纯度可用于调整单个样本的 beta 值,以更好地反映肿瘤生物学特性。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2024-11-04 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae146
Iñaki Sasiain, Deborah F Nacer, Mattias Aine, Srinivas Veerla, Johan Staaf

Epigenetic deregulation through altered DNA methylation is a fundamental feature of tumorigenesis, but tumor data from bulk tissue samples contain different proportions of malignant and non-malignant cells that may confound the interpretation of DNA methylation values. The adjustment of DNA methylation data based on tumor purity has been proposed to render both genome-wide and gene-specific analyses more precise, but it requires sample purity estimates. Here we present PureBeta, a single-sample statistical framework that uses genome-wide DNA methylation data to first estimate sample purity and then adjust methylation values of individual CpGs to correct for sample impurity. Purity values estimated with the algorithm have high correlation (>0.8) to reference values obtained from DNA sequencing when applied to samples from breast carcinoma, lung adenocarcinoma, and lung squamous cell carcinoma. Methylation beta values adjusted based on purity estimates have a more binary distribution that better reflects theoretical methylation states, thus facilitating improved biological inference as shown for BRCA1 in breast cancer. PureBeta is a versatile tool that can be used for different Illumina DNA methylation arrays and can be applied to individual samples of different cancer types to enhance biological interpretability of methylation data.

通过改变 DNA 甲基化实现表观遗传学失调是肿瘤发生的一个基本特征,但来自大量组织样本的肿瘤数据包含不同比例的恶性和非恶性细胞,这可能会混淆 DNA 甲基化值的解释。有人提出根据肿瘤纯度调整 DNA 甲基化数据,使全基因组和基因特异性分析更加精确,但这需要对样本纯度进行估计。在这里,我们介绍一种单样本统计框架 PureBeta,它使用全基因组 DNA 甲基化数据首先估算样本纯度,然后调整单个 CpGs 的甲基化值以校正样本不纯度。在应用于乳腺癌、肺腺癌和肺鳞癌样本时,用该算法估算的纯度值与 DNA 测序获得的参考值具有很高的相关性(>0.8)。根据纯度估计值调整的甲基化贝塔值具有更二元的分布,能更好地反映理论上的甲基化状态,从而有助于改进生物学推断,如乳腺癌中 BRCA1 的情况所示。PureBeta 是一种多功能工具,可用于不同的 Illumina DNA 甲基化阵列,并可应用于不同癌症类型的个体样本,以提高甲基化数据的生物学可解释性。
{"title":"Tumor purity estimated from bulk DNA methylation can be used for adjusting beta values of individual samples to better reflect tumor biology.","authors":"Iñaki Sasiain, Deborah F Nacer, Mattias Aine, Srinivas Veerla, Johan Staaf","doi":"10.1093/nargab/lqae146","DOIUrl":"10.1093/nargab/lqae146","url":null,"abstract":"<p><p>Epigenetic deregulation through altered DNA methylation is a fundamental feature of tumorigenesis, but tumor data from bulk tissue samples contain different proportions of malignant and non-malignant cells that may confound the interpretation of DNA methylation values. The adjustment of DNA methylation data based on tumor purity has been proposed to render both genome-wide and gene-specific analyses more precise, but it requires sample purity estimates. Here we present PureBeta, a single-sample statistical framework that uses genome-wide DNA methylation data to first estimate sample purity and then adjust methylation values of individual CpGs to correct for sample impurity. Purity values estimated with the algorithm have high correlation (>0.8) to reference values obtained from DNA sequencing when applied to samples from breast carcinoma, lung adenocarcinoma, and lung squamous cell carcinoma. Methylation beta values adjusted based on purity estimates have a more binary distribution that better reflects theoretical methylation states, thus facilitating improved biological inference as shown for <i>BRCA1</i> in breast cancer. PureBeta is a versatile tool that can be used for different Illumina DNA methylation arrays and can be applied to individual samples of different cancer types to enhance biological interpretability of methylation data.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae146"},"PeriodicalIF":4.0,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11532792/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142577055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intronic RNA secondary structural information captured for the human MYC pre-mRNA. 捕捉到的人类 MYC 前核糖核酸内部二级结构信息。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2024-10-24 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae143
Taylor O Eich, Collin A O'Leary, Walter N Moss

To address the lack of intronic reads in secondary structure probing data for the human MYC pre-mRNA, we developed a method that combines spliceosomal inhibition with RNA probing and sequencing. Here, the SIRP-seq method was applied to study the secondary structure of human MYC RNAs by chemically probing HeLa cells with dimethyl sulfate in the presence of the small molecule spliceosome inhibitor pladienolide B. Pladienolide B binds to the SF3B complex of the spliceosome to inhibit intron removal during splicing, resulting in retained intronic sequences. This method was used to increase the read coverage over intronic regions of MYC. The purpose for increasing coverage across introns was to generate complete reactivity profiles for intronic sequences via the DMS-MaPseq approach. Notably, depth was sufficient for analysis by the program DRACO, which was able to deduce distinct reactivity profiles and predict multiple secondary structural conformations as well as their suggested stoichiometric abundances. The results presented here provide a new method for intronic RNA secondary structural analyses, as well as specific structural insights relevant to MYC RNA splicing regulation and therapeutic targeting.

为了解决人类 MYC pre-mRNA 二级结构探测数据中缺乏内含子读数的问题,我们开发了一种将剪接体抑制与 RNA 探测和测序相结合的方法。Pladienolide B 与剪接体的 SF3B 复合物结合,抑制剪接过程中的内含子去除,从而保留了内含子序列。这种方法用于提高 MYC 内含子区域的读数覆盖率。提高内含子覆盖率的目的是通过 DMS-MaPseq 方法生成内含子序列的完整反应谱。值得注意的是,DRACO 程序的深度足以进行分析,该程序能够推导出不同的反应性曲线,并预测多种二级结构构象及其建议的化学丰度。本文介绍的结果为内含子 RNA 二级结构分析提供了一种新方法,也为 MYC RNA 剪接调控和靶向治疗提供了特定的结构见解。
{"title":"Intronic RNA secondary structural information captured for the human <i>MYC</i> pre-mRNA.","authors":"Taylor O Eich, Collin A O'Leary, Walter N Moss","doi":"10.1093/nargab/lqae143","DOIUrl":"10.1093/nargab/lqae143","url":null,"abstract":"<p><p>To address the lack of intronic reads in secondary structure probing data for the human <i>MYC</i> pre-mRNA, we developed a method that combines spliceosomal inhibition with RNA probing and sequencing. Here, the SIRP-seq method was applied to study the secondary structure of human <i>MYC</i> RNAs by chemically probing HeLa cells with dimethyl sulfate in the presence of the small molecule spliceosome inhibitor pladienolide B. Pladienolide B binds to the SF3B complex of the spliceosome to inhibit intron removal during splicing, resulting in retained intronic sequences. This method was used to increase the read coverage over intronic regions of <i>MYC</i>. The purpose for increasing coverage across introns was to generate complete reactivity profiles for intronic sequences via the DMS-MaPseq approach. Notably, depth was sufficient for analysis by the program DRACO, which was able to deduce distinct reactivity profiles and predict multiple secondary structural conformations as well as their suggested stoichiometric abundances. The results presented here provide a new method for intronic RNA secondary structural analyses, as well as specific structural insights relevant to <i>MYC</i> RNA splicing regulation and therapeutic targeting.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae143"},"PeriodicalIF":4.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142509478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The next-generation sequencing-chess problem. 下一代测序-国际象棋问题。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2024-10-24 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae144
Leo Zeitler, Arach Goldar, Cyril Denby Wilkes, Julie Soutourina

The development of next-generation sequencing (NGS) technologies paved the way for studying the spatiotemporal coordination of cellular processes along the genome. However, data sets are commonly limited to a few time points, and missing information needs to be interpolated. Most models assume that the studied dynamics are similar between individual cells, so that a homogeneous cell culture can be represented by a population-wide average. Here, we demonstrate that this understanding can be inappropriate. We developed a thought experiment-which we call the NGS chess problem-in which we compare the temporal sequencing data analysis to observing a superimposed picture of many independent games of chess at a time. The analysis of the spatiotemporal kinetics advocates for a new methodology that considers DNA-particle interactions in each cell independently even for a homogeneous cell population.

新一代测序(NGS)技术的发展为沿基因组研究细胞过程的时空协调铺平了道路。然而,数据集通常仅限于几个时间点,缺失的信息需要内插。大多数模型假定单个细胞之间的研究动态相似,因此同种细胞培养可以用全群体平均值来表示。在这里,我们证明了这种理解可能是不恰当的。我们开发了一个思想实验--我们称之为 NGS 国际象棋问题--将时序测序数据分析比作同时观察多盘独立国际象棋的叠加画面。对时空动力学的分析主张采用一种新方法,即使是在同质细胞群中,也要独立考虑每个细胞中 DNA 粒子之间的相互作用。
{"title":"The next-generation sequencing-chess problem.","authors":"Leo Zeitler, Arach Goldar, Cyril Denby Wilkes, Julie Soutourina","doi":"10.1093/nargab/lqae144","DOIUrl":"https://doi.org/10.1093/nargab/lqae144","url":null,"abstract":"<p><p>The development of next-generation sequencing (NGS) technologies paved the way for studying the spatiotemporal coordination of cellular processes along the genome. However, data sets are commonly limited to a few time points, and missing information needs to be interpolated. Most models assume that the studied dynamics are similar between individual cells, so that a homogeneous cell culture can be represented by a population-wide average. Here, we demonstrate that this understanding can be inappropriate. We developed a thought experiment-which we call the NGS chess problem-in which we compare the temporal sequencing data analysis to observing a superimposed picture of many independent games of chess at a time. The analysis of the spatiotemporal kinetics advocates for a new methodology that considers DNA-particle interactions in each cell independently even for a homogeneous cell population.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae144"},"PeriodicalIF":4.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500447/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142509479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graphite: painting genomes using a colored de Bruijn graph. 石墨:使用彩色德布鲁因图绘制基因组。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2024-10-23 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae142
Rick Beeloo, Aldert L Zomer, Sebastian Deorowicz, Bas E Dutilh

The recent growth of microbial sequence data allows comparisons at unprecedented scales, enabling the tracking of strains, mobile genetic elements, or genes. Querying a genome against a large reference database can easily yield thousands of matches that are tedious to interpret and pose computational challenges. We developed Graphite that uses a colored de Bruijn graph (cDBG) to paint query genomes, selecting the local best matches along the full query length. By focusing on the best genomic match of each query region, Graphite reduces the number of matches while providing the most promising leads for sequence tracking or genomic forensics. When applied to hundreds of Campylobacter genomes we found extensive gene sharing, including a previously undetected C. coli plasmid that matched a C. jejuni chromosome. Together, genome painting using cDBGs as enabled by Graphite, can reveal new biological phenomena by mitigating computational hurdles.

近来微生物序列数据的增长使我们能够以前所未有的规模进行比较,从而追踪菌株、移动遗传因子或基因。根据大型参考数据库查询基因组很容易获得成千上万的匹配结果,而这些匹配结果的解读非常繁琐,并给计算带来了挑战。我们开发的 Graphite 使用彩色 de Bruijn 图(cDBG)来绘制查询基因组,沿着整个查询长度选择局部最佳匹配。通过关注每个查询区域的最佳基因组匹配,Graphite 减少了匹配的数量,同时为序列追踪或基因组取证提供了最有希望的线索。当应用于数百个弯曲杆菌基因组时,我们发现了广泛的基因共享,包括以前未检测到的与空肠弯曲杆菌染色体匹配的大肠杆菌质粒。总之,利用石墨实现的 cDBGs 进行基因组绘制,可以通过减少计算障碍来揭示新的生物现象。
{"title":"Graphite: painting genomes using a colored de Bruijn graph.","authors":"Rick Beeloo, Aldert L Zomer, Sebastian Deorowicz, Bas E Dutilh","doi":"10.1093/nargab/lqae142","DOIUrl":"https://doi.org/10.1093/nargab/lqae142","url":null,"abstract":"<p><p>The recent growth of microbial sequence data allows comparisons at unprecedented scales, enabling the tracking of strains, mobile genetic elements, or genes. Querying a genome against a large reference database can easily yield thousands of matches that are tedious to interpret and pose computational challenges. We developed Graphite that uses a colored de Bruijn graph (cDBG) to paint query genomes, selecting the local best matches along the full query length. By focusing on the best genomic match of each query region, Graphite reduces the number of matches while providing the most promising leads for sequence tracking or genomic forensics. When applied to hundreds of <i>Campylobacter</i> genomes we found extensive gene sharing, including a previously undetected <i>C. coli</i> plasmid that matched a <i>C. jejuni</i> chromosome. Together, genome painting using cDBGs as enabled by Graphite, can reveal new biological phenomena by mitigating computational hurdles.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae142"},"PeriodicalIF":4.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11497850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142509477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MoNETA: MultiOmics Network Embedding for SubType Analysis. MoNETA:用于子类型分析的多声学网络嵌入。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2024-10-16 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae141
Giovanni Scala, Luigi Ferraro, Aurora Brandi, Yan Guo, Barbara Majello, Michele Ceccarelli

Cells are complex systems whose behavior emerges from a huge number of reactions taking place within and among different molecular districts. The availability of bulk and single-cell omics data fueled the creation of multi-omics systems biology models capturing the dynamics within and between omics layers. Powerful modeling strategies are needed to cope with the increased amount of data to be interrogated and the relative research questions. Here, we present MultiOmics Network Embedding for SubType Analysis (MoNETA) for fast and scalable identification of relevant multi-omics relationships between biological entities at the bulk and single-cells level. We apply MoNETA to show how glioma subtypes previously described naturally emerge with our approach. We also show how MoNETA can be used to identify cell types in five multi-omic single-cell datasets.

细胞是一个复杂的系统,其行为源于不同分子区内部和之间发生的大量反应。大量和单细胞组学数据的可用性推动了捕捉组学层内部和之间动态的多组学系统生物学模型的建立。我们需要强大的建模策略来应对日益增多的待查询数据量和相关研究问题。在这里,我们提出了用于子类型分析的多组学网络嵌入(MoNETA),用于快速、可扩展地识别大块和单细胞水平生物实体之间的相关多组学关系。我们应用 MoNETA 展示了之前描述的胶质瘤亚型是如何通过我们的方法自然出现的。我们还展示了如何利用 MoNETA 在五个多组学单细胞数据集中识别细胞类型。
{"title":"MoNETA: MultiOmics Network Embedding for SubType Analysis.","authors":"Giovanni Scala, Luigi Ferraro, Aurora Brandi, Yan Guo, Barbara Majello, Michele Ceccarelli","doi":"10.1093/nargab/lqae141","DOIUrl":"10.1093/nargab/lqae141","url":null,"abstract":"<p><p>Cells are complex systems whose behavior emerges from a huge number of reactions taking place within and among different molecular districts. The availability of bulk and single-cell omics data fueled the creation of multi-omics systems biology models capturing the dynamics within and between omics layers. Powerful modeling strategies are needed to cope with the increased amount of data to be interrogated and the relative research questions. Here, we present MultiOmics Network Embedding for SubType Analysis (MoNETA) for fast and scalable identification of relevant multi-omics relationships between biological entities at the bulk and single-cells level. We apply MoNETA to show how glioma subtypes previously described naturally emerge with our approach. We also show how MoNETA can be used to identify cell types in five multi-omic single-cell datasets.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae141"},"PeriodicalIF":4.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11482636/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142476446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
bcRflow: a Nextflow pipeline for characterizing B cell receptor repertoires from non-targeted transcriptomic data. bcRflow:从非靶向转录组数据表征 B 细胞受体谱系的 Nextflow 管道。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2024-10-15 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae137
Brent T Schlegel, Michael Morikone, Fangping Mu, Wan-Yee Tang, Gary Kohanbash, Dhivyaa Rajasundaram

B cells play a critical role in the adaptive recognition of foreign antigens through diverse receptor generation. While targeted immune sequencing methods are commonly used to profile B cell receptors (BCRs), they have limitations in cost and tissue availability. Analyzing B cell receptor profiling from non-targeted transcriptomics data is a promising alternative, but a systematic pipeline integrating tools for accurate immune repertoire extraction is lacking. Here, we present bcRflow, a Nextflow pipeline designed to characterize BCR repertoires from non-targeted transcriptomics data, with functional modules for alignment, processing, and visualization. bcRflow is a comprehensive, reproducible, and scalable pipeline that can run on high-performance computing clusters, cloud-based computing resources like Amazon Web Services (AWS), the Open OnDemand framework, or even local desktops. bcRflow utilizes institutional configurations provided by nf-core to ensure maximum portability and accessibility. To demonstrate the functionality of the bcRflow pipeline, we analyzed a public dataset of bulk transcriptomic samples from COVID-19 patients and healthy controls. We have shown that bcRflow streamlines the analysis of BCR repertoires from non-targeted transcriptomics data, providing valuable insights into the B cell immune response for biological and clinical research. bcRflow is available at https://github.com/Bioinformatics-Core-at-Childrens/bcRflow.

B 细胞通过产生不同的受体,在对外来抗原的适应性识别中发挥着关键作用。虽然靶向免疫测序方法常用于分析 B 细胞受体(BCR),但它们在成本和组织可用性方面存在局限性。从非靶向转录组学数据中分析 B 细胞受体图谱是一种很有前景的替代方法,但目前还缺乏一种整合了精确提取免疫基因组工具的系统管道。bcRflow 是一个全面、可重现、可扩展的管道,可以运行在高性能计算集群、亚马逊网络服务(AWS)等云计算资源、Open OnDemand 框架甚至本地台式机上。为了展示 bcRflow 管道的功能,我们分析了来自 COVID-19 患者和健康对照的批量转录组样本的公共数据集。我们的研究表明,bcRflow 简化了对非靶向转录组学数据中 BCR 重排的分析,为生物和临床研究提供了有关 B 细胞免疫反应的宝贵见解。bcRflow 可在 https://github.com/Bioinformatics-Core-at-Childrens/bcRflow 上查阅。
{"title":"bcRflow: a Nextflow pipeline for characterizing B cell receptor repertoires from non-targeted transcriptomic data.","authors":"Brent T Schlegel, Michael Morikone, Fangping Mu, Wan-Yee Tang, Gary Kohanbash, Dhivyaa Rajasundaram","doi":"10.1093/nargab/lqae137","DOIUrl":"10.1093/nargab/lqae137","url":null,"abstract":"<p><p>B cells play a critical role in the adaptive recognition of foreign antigens through diverse receptor generation. While targeted immune sequencing methods are commonly used to profile B cell receptors (BCRs), they have limitations in cost and tissue availability. Analyzing B cell receptor profiling from non-targeted transcriptomics data is a promising alternative, but a systematic pipeline integrating tools for accurate immune repertoire extraction is lacking. Here, we present bcRflow, a Nextflow pipeline designed to characterize BCR repertoires from non-targeted transcriptomics data, with functional modules for alignment, processing, and visualization. bcRflow is a comprehensive, reproducible, and scalable pipeline that can run on high-performance computing clusters, cloud-based computing resources like Amazon Web Services (AWS), the Open OnDemand framework, or even local desktops. bcRflow utilizes institutional configurations provided by nf-core to ensure maximum portability and accessibility. To demonstrate the functionality of the bcRflow pipeline, we analyzed a public dataset of bulk transcriptomic samples from COVID-19 patients and healthy controls. We have shown that bcRflow streamlines the analysis of BCR repertoires from non-targeted transcriptomics data, providing valuable insights into the B cell immune response for biological and clinical research. bcRflow is available at https://github.com/Bioinformatics-Core-at-Childrens/bcRflow.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae137"},"PeriodicalIF":4.0,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11474772/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142476445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic configuration and data security for bioinformatics cloud services with the Laniakea Dashboard. 利用 Laniakea Dashboard 实现生物信息学云服务的动态配置和数据安全。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2024-10-10 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae140
Marco Antonio Tangaro, Marica Antonacci, Giacinto Donvito, Nadina Foggetti, Pietro Mandreoli, Daniele Colombo, Graziano Pesole, Federico Zambelli

Technological advances in high-throughput technologies improve our ability to explore the molecular mechanisms of life. Computational infrastructures for scientific applications fulfil a critical role in harnessing this potential. However, there is an ongoing need to improve accessibility and implement robust data security technologies to allow the processing of sensitive data, particularly human genetic data. Scientific clouds have emerged as a promising solution to meet these needs. We present three components of the Laniakea software stack, initially developed to support the provision of private on-demand Galaxy instances. These components can be adopted by providers of scientific cloud services built on the INDIGO PaaS layer. The Dashboard translates configuration template files into user-friendly web interfaces, enabling the easy configuration and launch of on-demand applications. The secret management and the encryption components, integrated within the Dashboard, support the secure handling of passphrases and credentials and the deployment of block-level encrypted storage volumes for managing sensitive data in the cloud environment. By adopting these software components, scientific cloud providers can develop convenient, secure and efficient on-demand services for their users.

高通量技术的进步提高了我们探索生命分子机制的能力。用于科学应用的计算基础设施在利用这一潜力方面发挥了关键作用。然而,我们一直需要提高数据的可访问性,并实施强大的数据安全技术,以便能够处理敏感数据,尤其是人类基因数据。科学云已经成为满足这些需求的一种有前途的解决方案。我们介绍了 Laniakea 软件栈的三个组件,这些组件最初是为支持按需提供私人银河实例而开发的。建立在 INDIGO PaaS 层上的科学云服务提供商可以采用这些组件。控制面板可将配置模板文件转化为用户友好的网络界面,从而轻松配置和启动按需应用程序。集成在控制面板中的密文管理和加密组件支持安全处理口令和凭证,并支持部署块级加密存储卷,以便在云环境中管理敏感数据。通过采用这些软件组件,科学云提供商可以为用户开发便捷、安全和高效的按需服务。
{"title":"Dynamic configuration and data security for bioinformatics cloud services with the Laniakea Dashboard.","authors":"Marco Antonio Tangaro, Marica Antonacci, Giacinto Donvito, Nadina Foggetti, Pietro Mandreoli, Daniele Colombo, Graziano Pesole, Federico Zambelli","doi":"10.1093/nargab/lqae140","DOIUrl":"10.1093/nargab/lqae140","url":null,"abstract":"<p><p>Technological advances in high-throughput technologies improve our ability to explore the molecular mechanisms of life. Computational infrastructures for scientific applications fulfil a critical role in harnessing this potential. However, there is an ongoing need to improve accessibility and implement robust data security technologies to allow the processing of sensitive data, particularly human genetic data. Scientific clouds have emerged as a promising solution to meet these needs. We present three components of the Laniakea software stack, initially developed to support the provision of private on-demand Galaxy instances. These components can be adopted by providers of scientific cloud services built on the INDIGO PaaS layer. The <i>Dashboard</i> translates configuration template files into user-friendly web interfaces, enabling the easy configuration and launch of on-demand applications. The <i>secret management</i> and the <i>encryption</i> components, integrated within the Dashboard, support the secure handling of passphrases and credentials and the deployment of block-level encrypted storage volumes for managing sensitive data in the cloud environment. By adopting these software components, scientific cloud providers can develop convenient, secure and efficient on-demand services for their users.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae140"},"PeriodicalIF":4.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11464921/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142401507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scATAcat: cell-type annotation for scATAC-seq data. scATAcat:用于 scATAC-seq 数据的细胞类型注释。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2024-10-08 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae135
Aybuge Altay, Martin Vingron

Cells whose accessibility landscape has been profiled with scATAC-seq cannot readily be annotated to a particular cell type. In fact, annotating cell-types in scATAC-seq data is a challenging task since, unlike in scRNA-seq data, we lack knowledge of 'marker regions' which could be used for cell-type annotation. Current annotation methods typically translate accessibility to expression space and rely on gene expression patterns. We propose a novel approach, scATAcat, that leverages characterized bulk ATAC-seq data as prototypes to annotate scATAC-seq data. To mitigate the inherent sparsity of single-cell data, we aggregate cells that belong to the same cluster and create pseudobulk. To demonstrate the feasibility of our approach we collected a number of datasets with respective annotations to quantify the results and evaluate performance for scATAcat. scATAcat is available as a python package at https://github.com/aybugealtay/scATAcat.

用 scATAC-seq 分析了细胞的可及性图谱的细胞不能轻易地被注释为特定的细胞类型。事实上,在 scATAC-seq 数据中注释细胞类型是一项具有挑战性的任务,因为与 scRNA-seq 数据不同,我们缺乏可用于细胞类型注释的 "标记区 "知识。目前的注释方法通常将可及性转化为表达空间,并依赖于基因表达模式。我们提出了一种新方法--scATAcat,它利用特征化的大容量 ATAC-seq 数据作为原型来注释 scATAC-seq 数据。为了减轻单细胞数据固有的稀疏性,我们将属于同一群组的细胞聚合在一起,创建伪群组。为了证明我们方法的可行性,我们收集了一些数据集,并分别进行了注释,以量化结果并评估 scATAcat 的性能。scATAcat 是一个 python 软件包,可在 https://github.com/aybugealtay/scATAcat 上下载。
{"title":"scATAcat: cell-type annotation for scATAC-seq data.","authors":"Aybuge Altay, Martin Vingron","doi":"10.1093/nargab/lqae135","DOIUrl":"https://doi.org/10.1093/nargab/lqae135","url":null,"abstract":"<p><p>Cells whose accessibility landscape has been profiled with scATAC-seq cannot readily be annotated to a particular cell type. In fact, annotating cell-types in scATAC-seq data is a challenging task since, unlike in scRNA-seq data, we lack knowledge of 'marker regions' which could be used for cell-type annotation. Current annotation methods typically translate accessibility to expression space and rely on gene expression patterns. We propose a novel approach, scATAcat, that leverages characterized bulk ATAC-seq data as prototypes to annotate scATAC-seq data. To mitigate the inherent sparsity of single-cell data, we aggregate cells that belong to the same cluster and create pseudobulk. To demonstrate the feasibility of our approach we collected a number of datasets with respective annotations to quantify the results and evaluate performance for scATAcat. scATAcat is available as a python package at https://github.com/aybugealtay/scATAcat.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae135"},"PeriodicalIF":4.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11459382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TrajectoryGeometry suggests cell fate decisions can involve branches rather than bifurcations. 轨迹几何表明,细胞命运的决定可能涉及分支而非分叉。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2024-10-08 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae139
Anna Laddach, Vassilis Pachnis, Michael Shapiro

Differentiation of multipotential progenitor cells is a key process in the development of any multi-cellular organism and often continues throughout its life. It is often assumed that a bi-potential progenitor develops along a (relatively) straight trajectory until it reaches a decision point where the trajectory bifurcates. At this point one of two directions is chosen, each direction representing the unfolding of a new transcriptional programme. However, we have lacked quantitative means for testing this model. Accordingly, we have developed the R package TrajectoryGeometry. Applying this to published data we find several examples where, rather than bifurcate, developmental pathways branch. That is, the bipotential progenitor develops along a relatively straight trajectory leading to one of its potential fates. A second relatively straight trajectory branches off from this towards the other potential fate. In this sense only cells that branch off to follow the second trajectory make a 'decision'. Our methods give precise descriptions of the genes and cellular pathways involved in these trajectories. We speculate that branching may be the more common behaviour and may have advantages from a control-theoretic viewpoint.

多潜能祖细胞的分化是任何多细胞生物体发育过程中的一个关键过程,而且往往贯穿整个生命过程。通常假定,双潜能祖细胞沿着(相对)直线轨迹发育,直到到达轨迹分叉的决定点。在这一点上,会选择两个方向中的一个,每个方向都代表一个新的转录程序的展开。然而,我们缺乏对这一模型进行定量测试的方法。因此,我们开发了 R 软件包 TrajectoryGeometry。将其应用到已发表的数据中,我们发现了几个例子,在这些例子中,发育途径不是分叉,而是分支。也就是说,双潜能原基沿着一条相对笔直的轨迹发展,最终走向其中一种可能的命运。另一条相对笔直的轨迹则从中分支,通向另一种潜在命运。从这个意义上说,只有沿着第二条轨迹分支的细胞才会做出 "决定"。我们的方法精确描述了这些轨迹所涉及的基因和细胞通路。我们推测,分支可能是更常见的行为,从控制理论的角度来看可能具有优势。
{"title":"TrajectoryGeometry suggests cell fate decisions can involve branches rather than bifurcations.","authors":"Anna Laddach, Vassilis Pachnis, Michael Shapiro","doi":"10.1093/nargab/lqae139","DOIUrl":"10.1093/nargab/lqae139","url":null,"abstract":"<p><p>Differentiation of multipotential progenitor cells is a key process in the development of any multi-cellular organism and often continues throughout its life. It is often assumed that a bi-potential progenitor develops along a (relatively) straight trajectory until it reaches a decision point where the trajectory bifurcates. At this point one of two directions is chosen, each direction representing the unfolding of a new transcriptional programme. However, we have lacked quantitative means for testing this model. Accordingly, we have developed the R package TrajectoryGeometry. Applying this to published data we find several examples where, rather than bifurcate, developmental pathways <i>branch</i>. That is, the bipotential progenitor develops along a relatively straight trajectory leading to one of its potential fates. A second relatively straight trajectory branches off from this towards the other potential fate. In this sense only cells that branch off to follow the second trajectory make a 'decision'. Our methods give precise descriptions of the genes and cellular pathways involved in these trajectories. We speculate that branching may be the more common behaviour and may have advantages from a control-theoretic viewpoint.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae139"},"PeriodicalIF":4.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11459380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142393890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome wide clustering on integrated chromatin states and Micro-C contacts reveals chromatin interaction signatures. 整合染色质状态和 Micro-C 接触的全基因组聚类揭示了染色质相互作用特征。
IF 4 Q1 GENETICS & HEREDITY Pub Date : 2024-10-03 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae136
Corinne E Sexton, Sylvia Victor Paul, Dylan Barth, Mira V Han

We can now analyze 3D physical interactions of chromatin regions with chromatin conformation capture technologies, in addition to the 1D chromatin state annotations, but methods to integrate this information are lacking. We propose a method to integrate the chromatin state of interacting regions into a vector representation through the contact-weighted sum of chromatin states. Unsupervised clustering on integrated chromatin states and Micro-C contacts reveals common patterns of chromatin interaction signatures. This provides an integrated view of the complex dynamics of concurrent change occurring in chromatin state and in chromatin interaction, adding another layer of annotation beyond chromatin state or Hi-C contact separately.

除了一维染色质状态注释外,我们现在还能利用染色质构象捕获技术分析染色质区域的三维物理相互作用,但缺乏整合这些信息的方法。我们提出了一种方法,通过染色质状态的接触加权和,将相互作用区域的染色质状态整合到一个向量表示中。对整合后的染色质状态和 Micro-C 接触进行无监督聚类,可以揭示染色质相互作用特征的共同模式。这为染色质状态和染色质相互作用同时发生的复杂动态变化提供了一个综合视图,在染色质状态或 Hi-C 接触之外增加了另一层注释。
{"title":"Genome wide clustering on integrated chromatin states and Micro-C contacts reveals chromatin interaction signatures.","authors":"Corinne E Sexton, Sylvia Victor Paul, Dylan Barth, Mira V Han","doi":"10.1093/nargab/lqae136","DOIUrl":"10.1093/nargab/lqae136","url":null,"abstract":"<p><p>We can now analyze 3D physical interactions of chromatin regions with chromatin conformation capture technologies, in addition to the 1D chromatin state annotations, but methods to integrate this information are lacking. We propose a method to integrate the chromatin state of interacting regions into a vector representation through the contact-weighted sum of chromatin states. Unsupervised clustering on integrated chromatin states and Micro-C contacts reveals common patterns of chromatin interaction signatures. This provides an integrated view of the complex dynamics of concurrent change occurring in chromatin state and in chromatin interaction, adding another layer of annotation beyond chromatin state or Hi-C contact separately.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae136"},"PeriodicalIF":4.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447530/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
NAR Genomics and Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1