Pub Date : 2024-11-04eCollection Date: 2024-09-01DOI: 10.1093/nargab/lqae146
Iñaki Sasiain, Deborah F Nacer, Mattias Aine, Srinivas Veerla, Johan Staaf
Epigenetic deregulation through altered DNA methylation is a fundamental feature of tumorigenesis, but tumor data from bulk tissue samples contain different proportions of malignant and non-malignant cells that may confound the interpretation of DNA methylation values. The adjustment of DNA methylation data based on tumor purity has been proposed to render both genome-wide and gene-specific analyses more precise, but it requires sample purity estimates. Here we present PureBeta, a single-sample statistical framework that uses genome-wide DNA methylation data to first estimate sample purity and then adjust methylation values of individual CpGs to correct for sample impurity. Purity values estimated with the algorithm have high correlation (>0.8) to reference values obtained from DNA sequencing when applied to samples from breast carcinoma, lung adenocarcinoma, and lung squamous cell carcinoma. Methylation beta values adjusted based on purity estimates have a more binary distribution that better reflects theoretical methylation states, thus facilitating improved biological inference as shown for BRCA1 in breast cancer. PureBeta is a versatile tool that can be used for different Illumina DNA methylation arrays and can be applied to individual samples of different cancer types to enhance biological interpretability of methylation data.
通过改变 DNA 甲基化实现表观遗传学失调是肿瘤发生的一个基本特征,但来自大量组织样本的肿瘤数据包含不同比例的恶性和非恶性细胞,这可能会混淆 DNA 甲基化值的解释。有人提出根据肿瘤纯度调整 DNA 甲基化数据,使全基因组和基因特异性分析更加精确,但这需要对样本纯度进行估计。在这里,我们介绍一种单样本统计框架 PureBeta,它使用全基因组 DNA 甲基化数据首先估算样本纯度,然后调整单个 CpGs 的甲基化值以校正样本不纯度。在应用于乳腺癌、肺腺癌和肺鳞癌样本时,用该算法估算的纯度值与 DNA 测序获得的参考值具有很高的相关性(>0.8)。根据纯度估计值调整的甲基化贝塔值具有更二元的分布,能更好地反映理论上的甲基化状态,从而有助于改进生物学推断,如乳腺癌中 BRCA1 的情况所示。PureBeta 是一种多功能工具,可用于不同的 Illumina DNA 甲基化阵列,并可应用于不同癌症类型的个体样本,以提高甲基化数据的生物学可解释性。
{"title":"Tumor purity estimated from bulk DNA methylation can be used for adjusting beta values of individual samples to better reflect tumor biology.","authors":"Iñaki Sasiain, Deborah F Nacer, Mattias Aine, Srinivas Veerla, Johan Staaf","doi":"10.1093/nargab/lqae146","DOIUrl":"10.1093/nargab/lqae146","url":null,"abstract":"<p><p>Epigenetic deregulation through altered DNA methylation is a fundamental feature of tumorigenesis, but tumor data from bulk tissue samples contain different proportions of malignant and non-malignant cells that may confound the interpretation of DNA methylation values. The adjustment of DNA methylation data based on tumor purity has been proposed to render both genome-wide and gene-specific analyses more precise, but it requires sample purity estimates. Here we present PureBeta, a single-sample statistical framework that uses genome-wide DNA methylation data to first estimate sample purity and then adjust methylation values of individual CpGs to correct for sample impurity. Purity values estimated with the algorithm have high correlation (>0.8) to reference values obtained from DNA sequencing when applied to samples from breast carcinoma, lung adenocarcinoma, and lung squamous cell carcinoma. Methylation beta values adjusted based on purity estimates have a more binary distribution that better reflects theoretical methylation states, thus facilitating improved biological inference as shown for <i>BRCA1</i> in breast cancer. PureBeta is a versatile tool that can be used for different Illumina DNA methylation arrays and can be applied to individual samples of different cancer types to enhance biological interpretability of methylation data.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae146"},"PeriodicalIF":4.0,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11532792/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142577055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-24eCollection Date: 2024-09-01DOI: 10.1093/nargab/lqae143
Taylor O Eich, Collin A O'Leary, Walter N Moss
To address the lack of intronic reads in secondary structure probing data for the human MYC pre-mRNA, we developed a method that combines spliceosomal inhibition with RNA probing and sequencing. Here, the SIRP-seq method was applied to study the secondary structure of human MYC RNAs by chemically probing HeLa cells with dimethyl sulfate in the presence of the small molecule spliceosome inhibitor pladienolide B. Pladienolide B binds to the SF3B complex of the spliceosome to inhibit intron removal during splicing, resulting in retained intronic sequences. This method was used to increase the read coverage over intronic regions of MYC. The purpose for increasing coverage across introns was to generate complete reactivity profiles for intronic sequences via the DMS-MaPseq approach. Notably, depth was sufficient for analysis by the program DRACO, which was able to deduce distinct reactivity profiles and predict multiple secondary structural conformations as well as their suggested stoichiometric abundances. The results presented here provide a new method for intronic RNA secondary structural analyses, as well as specific structural insights relevant to MYC RNA splicing regulation and therapeutic targeting.
{"title":"Intronic RNA secondary structural information captured for the human <i>MYC</i> pre-mRNA.","authors":"Taylor O Eich, Collin A O'Leary, Walter N Moss","doi":"10.1093/nargab/lqae143","DOIUrl":"10.1093/nargab/lqae143","url":null,"abstract":"<p><p>To address the lack of intronic reads in secondary structure probing data for the human <i>MYC</i> pre-mRNA, we developed a method that combines spliceosomal inhibition with RNA probing and sequencing. Here, the SIRP-seq method was applied to study the secondary structure of human <i>MYC</i> RNAs by chemically probing HeLa cells with dimethyl sulfate in the presence of the small molecule spliceosome inhibitor pladienolide B. Pladienolide B binds to the SF3B complex of the spliceosome to inhibit intron removal during splicing, resulting in retained intronic sequences. This method was used to increase the read coverage over intronic regions of <i>MYC</i>. The purpose for increasing coverage across introns was to generate complete reactivity profiles for intronic sequences via the DMS-MaPseq approach. Notably, depth was sufficient for analysis by the program DRACO, which was able to deduce distinct reactivity profiles and predict multiple secondary structural conformations as well as their suggested stoichiometric abundances. The results presented here provide a new method for intronic RNA secondary structural analyses, as well as specific structural insights relevant to <i>MYC</i> RNA splicing regulation and therapeutic targeting.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae143"},"PeriodicalIF":4.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142509478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-24eCollection Date: 2024-09-01DOI: 10.1093/nargab/lqae144
Leo Zeitler, Arach Goldar, Cyril Denby Wilkes, Julie Soutourina
The development of next-generation sequencing (NGS) technologies paved the way for studying the spatiotemporal coordination of cellular processes along the genome. However, data sets are commonly limited to a few time points, and missing information needs to be interpolated. Most models assume that the studied dynamics are similar between individual cells, so that a homogeneous cell culture can be represented by a population-wide average. Here, we demonstrate that this understanding can be inappropriate. We developed a thought experiment-which we call the NGS chess problem-in which we compare the temporal sequencing data analysis to observing a superimposed picture of many independent games of chess at a time. The analysis of the spatiotemporal kinetics advocates for a new methodology that considers DNA-particle interactions in each cell independently even for a homogeneous cell population.
新一代测序(NGS)技术的发展为沿基因组研究细胞过程的时空协调铺平了道路。然而,数据集通常仅限于几个时间点,缺失的信息需要内插。大多数模型假定单个细胞之间的研究动态相似,因此同种细胞培养可以用全群体平均值来表示。在这里,我们证明了这种理解可能是不恰当的。我们开发了一个思想实验--我们称之为 NGS 国际象棋问题--将时序测序数据分析比作同时观察多盘独立国际象棋的叠加画面。对时空动力学的分析主张采用一种新方法,即使是在同质细胞群中,也要独立考虑每个细胞中 DNA 粒子之间的相互作用。
{"title":"The next-generation sequencing-chess problem.","authors":"Leo Zeitler, Arach Goldar, Cyril Denby Wilkes, Julie Soutourina","doi":"10.1093/nargab/lqae144","DOIUrl":"https://doi.org/10.1093/nargab/lqae144","url":null,"abstract":"<p><p>The development of next-generation sequencing (NGS) technologies paved the way for studying the spatiotemporal coordination of cellular processes along the genome. However, data sets are commonly limited to a few time points, and missing information needs to be interpolated. Most models assume that the studied dynamics are similar between individual cells, so that a homogeneous cell culture can be represented by a population-wide average. Here, we demonstrate that this understanding can be inappropriate. We developed a thought experiment-which we call the NGS chess problem-in which we compare the temporal sequencing data analysis to observing a superimposed picture of many independent games of chess at a time. The analysis of the spatiotemporal kinetics advocates for a new methodology that considers DNA-particle interactions in each cell independently even for a homogeneous cell population.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae144"},"PeriodicalIF":4.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500447/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142509479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23eCollection Date: 2024-09-01DOI: 10.1093/nargab/lqae142
Rick Beeloo, Aldert L Zomer, Sebastian Deorowicz, Bas E Dutilh
The recent growth of microbial sequence data allows comparisons at unprecedented scales, enabling the tracking of strains, mobile genetic elements, or genes. Querying a genome against a large reference database can easily yield thousands of matches that are tedious to interpret and pose computational challenges. We developed Graphite that uses a colored de Bruijn graph (cDBG) to paint query genomes, selecting the local best matches along the full query length. By focusing on the best genomic match of each query region, Graphite reduces the number of matches while providing the most promising leads for sequence tracking or genomic forensics. When applied to hundreds of Campylobacter genomes we found extensive gene sharing, including a previously undetected C. coli plasmid that matched a C. jejuni chromosome. Together, genome painting using cDBGs as enabled by Graphite, can reveal new biological phenomena by mitigating computational hurdles.
近来微生物序列数据的增长使我们能够以前所未有的规模进行比较,从而追踪菌株、移动遗传因子或基因。根据大型参考数据库查询基因组很容易获得成千上万的匹配结果,而这些匹配结果的解读非常繁琐,并给计算带来了挑战。我们开发的 Graphite 使用彩色 de Bruijn 图(cDBG)来绘制查询基因组,沿着整个查询长度选择局部最佳匹配。通过关注每个查询区域的最佳基因组匹配,Graphite 减少了匹配的数量,同时为序列追踪或基因组取证提供了最有希望的线索。当应用于数百个弯曲杆菌基因组时,我们发现了广泛的基因共享,包括以前未检测到的与空肠弯曲杆菌染色体匹配的大肠杆菌质粒。总之,利用石墨实现的 cDBGs 进行基因组绘制,可以通过减少计算障碍来揭示新的生物现象。
{"title":"Graphite: painting genomes using a colored de Bruijn graph.","authors":"Rick Beeloo, Aldert L Zomer, Sebastian Deorowicz, Bas E Dutilh","doi":"10.1093/nargab/lqae142","DOIUrl":"https://doi.org/10.1093/nargab/lqae142","url":null,"abstract":"<p><p>The recent growth of microbial sequence data allows comparisons at unprecedented scales, enabling the tracking of strains, mobile genetic elements, or genes. Querying a genome against a large reference database can easily yield thousands of matches that are tedious to interpret and pose computational challenges. We developed Graphite that uses a colored de Bruijn graph (cDBG) to paint query genomes, selecting the local best matches along the full query length. By focusing on the best genomic match of each query region, Graphite reduces the number of matches while providing the most promising leads for sequence tracking or genomic forensics. When applied to hundreds of <i>Campylobacter</i> genomes we found extensive gene sharing, including a previously undetected <i>C. coli</i> plasmid that matched a <i>C. jejuni</i> chromosome. Together, genome painting using cDBGs as enabled by Graphite, can reveal new biological phenomena by mitigating computational hurdles.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae142"},"PeriodicalIF":4.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11497850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142509477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-16eCollection Date: 2024-09-01DOI: 10.1093/nargab/lqae141
Giovanni Scala, Luigi Ferraro, Aurora Brandi, Yan Guo, Barbara Majello, Michele Ceccarelli
Cells are complex systems whose behavior emerges from a huge number of reactions taking place within and among different molecular districts. The availability of bulk and single-cell omics data fueled the creation of multi-omics systems biology models capturing the dynamics within and between omics layers. Powerful modeling strategies are needed to cope with the increased amount of data to be interrogated and the relative research questions. Here, we present MultiOmics Network Embedding for SubType Analysis (MoNETA) for fast and scalable identification of relevant multi-omics relationships between biological entities at the bulk and single-cells level. We apply MoNETA to show how glioma subtypes previously described naturally emerge with our approach. We also show how MoNETA can be used to identify cell types in five multi-omic single-cell datasets.
{"title":"MoNETA: MultiOmics Network Embedding for SubType Analysis.","authors":"Giovanni Scala, Luigi Ferraro, Aurora Brandi, Yan Guo, Barbara Majello, Michele Ceccarelli","doi":"10.1093/nargab/lqae141","DOIUrl":"10.1093/nargab/lqae141","url":null,"abstract":"<p><p>Cells are complex systems whose behavior emerges from a huge number of reactions taking place within and among different molecular districts. The availability of bulk and single-cell omics data fueled the creation of multi-omics systems biology models capturing the dynamics within and between omics layers. Powerful modeling strategies are needed to cope with the increased amount of data to be interrogated and the relative research questions. Here, we present MultiOmics Network Embedding for SubType Analysis (MoNETA) for fast and scalable identification of relevant multi-omics relationships between biological entities at the bulk and single-cells level. We apply MoNETA to show how glioma subtypes previously described naturally emerge with our approach. We also show how MoNETA can be used to identify cell types in five multi-omic single-cell datasets.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae141"},"PeriodicalIF":4.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11482636/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142476446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-15eCollection Date: 2024-09-01DOI: 10.1093/nargab/lqae137
Brent T Schlegel, Michael Morikone, Fangping Mu, Wan-Yee Tang, Gary Kohanbash, Dhivyaa Rajasundaram
B cells play a critical role in the adaptive recognition of foreign antigens through diverse receptor generation. While targeted immune sequencing methods are commonly used to profile B cell receptors (BCRs), they have limitations in cost and tissue availability. Analyzing B cell receptor profiling from non-targeted transcriptomics data is a promising alternative, but a systematic pipeline integrating tools for accurate immune repertoire extraction is lacking. Here, we present bcRflow, a Nextflow pipeline designed to characterize BCR repertoires from non-targeted transcriptomics data, with functional modules for alignment, processing, and visualization. bcRflow is a comprehensive, reproducible, and scalable pipeline that can run on high-performance computing clusters, cloud-based computing resources like Amazon Web Services (AWS), the Open OnDemand framework, or even local desktops. bcRflow utilizes institutional configurations provided by nf-core to ensure maximum portability and accessibility. To demonstrate the functionality of the bcRflow pipeline, we analyzed a public dataset of bulk transcriptomic samples from COVID-19 patients and healthy controls. We have shown that bcRflow streamlines the analysis of BCR repertoires from non-targeted transcriptomics data, providing valuable insights into the B cell immune response for biological and clinical research. bcRflow is available at https://github.com/Bioinformatics-Core-at-Childrens/bcRflow.
B 细胞通过产生不同的受体,在对外来抗原的适应性识别中发挥着关键作用。虽然靶向免疫测序方法常用于分析 B 细胞受体(BCR),但它们在成本和组织可用性方面存在局限性。从非靶向转录组学数据中分析 B 细胞受体图谱是一种很有前景的替代方法,但目前还缺乏一种整合了精确提取免疫基因组工具的系统管道。bcRflow 是一个全面、可重现、可扩展的管道,可以运行在高性能计算集群、亚马逊网络服务(AWS)等云计算资源、Open OnDemand 框架甚至本地台式机上。为了展示 bcRflow 管道的功能,我们分析了来自 COVID-19 患者和健康对照的批量转录组样本的公共数据集。我们的研究表明,bcRflow 简化了对非靶向转录组学数据中 BCR 重排的分析,为生物和临床研究提供了有关 B 细胞免疫反应的宝贵见解。bcRflow 可在 https://github.com/Bioinformatics-Core-at-Childrens/bcRflow 上查阅。
{"title":"bcRflow: a Nextflow pipeline for characterizing B cell receptor repertoires from non-targeted transcriptomic data.","authors":"Brent T Schlegel, Michael Morikone, Fangping Mu, Wan-Yee Tang, Gary Kohanbash, Dhivyaa Rajasundaram","doi":"10.1093/nargab/lqae137","DOIUrl":"10.1093/nargab/lqae137","url":null,"abstract":"<p><p>B cells play a critical role in the adaptive recognition of foreign antigens through diverse receptor generation. While targeted immune sequencing methods are commonly used to profile B cell receptors (BCRs), they have limitations in cost and tissue availability. Analyzing B cell receptor profiling from non-targeted transcriptomics data is a promising alternative, but a systematic pipeline integrating tools for accurate immune repertoire extraction is lacking. Here, we present bcRflow, a Nextflow pipeline designed to characterize BCR repertoires from non-targeted transcriptomics data, with functional modules for alignment, processing, and visualization. bcRflow is a comprehensive, reproducible, and scalable pipeline that can run on high-performance computing clusters, cloud-based computing resources like Amazon Web Services (AWS), the Open OnDemand framework, or even local desktops. bcRflow utilizes institutional configurations provided by nf-core to ensure maximum portability and accessibility. To demonstrate the functionality of the bcRflow pipeline, we analyzed a public dataset of bulk transcriptomic samples from COVID-19 patients and healthy controls. We have shown that bcRflow streamlines the analysis of BCR repertoires from non-targeted transcriptomics data, providing valuable insights into the B cell immune response for biological and clinical research. bcRflow is available at https://github.com/Bioinformatics-Core-at-Childrens/bcRflow.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae137"},"PeriodicalIF":4.0,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11474772/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142476445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-10eCollection Date: 2024-09-01DOI: 10.1093/nargab/lqae140
Marco Antonio Tangaro, Marica Antonacci, Giacinto Donvito, Nadina Foggetti, Pietro Mandreoli, Daniele Colombo, Graziano Pesole, Federico Zambelli
Technological advances in high-throughput technologies improve our ability to explore the molecular mechanisms of life. Computational infrastructures for scientific applications fulfil a critical role in harnessing this potential. However, there is an ongoing need to improve accessibility and implement robust data security technologies to allow the processing of sensitive data, particularly human genetic data. Scientific clouds have emerged as a promising solution to meet these needs. We present three components of the Laniakea software stack, initially developed to support the provision of private on-demand Galaxy instances. These components can be adopted by providers of scientific cloud services built on the INDIGO PaaS layer. The Dashboard translates configuration template files into user-friendly web interfaces, enabling the easy configuration and launch of on-demand applications. The secret management and the encryption components, integrated within the Dashboard, support the secure handling of passphrases and credentials and the deployment of block-level encrypted storage volumes for managing sensitive data in the cloud environment. By adopting these software components, scientific cloud providers can develop convenient, secure and efficient on-demand services for their users.
{"title":"Dynamic configuration and data security for bioinformatics cloud services with the Laniakea Dashboard.","authors":"Marco Antonio Tangaro, Marica Antonacci, Giacinto Donvito, Nadina Foggetti, Pietro Mandreoli, Daniele Colombo, Graziano Pesole, Federico Zambelli","doi":"10.1093/nargab/lqae140","DOIUrl":"10.1093/nargab/lqae140","url":null,"abstract":"<p><p>Technological advances in high-throughput technologies improve our ability to explore the molecular mechanisms of life. Computational infrastructures for scientific applications fulfil a critical role in harnessing this potential. However, there is an ongoing need to improve accessibility and implement robust data security technologies to allow the processing of sensitive data, particularly human genetic data. Scientific clouds have emerged as a promising solution to meet these needs. We present three components of the Laniakea software stack, initially developed to support the provision of private on-demand Galaxy instances. These components can be adopted by providers of scientific cloud services built on the INDIGO PaaS layer. The <i>Dashboard</i> translates configuration template files into user-friendly web interfaces, enabling the easy configuration and launch of on-demand applications. The <i>secret management</i> and the <i>encryption</i> components, integrated within the Dashboard, support the secure handling of passphrases and credentials and the deployment of block-level encrypted storage volumes for managing sensitive data in the cloud environment. By adopting these software components, scientific cloud providers can develop convenient, secure and efficient on-demand services for their users.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae140"},"PeriodicalIF":4.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11464921/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142401507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-08eCollection Date: 2024-09-01DOI: 10.1093/nargab/lqae135
Aybuge Altay, Martin Vingron
Cells whose accessibility landscape has been profiled with scATAC-seq cannot readily be annotated to a particular cell type. In fact, annotating cell-types in scATAC-seq data is a challenging task since, unlike in scRNA-seq data, we lack knowledge of 'marker regions' which could be used for cell-type annotation. Current annotation methods typically translate accessibility to expression space and rely on gene expression patterns. We propose a novel approach, scATAcat, that leverages characterized bulk ATAC-seq data as prototypes to annotate scATAC-seq data. To mitigate the inherent sparsity of single-cell data, we aggregate cells that belong to the same cluster and create pseudobulk. To demonstrate the feasibility of our approach we collected a number of datasets with respective annotations to quantify the results and evaluate performance for scATAcat. scATAcat is available as a python package at https://github.com/aybugealtay/scATAcat.
{"title":"scATAcat: cell-type annotation for scATAC-seq data.","authors":"Aybuge Altay, Martin Vingron","doi":"10.1093/nargab/lqae135","DOIUrl":"https://doi.org/10.1093/nargab/lqae135","url":null,"abstract":"<p><p>Cells whose accessibility landscape has been profiled with scATAC-seq cannot readily be annotated to a particular cell type. In fact, annotating cell-types in scATAC-seq data is a challenging task since, unlike in scRNA-seq data, we lack knowledge of 'marker regions' which could be used for cell-type annotation. Current annotation methods typically translate accessibility to expression space and rely on gene expression patterns. We propose a novel approach, scATAcat, that leverages characterized bulk ATAC-seq data as prototypes to annotate scATAC-seq data. To mitigate the inherent sparsity of single-cell data, we aggregate cells that belong to the same cluster and create pseudobulk. To demonstrate the feasibility of our approach we collected a number of datasets with respective annotations to quantify the results and evaluate performance for scATAcat. scATAcat is available as a python package at https://github.com/aybugealtay/scATAcat.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae135"},"PeriodicalIF":4.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11459382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-08eCollection Date: 2024-09-01DOI: 10.1093/nargab/lqae139
Anna Laddach, Vassilis Pachnis, Michael Shapiro
Differentiation of multipotential progenitor cells is a key process in the development of any multi-cellular organism and often continues throughout its life. It is often assumed that a bi-potential progenitor develops along a (relatively) straight trajectory until it reaches a decision point where the trajectory bifurcates. At this point one of two directions is chosen, each direction representing the unfolding of a new transcriptional programme. However, we have lacked quantitative means for testing this model. Accordingly, we have developed the R package TrajectoryGeometry. Applying this to published data we find several examples where, rather than bifurcate, developmental pathways branch. That is, the bipotential progenitor develops along a relatively straight trajectory leading to one of its potential fates. A second relatively straight trajectory branches off from this towards the other potential fate. In this sense only cells that branch off to follow the second trajectory make a 'decision'. Our methods give precise descriptions of the genes and cellular pathways involved in these trajectories. We speculate that branching may be the more common behaviour and may have advantages from a control-theoretic viewpoint.
多潜能祖细胞的分化是任何多细胞生物体发育过程中的一个关键过程,而且往往贯穿整个生命过程。通常假定,双潜能祖细胞沿着(相对)直线轨迹发育,直到到达轨迹分叉的决定点。在这一点上,会选择两个方向中的一个,每个方向都代表一个新的转录程序的展开。然而,我们缺乏对这一模型进行定量测试的方法。因此,我们开发了 R 软件包 TrajectoryGeometry。将其应用到已发表的数据中,我们发现了几个例子,在这些例子中,发育途径不是分叉,而是分支。也就是说,双潜能原基沿着一条相对笔直的轨迹发展,最终走向其中一种可能的命运。另一条相对笔直的轨迹则从中分支,通向另一种潜在命运。从这个意义上说,只有沿着第二条轨迹分支的细胞才会做出 "决定"。我们的方法精确描述了这些轨迹所涉及的基因和细胞通路。我们推测,分支可能是更常见的行为,从控制理论的角度来看可能具有优势。
{"title":"TrajectoryGeometry suggests cell fate decisions can involve branches rather than bifurcations.","authors":"Anna Laddach, Vassilis Pachnis, Michael Shapiro","doi":"10.1093/nargab/lqae139","DOIUrl":"10.1093/nargab/lqae139","url":null,"abstract":"<p><p>Differentiation of multipotential progenitor cells is a key process in the development of any multi-cellular organism and often continues throughout its life. It is often assumed that a bi-potential progenitor develops along a (relatively) straight trajectory until it reaches a decision point where the trajectory bifurcates. At this point one of two directions is chosen, each direction representing the unfolding of a new transcriptional programme. However, we have lacked quantitative means for testing this model. Accordingly, we have developed the R package TrajectoryGeometry. Applying this to published data we find several examples where, rather than bifurcate, developmental pathways <i>branch</i>. That is, the bipotential progenitor develops along a relatively straight trajectory leading to one of its potential fates. A second relatively straight trajectory branches off from this towards the other potential fate. In this sense only cells that branch off to follow the second trajectory make a 'decision'. Our methods give precise descriptions of the genes and cellular pathways involved in these trajectories. We speculate that branching may be the more common behaviour and may have advantages from a control-theoretic viewpoint.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae139"},"PeriodicalIF":4.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11459380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142393890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-03eCollection Date: 2024-09-01DOI: 10.1093/nargab/lqae136
Corinne E Sexton, Sylvia Victor Paul, Dylan Barth, Mira V Han
We can now analyze 3D physical interactions of chromatin regions with chromatin conformation capture technologies, in addition to the 1D chromatin state annotations, but methods to integrate this information are lacking. We propose a method to integrate the chromatin state of interacting regions into a vector representation through the contact-weighted sum of chromatin states. Unsupervised clustering on integrated chromatin states and Micro-C contacts reveals common patterns of chromatin interaction signatures. This provides an integrated view of the complex dynamics of concurrent change occurring in chromatin state and in chromatin interaction, adding another layer of annotation beyond chromatin state or Hi-C contact separately.
{"title":"Genome wide clustering on integrated chromatin states and Micro-C contacts reveals chromatin interaction signatures.","authors":"Corinne E Sexton, Sylvia Victor Paul, Dylan Barth, Mira V Han","doi":"10.1093/nargab/lqae136","DOIUrl":"10.1093/nargab/lqae136","url":null,"abstract":"<p><p>We can now analyze 3D physical interactions of chromatin regions with chromatin conformation capture technologies, in addition to the 1D chromatin state annotations, but methods to integrate this information are lacking. We propose a method to integrate the chromatin state of interacting regions into a vector representation through the contact-weighted sum of chromatin states. Unsupervised clustering on integrated chromatin states and Micro-C contacts reveals common patterns of chromatin interaction signatures. This provides an integrated view of the complex dynamics of concurrent change occurring in chromatin state and in chromatin interaction, adding another layer of annotation beyond chromatin state or Hi-C contact separately.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae136"},"PeriodicalIF":4.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447530/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}