首页 > 最新文献

Bioinformatics advances最新文献

英文 中文
scVIC: Deep generative modeling of heterogeneity for scRNA-seq data scVIC:scRNA-seq 数据异质性深度生成建模
Pub Date : 2024-06-13 DOI: 10.1093/bioadv/vbae086
Jiankang Xiong, Fuzhou Gong, Liang Ma, Lin Wan
Single-cell RNA sequencing (scRNA-seq) has become a valuable tool for studying cellular heterogeneity. However, the analysis of scRNA-seq data is challenging because of inherent noise and technical variability. Existing methods often struggle to simultaneously explore heterogeneity across cells, handle dropout events, and account for batch effects. These drawbacks call for a robust and comprehensive method that can address these challenges and provide accurate insights into heterogeneity at the single-cell level. In this study, we introduce scVIC, an algorithm designed to account for variational inference, while simultaneously handling biological heterogeneity and batch effects at the single-cell level. scVIC explicitly models both biological heterogeneity and technical variability to learn cellular heterogeneity in a manner free from dropout events and the bias of batch effects. By leveraging variational inference, we provide a robust framework for inferring the parameters of scVIC. To test the performance of scVIC, we employed both simulated and biological scRNA-seq datasets, either including, or not, batch effects. scVIC was found to outperform other approaches because of its superior clustering ability and circumvention of the batch effects problem. The code of scVIC and replication for this study are available at https://github.com/HiBearME/scVIC/tree/v1.0. Supplementary data are available at Bioinformatics Advances online.
单细胞 RNA 测序(scRNA-seq)已成为研究细胞异质性的重要工具。然而,由于固有的噪声和技术变异性,scRNA-seq 数据分析具有挑战性。现有的方法往往难以同时探索细胞间的异质性、处理脱落事件并考虑批次效应。这些弊端要求有一种稳健而全面的方法来应对这些挑战,并准确地洞察单细胞水平的异质性。 在本研究中,我们介绍了 scVIC,这是一种旨在考虑变异推理的算法,同时能在单细胞水平上处理生物异质性和批次效应。scVIC 明确地对生物异质性和技术变异性进行建模,以学习细胞异质性,避免了辍学事件和批次效应的偏差。通过利用变异推理,我们为推断 scVIC 的参数提供了一个稳健的框架。为了测试 scVIC 的性能,我们使用了模拟和生物 scRNA-seq 数据集,包括或不包括批次效应。结果发现 scVIC 优于其他方法,因为它具有卓越的聚类能力并能规避批次效应问题。 本研究的 scVIC 代码和复制可在 https://github.com/HiBearME/scVIC/tree/v1.0 上获取。 补充数据可在 Bioinformatics Advances 在线查阅。
{"title":"scVIC: Deep generative modeling of heterogeneity for scRNA-seq data","authors":"Jiankang Xiong, Fuzhou Gong, Liang Ma, Lin Wan","doi":"10.1093/bioadv/vbae086","DOIUrl":"https://doi.org/10.1093/bioadv/vbae086","url":null,"abstract":"\u0000 \u0000 \u0000 Single-cell RNA sequencing (scRNA-seq) has become a valuable tool for studying cellular heterogeneity. However, the analysis of scRNA-seq data is challenging because of inherent noise and technical variability. Existing methods often struggle to simultaneously explore heterogeneity across cells, handle dropout events, and account for batch effects. These drawbacks call for a robust and comprehensive method that can address these challenges and provide accurate insights into heterogeneity at the single-cell level.\u0000 \u0000 \u0000 \u0000 In this study, we introduce scVIC, an algorithm designed to account for variational inference, while simultaneously handling biological heterogeneity and batch effects at the single-cell level. scVIC explicitly models both biological heterogeneity and technical variability to learn cellular heterogeneity in a manner free from dropout events and the bias of batch effects. By leveraging variational inference, we provide a robust framework for inferring the parameters of scVIC. To test the performance of scVIC, we employed both simulated and biological scRNA-seq datasets, either including, or not, batch effects. scVIC was found to outperform other approaches because of its superior clustering ability and circumvention of the batch effects problem.\u0000 \u0000 \u0000 \u0000 The code of scVIC and replication for this study are available at https://github.com/HiBearME/scVIC/tree/v1.0.\u0000 \u0000 \u0000 \u0000 Supplementary data are available at Bioinformatics Advances online.\u0000","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141349305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Demultiplexing of Single-Cell RNA sequencing data using interindividual variation in gene expression 利用基因表达的个体间差异解复用单细胞 RNA 测序数据
Pub Date : 2024-06-08 DOI: 10.1093/bioadv/vbae085
I. Nassiri, Andrew J Kwok, Aneesha Bhandari, Katherine R. Bull, Lucy C. Garner, Paul Klenerman, Caleb Webber, Laura Parkkinen, Angela W Lee, Yanxia Wu, Benjamin Fairfax, Julian C. Knight, David Buck, Paolo Piazza
Pooled designs for single-cell RNA sequencing, where many cells from distinct samples are processed jointly, offer increased throughput and reduced batch variation. This study describes expression-aware demultiplexing (EAD), a computational method that employs differential co-expression patterns between individuals to demultiplex pooled samples without any extra experimental steps. We use synthetic sample pools and show that the top interindividual differentially co-expressed genes provide a distinct cluster of cells per individual, significantly enriching the regulation of metabolism. Our application of EAD to samples of 6 isogenic inbred mice demonstrated that controlling genetic and environmental effects can solve inter-individual variations related to metabolic pathways. We utilized 30 samples from both sepsis and healthy individuals in six batches to assess the performance of classification approaches. The results indicate that combining genetic and EAD results can enhance the accuracy of assignments (Min 0.94, Mean 0.98, Max 1). The results were enhanced by an average of 1.4% when EAD and barcoding techniques were combined (Min. 1.25%, Median 1.33%, Max. 1.74%). Furthermore, we demonstrate that interindividual differential co-expression analysis within the same cell type can be used to identify cells from the same donor in different activation states. By analyzing single-nuclei transcriptome profiles from the brain, we demonstrate that our method can be applied to non-immune cells. Expression-aware demultiplexing workflow is available at https://isarnassiri.github.io/scDIV/ as an R package called scDIV (acronym for Single Cell RNA sequencing data Demultiplexing using Interindividual Variations). Supplementary data are available at Bioinformatics Advances online.
单细胞 RNA 测序的汇集设计,即联合处理来自不同样本的许多细胞,可提高通量并减少批次差异。本研究介绍了表达感知解复用(EAD),这是一种利用个体间差异共表达模式解复用集合样本的计算方法,无需任何额外的实验步骤。 我们使用合成样本池,结果表明个体间差异共表达基因的前列提供了每个个体的独特细胞群,极大地丰富了新陈代谢的调控。我们将 EAD 应用于 6 个同源近交系小鼠样本,结果表明,控制遗传和环境效应可以解决与代谢途径相关的个体间差异。我们利用来自败血症和健康个体的 6 批 30 个样本来评估分类方法的性能。结果表明,结合基因和 EAD 结果可以提高分配的准确性(最小值 0.94,平均值 0.98,最大值 1)。当 EAD 和条形码技术相结合时,结果平均提高了 1.4%(最小值 1.25%,中值 1.33%,最大值 1.74%)。此外,我们还证明了同一细胞类型中的个体间差异共表达分析可用于识别处于不同激活状态的同一供体的细胞。通过分析来自大脑的单核转录组图谱,我们证明了我们的方法可以应用于非免疫细胞。 表达感知解复用工作流程作为一个名为scDIV(Single Cell RNA sequencing data Demultiplexing using Interindividual Variations)的R包可在https://isarnassiri.github.io/scDIV/。 补充数据可在 Bioinformatics Advances 在线查阅。
{"title":"Demultiplexing of Single-Cell RNA sequencing data using interindividual variation in gene expression","authors":"I. Nassiri, Andrew J Kwok, Aneesha Bhandari, Katherine R. Bull, Lucy C. Garner, Paul Klenerman, Caleb Webber, Laura Parkkinen, Angela W Lee, Yanxia Wu, Benjamin Fairfax, Julian C. Knight, David Buck, Paolo Piazza","doi":"10.1093/bioadv/vbae085","DOIUrl":"https://doi.org/10.1093/bioadv/vbae085","url":null,"abstract":"\u0000 \u0000 \u0000 Pooled designs for single-cell RNA sequencing, where many cells from distinct samples are processed jointly, offer increased throughput and reduced batch variation. This study describes expression-aware demultiplexing (EAD), a computational method that employs differential co-expression patterns between individuals to demultiplex pooled samples without any extra experimental steps.\u0000 \u0000 \u0000 \u0000 We use synthetic sample pools and show that the top interindividual differentially co-expressed genes provide a distinct cluster of cells per individual, significantly enriching the regulation of metabolism. Our application of EAD to samples of 6 isogenic inbred mice demonstrated that controlling genetic and environmental effects can solve inter-individual variations related to metabolic pathways. We utilized 30 samples from both sepsis and healthy individuals in six batches to assess the performance of classification approaches. The results indicate that combining genetic and EAD results can enhance the accuracy of assignments (Min 0.94, Mean 0.98, Max 1). The results were enhanced by an average of 1.4% when EAD and barcoding techniques were combined (Min. 1.25%, Median 1.33%, Max. 1.74%). Furthermore, we demonstrate that interindividual differential co-expression analysis within the same cell type can be used to identify cells from the same donor in different activation states. By analyzing single-nuclei transcriptome profiles from the brain, we demonstrate that our method can be applied to non-immune cells.\u0000 \u0000 \u0000 \u0000 Expression-aware demultiplexing workflow is available at https://isarnassiri.github.io/scDIV/ as an R package called scDIV (acronym for Single Cell RNA sequencing data Demultiplexing using Interindividual Variations).\u0000 \u0000 \u0000 \u0000 Supplementary data are available at Bioinformatics Advances online.\u0000","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141370591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OM2Seq: learning retrieval embeddings for optical genome mapping. OM2Seq:学习光学基因组图谱的检索嵌入。
IF 2.4 Pub Date : 2024-06-05 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae079
Yevgeni Nogin, Danielle Sapir, Tahir Detinis Zur, Nir Weinberger, Yonatan Belinkov, Yuval Ebenstein, Yoav Shechtman

Motivation: Genomics-based diagnostic methods that are quick, precise, and economical are essential for the advancement of precision medicine, with applications spanning the diagnosis of infectious diseases, cancer, and rare diseases. One technology that holds potential in this field is optical genome mapping (OGM), which is capable of detecting structural variations, epigenomic profiling, and microbial species identification. It is based on imaging of linearized DNA molecules that are stained with fluorescent labels, that are then aligned to a reference genome. However, the computational methods currently available for OGM fall short in terms of accuracy and computational speed.

Results: This work introduces OM2Seq, a new approach for the rapid and accurate mapping of DNA fragment images to a reference genome. Based on a Transformer-encoder architecture, OM2Seq is trained on acquired OGM data to efficiently encode DNA fragment images and reference genome segments to a common embedding space, which can be indexed and efficiently queried using a vector database. We show that OM2Seq significantly outperforms the baseline methods in both computational speed (by 2 orders of magnitude) and accuracy.

Availability and implementation: https://github.com/yevgenin/om2seq.

动机:快速、精确、经济的基因组学诊断方法对精准医疗的发展至关重要,其应用范围涵盖传染病、癌症和罕见病的诊断。光学基因组图谱(OGM)是这一领域具有潜力的一项技术,它能够检测结构变异、表观基因组剖析和微生物物种鉴定。它基于线性化 DNA 分子的成像,这些分子被荧光标签染色,然后与参考基因组对齐。然而,目前可用于 OGM 的计算方法在准确性和计算速度方面存在不足:这项工作引入了 OM2Seq,这是一种快速、准确地将 DNA 片段图像映射到参考基因组的新方法。OM2Seq 基于变换器编码器架构,通过对获取的 OGM 数据进行训练,可将 DNA 片段图像和参考基因组片段高效编码到一个共同的嵌入空间,该嵌入空间可使用矢量数据库进行索引和高效查询。我们的研究表明,OM2Seq 在计算速度(2 个数量级)和准确性方面都明显优于基线方法。可用性和实现:https://github.com/yevgenin/om2seq。
{"title":"OM2Seq: learning retrieval embeddings for optical genome mapping.","authors":"Yevgeni Nogin, Danielle Sapir, Tahir Detinis Zur, Nir Weinberger, Yonatan Belinkov, Yuval Ebenstein, Yoav Shechtman","doi":"10.1093/bioadv/vbae079","DOIUrl":"10.1093/bioadv/vbae079","url":null,"abstract":"<p><strong>Motivation: </strong>Genomics-based diagnostic methods that are quick, precise, and economical are essential for the advancement of precision medicine, with applications spanning the diagnosis of infectious diseases, cancer, and rare diseases. One technology that holds potential in this field is optical genome mapping (OGM), which is capable of detecting structural variations, epigenomic profiling, and microbial species identification. It is based on imaging of linearized DNA molecules that are stained with fluorescent labels, that are then aligned to a reference genome. However, the computational methods currently available for OGM fall short in terms of accuracy and computational speed.</p><p><strong>Results: </strong>This work introduces OM2Seq, a new approach for the rapid and accurate mapping of DNA fragment images to a reference genome. Based on a Transformer-encoder architecture, OM2Seq is trained on acquired OGM data to efficiently encode DNA fragment images and reference genome segments to a common embedding space, which can be indexed and efficiently queried using a vector database. We show that OM2Seq significantly outperforms the baseline methods in both computational speed (by 2 orders of magnitude) and accuracy.</p><p><strong>Availability and implementation: </strong>https://github.com/yevgenin/om2seq.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11194751/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hapsolutely: a user-friendly tool integrating haplotype phasing, network construction and haploweb calculation 单倍型:集单倍型分期、网络构建和单倍网计算于一体的用户友好型工具
Pub Date : 2024-06-05 DOI: 10.1093/bioadv/vbae083
Miguel Vences, Stefanos Patmanidis, Jan-Christopher Schmidt, Michael Matschiner, Aurélien Miralles, Susanne S Renner
Haplotype networks are a routine approach to visualize relationships among alleles. Such visual analysis of single-locus data is still of importance, especially in species diagnosis and delimitation, where a limited amount of sequence data usually are available and sufficient, along with other data sets in the framework of integrative taxonomy. In diploid organisms, this often requires separating (‘phasing’) sequences with heterozygotic positions, and typically separate programs are required for phasing, reformatting of input files, and haplotype network construction. We therefore developed Hapsolutely, a user-friendly program with an ergonomic graphical user interface (GUI) that integrates haplotype phasing from single-locus sequences with five approaches for network/genealogy reconstruction. Among the novel options implemented, Hapsolutely integrates phasing and graphical reconstruction steps of haplotype networks, supports input of species partition data in the common SPART and SPART-XML formats, and calculates and visualizes haplowebs and fields for recombination, thus allowing graphical comparison of allele distribution and allele sharing among subsets for the purpose of species delimitation. The new tool has been specifically developed with a focus on the workflow in alpha-taxonomy, where exploring fields for recombination across alternative species partitions may help species delimitation. Hapsolutely is written in Python, and integrates code from Phase, SeqPHASE and PopART in C ++ and Haxe. Compiled stand-alone executables for MS Windows and Mac OS along with a detailed manual can be downloaded from https://www.itaxotools.org; the source code is openly available on GitHub (https://github.com/iTaxoTools/Hapsolutely).
单倍型网络是可视化等位基因之间关系的常规方法。这种对单锁点数据的可视化分析仍然非常重要,尤其是在物种诊断和定界方面,因为在综合分类学的框架下,通常只有有限的序列数据和其他数据集。在二倍体生物中,这通常需要分离("分阶段")具有杂合位置的序列,通常需要单独的程序来进行分阶段、输入文件的重新格式化和单倍型网络构建。因此,我们开发了 Hapsolutely,这是一个用户友好型程序,具有符合人体工程学的图形用户界面(GUI),可将单焦点序列的单倍型分期与网络/系谱重建的五种方法整合在一起。 在实现的新选项中,Hapsolutely 整合了单倍型网络的分期和图形重建步骤,支持以常见的 SPART 和 SPART-XML 格式输入物种分区数据,并计算和可视化单倍网络和重组字段,从而以图形方式比较等位基因分布和子集之间的等位基因共享情况,达到物种划分的目的。这款新工具是专门针对阿尔法分类学的工作流程而开发的,在阿尔法分类学中,探索不同物种分区之间的重组字段可能有助于物种划分。 Hapsolutely 是用 Python 编写的,并将 Phase、SeqPHASE 和 PopART 的代码整合到 C ++ 和 Haxe 中。可从 https://www.itaxotools.org 下载编译后的 MS Windows 和 Mac OS 单机版可执行文件以及详细手册;源代码可从 GitHub (https://github.com/iTaxoTools/Hapsolutely) 公开获取。
{"title":"Hapsolutely: a user-friendly tool integrating haplotype phasing, network construction and haploweb calculation","authors":"Miguel Vences, Stefanos Patmanidis, Jan-Christopher Schmidt, Michael Matschiner, Aurélien Miralles, Susanne S Renner","doi":"10.1093/bioadv/vbae083","DOIUrl":"https://doi.org/10.1093/bioadv/vbae083","url":null,"abstract":"\u0000 \u0000 \u0000 Haplotype networks are a routine approach to visualize relationships among alleles. Such visual analysis of single-locus data is still of importance, especially in species diagnosis and delimitation, where a limited amount of sequence data usually are available and sufficient, along with other data sets in the framework of integrative taxonomy. In diploid organisms, this often requires separating (‘phasing’) sequences with heterozygotic positions, and typically separate programs are required for phasing, reformatting of input files, and haplotype network construction. We therefore developed Hapsolutely, a user-friendly program with an ergonomic graphical user interface (GUI) that integrates haplotype phasing from single-locus sequences with five approaches for network/genealogy reconstruction.\u0000 \u0000 \u0000 \u0000 Among the novel options implemented, Hapsolutely integrates phasing and graphical reconstruction steps of haplotype networks, supports input of species partition data in the common SPART and SPART-XML formats, and calculates and visualizes haplowebs and fields for recombination, thus allowing graphical comparison of allele distribution and allele sharing among subsets for the purpose of species delimitation. The new tool has been specifically developed with a focus on the workflow in alpha-taxonomy, where exploring fields for recombination across alternative species partitions may help species delimitation.\u0000 \u0000 \u0000 \u0000 Hapsolutely is written in Python, and integrates code from Phase, SeqPHASE and PopART in C ++ and Haxe. Compiled stand-alone executables for MS Windows and Mac OS along with a detailed manual can be downloaded from https://www.itaxotools.org; the source code is openly available on GitHub (https://github.com/iTaxoTools/Hapsolutely).\u0000","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141382901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CellsFromSpace: a fast, accurate, and reference-free tool to deconvolve and annotate spatially distributed omics data. CellsFromSpace:一种快速、准确、无参考文献的工具,用于对空间分布的 omics 数据进行解卷积和注释。
IF 2.4 Pub Date : 2024-05-30 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae081
Corentin Thuilliez, Gaël Moquin-Beaudry, Pierre Khneisser, Maria Eugenia Marques Da Costa, Slim Karkar, Hanane Boudhouche, Damien Drubay, Baptiste Audinot, Birgit Geoerger, Jean-Yves Scoazec, Nathalie Gaspar, Antonin Marchais

Motivation: Spatial transcriptomics enables the analysis of cell crosstalk in healthy and diseased organs by capturing the transcriptomic profiles of millions of cells within their spatial contexts. However, spatial transcriptomics approaches also raise new computational challenges for the multidimensional data analysis associated with spatial coordinates.

Results: In this context, we introduce a novel analytical framework called CellsFromSpace based on independent component analysis (ICA), which allows users to analyze various commercially available technologies without relying on a single-cell reference dataset. The ICA approach deployed in CellsFromSpace decomposes spatial transcriptomics data into interpretable components associated with distinct cell types or activities. ICA also enables noise or artifact reduction and subset analysis of cell types of interest through component selection. We demonstrate the flexibility and performance of CellsFromSpace using real-world samples to demonstrate ICA's ability to successfully identify spatially distributed cells as well as rare diffuse cells, and quantitatively deconvolute datasets from the Visium, Slide-seq, MERSCOPE, and CosMX technologies. Comparative analysis with a current alternative reference-free deconvolution tool also highlights CellsFromSpace's speed, scalability and accuracy in processing complex, even multisample datasets. CellsFromSpace also offers a user-friendly graphical interface enabling non-bioinformaticians to annotate and interpret components based on spatial distribution and contributor genes, and perform full downstream analysis.

Availability and implementation: CellsFromSpace (CFS) is distributed as an R package available from github at https://github.com/gustaveroussy/CFS along with tutorials, examples, and detailed documentation.

动机空间转录组学通过捕捉数百万个细胞在其空间环境中的转录组特征,能够分析健康和患病器官中的细胞串扰。然而,空间转录组学方法也对与空间坐标相关的多维数据分析提出了新的计算挑战:在此背景下,我们推出了一种基于独立成分分析(ICA)的新型分析框架 CellsFromSpace,它允许用户在不依赖单细胞参考数据集的情况下分析各种商用技术。CellsFromSpace 中采用的 ICA 方法可将空间转录组学数据分解为与不同细胞类型或活动相关的可解释成分。ICA 还能减少噪音或伪影,并通过选择成分对感兴趣的细胞类型进行子集分析。我们利用真实世界的样本展示了 CellsFromSpace 的灵活性和性能,证明 ICA 能够成功识别空间分布的细胞以及罕见的弥散细胞,并对 Visium、Slide-seq、MERSCOPE 和 CosMX 技术的数据集进行定量解旋。与目前其他无参照解卷积工具的对比分析也凸显了 CellsFromSpace 在处理复杂甚至多样本数据集方面的速度、可扩展性和准确性。CellsFromSpace 还提供用户友好型图形界面,使非生物信息学家也能根据空间分布和贡献基因注释和解释成分,并进行全面的下游分析:CellsFromSpace(CFS)以 R 软件包的形式发布,可在 github 上获取 https://github.com/gustaveroussy/CFS 以及教程、示例和详细文档。
{"title":"CellsFromSpace: a fast, accurate, and reference-free tool to deconvolve and annotate spatially distributed omics data.","authors":"Corentin Thuilliez, Gaël Moquin-Beaudry, Pierre Khneisser, Maria Eugenia Marques Da Costa, Slim Karkar, Hanane Boudhouche, Damien Drubay, Baptiste Audinot, Birgit Geoerger, Jean-Yves Scoazec, Nathalie Gaspar, Antonin Marchais","doi":"10.1093/bioadv/vbae081","DOIUrl":"10.1093/bioadv/vbae081","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial transcriptomics enables the analysis of cell crosstalk in healthy and diseased organs by capturing the transcriptomic profiles of millions of cells within their spatial contexts. However, spatial transcriptomics approaches also raise new computational challenges for the multidimensional data analysis associated with spatial coordinates.</p><p><strong>Results: </strong>In this context, we introduce a novel analytical framework called CellsFromSpace based on independent component analysis (ICA), which allows users to analyze various commercially available technologies without relying on a single-cell reference dataset. The ICA approach deployed in CellsFromSpace decomposes spatial transcriptomics data into interpretable components associated with distinct cell types or activities. ICA also enables noise or artifact reduction and subset analysis of cell types of interest through component selection. We demonstrate the flexibility and performance of CellsFromSpace using real-world samples to demonstrate ICA's ability to successfully identify spatially distributed cells as well as rare diffuse cells, and quantitatively deconvolute datasets from the Visium, Slide-seq, MERSCOPE, and CosMX technologies. Comparative analysis with a current alternative reference-free deconvolution tool also highlights CellsFromSpace's speed, scalability and accuracy in processing complex, even multisample datasets. CellsFromSpace also offers a user-friendly graphical interface enabling non-bioinformaticians to annotate and interpret components based on spatial distribution and contributor genes, and perform full downstream analysis.</p><p><strong>Availability and implementation: </strong>CellsFromSpace (CFS) is distributed as an R package available from github at https://github.com/gustaveroussy/CFS along with tutorials, examples, and detailed documentation.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11194756/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of tumor-specific splicing from somatic mutations as a source of neoantigen candidates. 从体细胞突变中预测肿瘤特异性剪接,作为新抗原候选源。
Pub Date : 2024-05-29 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae080
Franziska Lang, Patrick Sorn, Martin Suchan, Alina Henrich, Christian Albrecht, Nina Köhl, Aline Beicht, Pablo Riesgo-Ferreiro, Christoph Holtsträter, Barbara Schrörs, David Weber, Martin Löwer, Ugur Sahin, Jonas Ibn-Salem

Motivation: Neoantigens are promising targets for cancer immunotherapies and might arise from alternative splicing. However, detecting tumor-specific splicing is challenging because many non-canonical splice junctions identified in tumors also appear in healthy tissues. To increase tumor-specificity, we focused on splicing caused by somatic mutations as a source for neoantigen candidates in individual patients.

Results: We developed the tool splice2neo with multiple functionalities to integrate predicted splice effects from somatic mutations with splice junctions detected in tumor RNA-seq and to annotate the resulting transcript and peptide sequences. Additionally, we provide the tool EasyQuant for targeted RNA-seq read mapping to candidate splice junctions. Using a stringent detection rule, we predicted 1.7 splice junctions per patient as splice targets with a false discovery rate below 5% in a melanoma cohort. We confirmed tumor-specificity using independent, healthy tissue samples. Furthermore, using tumor-derived RNA, we confirmed individual exon-skipping events experimentally. Most target splice junctions encoded neoepitope candidates with predicted major histocompatibility complex (MHC)-I or MHC-II binding. Compared to neoepitope candidates from non-synonymous point mutations, the splicing-derived MHC-I neoepitope candidates had lower self-similarity to corresponding wild-type peptides. In conclusion, we demonstrate that identifying mutation-derived, tumor-specific splice junctions can lead to additional neoantigen candidates to expand the target repertoire for cancer immunotherapies.

Availability and implementation: The R package splice2neo and the python package EasyQuant are available at https://github.com/TRON-Bioinformatics/splice2neo and https://github.com/TRON-Bioinformatics/easyquant, respectively.

动机:新抗原是很有希望的癌症免疫疗法靶点,可能来自于替代剪接。然而,检测肿瘤特异性剪接具有挑战性,因为在肿瘤中发现的许多非经典剪接接头也出现在健康组织中。为了提高肿瘤特异性,我们重点研究了体细胞突变引起的剪接,以此作为个体患者新抗原候选物的来源:我们开发了具有多种功能的工具 splice2neo,将体细胞突变预测的剪接效应与肿瘤 RNA-seq 中检测到的剪接接头整合在一起,并对由此产生的转录本和肽序列进行注释。此外,我们还提供了 EasyQuant 工具,用于将定向 RNA-seq 读数映射到候选剪接接头。利用严格的检测规则,我们在黑色素瘤队列中为每位患者预测了 1.7 个剪接接头作为剪接目标,错误发现率低于 5%。我们使用独立的健康组织样本证实了肿瘤特异性。此外,我们还利用肿瘤衍生 RNA 通过实验证实了单个外显子跳接事件。大多数目标剪接接头编码的新表位候选基因与主要组织相容性复合体(MHC)-I或MHC-II结合。与来自非同义点突变的新表位候选肽相比,剪接衍生的 MHC-I 新表位候选肽与相应野生型肽的自相似性较低。总之,我们证明了识别突变衍生的肿瘤特异性剪接接头可以产生更多的新抗原候选,从而扩大癌症免疫疗法的靶标范围:R软件包splice2neo和python软件包EasyQuant分别可在https://github.com/TRON-Bioinformatics/splice2neo 和https://github.com/TRON-Bioinformatics/easyquant。
{"title":"Prediction of tumor-specific splicing from somatic mutations as a source of neoantigen candidates.","authors":"Franziska Lang, Patrick Sorn, Martin Suchan, Alina Henrich, Christian Albrecht, Nina Köhl, Aline Beicht, Pablo Riesgo-Ferreiro, Christoph Holtsträter, Barbara Schrörs, David Weber, Martin Löwer, Ugur Sahin, Jonas Ibn-Salem","doi":"10.1093/bioadv/vbae080","DOIUrl":"10.1093/bioadv/vbae080","url":null,"abstract":"<p><strong>Motivation: </strong>Neoantigens are promising targets for cancer immunotherapies and might arise from alternative splicing. However, detecting tumor-specific splicing is challenging because many non-canonical splice junctions identified in tumors also appear in healthy tissues. To increase tumor-specificity, we focused on splicing caused by somatic mutations as a source for neoantigen candidates in individual patients.</p><p><strong>Results: </strong>We developed the tool splice2neo with multiple functionalities to integrate predicted splice effects from somatic mutations with splice junctions detected in tumor RNA-seq and to annotate the resulting transcript and peptide sequences. Additionally, we provide the tool EasyQuant for targeted RNA-seq read mapping to candidate splice junctions. Using a stringent detection rule, we predicted 1.7 splice junctions per patient as splice targets with a false discovery rate below 5% in a melanoma cohort. We confirmed tumor-specificity using independent, healthy tissue samples. Furthermore, using tumor-derived RNA, we confirmed individual exon-skipping events experimentally. Most target splice junctions encoded neoepitope candidates with predicted major histocompatibility complex (MHC)-I or MHC-II binding. Compared to neoepitope candidates from non-synonymous point mutations, the splicing-derived MHC-I neoepitope candidates had lower self-similarity to corresponding wild-type peptides. In conclusion, we demonstrate that identifying mutation-derived, tumor-specific splice junctions can lead to additional neoantigen candidates to expand the target repertoire for cancer immunotherapies.</p><p><strong>Availability and implementation: </strong>The R package splice2neo and the python package EasyQuant are available at https://github.com/TRON-Bioinformatics/splice2neo and https://github.com/TRON-Bioinformatics/easyquant, respectively.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11165244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncover spatially informed variations for single-cell spatial transcriptomics with STew. 利用 STew 揭示单细胞空间转录组学的空间信息变化
Pub Date : 2024-05-29 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae064
Nanxi Guo, Juan Vargas, Samantha Reynoso, Douglas Fritz, Revanth Krishna, Chuangqi Wang, Fan Zhang

Motivation: The recent spatial transcriptomics (ST) technologies have enabled characterization of gene expression patterns and spatial information, advancing our understanding of cell lineages within diseased tissues. Several analytical approaches have been proposed for ST data, but effectively utilizing spatial information to unveil the shared variation with gene expression remains a challenge.

Results: We introduce STew, a Spatial Transcriptomic multi-viEW representation learning method, to jointly analyze spatial information and gene expression in a scalable manner, followed by a data-driven statistical framework to measure the goodness of model fit. Through benchmarking using human dorsolateral prefrontal cortex and mouse main olfactory bulb data with true manual annotations, STew achieved superior performance in both clustering accuracy and continuity of identified spatial domains compared with other methods. STew is also robust to generate consistent results insensitive to model parameters, including sparsity constraints. We next applied STew to various ST data acquired from 10× Visium, Slide-seqV2, and 10× Xenium, encompassing single-cell and multi-cellular resolution ST technologies, which revealed spatially informed cell type clusters and biologically meaningful axes. In particular, we identified a proinflammatory fibroblast spatial niche using ST data from psoriatic skins. Moreover, STew scales almost linearly with the number of spatial locations, guaranteeing its applicability to datasets with thousands of spatial locations to capture disease-relevant niches in complex tissues.

Availability and implementation: Source code and the R software tool STew are available from github.com/fanzhanglab/STew.

动机最近的空间转录组学(ST)技术实现了基因表达模式和空间信息的特征描述,推动了我们对病变组织内细胞系的了解。针对 ST 数据提出了几种分析方法,但有效利用空间信息来揭示基因表达的共同变异仍是一项挑战:我们介绍了一种空间转录组多维表征学习方法 STew,它以可扩展的方式联合分析空间信息和基因表达,然后采用数据驱动的统计框架来衡量模型的拟合度。通过使用带有真实人工注释的人类背外侧前额叶皮层和小鼠主嗅球数据进行基准测试,STew 在聚类准确性和已识别空间域的连续性方面都比其他方法表现出色。STew 还具有很强的鲁棒性,能产生对模型参数(包括稀疏性约束)不敏感的一致结果。接下来,我们将 STew 应用于从 10× Visium、Slide-seqV2 和 10× Xenium(包括单细胞和多细胞分辨率 ST 技术)获得的各种 ST 数据,这些数据揭示了具有空间信息的细胞类型群和具有生物学意义的轴。特别是,我们利用银屑病皮肤的 ST 数据确定了促炎性成纤维细胞的空间生态位。此外,STew 与空间位置的数量几乎成线性关系,保证了其适用于具有数千个空间位置的数据集,从而捕捉复杂组织中与疾病相关的壁龛:源代码和R软件工具STew可从github.com/fanzhanglab/STew获取。
{"title":"Uncover spatially informed variations for single-cell spatial transcriptomics with STew.","authors":"Nanxi Guo, Juan Vargas, Samantha Reynoso, Douglas Fritz, Revanth Krishna, Chuangqi Wang, Fan Zhang","doi":"10.1093/bioadv/vbae064","DOIUrl":"10.1093/bioadv/vbae064","url":null,"abstract":"<p><strong>Motivation: </strong>The recent spatial transcriptomics (ST) technologies have enabled characterization of gene expression patterns and spatial information, advancing our understanding of cell lineages within diseased tissues. Several analytical approaches have been proposed for ST data, but effectively utilizing spatial information to unveil the shared variation with gene expression remains a challenge.</p><p><strong>Results: </strong>We introduce STew, a Spatial Transcriptomic multi-viEW representation learning method, to jointly analyze spatial information and gene expression in a scalable manner, followed by a data-driven statistical framework to measure the goodness of model fit. Through benchmarking using human dorsolateral prefrontal cortex and mouse main olfactory bulb data with true manual annotations, STew achieved superior performance in both clustering accuracy and continuity of identified spatial domains compared with other methods. STew is also robust to generate consistent results insensitive to model parameters, including sparsity constraints. We next applied STew to various ST data acquired from 10× Visium, Slide-seqV2, and 10× Xenium, encompassing single-cell and multi-cellular resolution ST technologies, which revealed spatially informed cell type clusters and biologically meaningful axes. In particular, we identified a proinflammatory fibroblast spatial niche using ST data from psoriatic skins. Moreover, STew scales almost linearly with the number of spatial locations, guaranteeing its applicability to datasets with thousands of spatial locations to capture disease-relevant niches in complex tissues.</p><p><strong>Availability and implementation: </strong>Source code and the R software tool STew are available from github.com/fanzhanglab/STew.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11142628/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improve-RRBS: a novel tool to correct the 3’ trimming of reduced representation sequencing reads Improve-RRBS:校正缩减代表测序读数 3' 修剪的新型工具
Pub Date : 2024-05-24 DOI: 10.1093/bioadv/vbae076
Abel Fothi, Hongbo Liu, Katalin Susztak, Tamás Arányi
Reduced Representation Bisulfite Sequencing (RRBS) is a popular approach to determine DNA methylation of the CpG-rich regions of the genome. However, we observed that false positive differentially methylated sites (DMS) are also identified using the standard computational analysis. During RRBS library preparation the MspI digested DNA undergo end-repair by a cytosine at the 3’ end of the fragments. After sequencing, Trim Galore cuts these end-repaired nucleotides. However, Trim Galore fails to detect end-repair when it overlaps with the 3’ end of the sequencing reads. We found that these non-trimmed cytosines bias methylation calling, thus can identify DMS erroneously. To circumvent this problem, we developed improve-RRBS, which efficiently identifies and hides these cytosines from methylation calling with a false positive rate of maximum 0.5%. To test improve-RRBS, we investigated four datasets from four laboratories and two different species. We found non-trimmed 3’ cytosines in all datasets analyzed and as much as > 50% of false positive DMS under certain conditions. By applying improve-RRBS, these DMS completely disappeared from all comparisons. improve-RRBS is a freely available python package https://pypi.org/project/iRRBS/ or https://github.com/fothia/improve-RRBS to be implemented in RRBS pipelines. Supplementary data are available at Bioinformatics Advances online.
还原表征亚硫酸氢盐测序(RRBS)是确定基因组 CpG 富集区 DNA 甲基化的一种常用方法。然而,我们观察到,使用标准计算分析也能发现假阳性差异甲基化位点(DMS)。 在 RRBS 文库制备过程中,MspI 消化的 DNA 会在片段的 3' 端发生胞嘧啶末端修复。测序后,Trim Galore 会切割这些末端修复的核苷酸。然而,当末端修复与测序读数的 3' 端重叠时,Trim Galore 无法检测到。我们发现,这些未修剪的胞嘧啶会影响甲基化调用,从而错误地识别出 DMS。为了规避这个问题,我们开发了 improve-RRBS,它能有效地识别和隐藏甲基化调用中的这些胞嘧啶,假阳性率不超过 0.5%。为了测试 improve-RRBS,我们调查了来自四个实验室和两个不同物种的四个数据集。我们在分析的所有数据集中都发现了未修剪的 3' 胞嘧啶,而且在某些条件下,DMS 的假阳性率高达 50% 以上。通过应用 improve-RRBS,这些 DMS 从所有比较中完全消失了。improve-RRBS 是一个免费提供的 python 软件包 https://pypi.org/project/iRRBS/ 或 https://github.com/fothia/improve-RRBS,可在 RRBS 管道中实现。 补充数据可在 Bioinformatics Advances 在线查阅。
{"title":"Improve-RRBS: a novel tool to correct the 3’ trimming of reduced representation sequencing reads","authors":"Abel Fothi, Hongbo Liu, Katalin Susztak, Tamás Arányi","doi":"10.1093/bioadv/vbae076","DOIUrl":"https://doi.org/10.1093/bioadv/vbae076","url":null,"abstract":"\u0000 \u0000 \u0000 Reduced Representation Bisulfite Sequencing (RRBS) is a popular approach to determine DNA methylation of the CpG-rich regions of the genome. However, we observed that false positive differentially methylated sites (DMS) are also identified using the standard computational analysis.\u0000 \u0000 \u0000 \u0000 During RRBS library preparation the MspI digested DNA undergo end-repair by a cytosine at the 3’ end of the fragments. After sequencing, Trim Galore cuts these end-repaired nucleotides. However, Trim Galore fails to detect end-repair when it overlaps with the 3’ end of the sequencing reads. We found that these non-trimmed cytosines bias methylation calling, thus can identify DMS erroneously. To circumvent this problem, we developed improve-RRBS, which efficiently identifies and hides these cytosines from methylation calling with a false positive rate of maximum 0.5%. To test improve-RRBS, we investigated four datasets from four laboratories and two different species. We found non-trimmed 3’ cytosines in all datasets analyzed and as much as > 50% of false positive DMS under certain conditions. By applying improve-RRBS, these DMS completely disappeared from all comparisons.\u0000 \u0000 \u0000 \u0000 improve-RRBS is a freely available python package https://pypi.org/project/iRRBS/ or https://github.com/fothia/improve-RRBS to be implemented in RRBS pipelines.\u0000 \u0000 \u0000 \u0000 Supplementary data are available at Bioinformatics Advances online.\u0000","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141099680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predmoter-cross-species prediction of plant promoter and enhancer regions. 植物启动子和增强子区域的跨物种预测。
Pub Date : 2024-05-24 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae074
Felicitas Kindel, Sebastian Triesch, Urte Schlüter, Laura Alexandra Randarevitch, Vanessa Reichel-Deland, Andreas P M Weber, Alisandra K Denton

Motivation: Identifying cis-regulatory elements (CREs) is crucial for analyzing gene regulatory networks. Next generation sequencing methods were developed to identify CREs but represent a considerable expenditure for targeted analysis of few genomic loci. Thus, predicting the outputs of these methods would significantly cut costs and time investment.

Results: We present Predmoter, a deep neural network that predicts base-wise Assay for Transposase Accessible Chromatin using sequencing (ATAC-seq) and histone Chromatin immunoprecipitation DNA-sequencing (ChIP-seq) read coverage for plant genomes. Predmoter uses only the DNA sequence as input. We trained our final model on 21 species for 13 of which ATAC-seq data and for 17 of which ChIP-seq data was publicly available. We evaluated our models on Arabidopsis thaliana and Oryza sativa. Our best models showed accurate predictions in peak position and pattern for ATAC- and histone ChIP-seq. Annotating putatively accessible chromatin regions provides valuable input for the identification of CREs. In conjunction with other in silico data, this can significantly reduce the search space for experimentally verifiable DNA-protein interaction pairs.

Availability and implementation: The source code for Predmoter is available at: https://github.com/weberlab-hhu/Predmoter. Predmoter takes a fasta file as input and outputs h5, and optionally bigWig and bedGraph files.

动机识别顺式调控元件(CRE)对于分析基因调控网络至关重要。下一代测序方法是为识别 CREs 而开发的,但要对少数基因组位点进行有针对性的分析,需要相当大的花费。因此,预测这些方法的结果将大大减少成本和时间投入:我们介绍了一种深度神经网络--Predmoter,它能预测植物基因组的转座酶可访问染色质测序(ATAC-seq)和组蛋白染色质免疫沉淀DNA测序(ChIP-seq)读数覆盖率。Predmoter 仅使用 DNA 序列作为输入。我们在 21 个物种上训练了最终模型,其中 13 个物种的 ATAC-seq 数据和 17 个物种的 ChIP-seq 数据已经公开。我们在拟南芥和黑麦草上评估了我们的模型。我们的最佳模型能准确预测 ATAC 和组蛋白 ChIP-seq 的峰位置和模式。标注推测可访问的染色质区域为识别 CREs 提供了有价值的信息。结合其他硅学数据,这可以大大缩小可通过实验验证的 DNA 蛋白相互作用对的搜索空间:Predmoter 的源代码可在以下网址获取:https://github.com/weberlab-hhu/Predmoter。Predmoter 将 fasta 文件作为输入,并输出 h5 文件以及可选的 bigWig 和 bedGraph 文件。
{"title":"Predmoter-cross-species prediction of plant promoter and enhancer regions.","authors":"Felicitas Kindel, Sebastian Triesch, Urte Schlüter, Laura Alexandra Randarevitch, Vanessa Reichel-Deland, Andreas P M Weber, Alisandra K Denton","doi":"10.1093/bioadv/vbae074","DOIUrl":"10.1093/bioadv/vbae074","url":null,"abstract":"<p><strong>Motivation: </strong>Identifying <i>cis</i>-regulatory elements (CREs) is crucial for analyzing gene regulatory networks. Next generation sequencing methods were developed to identify CREs but represent a considerable expenditure for targeted analysis of few genomic loci. Thus, predicting the outputs of these methods would significantly cut costs and time investment.</p><p><strong>Results: </strong>We present Predmoter, a deep neural network that predicts base-wise Assay for Transposase Accessible Chromatin using sequencing (ATAC-seq) and histone Chromatin immunoprecipitation DNA-sequencing (ChIP-seq) read coverage for plant genomes. Predmoter uses only the DNA sequence as input. We trained our final model on 21 species for 13 of which ATAC-seq data and for 17 of which ChIP-seq data was publicly available. We evaluated our models on <i>Arabidopsis thaliana</i> and <i>Oryza sativa</i>. Our best models showed accurate predictions in peak position and pattern for ATAC- and histone ChIP-seq. Annotating putatively accessible chromatin regions provides valuable input for the identification of CREs. In conjunction with other <i>in silico</i> data, this can significantly reduce the search space for experimentally verifiable DNA-protein interaction pairs.</p><p><strong>Availability and implementation: </strong>The source code for Predmoter is available at: https://github.com/weberlab-hhu/Predmoter. Predmoter takes a fasta file as input and outputs h5, and optionally bigWig and bedGraph files.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11150885/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141263386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WGCCRR: a web-based tool for genome-wide screening of convergent indels and substitutions of amino-acids WGCCRR:基于网络的全基因组聚合嵌合和氨基酸替换筛选工具
Pub Date : 2024-05-24 DOI: 10.1093/bioadv/vbae070
Zheng Dong, Chen Wang, Qingming Qu
Genome-wide analyses of protein-coding gene sequences are being employed to examine the genetic basis of adaptive evolution in many organismal groups. Previous studies have revealed that convergent/parallel adaptive evolution may be caused by convergent/parallel amino acid changes. Similarly, detailed analysis of lineage-specific amino acid changes has shown correlations with certain lineage-specific traits. However, experimental validation remains the ultimate measure of causality. With the increasing availability of genomic data, a streamlined tool for such analyses would facilitate and expedite the screening of genetic loci that hold potential for adaptive evolution, while alleviating the bioinformatic burden for experimental biologists. In this study, we present a user-friendly web-based tool called WGCCRR (Whole Genome Comparative Coding Region Read) designed to screen both convergent/parallel and lineage-specific amino acid changes on a genome-wide scale. Our tool allows users to replicate previous analyses with just a few clicks, and the exported results are straightforward to interpret. In addition, we have also included amino acid indels that are usually neglected in previous work. Our website provides an efficient platform for screening candidate loci for downstream experimental tests. It is available at: https://fishevo.xmu.edu.cn/.
蛋白质编码基因序列的全基因组分析被用来研究许多生物群体适应性进化的遗传基础。以往的研究表明,趋同/平行适应性进化可能是由趋同/平行氨基酸变化引起的。同样,对特定品系氨基酸变化的详细分析也显示出与某些特定品系性状的相关性。然而,实验验证仍然是衡量因果关系的最终标准。随着基因组数据可用性的不断提高,用于此类分析的简化工具将促进并加快筛选具有适应性进化潜力的基因位点,同时减轻实验生物学家的生物信息学负担。在本研究中,我们介绍了一种基于网络的用户友好型工具,名为 WGCCRR(全基因组编码区比较读取),旨在全基因组范围内筛选会聚/平行和品系特异性氨基酸变化。我们的工具让用户只需点击几下就能复制以前的分析,导出的结果也易于解释。此外,我们还加入了在以往工作中通常被忽略的氨基酸嵌合体。我们的网站为下游实验测试筛选候选基因位点提供了一个高效的平台。网址:https://fishevo.xmu.edu.cn/。
{"title":"WGCCRR: a web-based tool for genome-wide screening of convergent indels and substitutions of amino-acids","authors":"Zheng Dong, Chen Wang, Qingming Qu","doi":"10.1093/bioadv/vbae070","DOIUrl":"https://doi.org/10.1093/bioadv/vbae070","url":null,"abstract":"\u0000 Genome-wide analyses of protein-coding gene sequences are being employed to examine the genetic basis of adaptive evolution in many organismal groups. Previous studies have revealed that convergent/parallel adaptive evolution may be caused by convergent/parallel amino acid changes. Similarly, detailed analysis of lineage-specific amino acid changes has shown correlations with certain lineage-specific traits. However, experimental validation remains the ultimate measure of causality. With the increasing availability of genomic data, a streamlined tool for such analyses would facilitate and expedite the screening of genetic loci that hold potential for adaptive evolution, while alleviating the bioinformatic burden for experimental biologists. In this study, we present a user-friendly web-based tool called WGCCRR (Whole Genome Comparative Coding Region Read) designed to screen both convergent/parallel and lineage-specific amino acid changes on a genome-wide scale. Our tool allows users to replicate previous analyses with just a few clicks, and the exported results are straightforward to interpret. In addition, we have also included amino acid indels that are usually neglected in previous work. Our website provides an efficient platform for screening candidate loci for downstream experimental tests. It is available at: https://fishevo.xmu.edu.cn/.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141101376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics advances
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1