首页 > 最新文献

Bioinformatics最新文献

英文 中文
MoleculeExperiment enables consistent infrastructure for molecule-resolved spatial omics data in bioconductor. MoleculeExperiment为生物导管中分子解析的空间组学数据提供了一致的基础设施。
IF 4.4 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad550
Bárbara Zita Peters Couto, Nicholas Robertson, Ellis Patrick, Shila Ghazanfar

Motivation: Imaging-based spatial transcriptomics (ST) technologies have achieved subcellular resolution, enabling detection of individual molecules in their native tissue context. Data associated with these technologies promise unprecedented opportunity toward understanding cellular and subcellular biology. However, in R/Bioconductor, there is a scarcity of existing computational infrastructure to represent such data, and particularly to summarize and transform it for existing widely adopted computational tools in single-cell transcriptomics analysis, including SingleCellExperiment and SpatialExperiment (SPE) classes. With the emergence of several commercial offerings of imaging-based ST, there is a pressing need to develop consistent data structure standards for these technologies at the individual molecule-level.

Results: To this end, we have developed MoleculeExperiment, an R/Bioconductor package, which (i) stores molecule and cell segmentation boundary information at the molecule-level, (ii) standardizes this molecule-level information across different imaging-based ST technologies, including 10× Genomics' Xenium, and (iii) streamlines transition from a MoleculeExperiment object to a SpatialExperiment object. Overall, MoleculeExperiment is generally applicable as a data infrastructure class for consistent analysis of molecule-resolved spatial omics data.

Availability and implementation: The MoleculeExperiment package is publicly available on Bioconductor at https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html. Source code is available on Github at: https://github.com/SydneyBioX/MoleculeExperiment. The vignette for MoleculeExperiment can be found at https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html.

动机:基于成像的空间转录组学(ST)技术已经实现了亚细胞分辨率,能够在其天然组织环境中检测单个分子。与这些技术相关的数据为理解细胞和亚细胞生物学提供了前所未有的机会。然而,在R/Bioconductor中,缺乏现有的计算基础设施来表示这些数据,特别是为单细胞转录组学分析中广泛采用的现有计算工具总结和转换这些数据,包括单细胞实验和空间实验(SPE)类。随着基于成像的ST的几种商业产品的出现,迫切需要在单个分子水平上为这些技术开发一致的数据结构标准。结果:为此,我们开发了MoleculeExperiment,一种R/生物导体包,它(i)在分子水平上存储分子和细胞分割边界信息,(ii)在不同的基于成像的ST技术(包括10×Genomics的Xenium)中标准化这种分子水平的信息,以及(iii)简化从分子实验对象到空间实验对象的转换。总体而言,MoleculeExperiment通常适用于作为一个数据基础设施类,用于对分子解析的空间组学数据进行一致分析。可用性和实施:MoleculeExperiment软件包可在Bioconductor上公开获取,网址为https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html.源代码可在Github上获得,网址为:https://github.com/SydneyBioX/MoleculeExperiment.MoleculeExperiment的小插曲可以在https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html.
{"title":"MoleculeExperiment enables consistent infrastructure for molecule-resolved spatial omics data in bioconductor.","authors":"Bárbara Zita Peters Couto, Nicholas Robertson, Ellis Patrick, Shila Ghazanfar","doi":"10.1093/bioinformatics/btad550","DOIUrl":"10.1093/bioinformatics/btad550","url":null,"abstract":"<p><strong>Motivation: </strong>Imaging-based spatial transcriptomics (ST) technologies have achieved subcellular resolution, enabling detection of individual molecules in their native tissue context. Data associated with these technologies promise unprecedented opportunity toward understanding cellular and subcellular biology. However, in R/Bioconductor, there is a scarcity of existing computational infrastructure to represent such data, and particularly to summarize and transform it for existing widely adopted computational tools in single-cell transcriptomics analysis, including SingleCellExperiment and SpatialExperiment (SPE) classes. With the emergence of several commercial offerings of imaging-based ST, there is a pressing need to develop consistent data structure standards for these technologies at the individual molecule-level.</p><p><strong>Results: </strong>To this end, we have developed MoleculeExperiment, an R/Bioconductor package, which (i) stores molecule and cell segmentation boundary information at the molecule-level, (ii) standardizes this molecule-level information across different imaging-based ST technologies, including 10× Genomics' Xenium, and (iii) streamlines transition from a MoleculeExperiment object to a SpatialExperiment object. Overall, MoleculeExperiment is generally applicable as a data infrastructure class for consistent analysis of molecule-resolved spatial omics data.</p><p><strong>Availability and implementation: </strong>The MoleculeExperiment package is publicly available on Bioconductor at https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html. Source code is available on Github at: https://github.com/SydneyBioX/MoleculeExperiment. The vignette for MoleculeExperiment can be found at https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 9","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10504467/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10307715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AliSim-HPC: parallel sequence simulator for phylogenetics. AliSim HPC:系统发育学的并行序列模拟器。
IF 4.4 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad540
Nhan Ly-Trong, Giuseppe M J Barca, Bui Quang Minh

Motivation: Sequence simulation plays a vital role in phylogenetics with many applications, such as evaluating phylogenetic methods, testing hypotheses, and generating training data for machine-learning applications. We recently introduced a new simulator for multiple sequence alignments called AliSim, which outperformed existing tools. However, with the increasing demands of simulating large data sets, AliSim is still slow due to its sequential implementation; for example, to simulate millions of sequence alignments, AliSim took several days or weeks. Parallelization has been used for many phylogenetic inference methods but not yet for sequence simulation.

Results: This paper introduces AliSim-HPC, which, for the first time, employs high-performance computing for phylogenetic simulations. AliSim-HPC parallelizes the simulation process at both multi-core and multi-CPU levels using the OpenMP and message passing interface (MPI) libraries, respectively. AliSim-HPC is highly efficient and scalable, which reduces the runtime to simulate 100 large gap-free alignments (30 000 sequences of one million sites) from over one day to 11 min using 256 CPU cores from a cluster with six computing nodes, a 153-fold speedup. While the OpenMP version can only simulate gap-free alignments, the MPI version supports insertion-deletion models like the sequential AliSim.

Availability and implementation: AliSim-HPC is open-source and available as part of the new IQ-TREE version v2.2.3 at https://github.com/iqtree/iqtree2/releases with a user manual at http://www.iqtree.org/doc/AliSim.

动机:序列模拟在系统发育学中发挥着至关重要的作用,具有许多应用,如评估系统发育方法、测试假设和生成机器学习应用的训练数据。我们最近推出了一种新的多序列比对模拟器AliSim,其性能优于现有工具。然而,随着模拟大数据集的需求不断增加,AliSim由于其顺序实现而仍然缓慢;例如,为了模拟数百万个序列比对,AliSim花了几天或几周的时间。并行化已被用于许多系统发育推断方法,但尚未用于序列模拟。结果:本文介绍了AliSim HPC,它首次将高性能计算用于系统发育模拟。AliSimHPC分别使用OpenMP和消息传递接口(MPI)库在多核和多CPU级别并行化模拟过程。AliSim HPC是高效和可扩展的,它减少了模拟100个大间隙无对齐的运行时间(30 000个序列的一百万个位点)从一天到11天 最小使用来自具有六个计算节点的集群的256个CPU核,速度提高了153倍。虽然OpenMP版本只能模拟无间隙对齐,但MPI版本支持插入-删除模型,如顺序AliSim。可用性和实现:AliSim HPC是开源的,可作为新IQ-TREE v2.2.3版本的一部分在https://github.com/iqtree/iqtree2/releases用户手册位于http://www.iqtree.org/doc/AliSim.
{"title":"AliSim-HPC: parallel sequence simulator for phylogenetics.","authors":"Nhan Ly-Trong, Giuseppe M J Barca, Bui Quang Minh","doi":"10.1093/bioinformatics/btad540","DOIUrl":"10.1093/bioinformatics/btad540","url":null,"abstract":"<p><strong>Motivation: </strong>Sequence simulation plays a vital role in phylogenetics with many applications, such as evaluating phylogenetic methods, testing hypotheses, and generating training data for machine-learning applications. We recently introduced a new simulator for multiple sequence alignments called AliSim, which outperformed existing tools. However, with the increasing demands of simulating large data sets, AliSim is still slow due to its sequential implementation; for example, to simulate millions of sequence alignments, AliSim took several days or weeks. Parallelization has been used for many phylogenetic inference methods but not yet for sequence simulation.</p><p><strong>Results: </strong>This paper introduces AliSim-HPC, which, for the first time, employs high-performance computing for phylogenetic simulations. AliSim-HPC parallelizes the simulation process at both multi-core and multi-CPU levels using the OpenMP and message passing interface (MPI) libraries, respectively. AliSim-HPC is highly efficient and scalable, which reduces the runtime to simulate 100 large gap-free alignments (30 000 sequences of one million sites) from over one day to 11 min using 256 CPU cores from a cluster with six computing nodes, a 153-fold speedup. While the OpenMP version can only simulate gap-free alignments, the MPI version supports insertion-deletion models like the sequential AliSim.</p><p><strong>Availability and implementation: </strong>AliSim-HPC is open-source and available as part of the new IQ-TREE version v2.2.3 at https://github.com/iqtree/iqtree2/releases with a user manual at http://www.iqtree.org/doc/AliSim.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":" ","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10534053/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10491910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal selection of suitable templates in protein interface prediction. 在蛋白质界面预测中优化选择合适的模板。
IF 4.4 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad510
Steven Grudman, J Eduardo Fajardo, Andras Fiser

Motivation: Molecular-level classification of protein-protein interfaces can greatly assist in functional characterization and rational drug design. The most accurate protein interface predictions rely on finding homologous proteins with known interfaces since most interfaces are conserved within the same protein family. The accuracy of these template-based prediction approaches depends on the correct choice of suitable templates. Choosing the right templates in the immunoglobulin superfamily (IgSF) is challenging because its members share low sequence identity and display a wide range of alternative binding sites despite structural homology.

Results: We present a new approach to predict protein interfaces. First, template-specific, informative evolutionary profiles are established using a mutual information-based approach. Next, based on the similarity of residue level conservation scores derived from the evolutionary profiles, a query protein is hierarchically clustered with all available template proteins in its superfamily with known interface definitions. Once clustered, a subset of the most closely related templates is selected, and an interface prediction is made. These initial interface predictions are subsequently refined by extensive docking. This method was benchmarked on 51 IgSF proteins and can predict nontrivial interfaces of IgSF proteins with an average and median F-score of 0.64 and 0.78, respectively. We also provide a way to assess the confidence of the results. The average and median F-scores increase to 0.8 and 0.81, respectively, if 27% of low confidence cases and 17% of medium confidence cases are removed. Lastly, we provide residue level interface predictions, protein complexes, and confidence measurements for singletons in the IgSF.

Availability and implementation: Source code is freely available at: https://gitlab.com/fiserlab.org/interdct_with_refinement.

动机:蛋白质-蛋白质界面的分子级分类对功能表征和合理药物设计大有帮助。最准确的蛋白质界面预测依赖于寻找具有已知界面的同源蛋白质,因为大多数界面在同一蛋白质家族中是保守的。这些基于模板的预测方法的准确性取决于正确选择合适的模板。在免疫球蛋白超家族(IgSF)中选择合适的模板具有挑战性,因为其成员的序列同一性很低,而且尽管结构同源,但却显示出广泛的替代结合位点:结果:我们提出了一种预测蛋白质界面的新方法。结果:我们提出了预测蛋白质界面的新方法。首先,利用基于互信息的方法建立了特定模板的信息进化曲线。接下来,根据从进化图谱中得出的残基水平保护得分的相似性,将查询蛋白质与其超家族中已知界面定义的所有可用模板蛋白质进行分层聚类。聚类完成后,选择一个关系最密切的模板子集,并进行界面预测。随后通过广泛的对接来完善这些初步的界面预测。该方法在 51 个 IgSF 蛋白上进行了基准测试,可以预测 IgSF 蛋白的非复杂界面,平均 F 分数和中位数分别为 0.64 和 0.78。我们还提供了一种评估结果置信度的方法。如果去除 27% 的低置信度案例和 17% 的中等置信度案例,平均 F score 和中位 F score 分别增至 0.8 和 0.81。最后,我们还提供了残基级界面预测、蛋白质复合物以及 IgSF 中单体的置信度测量:源代码可在 https://gitlab.com/fiserlab.org/interdct_with_refinement 免费获取。
{"title":"Optimal selection of suitable templates in protein interface prediction.","authors":"Steven Grudman, J Eduardo Fajardo, Andras Fiser","doi":"10.1093/bioinformatics/btad510","DOIUrl":"10.1093/bioinformatics/btad510","url":null,"abstract":"<p><strong>Motivation: </strong>Molecular-level classification of protein-protein interfaces can greatly assist in functional characterization and rational drug design. The most accurate protein interface predictions rely on finding homologous proteins with known interfaces since most interfaces are conserved within the same protein family. The accuracy of these template-based prediction approaches depends on the correct choice of suitable templates. Choosing the right templates in the immunoglobulin superfamily (IgSF) is challenging because its members share low sequence identity and display a wide range of alternative binding sites despite structural homology.</p><p><strong>Results: </strong>We present a new approach to predict protein interfaces. First, template-specific, informative evolutionary profiles are established using a mutual information-based approach. Next, based on the similarity of residue level conservation scores derived from the evolutionary profiles, a query protein is hierarchically clustered with all available template proteins in its superfamily with known interface definitions. Once clustered, a subset of the most closely related templates is selected, and an interface prediction is made. These initial interface predictions are subsequently refined by extensive docking. This method was benchmarked on 51 IgSF proteins and can predict nontrivial interfaces of IgSF proteins with an average and median F-score of 0.64 and 0.78, respectively. We also provide a way to assess the confidence of the results. The average and median F-scores increase to 0.8 and 0.81, respectively, if 27% of low confidence cases and 17% of medium confidence cases are removed. Lastly, we provide residue level interface predictions, protein complexes, and confidence measurements for singletons in the IgSF.</p><p><strong>Availability and implementation: </strong>Source code is freely available at: https://gitlab.com/fiserlab.org/interdct_with_refinement.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 9","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10491951/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10335292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MULGA, a unified multi-view graph autoencoder-based approach for identifying drug-protein interaction and drug repositioning. MULGA,一种基于多视图图自动编码器的统一方法,用于识别药物-蛋白质相互作用和药物重新定位。
IF 5.8 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad524
Jiani Ma, Chen Li, Yiwen Zhang, Zhikang Wang, Shanshan Li, Yuming Guo, Lin Zhang, Hui Liu, Xin Gao, Jiangning Song

Motivation: Identifying drug-protein interactions (DPIs) is a critical step in drug repositioning, which allows reuse of approved drugs that may be effective for treating a different disease and thereby alleviates the challenges of new drug development. Despite the fact that a great variety of computational approaches for DPI prediction have been proposed, key challenges, such as extendable and unbiased similarity calculation, heterogeneous information utilization, and reliable negative sample selection, remain to be addressed.

Results: To address these issues, we propose a novel, unified multi-view graph autoencoder framework, termed MULGA, for both DPI and drug repositioning predictions. MULGA is featured by: (i) a multi-view learning technique to effectively learn authentic drug affinity and target affinity matrices; (ii) a graph autoencoder to infer missing DPI interactions; and (iii) a new "guilty-by-association"-based negative sampling approach for selecting highly reliable non-DPIs. Benchmark experiments demonstrate that MULGA outperforms state-of-the-art methods in DPI prediction and the ablation studies verify the effectiveness of each proposed component. Importantly, we highlight the top drugs shortlisted by MULGA that target the spike glycoprotein of severe acute respiratory syndrome coronavirus 2 (SAR-CoV-2), offering additional insights into and potentially useful treatment option for COVID-19. Together with the availability of datasets and source codes, we envision that MULGA can be explored as a useful tool for DPI prediction and drug repositioning.

Availability and implementation: MULGA is publicly available for academic purposes at https://github.com/jianiM/MULGA/.

动机:识别药物-蛋白质相互作用(DPI)是药物重新定位的关键一步,这允许重复使用可能对治疗不同疾病有效的获批药物,从而缓解新药开发的挑战。尽管已经提出了多种DPI预测的计算方法,但关键挑战,如可扩展和无偏的相似性计算、异构信息利用和可靠的负样本选择,仍有待解决。结果:为了解决这些问题,我们提出了一种新的、统一的多视图图自动编码器框架,称为MULGA,用于DPI和药物重新定位预测。MULGA的特点是:(i)一种多视角学习技术,可以有效地学习真实的药物亲和力和靶点亲和力矩阵;(ii)图自动编码器,用于推断缺失的DPI交互;以及(iii)一种新的基于“关联有罪”的负采样方法,用于选择高度可靠的非DPI。基准实验表明,MULGA在DPI预测方面优于最先进的方法,消融研究验证了每个拟议组件的有效性。重要的是,我们重点介绍了MULGA入围的针对严重急性呼吸综合征冠状病毒2(SAR-CoV-2)刺突糖蛋白的顶级药物,为新冠肺炎的治疗提供了更多见解和潜在的有用选择。结合数据集和源代码的可用性,我们设想MULGA可以作为DPI预测和药物重新定位的有用工具进行探索。可用性和实施:MULGA可在https://github.com/jianiM/MULGA/.
{"title":"MULGA, a unified multi-view graph autoencoder-based approach for identifying drug-protein interaction and drug repositioning.","authors":"Jiani Ma,&nbsp;Chen Li,&nbsp;Yiwen Zhang,&nbsp;Zhikang Wang,&nbsp;Shanshan Li,&nbsp;Yuming Guo,&nbsp;Lin Zhang,&nbsp;Hui Liu,&nbsp;Xin Gao,&nbsp;Jiangning Song","doi":"10.1093/bioinformatics/btad524","DOIUrl":"10.1093/bioinformatics/btad524","url":null,"abstract":"<p><strong>Motivation: </strong>Identifying drug-protein interactions (DPIs) is a critical step in drug repositioning, which allows reuse of approved drugs that may be effective for treating a different disease and thereby alleviates the challenges of new drug development. Despite the fact that a great variety of computational approaches for DPI prediction have been proposed, key challenges, such as extendable and unbiased similarity calculation, heterogeneous information utilization, and reliable negative sample selection, remain to be addressed.</p><p><strong>Results: </strong>To address these issues, we propose a novel, unified multi-view graph autoencoder framework, termed MULGA, for both DPI and drug repositioning predictions. MULGA is featured by: (i) a multi-view learning technique to effectively learn authentic drug affinity and target affinity matrices; (ii) a graph autoencoder to infer missing DPI interactions; and (iii) a new \"guilty-by-association\"-based negative sampling approach for selecting highly reliable non-DPIs. Benchmark experiments demonstrate that MULGA outperforms state-of-the-art methods in DPI prediction and the ablation studies verify the effectiveness of each proposed component. Importantly, we highlight the top drugs shortlisted by MULGA that target the spike glycoprotein of severe acute respiratory syndrome coronavirus 2 (SAR-CoV-2), offering additional insights into and potentially useful treatment option for COVID-19. Together with the availability of datasets and source codes, we envision that MULGA can be explored as a useful tool for DPI prediction and drug repositioning.</p><p><strong>Availability and implementation: </strong>MULGA is publicly available for academic purposes at https://github.com/jianiM/MULGA/.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":" ","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10518077/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10049260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iDeLUCS: a deep learning interactive tool for alignment-free clustering of DNA sequences. iducus:用于DNA序列无比对聚类的深度学习交互式工具。
IF 5.8 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad508
Pablo Millan Arias, Kathleen A Hill, Lila Kari

Summary: We present an interactive Deep Learning-based software tool for Unsupervised Clustering of DNA Sequences (iDeLUCS), that detects genomic signatures and uses them to cluster DNA sequences, without the need for sequence alignment or taxonomic identifiers. iDeLUCS is scalable and user-friendly: its graphical user interface, with support for hardware acceleration, allows the practitioner to fine-tune the different hyper-parameters involved in the training process without requiring extensive knowledge of deep learning. The performance of iDeLUCS was evaluated on a diverse set of datasets: several real genomic datasets from organisms in kingdoms Animalia, Protista, Fungi, Bacteria, and Archaea, three datasets of viral genomes, a dataset of simulated metagenomic reads from microbial genomes, and multiple datasets of synthetic DNA sequences. The performance of iDeLUCS was compared to that of two classical clustering algorithms (k-means++ and GMM) and two clustering algorithms specialized in DNA sequences (MeShClust v3.0 and DeLUCS), using both intrinsic cluster evaluation metrics and external evaluation metrics. In terms of unsupervised clustering accuracy, iDeLUCS outperforms the two classical algorithms by an average of ∼20%, and the two specialized algorithms by an average of ∼12%, on the datasets of real DNA sequences analyzed. Overall, our results indicate that iDeLUCS is a robust clustering method suitable for the clustering of large and diverse datasets of unlabeled DNA sequences.

Availability and implementation: iDeLUCS is available at https://github.com/Kari-Genomics-Lab/iDeLUCS under the terms of the MIT licence.

摘要:我们提出了一个交互式的基于深度学习的软件工具,用于DNA序列的无监督聚类(iDeLUCS),它检测基因组特征并使用它们对DNA序列进行聚类,而不需要序列比对或分类标识符。iducus具有可扩展性和用户友好性:其图形用户界面支持硬件加速,允许从业者微调训练过程中涉及的不同超参数,而无需广泛的深度学习知识。ideus的性能在不同的数据集上进行了评估:来自动物、原生生物、真菌、细菌和古细菌等生物领域的几个真实基因组数据集,三个病毒基因组数据集,一个模拟微生物基因组的宏基因组读取数据集,以及多个合成DNA序列数据集。利用内部聚类评价指标和外部聚类评价指标,将iDeLUCS与两种经典聚类算法(k- meme++和GMM)和两种DNA序列专用聚类算法(MeShClust v3.0和DeLUCS)的性能进行了比较。在无监督聚类精度方面,iducus在分析的真实DNA序列数据集上比两种经典算法平均高出约20%,比两种专用算法平均高出约12%。总体而言,我们的结果表明,iducus是一种鲁棒的聚类方法,适用于大型和多样化的未标记DNA序列数据集的聚类。可用性和实现:iducus在MIT许可条款下可在https://github.com/Kari-Genomics-Lab/iDeLUCS获得。
{"title":"iDeLUCS: a deep learning interactive tool for alignment-free clustering of DNA sequences.","authors":"Pablo Millan Arias,&nbsp;Kathleen A Hill,&nbsp;Lila Kari","doi":"10.1093/bioinformatics/btad508","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad508","url":null,"abstract":"<p><strong>Summary: </strong>We present an interactive Deep Learning-based software tool for Unsupervised Clustering of DNA Sequences (iDeLUCS), that detects genomic signatures and uses them to cluster DNA sequences, without the need for sequence alignment or taxonomic identifiers. iDeLUCS is scalable and user-friendly: its graphical user interface, with support for hardware acceleration, allows the practitioner to fine-tune the different hyper-parameters involved in the training process without requiring extensive knowledge of deep learning. The performance of iDeLUCS was evaluated on a diverse set of datasets: several real genomic datasets from organisms in kingdoms Animalia, Protista, Fungi, Bacteria, and Archaea, three datasets of viral genomes, a dataset of simulated metagenomic reads from microbial genomes, and multiple datasets of synthetic DNA sequences. The performance of iDeLUCS was compared to that of two classical clustering algorithms (k-means++ and GMM) and two clustering algorithms specialized in DNA sequences (MeShClust v3.0 and DeLUCS), using both intrinsic cluster evaluation metrics and external evaluation metrics. In terms of unsupervised clustering accuracy, iDeLUCS outperforms the two classical algorithms by an average of ∼20%, and the two specialized algorithms by an average of ∼12%, on the datasets of real DNA sequences analyzed. Overall, our results indicate that iDeLUCS is a robust clustering method suitable for the clustering of large and diverse datasets of unlabeled DNA sequences.</p><p><strong>Availability and implementation: </strong>iDeLUCS is available at https://github.com/Kari-Genomics-Lab/iDeLUCS under the terms of the MIT licence.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 9","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10483029/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10281965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cell-connectivity-guided trajectory inference from single-cell data. 基于单细胞数据的细胞连接引导轨迹推断。
IF 5.8 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad515
Johannes Smolander, Sini Junttila, Laura L Elo

Motivation: Single-cell RNA-sequencing enables cell-level investigation of cell differentiation, which can be modelled using trajectory inference methods. While tremendous effort has been put into designing these methods, inferring accurate trajectories automatically remains difficult. Therefore, the standard approach involves testing different trajectory inference methods and picking the trajectory giving the most biologically sensible model. As the default parameters are often suboptimal, their tuning requires methodological expertise.

Results: We introduce Totem, an open-source, easy-to-use R package designed to facilitate inference of tree-shaped trajectories from single-cell data. Totem generates a large number of clustering results, estimates their topologies as minimum spanning trees, and uses them to measure the connectivity of the cells. Besides automatic selection of an appropriate trajectory, cell connectivity enables to visually pinpoint branching points and milestones relevant to the trajectory. Furthermore, testing different trajectories with Totem is fast, easy, and does not require in-depth methodological knowledge.

Availability and implementation: Totem is available as an R package at https://github.com/elolab/Totem.

动机:单细胞rna测序能够在细胞水平上研究细胞分化,这可以使用轨迹推断方法进行建模。虽然在设计这些方法方面已经付出了巨大的努力,但自动推断准确的轨迹仍然很困难。因此,标准方法包括测试不同的轨迹推理方法,并选择给出最具生物学意义的模型的轨迹。由于默认参数通常不是最优的,因此它们的调优需要方法方面的专业知识。结果:我们介绍了Totem,这是一个开源的,易于使用的R包,旨在促进从单细胞数据推断树状轨迹。Totem生成大量的聚类结果,将它们的拓扑估计为最小生成树,并使用它们来度量单元的连通性。除了自动选择适当的轨迹外,细胞连接还可以直观地确定与轨迹相关的分支点和里程碑。此外,用Totem测试不同的轨迹是快速、容易的,并且不需要深入的方法论知识。可用性和实现:Totem作为R包可在https://github.com/elolab/Totem获得。
{"title":"Cell-connectivity-guided trajectory inference from single-cell data.","authors":"Johannes Smolander,&nbsp;Sini Junttila,&nbsp;Laura L Elo","doi":"10.1093/bioinformatics/btad515","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad515","url":null,"abstract":"<p><strong>Motivation: </strong>Single-cell RNA-sequencing enables cell-level investigation of cell differentiation, which can be modelled using trajectory inference methods. While tremendous effort has been put into designing these methods, inferring accurate trajectories automatically remains difficult. Therefore, the standard approach involves testing different trajectory inference methods and picking the trajectory giving the most biologically sensible model. As the default parameters are often suboptimal, their tuning requires methodological expertise.</p><p><strong>Results: </strong>We introduce Totem, an open-source, easy-to-use R package designed to facilitate inference of tree-shaped trajectories from single-cell data. Totem generates a large number of clustering results, estimates their topologies as minimum spanning trees, and uses them to measure the connectivity of the cells. Besides automatic selection of an appropriate trajectory, cell connectivity enables to visually pinpoint branching points and milestones relevant to the trajectory. Furthermore, testing different trajectories with Totem is fast, easy, and does not require in-depth methodological knowledge.</p><p><strong>Availability and implementation: </strong>Totem is available as an R package at https://github.com/elolab/Totem.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 9","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10474950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10335308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PyDESeq2: a python package for bulk RNA-seq differential expression analysis. PyDESeq2:用于批量RNA-seq差异表达分析的python包。
IF 5.8 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad547
Boris Muzellec, Maria Teleńczuk, Vincent Cabeli, Mathieu Andreux

Summary: We present PyDESeq2, a python implementation of the DESeq2 workflow for differential expression analysis on bulk RNA-seq data. This re-implementation yields similar, but not identical, results: it achieves higher model likelihood, allows speed improvements on large datasets, as shown in experiments on TCGA data, and can be more easily interfaced with modern python-based data science tools.

Availability and implementation: PyDESeq2 is released as an open-source software under the MIT license. The source code is available on GitHub at https://github.com/owkin/PyDESeq2 and documented at https://pydeseq2.readthedocs.io. PyDESeq2 is part of the scverse ecosystem.

摘要:我们提出PyDESeq2,一个python实现的DESeq2工作流,用于对大量RNA-seq数据进行差异表达分析。这种重新实现产生了类似但不相同的结果:它实现了更高的模型可能性,允许在大型数据集上提高速度,如在TCGA数据上的实验所示,并且可以更容易地与现代基于python的数据科学工具接口。可用性和实现:PyDESeq2在MIT许可下作为开源软件发布。源代码可在GitHub上获得https://github.com/owkin/PyDESeq2,文档在https://pydeseq2.readthedocs.io。PyDESeq2是逆向生态系统的一部分。
{"title":"PyDESeq2: a python package for bulk RNA-seq differential expression analysis.","authors":"Boris Muzellec,&nbsp;Maria Teleńczuk,&nbsp;Vincent Cabeli,&nbsp;Mathieu Andreux","doi":"10.1093/bioinformatics/btad547","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad547","url":null,"abstract":"<p><strong>Summary: </strong>We present PyDESeq2, a python implementation of the DESeq2 workflow for differential expression analysis on bulk RNA-seq data. This re-implementation yields similar, but not identical, results: it achieves higher model likelihood, allows speed improvements on large datasets, as shown in experiments on TCGA data, and can be more easily interfaced with modern python-based data science tools.</p><p><strong>Availability and implementation: </strong>PyDESeq2 is released as an open-source software under the MIT license. The source code is available on GitHub at https://github.com/owkin/PyDESeq2 and documented at https://pydeseq2.readthedocs.io. PyDESeq2 is part of the scverse ecosystem.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 9","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10502239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10631512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Multimodal learning of noncoding variant effects using genome sequence and chromatin structure. 利用基因组序列和染色质结构研究非编码变异效应的多模态学习。
IF 5.8 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad541
Wuwei Tan, Yang Shen

Motivation: A growing amount of noncoding genetic variants, including single-nucleotide polymorphisms, are found to be associated with complex human traits and diseases. Their mechanistic interpretation is relatively limited and can use the help from computational prediction of their effects on epigenetic profiles. However, current models often focus on local, 1D genome sequence determinants and disregard global, 3D chromatin structure that critically affects epigenetic events.

Results: We find that noncoding variants of unexpected high similarity in epigenetic profiles, with regards to their relatively low similarity in local sequences, can be largely attributed to their proximity in chromatin structure. Accordingly, we have developed a multimodal deep learning scheme that incorporates both data of 1D genome sequence and 3D chromatin structure for predicting noncoding variant effects. Specifically, we have integrated convolutional and recurrent neural networks for sequence embedding and graph neural networks for structure embedding despite the resolution gap between the two types of data, while utilizing recent DNA language models. Numerical results show that our models outperform competing sequence-only models in predicting epigenetic profiles and their use of long-range interactions complement sequence-only models in extracting regulatory motifs. They prove to be excellent predictors for noncoding variant effects in gene expression and pathogenicity, whether in unsupervised "zero-shot" learning or supervised "few-shot" learning.

Availability and implementation: Codes and data can be accessed at https://github.com/Shen-Lab/ncVarPred-1D3D and https://zenodo.org/record/7975777.

动机:越来越多的非编码基因变异,包括单核苷酸多态性,被发现与复杂的人类特征和疾病有关。它们的机制解释是相对有限的,可以利用计算预测它们对表观遗传谱的影响。然而,目前的模型往往侧重于局部的1D基因组序列决定因素,而忽略了对表观遗传事件有重要影响的全局的3D染色质结构。结果:我们发现,在表观遗传谱中具有意想不到的高相似性的非编码变异,在局部序列中具有相对较低的相似性,这在很大程度上可归因于它们在染色质结构上的接近性。因此,我们开发了一种多模态深度学习方案,该方案结合了1D基因组序列和3D染色质结构数据,用于预测非编码变异效应。具体来说,我们利用最新的DNA语言模型,将卷积和循环神经网络集成到序列嵌入中,将图神经网络集成到结构嵌入中,尽管两种类型的数据之间存在分辨率差距。数值结果表明,我们的模型在预测表观遗传谱方面优于竞争的纯序列模型,并且它们使用远程相互作用来补充纯序列模型在提取调控基序方面的作用。无论是在无监督的“零次”学习还是在有监督的“少次”学习中,它们都被证明是基因表达和致病性中非编码变异效应的极好预测因子。可用性和实施:代码和数据可在https://github.com/Shen-Lab/ncVarPred-1D3D和https://zenodo.org/record/7975777上访问。
{"title":"Multimodal learning of noncoding variant effects using genome sequence and chromatin structure.","authors":"Wuwei Tan,&nbsp;Yang Shen","doi":"10.1093/bioinformatics/btad541","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad541","url":null,"abstract":"<p><strong>Motivation: </strong>A growing amount of noncoding genetic variants, including single-nucleotide polymorphisms, are found to be associated with complex human traits and diseases. Their mechanistic interpretation is relatively limited and can use the help from computational prediction of their effects on epigenetic profiles. However, current models often focus on local, 1D genome sequence determinants and disregard global, 3D chromatin structure that critically affects epigenetic events.</p><p><strong>Results: </strong>We find that noncoding variants of unexpected high similarity in epigenetic profiles, with regards to their relatively low similarity in local sequences, can be largely attributed to their proximity in chromatin structure. Accordingly, we have developed a multimodal deep learning scheme that incorporates both data of 1D genome sequence and 3D chromatin structure for predicting noncoding variant effects. Specifically, we have integrated convolutional and recurrent neural networks for sequence embedding and graph neural networks for structure embedding despite the resolution gap between the two types of data, while utilizing recent DNA language models. Numerical results show that our models outperform competing sequence-only models in predicting epigenetic profiles and their use of long-range interactions complement sequence-only models in extracting regulatory motifs. They prove to be excellent predictors for noncoding variant effects in gene expression and pathogenicity, whether in unsupervised \"zero-shot\" learning or supervised \"few-shot\" learning.</p><p><strong>Availability and implementation: </strong>Codes and data can be accessed at https://github.com/Shen-Lab/ncVarPred-1D3D and https://zenodo.org/record/7975777.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 9","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10502240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10631515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Deep Generative Decoder: MAP estimation of representations improves modelling of single-cell RNA data. 深度生成解码器:表征的 MAP 估计改进了单细胞 RNA 数据建模。
IF 4.4 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad497
Viktoria Schuster, Anders Krogh

Motivation: Learning low-dimensional representations of single-cell transcriptomics has become instrumental to its downstream analysis. The state of the art is currently represented by neural network models, such as variational autoencoders, which use a variational approximation of the likelihood for inference.

Results: We here present the Deep Generative Decoder (DGD), a simple generative model that computes model parameters and representations directly via maximum a posteriori estimation. The DGD handles complex parameterized latent distributions naturally unlike variational autoencoders, which typically use a fixed Gaussian distribution, because of the complexity of adding other types. We first show its general functionality on a commonly used benchmark set, Fashion-MNIST. Secondly, we apply the model to multiple single-cell datasets. Here, the DGD learns low-dimensional, meaningful, and well-structured latent representations with sub-clustering beyond the provided labels. The advantages of this approach are its simplicity and its capability to provide representations of much smaller dimensionality than a comparable variational autoencoder.

Availability and implementation: scDGD is available as a python package at https://github.com/Center-for-Health-Data-Science/scDGD. The remaining code is made available here: https://github.com/Center-for-Health-Data-Science/dgd.

动机学习单细胞转录组学的低维表征对其下游分析至关重要。目前最先进的技术是神经网络模型,例如变异自动编码器,它使用似然的变异近似值进行推理:我们在此介绍深度生成解码器(DGD),这是一种简单的生成模型,可通过最大后验估计直接计算模型参数和表示。与通常使用固定高斯分布的变分自动编码器不同,DGD 可以自然地处理复杂的参数化潜在分布,因为添加其他类型的分布非常复杂。我们首先在常用的基准集 Fashion-MNIST 上展示了其一般功能。其次,我们将该模型应用于多个单细胞数据集。在这里,DGD 学习低维、有意义和结构良好的潜在表征,并在提供的标签之外进行子聚类。这种方法的优势在于它的简单性和提供比同类变异自动编码器小得多的维度表示的能力。可用性和实现:scDGD 是一个 python 软件包,可从 https://github.com/Center-for-Health-Data-Science/scDGD 获取。其余代码可在此处获取:https://github.com/Center-for-Health-Data-Science/dgd。
{"title":"The Deep Generative Decoder: MAP estimation of representations improves modelling of single-cell RNA data.","authors":"Viktoria Schuster, Anders Krogh","doi":"10.1093/bioinformatics/btad497","DOIUrl":"10.1093/bioinformatics/btad497","url":null,"abstract":"<p><strong>Motivation: </strong>Learning low-dimensional representations of single-cell transcriptomics has become instrumental to its downstream analysis. The state of the art is currently represented by neural network models, such as variational autoencoders, which use a variational approximation of the likelihood for inference.</p><p><strong>Results: </strong>We here present the Deep Generative Decoder (DGD), a simple generative model that computes model parameters and representations directly via maximum a posteriori estimation. The DGD handles complex parameterized latent distributions naturally unlike variational autoencoders, which typically use a fixed Gaussian distribution, because of the complexity of adding other types. We first show its general functionality on a commonly used benchmark set, Fashion-MNIST. Secondly, we apply the model to multiple single-cell datasets. Here, the DGD learns low-dimensional, meaningful, and well-structured latent representations with sub-clustering beyond the provided labels. The advantages of this approach are its simplicity and its capability to provide representations of much smaller dimensionality than a comparable variational autoencoder.</p><p><strong>Availability and implementation: </strong>scDGD is available as a python package at https://github.com/Center-for-Health-Data-Science/scDGD. The remaining code is made available here: https://github.com/Center-for-Health-Data-Science/dgd.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 9","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10483129/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10647474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
INTEGRATE-Circ and INTEGRATE-Vis: unbiased detection and visualization of fusion-derived circular RNA. 整合Circ和整合Vis:融合衍生的环状RNA的无偏检测和可视化。
IF 5.8 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad569
Jace Webster, Hung Mai, Amy Ly, Christopher Maher

Motivation: Backsplicing of RNA results in circularized rather than linear transcripts, known as circular RNA (circRNA). A recently discovered and poorly understood subset of circRNAs that are composed of multiple genes, termed fusion-derived circular RNAs (fcircRNAs), represent a class of potential biomarkers shown to have oncogenic potential. Detection of fcircRNAs eludes existing analytical tools, making it difficult to more comprehensively assess their prevalence and function. Improved detection methods may lead to additional biological and clinical insights related to fcircRNAs.

Results: We developed the first unbiased tool for detecting fcircRNAs (INTEGRATE-Circ) and visualizing fcircRNAs (INTEGRATE-Vis) from RNA-Seq data. We found that INTEGRATE-Circ was more sensitive, precise and accurate than other tools based on our analysis of simulated RNA-Seq data and our tool was able to outperform other tools in an analysis of public lymphoblast cell line data. Finally, we were able to validate in vitro three novel fcircRNAs detected by INTEGRATE-Circ in a well-characterized breast cancer cell line.

Availability and implementation: Open source code for INTEGRATE-Circ and INTEGRATE-Vis is available at https://www.github.com/ChrisMaherLab/INTEGRATE-CIRC and https://www.github.com/ChrisMaherLab/INTEGRATE-Vis.

动机:RNA的反转录产生环状而非线性转录物,称为环状RNA(circRNA)。最近发现的一种由多个基因组成的circRNA亚群,称为融合衍生的环状RNA(fcircRNA),代表了一类具有致癌潜力的潜在生物标志物。现有的分析工具无法检测fcircRNA,因此很难更全面地评估其患病率和功能。改进的检测方法可能会带来与fcircRNA相关的额外生物学和临床见解。结果:我们开发了第一个从RNA-Seq数据中检测fcircRNA(INTEGRATE Circ)和可视化fcircRNAs(INTEGIATE Vis)的无偏工具。基于我们对模拟RNA-Seq数据的分析,我们发现INTEGRATE Circ比其他工具更灵敏、更精确、更准确,并且我们的工具在分析公共淋巴母细胞系数据方面能够优于其他工具。最后,我们能够在体外验证在一个特征良好的乳腺癌症细胞系中通过INTEGRATE-Circ检测到的三种新型fcircRNA。可用性和实现:INTEGRATE Circ和INTEGRATE-Vis的开源代码可在https://www.github.com/ChrisMaherLab/INTEGRATE-CIRC和https://www.github.com/ChrisMaherLab/INTEGRATE-Vis.
{"title":"INTEGRATE-Circ and INTEGRATE-Vis: unbiased detection and visualization of fusion-derived circular RNA.","authors":"Jace Webster,&nbsp;Hung Mai,&nbsp;Amy Ly,&nbsp;Christopher Maher","doi":"10.1093/bioinformatics/btad569","DOIUrl":"10.1093/bioinformatics/btad569","url":null,"abstract":"<p><strong>Motivation: </strong>Backsplicing of RNA results in circularized rather than linear transcripts, known as circular RNA (circRNA). A recently discovered and poorly understood subset of circRNAs that are composed of multiple genes, termed fusion-derived circular RNAs (fcircRNAs), represent a class of potential biomarkers shown to have oncogenic potential. Detection of fcircRNAs eludes existing analytical tools, making it difficult to more comprehensively assess their prevalence and function. Improved detection methods may lead to additional biological and clinical insights related to fcircRNAs.</p><p><strong>Results: </strong>We developed the first unbiased tool for detecting fcircRNAs (INTEGRATE-Circ) and visualizing fcircRNAs (INTEGRATE-Vis) from RNA-Seq data. We found that INTEGRATE-Circ was more sensitive, precise and accurate than other tools based on our analysis of simulated RNA-Seq data and our tool was able to outperform other tools in an analysis of public lymphoblast cell line data. Finally, we were able to validate in vitro three novel fcircRNAs detected by INTEGRATE-Circ in a well-characterized breast cancer cell line.</p><p><strong>Availability and implementation: </strong>Open source code for INTEGRATE-Circ and INTEGRATE-Vis is available at https://www.github.com/ChrisMaherLab/INTEGRATE-CIRC and https://www.github.com/ChrisMaherLab/INTEGRATE-Vis.</p>","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":" ","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516643/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10234464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1