首页 > 最新文献

Bioinformatics (Oxford, England)最新文献

英文 中文
FoldX Force Field revisited, an improved version.
Pub Date : 2025-02-06 DOI: 10.1093/bioinformatics/btaf064
Javier Delgado, Raul Reche, Damiano Cianferoni, Gabriele Orlando, Rob van der Kant, Frederic Rousseau, Joost Schymkowitz, Luis Serrano

Motivation: The FoldX force field was originally validated with a database of 1000 mutants at a time when there were few high-resolution structures. Here we have manually curated a database of 5556 mutants affecting protein stability, resulting in 2484 highly confident mutations denominated FoldX Stability Dataset (FSD), represented in non-redundant X-ray structures with less than 2.5 Å resolution, not involving duplicates, metals or prosthetic groups. Using this database, we have created a new version of the FoldX force field by introducing Pi stacking, pH dependency for all charged residues, improving aromatic-aromatic interactions, modifying the Ncap contribution and α-helix dipole, recalibrating the side chain entropy of Methionine, adjusting the H-bond parameters, and modifying the solvation contribution of Tryptophan and others.

Results: These changes have led to significant improvements for the prediction of specific mutants involving the above residues/interactions and a statistically significant increase of FoldX predictions, as well as for the majority of the 20 aa. Removing all training sets data from FSD (VFSD dataset), resulted in improved predictions from R = 0.693 (RMSE = 1.277 kcal/mol) to R = 0.706 (RMSE = 1.252 kcal/mol) when compared with the previously released version. FoldX achieves 95% accuracy considering an error of ± 0.85 kcal/mol in prediction, and an AUC = 0.78, for the VFSD, predicting the sign of the energy change upon mutation.

Availability: FoldX versions 4.1 & 5.1 are freely available for academics at https://foldxsuite.crg.eu/.

Supplementary information: Supplementary data are available at Bioinformatics online.

{"title":"FoldX Force Field revisited, an improved version.","authors":"Javier Delgado, Raul Reche, Damiano Cianferoni, Gabriele Orlando, Rob van der Kant, Frederic Rousseau, Joost Schymkowitz, Luis Serrano","doi":"10.1093/bioinformatics/btaf064","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf064","url":null,"abstract":"<p><strong>Motivation: </strong>The FoldX force field was originally validated with a database of 1000 mutants at a time when there were few high-resolution structures. Here we have manually curated a database of 5556 mutants affecting protein stability, resulting in 2484 highly confident mutations denominated FoldX Stability Dataset (FSD), represented in non-redundant X-ray structures with less than 2.5 Å resolution, not involving duplicates, metals or prosthetic groups. Using this database, we have created a new version of the FoldX force field by introducing Pi stacking, pH dependency for all charged residues, improving aromatic-aromatic interactions, modifying the Ncap contribution and α-helix dipole, recalibrating the side chain entropy of Methionine, adjusting the H-bond parameters, and modifying the solvation contribution of Tryptophan and others.</p><p><strong>Results: </strong>These changes have led to significant improvements for the prediction of specific mutants involving the above residues/interactions and a statistically significant increase of FoldX predictions, as well as for the majority of the 20 aa. Removing all training sets data from FSD (VFSD dataset), resulted in improved predictions from R = 0.693 (RMSE = 1.277 kcal/mol) to R = 0.706 (RMSE = 1.252 kcal/mol) when compared with the previously released version. FoldX achieves 95% accuracy considering an error of ± 0.85 kcal/mol in prediction, and an AUC = 0.78, for the VFSD, predicting the sign of the energy change upon mutation.</p><p><strong>Availability: </strong>FoldX versions 4.1 & 5.1 are freely available for academics at https://foldxsuite.crg.eu/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143367114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NetworkCommons: bridging data, knowledge and methods to build and evaluate context-specific biological networks.
Pub Date : 2025-02-05 DOI: 10.1093/bioinformatics/btaf048
Victor Paton, Denes Türei, Olga Ivanova, Sophia Müller-Dott, Pablo Rodriguez-Mier, Veron I Ca Venafra, Livia Perfetto, Martin Garrido-Rodriguez, Julio Saez-Rodriguez

Summary: We present NetworkCommons, a platform for integrating prior knowledge, omics data, and network inference methods, facilitating their usage and evaluation. NetworkCommons aims to be an infrastructure for the network biology community that supports the development of better methods and benchmarks, by enhancing interoperability and integration.

Availability and implementation: NetworkCommons is implemented in Python and offers programmatic access to multiple omics datasets, network inference methods, and benchmarking setups. It is a free software, available at https://github.com/saezlab/networkcommons, and deposited in Zenodo at https://doi.org/10.5281/zenodo.14719118  .

Supplementary data: Contribution guidelines, additional figures, and descriptions for data, knowledge, methods, evaluation strategies and their implementation are available in the Supplementary Data and in the NetworkCommons documentation at https://networkcommons.readthedocs.io/.

摘要:我们介绍的 NetworkCommons 是一个整合先验知识、omics 数据和网络推断方法的平台,可促进这些方法的使用和评估。NetworkCommons 的目标是成为网络生物学社区的基础设施,通过增强互操作性和集成性,支持开发更好的方法和基准:NetworkCommons 使用 Python 实现,可通过编程访问多个 omics 数据集、网络推理方法和基准设置。它是一款免费软件,可从 https://github.com/saezlab/networkcommons 获取,并存放在 Zenodo 中,网址为 https://doi.org/10.5281/zenodo.14719118 。补充数据:有关数据、知识、方法、评估策略及其实施的投稿指南、附加图表和说明可在补充数据和 NetworkCommons 文档(https://networkcommons.readthedocs.io/)中查阅。
{"title":"NetworkCommons: bridging data, knowledge and methods to build and evaluate context-specific biological networks.","authors":"Victor Paton, Denes Türei, Olga Ivanova, Sophia Müller-Dott, Pablo Rodriguez-Mier, Veron I Ca Venafra, Livia Perfetto, Martin Garrido-Rodriguez, Julio Saez-Rodriguez","doi":"10.1093/bioinformatics/btaf048","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf048","url":null,"abstract":"<p><strong>Summary: </strong>We present NetworkCommons, a platform for integrating prior knowledge, omics data, and network inference methods, facilitating their usage and evaluation. NetworkCommons aims to be an infrastructure for the network biology community that supports the development of better methods and benchmarks, by enhancing interoperability and integration.</p><p><strong>Availability and implementation: </strong>NetworkCommons is implemented in Python and offers programmatic access to multiple omics datasets, network inference methods, and benchmarking setups. It is a free software, available at https://github.com/saezlab/networkcommons, and deposited in Zenodo at https://doi.org/10.5281/zenodo.14719118  .</p><p><strong>Supplementary data: </strong>Contribution guidelines, additional figures, and descriptions for data, knowledge, methods, evaluation strategies and their implementation are available in the Supplementary Data and in the NetworkCommons documentation at https://networkcommons.readthedocs.io/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143191532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GenomeDecoder: Inferring Segmental Duplica-tions in Highly-Repetitive Genomic Regions.
Pub Date : 2025-02-05 DOI: 10.1093/bioinformatics/btaf058
Zhenmiao Zhang, Ishaan Gupta, Pavel A Pevzner

Motivation: The emergence of the "telomere-to-telomere" genomics brought the challenge of identifying segmental duplications (SDs) in complete genomes. It further opened a possibility for identifying the differences in SDs across individual human genomes and studying the SD evolution. These newly emerged challenges require algorithms for reconstructing SDs in the most complex genomic regions that evaded all previous attempts to analyze their architecture, such as rapidly-evolving immunoglobulin loci.

Results: We describe the GenomeDecoder algorithm for inferring SDs and apply it to analyzing genomic architectures of various loci in primate genomes. Our analysis revealed that multiple duplications/deletions led to a rapid birth/death of immunoglobulin genes within the human population and large changes in genomic architecture of immunoglobulin loci across primate genomes. Comparison of immunoglobulin loci across primate genomes suggests that they are subjected to diversifying selection.

Availability and implementation: GenomeDecoder is available at https://github.com/ZhangZhenmiao/GenomeDecoder. The software version and test data used in this paper is uploaded to https://doi.org/10.5281/zenodo.14753844.

Supplementary information: Supplementary data are available at Bioinformatics online.

{"title":"GenomeDecoder: Inferring Segmental Duplica-tions in Highly-Repetitive Genomic Regions.","authors":"Zhenmiao Zhang, Ishaan Gupta, Pavel A Pevzner","doi":"10.1093/bioinformatics/btaf058","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf058","url":null,"abstract":"<p><strong>Motivation: </strong>The emergence of the \"telomere-to-telomere\" genomics brought the challenge of identifying segmental duplications (SDs) in complete genomes. It further opened a possibility for identifying the differences in SDs across individual human genomes and studying the SD evolution. These newly emerged challenges require algorithms for reconstructing SDs in the most complex genomic regions that evaded all previous attempts to analyze their architecture, such as rapidly-evolving immunoglobulin loci.</p><p><strong>Results: </strong>We describe the GenomeDecoder algorithm for inferring SDs and apply it to analyzing genomic architectures of various loci in primate genomes. Our analysis revealed that multiple duplications/deletions led to a rapid birth/death of immunoglobulin genes within the human population and large changes in genomic architecture of immunoglobulin loci across primate genomes. Comparison of immunoglobulin loci across primate genomes suggests that they are subjected to diversifying selection.</p><p><strong>Availability and implementation: </strong>GenomeDecoder is available at https://github.com/ZhangZhenmiao/GenomeDecoder. The software version and test data used in this paper is uploaded to https://doi.org/10.5281/zenodo.14753844.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143257344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive graph neural network method for predicting triplet motifs in disease-drug-gene interactions. 预测疾病-药物-基因相互作用中三重基序的综合图神经网络方法。
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf023
Chuanze Kang, Zonghuan Liu, Han Zhang

Motivation: The drug-disease, gene-disease, and drug-gene relationships, as high-frequency edge types, describe complex biological processes within the biomedical knowledge graph. The structural patterns formed by these three edges are the graph motifs of (disease, drug, gene) triplets. Among them, the triangle is a steady and important motif structure in the network, and other various motifs different from the triangle also indicate rich semantic relationships. However, existing methods only focus on the triangle representation learning for classification, and fail to further discriminate various motifs of triplets. A comprehensive method is needed to predict the various motifs within triplets, which will uncover new pharmacological mechanisms and improve our understanding of disease-gene-drug interactions. Identifying complex motif structures within triplets can also help us to study the structural properties of triangles.

Results: We consider the seven typical motifs within the triplets and propose a novel graph contrastive learning-based method for triplet motif prediction (TriMoGCL). TriMoGCL utilizes a graph convolutional encoder to extract node features from the global network topology. Next, node pooling and edge pooling extract context information as the triplet features from global and local views. To avoid the redundant context information and motif imbalance problem caused by dense edges, we use node and class-prototype contrastive learning to denoise triplet features and enhance discrimination between motifs. The experiments on two different-scale knowledge graphs demonstrate the effectiveness and reliability of TriMoGCL in identifying various motif types. In addition, our model reveals new pharmacological mechanisms, providing a comprehensive analysis of triplet motifs.

Availability and implementation: Codes and datasets are available at https://github.com/zhanglabNKU/TriMoGCL and https://doi.org/10.5281/zenodo.14633572.

动机:药物-疾病、基因-疾病和药物-基因关系作为高频边缘类型,描述了生物医学知识图中的复杂生物过程。这三条边形成的结构模式是(疾病、药物、基因)三胞胎的图形基序。其中,三角形是网络中稳定而重要的母题结构,不同于三角形的其他各种母题也表示着丰富的语义关系。然而,现有的分类方法只关注三角表示学习,无法进一步区分三元组的各种基元。需要一种综合的方法来预测三胞胎中的各种基序,这将揭示新的药理学机制并提高我们对疾病-基因-药物相互作用的理解。识别三联体中复杂的母题结构也有助于我们研究三角形的结构特性。结果:我们考虑了三联体中七个典型的基序,提出了一种新的基于图对比学习的三联体基序预测方法(TriMoGCL)。TriMoGCL利用图形卷积编码器从全局网络拓扑中提取节点特征。接下来,节点池化和边缘池化从全局和局部视图中提取上下文信息作为三元特征。为了避免密集边缘导致的上下文信息冗余和母题不平衡问题,我们采用节点和类原型对比学习对三元特征进行去噪,增强母题之间的辨别能力。在两个不同尺度的知识图谱上的实验证明了TriMoGCL识别各种基序类型的有效性和可靠性。此外,我们的模型揭示了新的药理学机制,提供了三重基序的全面分析。可用性和实施:代码和数据集可在https://github.com/zhanglabNKU/TriMoGCL和https://doi.org/10.5281/zenodo.14633572上获得。
{"title":"A comprehensive graph neural network method for predicting triplet motifs in disease-drug-gene interactions.","authors":"Chuanze Kang, Zonghuan Liu, Han Zhang","doi":"10.1093/bioinformatics/btaf023","DOIUrl":"10.1093/bioinformatics/btaf023","url":null,"abstract":"<p><strong>Motivation: </strong>The drug-disease, gene-disease, and drug-gene relationships, as high-frequency edge types, describe complex biological processes within the biomedical knowledge graph. The structural patterns formed by these three edges are the graph motifs of (disease, drug, gene) triplets. Among them, the triangle is a steady and important motif structure in the network, and other various motifs different from the triangle also indicate rich semantic relationships. However, existing methods only focus on the triangle representation learning for classification, and fail to further discriminate various motifs of triplets. A comprehensive method is needed to predict the various motifs within triplets, which will uncover new pharmacological mechanisms and improve our understanding of disease-gene-drug interactions. Identifying complex motif structures within triplets can also help us to study the structural properties of triangles.</p><p><strong>Results: </strong>We consider the seven typical motifs within the triplets and propose a novel graph contrastive learning-based method for triplet motif prediction (TriMoGCL). TriMoGCL utilizes a graph convolutional encoder to extract node features from the global network topology. Next, node pooling and edge pooling extract context information as the triplet features from global and local views. To avoid the redundant context information and motif imbalance problem caused by dense edges, we use node and class-prototype contrastive learning to denoise triplet features and enhance discrimination between motifs. The experiments on two different-scale knowledge graphs demonstrate the effectiveness and reliability of TriMoGCL in identifying various motif types. In addition, our model reveals new pharmacological mechanisms, providing a comprehensive analysis of triplet motifs.</p><p><strong>Availability and implementation: </strong>Codes and datasets are available at https://github.com/zhanglabNKU/TriMoGCL and https://doi.org/10.5281/zenodo.14633572.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11796092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143018194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vistla: identifying influence paths with information theory.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf036
Miron B Kursa

Motivation: It is a challenging task to decipher the mechanisms of a complex system from observational data, especially in biology, where systems are sophisticated, measurements coarse, and multi-modality common. The typical approaches of inferring a network of relationships between a system's components struggle with the quality and feasibility of estimation, as well as with the interpretability of the results they yield. Said issues can be avoided, however, when dealing with a simpler problem of tracking only the influence paths, defined as circuits relying on the information of an experimental perturbation as it spreads through the system. Such an approach can be formalized with information theory and leads to a relatively streamlined, interpretable output, in contrast to the incomprehensibly dense 'haystack' networks produced by typical tools.

Results: Following this idea, the paper introduces Vistla, a novel method built around tri-variate mutual information and data processing inequality, combined with a higher-order generalization of the widest path problem. Vistla can be used standalone, in a machine learning pipeline to aid interpretability, or as a tool for mediation analysis; the paper demonstrates its efficiency both in synthetic and real-world problems.

Availability and implementation: The R package implementing the method is available at https://gitlab.com/mbq/vistla, as well as on CRAN.

{"title":"Vistla: identifying influence paths with information theory.","authors":"Miron B Kursa","doi":"10.1093/bioinformatics/btaf036","DOIUrl":"10.1093/bioinformatics/btaf036","url":null,"abstract":"<p><strong>Motivation: </strong>It is a challenging task to decipher the mechanisms of a complex system from observational data, especially in biology, where systems are sophisticated, measurements coarse, and multi-modality common. The typical approaches of inferring a network of relationships between a system's components struggle with the quality and feasibility of estimation, as well as with the interpretability of the results they yield. Said issues can be avoided, however, when dealing with a simpler problem of tracking only the influence paths, defined as circuits relying on the information of an experimental perturbation as it spreads through the system. Such an approach can be formalized with information theory and leads to a relatively streamlined, interpretable output, in contrast to the incomprehensibly dense 'haystack' networks produced by typical tools.</p><p><strong>Results: </strong>Following this idea, the paper introduces Vistla, a novel method built around tri-variate mutual information and data processing inequality, combined with a higher-order generalization of the widest path problem. Vistla can be used standalone, in a machine learning pipeline to aid interpretability, or as a tool for mediation analysis; the paper demonstrates its efficiency both in synthetic and real-world problems.</p><p><strong>Availability and implementation: </strong>The R package implementing the method is available at https://gitlab.com/mbq/vistla, as well as on CRAN.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11806950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143034834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
doubletrouble: an R/Bioconductor package for the identification, classification, and analysis of gene and genome duplications.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf043
Fabricio Almeida-Silva, Yves Van de Peer

Summary: Gene and genome duplications are major evolutionary forces that shape the diversity and complexity of life. However, different duplication modes have distinct impacts on gene function, expression, and regulation. Existing tools for identifying and classifying duplicated genes are either outdated or not user-friendly. Here, we present doubletrouble, an R/Bioconductor package that provides a comprehensive and robust framework for analyzing duplicated genes from genomic data. doubletrouble can detect and classify gene pairs as derived from six duplication modes (segmental, tandem, proximal, retrotransposon-derived, DNA transposon-derived, and dispersed duplications), calculate substitution rates, detect signatures of putative whole-genome duplication events, and visualize results as publication-ready figures. We applied doubletrouble to classify the duplicated gene repertoire in 822 eukaryotic genomes, and results were made available through a user-friendly web interface.

Availability and implementation: doubletrouble is available on Bioconductor (https://bioconductor.org/packages/doubletrouble), and the source code is available in a GitHub repository (https://github.com/almeidasilvaf/doubletrouble). doubletroubledb is available online at https://almeidasilvaf.github.io/doubletroubledb/.

{"title":"doubletrouble: an R/Bioconductor package for the identification, classification, and analysis of gene and genome duplications.","authors":"Fabricio Almeida-Silva, Yves Van de Peer","doi":"10.1093/bioinformatics/btaf043","DOIUrl":"10.1093/bioinformatics/btaf043","url":null,"abstract":"<p><strong>Summary: </strong>Gene and genome duplications are major evolutionary forces that shape the diversity and complexity of life. However, different duplication modes have distinct impacts on gene function, expression, and regulation. Existing tools for identifying and classifying duplicated genes are either outdated or not user-friendly. Here, we present doubletrouble, an R/Bioconductor package that provides a comprehensive and robust framework for analyzing duplicated genes from genomic data. doubletrouble can detect and classify gene pairs as derived from six duplication modes (segmental, tandem, proximal, retrotransposon-derived, DNA transposon-derived, and dispersed duplications), calculate substitution rates, detect signatures of putative whole-genome duplication events, and visualize results as publication-ready figures. We applied doubletrouble to classify the duplicated gene repertoire in 822 eukaryotic genomes, and results were made available through a user-friendly web interface.</p><p><strong>Availability and implementation: </strong>doubletrouble is available on Bioconductor (https://bioconductor.org/packages/doubletrouble), and the source code is available in a GitHub repository (https://github.com/almeidasilvaf/doubletrouble). doubletroubledb is available online at https://almeidasilvaf.github.io/doubletroubledb/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11810640/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
tagtango: an application to compare single-cell annotations. Tagtango:一个比较单细胞注释的应用程序。
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf012
Bernat Bramon Mora, Helen Lindsay, Antonin Thiébaut, Kenneth D Stuart, Raphael Gottardo

Summary: In this article, we present tagtango, an innovative R package and web application designed for robust and intuitive comparison of single-cell clusters and annotations. It offers an interactive platform that simplifies the exploration of differences and similarities among different clustering and annotation methods. Leveraging single-cell data analysis and different visualizations, it allows researchers to dissect the underlying biological differences across groups. tagtango is a user-friendly application that is portable and works seamlessly across multiple operating systems.

Availability and implementation: tagtango is freely available at https://github.com/bernibra/tagtango as an R package as well as an online web service at https://tagtango.unil.ch.

摘要:在本文中,我们介绍了tagtango,这是一个创新的R包和web应用程序,旨在对单细胞簇和注释进行鲁棒和直观的比较。它提供了一个交互平台,简化了对不同聚类和注释方法之间的异同的探索。利用单细胞数据分析和不同的可视化,它允许研究人员剖析群体之间潜在的生物学差异。Tagtango是一个用户友好的应用程序,它是可移植的,可以无缝地跨多个操作系统工作。可用性和实现:tagtango可以在https://github.com/bernibra/tagtango上免费获得R包,也可以在https://tagtango.unil.ch.Supplementary上免费获得在线web服务。信息:补充数据可以在Bioinformatics在线上获得。
{"title":"tagtango: an application to compare single-cell annotations.","authors":"Bernat Bramon Mora, Helen Lindsay, Antonin Thiébaut, Kenneth D Stuart, Raphael Gottardo","doi":"10.1093/bioinformatics/btaf012","DOIUrl":"10.1093/bioinformatics/btaf012","url":null,"abstract":"<p><strong>Summary: </strong>In this article, we present tagtango, an innovative R package and web application designed for robust and intuitive comparison of single-cell clusters and annotations. It offers an interactive platform that simplifies the exploration of differences and similarities among different clustering and annotation methods. Leveraging single-cell data analysis and different visualizations, it allows researchers to dissect the underlying biological differences across groups. tagtango is a user-friendly application that is portable and works seamlessly across multiple operating systems.</p><p><strong>Availability and implementation: </strong>tagtango is freely available at https://github.com/bernibra/tagtango as an R package as well as an online web service at https://tagtango.unil.ch.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11814489/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural basis of differential gene expression at eQTLs loci from high-resolution ensemble models of 3D single-cell chromatin conformations.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf050
Lin Du, Hammad Farooq, Pourya Delafrouz, Jie Liang

Motivation: Techniques such as high-throughput chromosome conformation capture (Hi-C) have provided a wealth of information on nucleus organization and genome important for understanding gene expression regulation. Genome-Wide Association Studies have identified numerous loci associated with complex traits. Expression quantitative trait loci (eQTL) studies have further linked the genetic variants to alteration in expression levels of associated target genes across individuals. However, the functional roles of many eQTLs in noncoding regions remain unclear. Current joint analyses of Hi-C and eQTLs data lack advanced computational tools, limiting what can be learned from these data.

Results: We developed a computational method for simultaneous analysis of Hi-C and eQTL data, capable of identifying a small set of nonrandom interactions from all Hi-C interactions. Using these nonrandom interactions, we reconstructed large ensembles (×105) of high-resolution single-cell 3D chromatin conformations with thorough sampling, accurately replicating Hi-C measurements. Our results revealed many-body interactions in chromatin conformation at the single-cell level within eQTL loci, providing a detailed view of how 3D chromatin structures form the physical foundation for gene regulation, including how genetic variants of eQTLs affect the expression of associated eGenes. Furthermore, our method can deconvolve chromatin heterogeneity and investigate the spatial associations of eQTLs and eGenes at subpopulation level, revealing their regulatory impacts on gene expression. Together, ensemble modeling of thoroughly sampled single-cell chromatin conformations combined with eQTL data, helps decipher how 3D chromatin structures provide the physical basis for gene regulation, expression control, and aid in understanding the overall structure-function relationships of genome organization.

Availability and implementation: It is available at https://github.com/uic-liang-lab/3DChromFolding-eQTL-Loci.

{"title":"Structural basis of differential gene expression at eQTLs loci from high-resolution ensemble models of 3D single-cell chromatin conformations.","authors":"Lin Du, Hammad Farooq, Pourya Delafrouz, Jie Liang","doi":"10.1093/bioinformatics/btaf050","DOIUrl":"10.1093/bioinformatics/btaf050","url":null,"abstract":"<p><strong>Motivation: </strong>Techniques such as high-throughput chromosome conformation capture (Hi-C) have provided a wealth of information on nucleus organization and genome important for understanding gene expression regulation. Genome-Wide Association Studies have identified numerous loci associated with complex traits. Expression quantitative trait loci (eQTL) studies have further linked the genetic variants to alteration in expression levels of associated target genes across individuals. However, the functional roles of many eQTLs in noncoding regions remain unclear. Current joint analyses of Hi-C and eQTLs data lack advanced computational tools, limiting what can be learned from these data.</p><p><strong>Results: </strong>We developed a computational method for simultaneous analysis of Hi-C and eQTL data, capable of identifying a small set of nonrandom interactions from all Hi-C interactions. Using these nonrandom interactions, we reconstructed large ensembles (×105) of high-resolution single-cell 3D chromatin conformations with thorough sampling, accurately replicating Hi-C measurements. Our results revealed many-body interactions in chromatin conformation at the single-cell level within eQTL loci, providing a detailed view of how 3D chromatin structures form the physical foundation for gene regulation, including how genetic variants of eQTLs affect the expression of associated eGenes. Furthermore, our method can deconvolve chromatin heterogeneity and investigate the spatial associations of eQTLs and eGenes at subpopulation level, revealing their regulatory impacts on gene expression. Together, ensemble modeling of thoroughly sampled single-cell chromatin conformations combined with eQTL data, helps decipher how 3D chromatin structures provide the physical basis for gene regulation, expression control, and aid in understanding the overall structure-function relationships of genome organization.</p><p><strong>Availability and implementation: </strong>It is available at https://github.com/uic-liang-lab/3DChromFolding-eQTL-Loci.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143076694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DNAdesign: feature-aware in silico design of synthetic DNA through mutation.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf052
Yingfei Wang, Jinsen Li, Tsu-Pei Chiu, Nicolas Gompel, Remo Rohs

Motivation: DNA sequence and shape readout represent different modes of protein-DNA recognition. Current tools lack the functionality to simultaneously consider alterations in different readout modes caused by sequence mutations. DNAdesign is a web-based tool to compare and design mutations based on both DNA sequence and shape characteristics. Users input a wild-type sequence, select sites to introduce mutations and choose a set of DNA shape parameters for mutation design.

Results: DNAdesign utilizes Deep DNAshape to provide ultra-fast predictions of DNA shape based on extended k-mers and offers multiple encoding methods for nucleotide sequences, including the physicochemical encoding of DNA through their functional groups in the major and minor groove. DNAdesign provides all mutation candidates along the sequence and shape dimensions, with interactive visualization comparing each candidate with the wild-type DNA molecule. DNAdesign provides an approach to studying gene regulation and applications in synthetic biology, such as the design of synthetic enhancers and transcription factor binding sites.

Availability and implementation: The DNAdesign webserver and documentation are freely accessible at https://dnadesign.usc.edu.

{"title":"DNAdesign: feature-aware in silico design of synthetic DNA through mutation.","authors":"Yingfei Wang, Jinsen Li, Tsu-Pei Chiu, Nicolas Gompel, Remo Rohs","doi":"10.1093/bioinformatics/btaf052","DOIUrl":"10.1093/bioinformatics/btaf052","url":null,"abstract":"<p><strong>Motivation: </strong>DNA sequence and shape readout represent different modes of protein-DNA recognition. Current tools lack the functionality to simultaneously consider alterations in different readout modes caused by sequence mutations. DNAdesign is a web-based tool to compare and design mutations based on both DNA sequence and shape characteristics. Users input a wild-type sequence, select sites to introduce mutations and choose a set of DNA shape parameters for mutation design.</p><p><strong>Results: </strong>DNAdesign utilizes Deep DNAshape to provide ultra-fast predictions of DNA shape based on extended k-mers and offers multiple encoding methods for nucleotide sequences, including the physicochemical encoding of DNA through their functional groups in the major and minor groove. DNAdesign provides all mutation candidates along the sequence and shape dimensions, with interactive visualization comparing each candidate with the wild-type DNA molecule. DNAdesign provides an approach to studying gene regulation and applications in synthetic biology, such as the design of synthetic enhancers and transcription factor binding sites.</p><p><strong>Availability and implementation: </strong>The DNAdesign webserver and documentation are freely accessible at https://dnadesign.usc.edu.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11825384/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143076691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Functional profiling of the sequence stockpile: a protein pair-based assessment of in silico prediction tools.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf035
R Prabakaran, Yana Bromberg

Motivation: In silico functional annotation of proteins is crucial to narrowing the sequencing-accelerated gap in our understanding of protein activities. Numerous function annotation methods exist, and their ranks have been growing, particularly so with the recent deep learning-based developments. However, it is unclear if these tools are truly predictive. As we are not aware of any methods that can identify new terms in functional ontologies, we ask if they can, at least, identify molecular functions of proteins that are non-homologous to or far-removed from known protein families.

Results: Here, we explore the potential and limitations of the existing methods in predicting the molecular functions of thousands of such proteins. Lacking the "ground truth" functional annotations, we transformed the assessment of function prediction into evaluation of functional similarity of protein pairs that likely share function but are unlike any of the currently functionally annotated sequences. Notably, our approach transcends the limitations of functional annotation vocabularies, providing a means to assess different-ontology annotation methods. We find that most existing methods are limited to identifying functional similarity of homologous sequences and fail to predict the function of proteins lacking reference. Curiously, despite their seemingly unlimited by-homology scope, deep learning methods also have trouble capturing the functional signal encoded in protein sequence. We believe that our work will inspire the development of a new generation of methods that push boundaries and promote exploration and discovery in the molecular function domain.

Availability and implementation: The data underlying this article are available at https://doi.org/10.6084/m9.figshare.c.6737127.v3. The code used to compute siblings is available openly at https://bitbucket.org/bromberglab/siblings-detector/.

{"title":"Functional profiling of the sequence stockpile: a protein pair-based assessment of in silico prediction tools.","authors":"R Prabakaran, Yana Bromberg","doi":"10.1093/bioinformatics/btaf035","DOIUrl":"10.1093/bioinformatics/btaf035","url":null,"abstract":"<p><strong>Motivation: </strong>In silico functional annotation of proteins is crucial to narrowing the sequencing-accelerated gap in our understanding of protein activities. Numerous function annotation methods exist, and their ranks have been growing, particularly so with the recent deep learning-based developments. However, it is unclear if these tools are truly predictive. As we are not aware of any methods that can identify new terms in functional ontologies, we ask if they can, at least, identify molecular functions of proteins that are non-homologous to or far-removed from known protein families.</p><p><strong>Results: </strong>Here, we explore the potential and limitations of the existing methods in predicting the molecular functions of thousands of such proteins. Lacking the \"ground truth\" functional annotations, we transformed the assessment of function prediction into evaluation of functional similarity of protein pairs that likely share function but are unlike any of the currently functionally annotated sequences. Notably, our approach transcends the limitations of functional annotation vocabularies, providing a means to assess different-ontology annotation methods. We find that most existing methods are limited to identifying functional similarity of homologous sequences and fail to predict the function of proteins lacking reference. Curiously, despite their seemingly unlimited by-homology scope, deep learning methods also have trouble capturing the functional signal encoded in protein sequence. We believe that our work will inspire the development of a new generation of methods that push boundaries and promote exploration and discovery in the molecular function domain.</p><p><strong>Availability and implementation: </strong>The data underlying this article are available at https://doi.org/10.6084/m9.figshare.c.6737127.v3. The code used to compute siblings is available openly at https://bitbucket.org/bromberglab/siblings-detector/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11821270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143034899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics (Oxford, England)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1