首页 > 最新文献

Bioinformatics (Oxford, England)最新文献

英文 中文
VisionMol: a novel virtual reality tool for protein molecular structure visualization and manipulation.
Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf118
Xin Wang, Yicheng Zhuang, Wenrui Liang, Haoyang Wen, Zhencong Cai, Yujia He, Yuxi Su, Wei Qin, Yuanzhe Cai, Lixin Liang, Bingding Huang

Motivation & results: Virtual reality (VR) technology holds significant potential for applications in biomedicine, particularly in the visualization and manipulation of protein molecular structures. To facilitate the study of protein molecules and enable the state-of-the-art VR hardware, we developed a novel VR software named VisionMol, which allows users to engage in immersive exploration and analysis of 3D molecular structures using a range of VR platforms (such as Rhino X Pro, Meta's Oculus Quest Pro/3) as well as personal computers. Built on the Unity engine and programmed using C#, VisionMol incorporates custom scripts to enable a variety of molecular operations. Users can rotate, scale, and translate molecular models using gestures, controllers, or other input devices. Furthermore, VisionMol offers rich visualization and interactive features, including multi-model molecular display, distance measurement between molecular components, and molecular alignment and docking.

Summary: These capabilities facilitate a more intuitive understanding of molecular interactions and chemical properties. The real-time interactive effects and clear visual representations allow users to delve deeper into the relationships between molecular structures and their properties, thereby accelerating research progress and promoting scientific discovery. We believe that this VR-based protein molecule analysis has significant application value in several fields, including biomedicine, life science education, drug design and optimization, biotechnology, and engineering applications.

Availability and implementation: The code is at https://github.com/WangLabforComputationalBiology/VisionMol. The v1.1 code (for Oculus Quest) could also be found at https://doi.org/10.5281/zenodo.14705790. The v1.0 code (for Rhino X Pro) could also be found at https://doi.org/10.5281/zenodo.14865216. Detailed documentation could be found at https://visionmol.surge.sh/#/en-us/README.

{"title":"VisionMol: a novel virtual reality tool for protein molecular structure visualization and manipulation.","authors":"Xin Wang, Yicheng Zhuang, Wenrui Liang, Haoyang Wen, Zhencong Cai, Yujia He, Yuxi Su, Wei Qin, Yuanzhe Cai, Lixin Liang, Bingding Huang","doi":"10.1093/bioinformatics/btaf118","DOIUrl":"10.1093/bioinformatics/btaf118","url":null,"abstract":"<p><strong>Motivation & results: </strong>Virtual reality (VR) technology holds significant potential for applications in biomedicine, particularly in the visualization and manipulation of protein molecular structures. To facilitate the study of protein molecules and enable the state-of-the-art VR hardware, we developed a novel VR software named VisionMol, which allows users to engage in immersive exploration and analysis of 3D molecular structures using a range of VR platforms (such as Rhino X Pro, Meta's Oculus Quest Pro/3) as well as personal computers. Built on the Unity engine and programmed using C#, VisionMol incorporates custom scripts to enable a variety of molecular operations. Users can rotate, scale, and translate molecular models using gestures, controllers, or other input devices. Furthermore, VisionMol offers rich visualization and interactive features, including multi-model molecular display, distance measurement between molecular components, and molecular alignment and docking.</p><p><strong>Summary: </strong>These capabilities facilitate a more intuitive understanding of molecular interactions and chemical properties. The real-time interactive effects and clear visual representations allow users to delve deeper into the relationships between molecular structures and their properties, thereby accelerating research progress and promoting scientific discovery. We believe that this VR-based protein molecule analysis has significant application value in several fields, including biomedicine, life science education, drug design and optimization, biotechnology, and engineering applications.</p><p><strong>Availability and implementation: </strong>The code is at https://github.com/WangLabforComputationalBiology/VisionMol. The v1.1 code (for Oculus Quest) could also be found at https://doi.org/10.5281/zenodo.14705790. The v1.0 code (for Rhino X Pro) could also be found at https://doi.org/10.5281/zenodo.14865216. Detailed documentation could be found at https://visionmol.surge.sh/#/en-us/README.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143652580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EPIPDLF: a pre-trained deep learning framework for predicting enhancer-promoter interactions.
Pub Date : 2025-03-01 DOI: 10.1093/bioinformatics/btae716
Zhichao Xiao, Yan Li, Yijie Ding, Liang Yu

Motivation: Enhancers and promoters, as regulatory DNA elements, play pivotal roles in gene expression, homeostasis, and disease development across various biological processes. With advancing research, it has been uncovered that distal enhancers may engage with nearby promoters to modulate the expression of target genes. This discovery holds significant implications for deepening our comprehension of various biological mechanisms. In recent years, numerous high-throughput wet-lab techniques have been created to detect possible interactions between enhancers and promoters. However, these experimental methods are often time-intensive and costly.

Results: To tackle this issue, we have created an innovative deep learning approach, EPIPDLF, which utilizes advanced deep learning techniques to predict EPIs based solely on genomic sequences in an interpretable manner. Comparative evaluations across six benchmark datasets demonstrate that EPIPDLF consistently exhibits superior performance in EPI prediction. Additionally, by incorporating interpretable analysis mechanisms, our model enables the elucidation of learned features, aiding in the identification and biological analysis of important sequences.

Availability: The source code and data are available at: https://github.com/xzc196/EPIPDLF.

{"title":"EPIPDLF: a pre-trained deep learning framework for predicting enhancer-promoter interactions.","authors":"Zhichao Xiao, Yan Li, Yijie Ding, Liang Yu","doi":"10.1093/bioinformatics/btae716","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae716","url":null,"abstract":"<p><strong>Motivation: </strong>Enhancers and promoters, as regulatory DNA elements, play pivotal roles in gene expression, homeostasis, and disease development across various biological processes. With advancing research, it has been uncovered that distal enhancers may engage with nearby promoters to modulate the expression of target genes. This discovery holds significant implications for deepening our comprehension of various biological mechanisms. In recent years, numerous high-throughput wet-lab techniques have been created to detect possible interactions between enhancers and promoters. However, these experimental methods are often time-intensive and costly.</p><p><strong>Results: </strong>To tackle this issue, we have created an innovative deep learning approach, EPIPDLF, which utilizes advanced deep learning techniques to predict EPIs based solely on genomic sequences in an interpretable manner. Comparative evaluations across six benchmark datasets demonstrate that EPIPDLF consistently exhibits superior performance in EPI prediction. Additionally, by incorporating interpretable analysis mechanisms, our model enables the elucidation of learned features, aiding in the identification and biological analysis of important sequences.</p><p><strong>Availability: </strong>The source code and data are available at: https://github.com/xzc196/EPIPDLF.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating changes in attractor sets under small network perturbations to infer reliable microbial interaction networks from abundance patterns.
Pub Date : 2025-03-01 DOI: 10.1093/bioinformatics/btaf095
Jyoti Jyoti, Marc-Thorsten Hütt

Motivation: Inferring microbial interaction networks from microbiome data is a core task of computational ecology. An avenue of research to create reliable inference methods is based on a stylized view of microbiome data, starting from the assumption that the presences and absences of microbiomes, rather than the quantitative abundances, are informative about the underlying interaction network. With this starting point, inference algorithms can be based on the notion of attractors (asymptotic states) in Boolean networks. Boolean network framework offers a computationally efficient method to tackle this problem. However, often existing algorithms operating under a Boolean network assumption, fail to provide networks that can reproduce the complete set of initial attractors (abundance patterns). Therefore, there is a need for network inference algorithms capable of reproducing the initial stable states of the system.

Results: We study the change of attractors in Boolean threshold dynamics on signed undirected graphs under small changes in network architecture and show, how to leverage these relationships to enhance network inference algorithms. As an illustration of this algorithmic approach, we analyze microbial abundance patterns from stool samples of humans with inflammatory bowel disease (IBD), with colorectal cancer and from healthy individuals to study differences between the interaction networks of the three conditions. The method reveals strong diversity in IBD interaction networks. The networks are first partially deduced by an earlier inference method called ESABO, then we apply the new algorithm developed here, EDAME, to this result to generate a network that comes nearest to satisfying the original attractors.

Availability: Implementation code is freely available at https://github.com/Jojo6297/edame.git.

Supplementary information: Supplementary data are available at Bioinformatics online.

{"title":"Evaluating changes in attractor sets under small network perturbations to infer reliable microbial interaction networks from abundance patterns.","authors":"Jyoti Jyoti, Marc-Thorsten Hütt","doi":"10.1093/bioinformatics/btaf095","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf095","url":null,"abstract":"<p><strong>Motivation: </strong>Inferring microbial interaction networks from microbiome data is a core task of computational ecology. An avenue of research to create reliable inference methods is based on a stylized view of microbiome data, starting from the assumption that the presences and absences of microbiomes, rather than the quantitative abundances, are informative about the underlying interaction network. With this starting point, inference algorithms can be based on the notion of attractors (asymptotic states) in Boolean networks. Boolean network framework offers a computationally efficient method to tackle this problem. However, often existing algorithms operating under a Boolean network assumption, fail to provide networks that can reproduce the complete set of initial attractors (abundance patterns). Therefore, there is a need for network inference algorithms capable of reproducing the initial stable states of the system.</p><p><strong>Results: </strong>We study the change of attractors in Boolean threshold dynamics on signed undirected graphs under small changes in network architecture and show, how to leverage these relationships to enhance network inference algorithms. As an illustration of this algorithmic approach, we analyze microbial abundance patterns from stool samples of humans with inflammatory bowel disease (IBD), with colorectal cancer and from healthy individuals to study differences between the interaction networks of the three conditions. The method reveals strong diversity in IBD interaction networks. The networks are first partially deduced by an earlier inference method called ESABO, then we apply the new algorithm developed here, EDAME, to this result to generate a network that comes nearest to satisfying the original attractors.</p><p><strong>Availability: </strong>Implementation code is freely available at https://github.com/Jojo6297/edame.git.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Associations on the Fly, a new feature aiming to facilitate exploration of the Open Targets Platform evidence.
Pub Date : 2025-02-12 DOI: 10.1093/bioinformatics/btaf070
C Cruz-Castillo, L Fumis, C Mehta, R E Martinez-Osorio, J M Roldan-Romero, H Cornu, P Uniyal, A Solano-Roman, M Carmona, D Ochoa, E M McDonagh, A Buniello

Motivation: The Open Targets Platform (https://platform.opentargets.org) is a unique, comprehensive, open-source resource supporting systematic identification and prioritisation of targets for drug discovery. The Platform combines, harmonises and integrates data from >20 diverse sources to provide target-disease associations, covering evidence derived from genetic associations, somatic mutations, known drugs, differential expression, animal models, pathways and systems biology. An in-house target identification scoring framework weighs the evidence from each data source and type, contributing to an overall score for each of the 7.8M target-disease associations. However, the old infrastructure did not allow user-led dynamic adjustments in the contribution of different evidence types for target prioritisation, a limitation frequently raised by our user community. Furthermore, the previous Platform user interface did not support navigation and exploration of the underlying target-disease evidence on the same page, occasionally making the user journey counterintuitive.

Results: Here, we describe "Associations on the Fly" (AOTF), a new Platform feature-developed with a user-centred vision-that enables the user to formulate more flexible therapeutic hypotheses through dynamic adjustment of the weight of contributing evidence from each source, altering the prioritisation of targets.

Availability and implementation: The codebases that power the Platform-including our pipelines, GraphQL API, and React UI-are all open source and licensed under the APACHE LICENSE, VERSION 2.0.You can find all of our code repositories on GitHub at https://github.com/opentargets and on Zenodo at https://zenodo.org/records/14392214.This tool was implemented using React v18 and its code is accessible here: [https://github.com/opentargets/ot-ui-apps].The tools are accessible through the Open Targets Platform web interface [https://platform.opentargets.org/] and GraphQL API (https://platform-docs.opentargets.org/data-access/graphql-api).Data is available for download here: [https://platform.opentargets.org/downloads] and from the EMBL-EBI FTP: [https://ftp.ebi.ac.uk/pub/databases/opentargets/platform/].

Contact: Annalisa Buniello, European Molecular Biology Laboratory (EMBL-EBI), buniello@ebi.ac.uk.

Supplementary information: Features walkthrough video: https://youtu.be/2A9bksboAag, https://www.youtube.com/watch?v=WQwQn6I4jkwExtensive documentation: https://platform-docs.opentargets.org/web-interface/associations-on-the-fly  https://platform-docs.opentargets.org/target-prioritisation.

{"title":"Associations on the Fly, a new feature aiming to facilitate exploration of the Open Targets Platform evidence.","authors":"C Cruz-Castillo, L Fumis, C Mehta, R E Martinez-Osorio, J M Roldan-Romero, H Cornu, P Uniyal, A Solano-Roman, M Carmona, D Ochoa, E M McDonagh, A Buniello","doi":"10.1093/bioinformatics/btaf070","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf070","url":null,"abstract":"<p><strong>Motivation: </strong>The Open Targets Platform (https://platform.opentargets.org) is a unique, comprehensive, open-source resource supporting systematic identification and prioritisation of targets for drug discovery. The Platform combines, harmonises and integrates data from >20 diverse sources to provide target-disease associations, covering evidence derived from genetic associations, somatic mutations, known drugs, differential expression, animal models, pathways and systems biology. An in-house target identification scoring framework weighs the evidence from each data source and type, contributing to an overall score for each of the 7.8M target-disease associations. However, the old infrastructure did not allow user-led dynamic adjustments in the contribution of different evidence types for target prioritisation, a limitation frequently raised by our user community. Furthermore, the previous Platform user interface did not support navigation and exploration of the underlying target-disease evidence on the same page, occasionally making the user journey counterintuitive.</p><p><strong>Results: </strong>Here, we describe \"Associations on the Fly\" (AOTF), a new Platform feature-developed with a user-centred vision-that enables the user to formulate more flexible therapeutic hypotheses through dynamic adjustment of the weight of contributing evidence from each source, altering the prioritisation of targets.</p><p><strong>Availability and implementation: </strong>The codebases that power the Platform-including our pipelines, GraphQL API, and React UI-are all open source and licensed under the APACHE LICENSE, VERSION 2.0.You can find all of our code repositories on GitHub at https://github.com/opentargets and on Zenodo at https://zenodo.org/records/14392214.This tool was implemented using React v18 and its code is accessible here: [https://github.com/opentargets/ot-ui-apps].The tools are accessible through the Open Targets Platform web interface [https://platform.opentargets.org/] and GraphQL API (https://platform-docs.opentargets.org/data-access/graphql-api).Data is available for download here: [https://platform.opentargets.org/downloads] and from the EMBL-EBI FTP: [https://ftp.ebi.ac.uk/pub/databases/opentargets/platform/].</p><p><strong>Contact: </strong>Annalisa Buniello, European Molecular Biology Laboratory (EMBL-EBI), buniello@ebi.ac.uk.</p><p><strong>Supplementary information: </strong>Features walkthrough video: https://youtu.be/2A9bksboAag, https://www.youtube.com/watch?v=WQwQn6I4jkwExtensive documentation: https://platform-docs.opentargets.org/web-interface/associations-on-the-fly  https://platform-docs.opentargets.org/target-prioritisation.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive graph neural network method for predicting triplet motifs in disease-drug-gene interactions. 预测疾病-药物-基因相互作用中三重基序的综合图神经网络方法。
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf023
Chuanze Kang, Zonghuan Liu, Han Zhang

Motivation: The drug-disease, gene-disease, and drug-gene relationships, as high-frequency edge types, describe complex biological processes within the biomedical knowledge graph. The structural patterns formed by these three edges are the graph motifs of (disease, drug, gene) triplets. Among them, the triangle is a steady and important motif structure in the network, and other various motifs different from the triangle also indicate rich semantic relationships. However, existing methods only focus on the triangle representation learning for classification, and fail to further discriminate various motifs of triplets. A comprehensive method is needed to predict the various motifs within triplets, which will uncover new pharmacological mechanisms and improve our understanding of disease-gene-drug interactions. Identifying complex motif structures within triplets can also help us to study the structural properties of triangles.

Results: We consider the seven typical motifs within the triplets and propose a novel graph contrastive learning-based method for triplet motif prediction (TriMoGCL). TriMoGCL utilizes a graph convolutional encoder to extract node features from the global network topology. Next, node pooling and edge pooling extract context information as the triplet features from global and local views. To avoid the redundant context information and motif imbalance problem caused by dense edges, we use node and class-prototype contrastive learning to denoise triplet features and enhance discrimination between motifs. The experiments on two different-scale knowledge graphs demonstrate the effectiveness and reliability of TriMoGCL in identifying various motif types. In addition, our model reveals new pharmacological mechanisms, providing a comprehensive analysis of triplet motifs.

Availability and implementation: Codes and datasets are available at https://github.com/zhanglabNKU/TriMoGCL and https://doi.org/10.5281/zenodo.14633572.

动机:药物-疾病、基因-疾病和药物-基因关系作为高频边缘类型,描述了生物医学知识图中的复杂生物过程。这三条边形成的结构模式是(疾病、药物、基因)三胞胎的图形基序。其中,三角形是网络中稳定而重要的母题结构,不同于三角形的其他各种母题也表示着丰富的语义关系。然而,现有的分类方法只关注三角表示学习,无法进一步区分三元组的各种基元。需要一种综合的方法来预测三胞胎中的各种基序,这将揭示新的药理学机制并提高我们对疾病-基因-药物相互作用的理解。识别三联体中复杂的母题结构也有助于我们研究三角形的结构特性。结果:我们考虑了三联体中七个典型的基序,提出了一种新的基于图对比学习的三联体基序预测方法(TriMoGCL)。TriMoGCL利用图形卷积编码器从全局网络拓扑中提取节点特征。接下来,节点池化和边缘池化从全局和局部视图中提取上下文信息作为三元特征。为了避免密集边缘导致的上下文信息冗余和母题不平衡问题,我们采用节点和类原型对比学习对三元特征进行去噪,增强母题之间的辨别能力。在两个不同尺度的知识图谱上的实验证明了TriMoGCL识别各种基序类型的有效性和可靠性。此外,我们的模型揭示了新的药理学机制,提供了三重基序的全面分析。可用性和实施:代码和数据集可在https://github.com/zhanglabNKU/TriMoGCL和https://doi.org/10.5281/zenodo.14633572上获得。
{"title":"A comprehensive graph neural network method for predicting triplet motifs in disease-drug-gene interactions.","authors":"Chuanze Kang, Zonghuan Liu, Han Zhang","doi":"10.1093/bioinformatics/btaf023","DOIUrl":"10.1093/bioinformatics/btaf023","url":null,"abstract":"<p><strong>Motivation: </strong>The drug-disease, gene-disease, and drug-gene relationships, as high-frequency edge types, describe complex biological processes within the biomedical knowledge graph. The structural patterns formed by these three edges are the graph motifs of (disease, drug, gene) triplets. Among them, the triangle is a steady and important motif structure in the network, and other various motifs different from the triangle also indicate rich semantic relationships. However, existing methods only focus on the triangle representation learning for classification, and fail to further discriminate various motifs of triplets. A comprehensive method is needed to predict the various motifs within triplets, which will uncover new pharmacological mechanisms and improve our understanding of disease-gene-drug interactions. Identifying complex motif structures within triplets can also help us to study the structural properties of triangles.</p><p><strong>Results: </strong>We consider the seven typical motifs within the triplets and propose a novel graph contrastive learning-based method for triplet motif prediction (TriMoGCL). TriMoGCL utilizes a graph convolutional encoder to extract node features from the global network topology. Next, node pooling and edge pooling extract context information as the triplet features from global and local views. To avoid the redundant context information and motif imbalance problem caused by dense edges, we use node and class-prototype contrastive learning to denoise triplet features and enhance discrimination between motifs. The experiments on two different-scale knowledge graphs demonstrate the effectiveness and reliability of TriMoGCL in identifying various motif types. In addition, our model reveals new pharmacological mechanisms, providing a comprehensive analysis of triplet motifs.</p><p><strong>Availability and implementation: </strong>Codes and datasets are available at https://github.com/zhanglabNKU/TriMoGCL and https://doi.org/10.5281/zenodo.14633572.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11796092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143018194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
doubletrouble: an R/Bioconductor package for the identification, classification, and analysis of gene and genome duplications.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf043
Fabricio Almeida-Silva, Yves Van de Peer

Summary: Gene and genome duplications are major evolutionary forces that shape the diversity and complexity of life. However, different duplication modes have distinct impacts on gene function, expression, and regulation. Existing tools for identifying and classifying duplicated genes are either outdated or not user-friendly. Here, we present doubletrouble, an R/Bioconductor package that provides a comprehensive and robust framework for analyzing duplicated genes from genomic data. doubletrouble can detect and classify gene pairs as derived from six duplication modes (segmental, tandem, proximal, retrotransposon-derived, DNA transposon-derived, and dispersed duplications), calculate substitution rates, detect signatures of putative whole-genome duplication events, and visualize results as publication-ready figures. We applied doubletrouble to classify the duplicated gene repertoire in 822 eukaryotic genomes, and results were made available through a user-friendly web interface.

Availability and implementation: doubletrouble is available on Bioconductor (https://bioconductor.org/packages/doubletrouble), and the source code is available in a GitHub repository (https://github.com/almeidasilvaf/doubletrouble). doubletroubledb is available online at https://almeidasilvaf.github.io/doubletroubledb/.

{"title":"doubletrouble: an R/Bioconductor package for the identification, classification, and analysis of gene and genome duplications.","authors":"Fabricio Almeida-Silva, Yves Van de Peer","doi":"10.1093/bioinformatics/btaf043","DOIUrl":"10.1093/bioinformatics/btaf043","url":null,"abstract":"<p><strong>Summary: </strong>Gene and genome duplications are major evolutionary forces that shape the diversity and complexity of life. However, different duplication modes have distinct impacts on gene function, expression, and regulation. Existing tools for identifying and classifying duplicated genes are either outdated or not user-friendly. Here, we present doubletrouble, an R/Bioconductor package that provides a comprehensive and robust framework for analyzing duplicated genes from genomic data. doubletrouble can detect and classify gene pairs as derived from six duplication modes (segmental, tandem, proximal, retrotransposon-derived, DNA transposon-derived, and dispersed duplications), calculate substitution rates, detect signatures of putative whole-genome duplication events, and visualize results as publication-ready figures. We applied doubletrouble to classify the duplicated gene repertoire in 822 eukaryotic genomes, and results were made available through a user-friendly web interface.</p><p><strong>Availability and implementation: </strong>doubletrouble is available on Bioconductor (https://bioconductor.org/packages/doubletrouble), and the source code is available in a GitHub repository (https://github.com/almeidasilvaf/doubletrouble). doubletroubledb is available online at https://almeidasilvaf.github.io/doubletroubledb/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11810640/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A conditional denoising VAE-based framework for antimicrobial peptides generation with preserving desirable properties.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf069
Weizhong Zhao, Kaijieyi Hou, Yiting Shen, Xiaohua Hu

Motivation: The widespread use of antibiotics has led to the emergence of resistant pathogens. Antimicrobial peptides (AMPs) combat bacterial infections by disrupting the integrity of cell membranes, making it challenging for bacteria to develop resistance. Consequently, AMPs offer a promising solution to addressing antibiotic resistance. However, the limited availability of natural AMPs cannot meet the growing demand. While deep learning technologies have advanced AMP generation, conventional models often lack stability and may introduce unforeseen side effects.

Results: This study presents a novel denoising VAE-based model guided by desirable physicochemical properties for AMP generation. The model integrates key features (e.g. molecular weight, isoelectric point, hydrophobicity, etc.), and employs position encoding along with a Transformer architecture to enhance generation accuracy. A customized loss function, combining reconstruction loss, KL divergence, and property preserving loss ensure effective model training. Additionally, the model incorporates a denoising mechanism, enabling it to learn from perturbed inputs, thus maintaining performance under limited training data. Experimental results demonstrate that the proposed model can generate AMPs with desirable functional properties, offering a viable approach for AMP design and analysis, which ultimately contributes to the fight against antibiotic resistance.

Availability and implementation: The data and source codes are available both in GitHub (https://github.com/David-WZhao/PPGC-DVAE) and Zenodo (DOI 10.5281/zenodo.14730711).

{"title":"A conditional denoising VAE-based framework for antimicrobial peptides generation with preserving desirable properties.","authors":"Weizhong Zhao, Kaijieyi Hou, Yiting Shen, Xiaohua Hu","doi":"10.1093/bioinformatics/btaf069","DOIUrl":"10.1093/bioinformatics/btaf069","url":null,"abstract":"<p><strong>Motivation: </strong>The widespread use of antibiotics has led to the emergence of resistant pathogens. Antimicrobial peptides (AMPs) combat bacterial infections by disrupting the integrity of cell membranes, making it challenging for bacteria to develop resistance. Consequently, AMPs offer a promising solution to addressing antibiotic resistance. However, the limited availability of natural AMPs cannot meet the growing demand. While deep learning technologies have advanced AMP generation, conventional models often lack stability and may introduce unforeseen side effects.</p><p><strong>Results: </strong>This study presents a novel denoising VAE-based model guided by desirable physicochemical properties for AMP generation. The model integrates key features (e.g. molecular weight, isoelectric point, hydrophobicity, etc.), and employs position encoding along with a Transformer architecture to enhance generation accuracy. A customized loss function, combining reconstruction loss, KL divergence, and property preserving loss ensure effective model training. Additionally, the model incorporates a denoising mechanism, enabling it to learn from perturbed inputs, thus maintaining performance under limited training data. Experimental results demonstrate that the proposed model can generate AMPs with desirable functional properties, offering a viable approach for AMP design and analysis, which ultimately contributes to the fight against antibiotic resistance.</p><p><strong>Availability and implementation: </strong>The data and source codes are available both in GitHub (https://github.com/David-WZhao/PPGC-DVAE) and Zenodo (DOI 10.5281/zenodo.14730711).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11850229/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vistla: identifying influence paths with information theory.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf036
Miron B Kursa

Motivation: It is a challenging task to decipher the mechanisms of a complex system from observational data, especially in biology, where systems are sophisticated, measurements coarse, and multi-modality common. The typical approaches of inferring a network of relationships between a system's components struggle with the quality and feasibility of estimation, as well as with the interpretability of the results they yield. Said issues can be avoided, however, when dealing with a simpler problem of tracking only the influence paths, defined as circuits relying on the information of an experimental perturbation as it spreads through the system. Such an approach can be formalized with information theory and leads to a relatively streamlined, interpretable output, in contrast to the incomprehensibly dense 'haystack' networks produced by typical tools.

Results: Following this idea, the paper introduces Vistla, a novel method built around tri-variate mutual information and data processing inequality, combined with a higher-order generalization of the widest path problem. Vistla can be used standalone, in a machine learning pipeline to aid interpretability, or as a tool for mediation analysis; the paper demonstrates its efficiency both in synthetic and real-world problems.

Availability and implementation: The R package implementing the method is available at https://gitlab.com/mbq/vistla, as well as on CRAN.

{"title":"Vistla: identifying influence paths with information theory.","authors":"Miron B Kursa","doi":"10.1093/bioinformatics/btaf036","DOIUrl":"10.1093/bioinformatics/btaf036","url":null,"abstract":"<p><strong>Motivation: </strong>It is a challenging task to decipher the mechanisms of a complex system from observational data, especially in biology, where systems are sophisticated, measurements coarse, and multi-modality common. The typical approaches of inferring a network of relationships between a system's components struggle with the quality and feasibility of estimation, as well as with the interpretability of the results they yield. Said issues can be avoided, however, when dealing with a simpler problem of tracking only the influence paths, defined as circuits relying on the information of an experimental perturbation as it spreads through the system. Such an approach can be formalized with information theory and leads to a relatively streamlined, interpretable output, in contrast to the incomprehensibly dense 'haystack' networks produced by typical tools.</p><p><strong>Results: </strong>Following this idea, the paper introduces Vistla, a novel method built around tri-variate mutual information and data processing inequality, combined with a higher-order generalization of the widest path problem. Vistla can be used standalone, in a machine learning pipeline to aid interpretability, or as a tool for mediation analysis; the paper demonstrates its efficiency both in synthetic and real-world problems.</p><p><strong>Availability and implementation: </strong>The R package implementing the method is available at https://gitlab.com/mbq/vistla, as well as on CRAN.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11806950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143034834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scHNTL: single-cell RNA-seq data clustering augmented by high-order neighbors and triplet loss.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf044
Hua Meng, Chuan Qin, Zhiguo Long

Motivation: The rapid development of single-cell RNA sequencing (scRNA-seq) has significantly advanced biomedical research. Clustering analysis, crucial for scRNA-seq data, faces challenges including data sparsity, high dimensionality, and variable gene expressions. Better low-dimensional embeddings for these complex data should maintain intrinsic information while making similar data close and dissimilar data distant. However, existing methods utilizing neural networks typically focus on minimizing reconstruction loss and maintaining similarity in embeddings of directly related cells, but fail to consider dissimilarity, thus lacking separability and limiting the performance of clustering.

Results: We propose a novel clustering algorithm, called scHNTL (scRNA-seq data clustering augmented by high-order neighbors and triplet loss). It first constructs an auxiliary similarity graph and uses a Graph Attentional Autoencoder to learn initial embeddings of cells. Then it identifies similar and dissimilar cells by exploring high-order structures of the similarity graph and exploits a triplet loss of contrastive learning, to improve the embeddings in preserving structural information by separating dissimilar pairs. Finally, this improvement for embedding and the target of clustering are fused in a self-optimizing clustering framework to obtain the clusters. Experimental evaluations on 16 real-world datasets demonstrate the superiority of scHNTL in clustering over the state-of-the-arts single-cell clustering algorithms.

Availability and implementation: Python implementation of scHNTL is available at Figshare (https://doi.org/10.6084/m9.figshare.27001090) and Github (https://github.com/SWJTU-ML/scHNTL-code).

{"title":"scHNTL: single-cell RNA-seq data clustering augmented by high-order neighbors and triplet loss.","authors":"Hua Meng, Chuan Qin, Zhiguo Long","doi":"10.1093/bioinformatics/btaf044","DOIUrl":"10.1093/bioinformatics/btaf044","url":null,"abstract":"<p><strong>Motivation: </strong>The rapid development of single-cell RNA sequencing (scRNA-seq) has significantly advanced biomedical research. Clustering analysis, crucial for scRNA-seq data, faces challenges including data sparsity, high dimensionality, and variable gene expressions. Better low-dimensional embeddings for these complex data should maintain intrinsic information while making similar data close and dissimilar data distant. However, existing methods utilizing neural networks typically focus on minimizing reconstruction loss and maintaining similarity in embeddings of directly related cells, but fail to consider dissimilarity, thus lacking separability and limiting the performance of clustering.</p><p><strong>Results: </strong>We propose a novel clustering algorithm, called scHNTL (scRNA-seq data clustering augmented by high-order neighbors and triplet loss). It first constructs an auxiliary similarity graph and uses a Graph Attentional Autoencoder to learn initial embeddings of cells. Then it identifies similar and dissimilar cells by exploring high-order structures of the similarity graph and exploits a triplet loss of contrastive learning, to improve the embeddings in preserving structural information by separating dissimilar pairs. Finally, this improvement for embedding and the target of clustering are fused in a self-optimizing clustering framework to obtain the clusters. Experimental evaluations on 16 real-world datasets demonstrate the superiority of scHNTL in clustering over the state-of-the-arts single-cell clustering algorithms.</p><p><strong>Availability and implementation: </strong>Python implementation of scHNTL is available at Figshare (https://doi.org/10.6084/m9.figshare.27001090) and Github (https://github.com/SWJTU-ML/scHNTL-code).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878765/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143061623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of molecular subtypes for endometrial cancer based on hierarchical foundation model.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf059
Haoyu Cui, Qinhao Guo, Jun Xu, Xiaohua Wu, Chengfei Cai, Yiping Jiao, Wenlong Ming, Hao Wen, Xiangxue Wang

Motivation: Endometrial cancer is a prevalent gynecological malignancy that requires accurate identification of its molecular subtypes for effective diagnosis and treatment. Four molecular subtypes with different clinical outcomes have been identified: POLE mutation, mismatch repair deficient, p53 abnormal, and no specific molecular profile. However, determining these subtypes typically relies on expensive gene sequencing. To overcome this limitation, we propose a novel method that utilizes hematoxylin and eosin-stained whole slide images to predict endometrial cancer molecular subtypes.

Results: Our approach leverages a hierarchical foundation model as a backbone, fine-tuned from the UNI computational pathology foundation model, to extract tissue embedding from different scales. We have achieved promising results through extensive experimentation on the Fudan University Shanghai Cancer Center cohort (N = 364). Our model demonstrates a macro-average AUROC of 0.879 (95% CI, 0.853-0.904) in a five-fold cross-validation. Compared to the current state-of-the-art molecular subtypes prediction for endometrial cancer, our method outperforms in terms of predictive accuracy and computational efficiency. Moreover, our method is highly reproducible, allowing for ease of implementation and widespread adoption. This study aims to address the cost and time constraints associated with traditional gene sequencing techniques. By providing a reliable and accessible alternative to gene sequencing, our method has the potential to revolutionize the field of endometrial cancer diagnosis and improve patient outcomes.

Availability and implementation: The codes and data used for generating results in this study are available at https://github.com/HaoyuCui/hi-UNI for GitHub and https://doi.org/10.5281/zenodo.14627478 for Zenodo.

{"title":"Prediction of molecular subtypes for endometrial cancer based on hierarchical foundation model.","authors":"Haoyu Cui, Qinhao Guo, Jun Xu, Xiaohua Wu, Chengfei Cai, Yiping Jiao, Wenlong Ming, Hao Wen, Xiangxue Wang","doi":"10.1093/bioinformatics/btaf059","DOIUrl":"10.1093/bioinformatics/btaf059","url":null,"abstract":"<p><strong>Motivation: </strong>Endometrial cancer is a prevalent gynecological malignancy that requires accurate identification of its molecular subtypes for effective diagnosis and treatment. Four molecular subtypes with different clinical outcomes have been identified: POLE mutation, mismatch repair deficient, p53 abnormal, and no specific molecular profile. However, determining these subtypes typically relies on expensive gene sequencing. To overcome this limitation, we propose a novel method that utilizes hematoxylin and eosin-stained whole slide images to predict endometrial cancer molecular subtypes.</p><p><strong>Results: </strong>Our approach leverages a hierarchical foundation model as a backbone, fine-tuned from the UNI computational pathology foundation model, to extract tissue embedding from different scales. We have achieved promising results through extensive experimentation on the Fudan University Shanghai Cancer Center cohort (N = 364). Our model demonstrates a macro-average AUROC of 0.879 (95% CI, 0.853-0.904) in a five-fold cross-validation. Compared to the current state-of-the-art molecular subtypes prediction for endometrial cancer, our method outperforms in terms of predictive accuracy and computational efficiency. Moreover, our method is highly reproducible, allowing for ease of implementation and widespread adoption. This study aims to address the cost and time constraints associated with traditional gene sequencing techniques. By providing a reliable and accessible alternative to gene sequencing, our method has the potential to revolutionize the field of endometrial cancer diagnosis and improve patient outcomes.</p><p><strong>Availability and implementation: </strong>The codes and data used for generating results in this study are available at https://github.com/HaoyuCui/hi-UNI for GitHub and https://doi.org/10.5281/zenodo.14627478 for Zenodo.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878776/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143392731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics (Oxford, England)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1