Pub Date : 2025-03-04DOI: 10.1093/bioinformatics/btaf074
Weiming Yu, Zerun Lin, Miaofang Lan, Le Ou-Yang
Motivation: Gene regulatory networks (GRNs) unveil the intricate interactions among genes, pivotal in elucidating the complex biological processes within cells. The advent of single-cell RNA-sequencing (scRNA-seq) enables the inference of GRNs at single-cell resolution. However, the majority of current supervised network inference methods typically concentrate on predicting pairwise gene regulatory interaction, thus failing to fully exploit correlations among all genes and exhibiting limited generalization performance.
Results: To address these issues, we propose a graph contrastive link prediction (GCLink) model to infer potential gene regulatory interactions from scRNA-seq data. Based on known gene regulatory interactions and scRNA-seq data, GCLink introduces a graph contrastive learning strategy to aggregate the feature and neighborhood information of genes to learn their representations. This approach reduces the dependence of our model on sample size and enhance its ability in predicting potential gene regulatory interactions. Extensive experiments on real scRNA-seq datasets demonstrate that GCLink outperforms other state-of-the-art methods in most cases. Furthermore, by pretraining GCLink on a source cell line with abundant known regulatory interactions and fine-tuning it on a target cell line with limited amount of known interactions, our GCLink model exhibits good performance in GRN inference, demonstrating its effectiveness in inferring GRNs from datasets with limited known interactions.
Availability and implementation: The source code and data are available at https://github.com/Yoyiming/GCLink.
{"title":"GCLink: a graph contrastive link prediction framework for gene regulatory network inference.","authors":"Weiming Yu, Zerun Lin, Miaofang Lan, Le Ou-Yang","doi":"10.1093/bioinformatics/btaf074","DOIUrl":"10.1093/bioinformatics/btaf074","url":null,"abstract":"<p><strong>Motivation: </strong>Gene regulatory networks (GRNs) unveil the intricate interactions among genes, pivotal in elucidating the complex biological processes within cells. The advent of single-cell RNA-sequencing (scRNA-seq) enables the inference of GRNs at single-cell resolution. However, the majority of current supervised network inference methods typically concentrate on predicting pairwise gene regulatory interaction, thus failing to fully exploit correlations among all genes and exhibiting limited generalization performance.</p><p><strong>Results: </strong>To address these issues, we propose a graph contrastive link prediction (GCLink) model to infer potential gene regulatory interactions from scRNA-seq data. Based on known gene regulatory interactions and scRNA-seq data, GCLink introduces a graph contrastive learning strategy to aggregate the feature and neighborhood information of genes to learn their representations. This approach reduces the dependence of our model on sample size and enhance its ability in predicting potential gene regulatory interactions. Extensive experiments on real scRNA-seq datasets demonstrate that GCLink outperforms other state-of-the-art methods in most cases. Furthermore, by pretraining GCLink on a source cell line with abundant known regulatory interactions and fine-tuning it on a target cell line with limited amount of known interactions, our GCLink model exhibits good performance in GRN inference, demonstrating its effectiveness in inferring GRNs from datasets with limited known interactions.</p><p><strong>Availability and implementation: </strong>The source code and data are available at https://github.com/Yoyiming/GCLink.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11881698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-04DOI: 10.1093/bioinformatics/btaf100
Avigail Taylor, Valentine M Macaulay, Matthieu J Miossec, Anand K Maurya, Francesca M Buffa
Summary: GeneFEAST, implemented in Python, is a gene-centric functional enrichment analysis summarization and visualization tool that can be applied to large functional enrichment analysis (FEA) results arising from upstream FEA pipelines. It produces a systematic, navigable HTML report, making it easy to identify sets of genes putatively driving multiple enrichments and to explore gene-level quantitative data first used to identify input genes. Further, GeneFEAST can juxtapose FEA results from multiple studies, making it possible to highlight patterns of gene expression amongst genes that are differentially expressed in at least one of multiple conditions, and which give rise to shared enrichments under those conditions. Thus, GeneFEAST offers a novel, effective way to address the complexities of linking up many overlapping FEA results to their underlying genes and data, advancing gene-centric hypotheses, and providing pivotal information for downstream validation experiments.
{"title":"GeneFEAST: the pivotal, gene-centric step in functional enrichment analysis interpretation.","authors":"Avigail Taylor, Valentine M Macaulay, Matthieu J Miossec, Anand K Maurya, Francesca M Buffa","doi":"10.1093/bioinformatics/btaf100","DOIUrl":"10.1093/bioinformatics/btaf100","url":null,"abstract":"<p><strong>Summary: </strong>GeneFEAST, implemented in Python, is a gene-centric functional enrichment analysis summarization and visualization tool that can be applied to large functional enrichment analysis (FEA) results arising from upstream FEA pipelines. It produces a systematic, navigable HTML report, making it easy to identify sets of genes putatively driving multiple enrichments and to explore gene-level quantitative data first used to identify input genes. Further, GeneFEAST can juxtapose FEA results from multiple studies, making it possible to highlight patterns of gene expression amongst genes that are differentially expressed in at least one of multiple conditions, and which give rise to shared enrichments under those conditions. Thus, GeneFEAST offers a novel, effective way to address the complexities of linking up many overlapping FEA results to their underlying genes and data, advancing gene-centric hypotheses, and providing pivotal information for downstream validation experiments.</p><p><strong>Availability and implementation: </strong>GeneFEAST GitHub repository: https://github.com/avigailtaylor/GeneFEAST; Zenodo record: 10.5281/zenodo.14753734; Python Package Index: https://pypi.org/project/genefeast; Docker container: ghcr.io/avigailtaylor/genefeast.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11919446/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-04DOI: 10.1093/bioinformatics/btaf106
Guoyi Zhang, Pekka Ristola, Han Su, Bipin Kumar, Boyu Zhang, Yujin Hu, Michael G Elliot, Viktor Drobot, Jie Zhu, Jens Staal, Martin Larralde, Shun Wang, Yun Yi, Haoran Yu
Motivation: The BioArchLinux project was initiated to address challenges in bioinformatics software reproducibility and freshness. Relying on Arch Linux's user-driven ecosystem, we aim to create a comprehensive and continuously updated repository for life sciences research.
Results: BioArchLinux provides a PKGBUILD-based system for seamless software packaging and maintenance, enabling users to access the latest bioinformatics tools across multiple programming languages. The repository includes Docker images, Windows Subsystem for Linux (WSL) support, and Junest for nonroot environments, enhancing accessibility across platforms. Although being developed and maintained by a small core team, BioArchLinux is a fast-growing bioinformatics repository that offers a participatory and community-driven environment.
Availability and implementation: The repository, documentation, and tools are freely available at https://bioarchlinux.org and https://github.com/BioArchLinux. Users and developers are encouraged to contribute and expand this open-source initiative.
{"title":"BioArchLinux: community-driven fresh reproducible software repository for life sciences.","authors":"Guoyi Zhang, Pekka Ristola, Han Su, Bipin Kumar, Boyu Zhang, Yujin Hu, Michael G Elliot, Viktor Drobot, Jie Zhu, Jens Staal, Martin Larralde, Shun Wang, Yun Yi, Haoran Yu","doi":"10.1093/bioinformatics/btaf106","DOIUrl":"10.1093/bioinformatics/btaf106","url":null,"abstract":"<p><strong>Motivation: </strong>The BioArchLinux project was initiated to address challenges in bioinformatics software reproducibility and freshness. Relying on Arch Linux's user-driven ecosystem, we aim to create a comprehensive and continuously updated repository for life sciences research.</p><p><strong>Results: </strong>BioArchLinux provides a PKGBUILD-based system for seamless software packaging and maintenance, enabling users to access the latest bioinformatics tools across multiple programming languages. The repository includes Docker images, Windows Subsystem for Linux (WSL) support, and Junest for nonroot environments, enhancing accessibility across platforms. Although being developed and maintained by a small core team, BioArchLinux is a fast-growing bioinformatics repository that offers a participatory and community-driven environment.</p><p><strong>Availability and implementation: </strong>The repository, documentation, and tools are freely available at https://bioarchlinux.org and https://github.com/BioArchLinux. Users and developers are encouraged to contribute and expand this open-source initiative.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11925497/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01DOI: 10.1093/bioinformatics/btae716
Zhichao Xiao, Yan Li, Yijie Ding, Liang Yu
Motivation: Enhancers and promoters, as regulatory DNA elements, play pivotal roles in gene expression, homeostasis, and disease development across various biological processes. With advancing research, it has been uncovered that distal enhancers may engage with nearby promoters to modulate the expression of target genes. This discovery holds significant implications for deepening our comprehension of various biological mechanisms. In recent years, numerous high-throughput wet-lab techniques have been created to detect possible interactions between enhancers and promoters. However, these experimental methods are often time-intensive and costly.
Results: To tackle this issue, we have created an innovative deep learning approach, EPIPDLF, which utilizes advanced deep learning techniques to predict EPIs based solely on genomic sequences in an interpretable manner. Comparative evaluations across six benchmark datasets demonstrate that EPIPDLF consistently exhibits superior performance in EPI prediction. Additionally, by incorporating interpretable analysis mechanisms, our model enables the elucidation of learned features, aiding in the identification and biological analysis of important sequences.
Availability: The source code and data are available at: https://github.com/xzc196/EPIPDLF.
{"title":"EPIPDLF: a pre-trained deep learning framework for predicting enhancer-promoter interactions.","authors":"Zhichao Xiao, Yan Li, Yijie Ding, Liang Yu","doi":"10.1093/bioinformatics/btae716","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae716","url":null,"abstract":"<p><strong>Motivation: </strong>Enhancers and promoters, as regulatory DNA elements, play pivotal roles in gene expression, homeostasis, and disease development across various biological processes. With advancing research, it has been uncovered that distal enhancers may engage with nearby promoters to modulate the expression of target genes. This discovery holds significant implications for deepening our comprehension of various biological mechanisms. In recent years, numerous high-throughput wet-lab techniques have been created to detect possible interactions between enhancers and promoters. However, these experimental methods are often time-intensive and costly.</p><p><strong>Results: </strong>To tackle this issue, we have created an innovative deep learning approach, EPIPDLF, which utilizes advanced deep learning techniques to predict EPIs based solely on genomic sequences in an interpretable manner. Comparative evaluations across six benchmark datasets demonstrate that EPIPDLF consistently exhibits superior performance in EPI prediction. Additionally, by incorporating interpretable analysis mechanisms, our model enables the elucidation of learned features, aiding in the identification and biological analysis of important sequences.</p><p><strong>Availability: </strong>The source code and data are available at: https://github.com/xzc196/EPIPDLF.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01DOI: 10.1093/bioinformatics/btaf095
Jyoti Jyoti, Marc-Thorsten Hütt
Motivation: Inferring microbial interaction networks from microbiome data is a core task of computational ecology. An avenue of research to create reliable inference methods is based on a stylized view of microbiome data, starting from the assumption that the presences and absences of microbiomes, rather than the quantitative abundances, are informative about the underlying interaction network. With this starting point, inference algorithms can be based on the notion of attractors (asymptotic states) in Boolean networks. Boolean network framework offers a computationally efficient method to tackle this problem. However, often existing algorithms operating under a Boolean network assumption, fail to provide networks that can reproduce the complete set of initial attractors (abundance patterns). Therefore, there is a need for network inference algorithms capable of reproducing the initial stable states of the system.
Results: We study the change of attractors in Boolean threshold dynamics on signed undirected graphs under small changes in network architecture and show, how to leverage these relationships to enhance network inference algorithms. As an illustration of this algorithmic approach, we analyze microbial abundance patterns from stool samples of humans with inflammatory bowel disease (IBD), with colorectal cancer and from healthy individuals to study differences between the interaction networks of the three conditions. The method reveals strong diversity in IBD interaction networks. The networks are first partially deduced by an earlier inference method called ESABO, then we apply the new algorithm developed here, EDAME, to this result to generate a network that comes nearest to satisfying the original attractors.
Availability: Implementation code is freely available at https://github.com/Jojo6297/edame.git.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Evaluating changes in attractor sets under small network perturbations to infer reliable microbial interaction networks from abundance patterns.","authors":"Jyoti Jyoti, Marc-Thorsten Hütt","doi":"10.1093/bioinformatics/btaf095","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf095","url":null,"abstract":"<p><strong>Motivation: </strong>Inferring microbial interaction networks from microbiome data is a core task of computational ecology. An avenue of research to create reliable inference methods is based on a stylized view of microbiome data, starting from the assumption that the presences and absences of microbiomes, rather than the quantitative abundances, are informative about the underlying interaction network. With this starting point, inference algorithms can be based on the notion of attractors (asymptotic states) in Boolean networks. Boolean network framework offers a computationally efficient method to tackle this problem. However, often existing algorithms operating under a Boolean network assumption, fail to provide networks that can reproduce the complete set of initial attractors (abundance patterns). Therefore, there is a need for network inference algorithms capable of reproducing the initial stable states of the system.</p><p><strong>Results: </strong>We study the change of attractors in Boolean threshold dynamics on signed undirected graphs under small changes in network architecture and show, how to leverage these relationships to enhance network inference algorithms. As an illustration of this algorithmic approach, we analyze microbial abundance patterns from stool samples of humans with inflammatory bowel disease (IBD), with colorectal cancer and from healthy individuals to study differences between the interaction networks of the three conditions. The method reveals strong diversity in IBD interaction networks. The networks are first partially deduced by an earlier inference method called ESABO, then we apply the new algorithm developed here, EDAME, to this result to generate a network that comes nearest to satisfying the original attractors.</p><p><strong>Availability: </strong>Implementation code is freely available at https://github.com/Jojo6297/edame.git.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-21DOI: 10.1093/bioinformatics/btaf082
Ziqi Kang, Angela Szabo, Teodora Farago, Fernando Perez-Villatoro, Ada Junquera, Shah Saundarya, Inga-Maria Launonen, Ella Anttila, Julia Casado, Kevin Elias, Anni Virtanen, Ulla-Maija Haltia, Anniina Färkkilä
Motivation: Multiplexed imaging and single-cell analysis are increasingly applied to investigate the tissue spatial ecosystems in cancer and other complex diseases. Accurate single-cell phenotyping based on marker combinations is a critical but challenging task due to (i) low reproducibility across experiments with manual thresholding, and, (ii) labor-intensive ground-truth expert annotation required for learning-based methods.
Results: We developed Tribus, an interactive knowledge-based classifier for multiplexed images and proteomic datasets that avoids hard-set thresholds and manual labeling. We demonstrated that Tribus recovers fine-grained cell types, matching the gold standard annotations by human experts. Additionally, Tribus can target ambiguous populations and discover phenotypically distinct cell subtypes. Through benchmarking against three similar methods in four public datasets with ground truth labels, we show that Tribus outperforms other methods in accuracy and computational efficiency, reducing runtime by an order of magnitude. Finally, we demonstrate the performance of Tribus in rapid and precise cell phenotyping with two large in-house whole-slide imaging datasets.
Availability: Tribus is available at https://github.com/farkkilab/tribus as an open-source Python package.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Tribus: Semi-automated discovery of cell identities and phenotypes from multiplexed imaging and proteomic data.","authors":"Ziqi Kang, Angela Szabo, Teodora Farago, Fernando Perez-Villatoro, Ada Junquera, Shah Saundarya, Inga-Maria Launonen, Ella Anttila, Julia Casado, Kevin Elias, Anni Virtanen, Ulla-Maija Haltia, Anniina Färkkilä","doi":"10.1093/bioinformatics/btaf082","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf082","url":null,"abstract":"<p><strong>Motivation: </strong>Multiplexed imaging and single-cell analysis are increasingly applied to investigate the tissue spatial ecosystems in cancer and other complex diseases. Accurate single-cell phenotyping based on marker combinations is a critical but challenging task due to (i) low reproducibility across experiments with manual thresholding, and, (ii) labor-intensive ground-truth expert annotation required for learning-based methods.</p><p><strong>Results: </strong>We developed Tribus, an interactive knowledge-based classifier for multiplexed images and proteomic datasets that avoids hard-set thresholds and manual labeling. We demonstrated that Tribus recovers fine-grained cell types, matching the gold standard annotations by human experts. Additionally, Tribus can target ambiguous populations and discover phenotypically distinct cell subtypes. Through benchmarking against three similar methods in four public datasets with ground truth labels, we show that Tribus outperforms other methods in accuracy and computational efficiency, reducing runtime by an order of magnitude. Finally, we demonstrate the performance of Tribus in rapid and precise cell phenotyping with two large in-house whole-slide imaging datasets.</p><p><strong>Availability: </strong>Tribus is available at https://github.com/farkkilab/tribus as an open-source Python package.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143470144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1093/bioinformatics/btaf070
C Cruz-Castillo, L Fumis, C Mehta, R E Martinez-Osorio, J M Roldan-Romero, H Cornu, P Uniyal, A Solano-Roman, M Carmona, D Ochoa, E M McDonagh, A Buniello
Motivation: The Open Targets Platform (https://platform.opentargets.org) is a unique, comprehensive, open-source resource supporting systematic identification and prioritisation of targets for drug discovery. The Platform combines, harmonises and integrates data from >20 diverse sources to provide target-disease associations, covering evidence derived from genetic associations, somatic mutations, known drugs, differential expression, animal models, pathways and systems biology. An in-house target identification scoring framework weighs the evidence from each data source and type, contributing to an overall score for each of the 7.8M target-disease associations. However, the old infrastructure did not allow user-led dynamic adjustments in the contribution of different evidence types for target prioritisation, a limitation frequently raised by our user community. Furthermore, the previous Platform user interface did not support navigation and exploration of the underlying target-disease evidence on the same page, occasionally making the user journey counterintuitive.
Results: Here, we describe "Associations on the Fly" (AOTF), a new Platform feature-developed with a user-centred vision-that enables the user to formulate more flexible therapeutic hypotheses through dynamic adjustment of the weight of contributing evidence from each source, altering the prioritisation of targets.
Availability and implementation: The codebases that power the Platform-including our pipelines, GraphQL API, and React UI-are all open source and licensed under the APACHE LICENSE, VERSION 2.0.You can find all of our code repositories on GitHub at https://github.com/opentargets and on Zenodo at https://zenodo.org/records/14392214.This tool was implemented using React v18 and its code is accessible here: [https://github.com/opentargets/ot-ui-apps].The tools are accessible through the Open Targets Platform web interface [https://platform.opentargets.org/] and GraphQL API (https://platform-docs.opentargets.org/data-access/graphql-api).Data is available for download here: [https://platform.opentargets.org/downloads] and from the EMBL-EBI FTP: [https://ftp.ebi.ac.uk/pub/databases/opentargets/platform/].
Contact: Annalisa Buniello, European Molecular Biology Laboratory (EMBL-EBI), buniello@ebi.ac.uk.
Supplementary information: Features walkthrough video: https://youtu.be/2A9bksboAag, https://www.youtube.com/watch?v=WQwQn6I4jkwExtensive documentation: https://platform-docs.opentargets.org/web-interface/associations-on-the-fly https://platform-docs.opentargets.org/target-prioritisation.
{"title":"Associations on the Fly, a new feature aiming to facilitate exploration of the Open Targets Platform evidence.","authors":"C Cruz-Castillo, L Fumis, C Mehta, R E Martinez-Osorio, J M Roldan-Romero, H Cornu, P Uniyal, A Solano-Roman, M Carmona, D Ochoa, E M McDonagh, A Buniello","doi":"10.1093/bioinformatics/btaf070","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf070","url":null,"abstract":"<p><strong>Motivation: </strong>The Open Targets Platform (https://platform.opentargets.org) is a unique, comprehensive, open-source resource supporting systematic identification and prioritisation of targets for drug discovery. The Platform combines, harmonises and integrates data from >20 diverse sources to provide target-disease associations, covering evidence derived from genetic associations, somatic mutations, known drugs, differential expression, animal models, pathways and systems biology. An in-house target identification scoring framework weighs the evidence from each data source and type, contributing to an overall score for each of the 7.8M target-disease associations. However, the old infrastructure did not allow user-led dynamic adjustments in the contribution of different evidence types for target prioritisation, a limitation frequently raised by our user community. Furthermore, the previous Platform user interface did not support navigation and exploration of the underlying target-disease evidence on the same page, occasionally making the user journey counterintuitive.</p><p><strong>Results: </strong>Here, we describe \"Associations on the Fly\" (AOTF), a new Platform feature-developed with a user-centred vision-that enables the user to formulate more flexible therapeutic hypotheses through dynamic adjustment of the weight of contributing evidence from each source, altering the prioritisation of targets.</p><p><strong>Availability and implementation: </strong>The codebases that power the Platform-including our pipelines, GraphQL API, and React UI-are all open source and licensed under the APACHE LICENSE, VERSION 2.0.You can find all of our code repositories on GitHub at https://github.com/opentargets and on Zenodo at https://zenodo.org/records/14392214.This tool was implemented using React v18 and its code is accessible here: [https://github.com/opentargets/ot-ui-apps].The tools are accessible through the Open Targets Platform web interface [https://platform.opentargets.org/] and GraphQL API (https://platform-docs.opentargets.org/data-access/graphql-api).Data is available for download here: [https://platform.opentargets.org/downloads] and from the EMBL-EBI FTP: [https://ftp.ebi.ac.uk/pub/databases/opentargets/platform/].</p><p><strong>Contact: </strong>Annalisa Buniello, European Molecular Biology Laboratory (EMBL-EBI), buniello@ebi.ac.uk.</p><p><strong>Supplementary information: </strong>Features walkthrough video: https://youtu.be/2A9bksboAag, https://www.youtube.com/watch?v=WQwQn6I4jkwExtensive documentation: https://platform-docs.opentargets.org/web-interface/associations-on-the-fly https://platform-docs.opentargets.org/target-prioritisation.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-04DOI: 10.1093/bioinformatics/btaf036
Miron B Kursa
Motivation: It is a challenging task to decipher the mechanisms of a complex system from observational data, especially in biology, where systems are sophisticated, measurements coarse, and multi-modality common. The typical approaches of inferring a network of relationships between a system's components struggle with the quality and feasibility of estimation, as well as with the interpretability of the results they yield. Said issues can be avoided, however, when dealing with a simpler problem of tracking only the influence paths, defined as circuits relying on the information of an experimental perturbation as it spreads through the system. Such an approach can be formalized with information theory and leads to a relatively streamlined, interpretable output, in contrast to the incomprehensibly dense 'haystack' networks produced by typical tools.
Results: Following this idea, the paper introduces Vistla, a novel method built around tri-variate mutual information and data processing inequality, combined with a higher-order generalization of the widest path problem. Vistla can be used standalone, in a machine learning pipeline to aid interpretability, or as a tool for mediation analysis; the paper demonstrates its efficiency both in synthetic and real-world problems.
Availability and implementation: The R package implementing the method is available at https://gitlab.com/mbq/vistla, as well as on CRAN.
{"title":"Vistla: identifying influence paths with information theory.","authors":"Miron B Kursa","doi":"10.1093/bioinformatics/btaf036","DOIUrl":"10.1093/bioinformatics/btaf036","url":null,"abstract":"<p><strong>Motivation: </strong>It is a challenging task to decipher the mechanisms of a complex system from observational data, especially in biology, where systems are sophisticated, measurements coarse, and multi-modality common. The typical approaches of inferring a network of relationships between a system's components struggle with the quality and feasibility of estimation, as well as with the interpretability of the results they yield. Said issues can be avoided, however, when dealing with a simpler problem of tracking only the influence paths, defined as circuits relying on the information of an experimental perturbation as it spreads through the system. Such an approach can be formalized with information theory and leads to a relatively streamlined, interpretable output, in contrast to the incomprehensibly dense 'haystack' networks produced by typical tools.</p><p><strong>Results: </strong>Following this idea, the paper introduces Vistla, a novel method built around tri-variate mutual information and data processing inequality, combined with a higher-order generalization of the widest path problem. Vistla can be used standalone, in a machine learning pipeline to aid interpretability, or as a tool for mediation analysis; the paper demonstrates its efficiency both in synthetic and real-world problems.</p><p><strong>Availability and implementation: </strong>The R package implementing the method is available at https://gitlab.com/mbq/vistla, as well as on CRAN.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11806950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143034834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-04DOI: 10.1093/bioinformatics/btaf023
Chuanze Kang, Zonghuan Liu, Han Zhang
Motivation: The drug-disease, gene-disease, and drug-gene relationships, as high-frequency edge types, describe complex biological processes within the biomedical knowledge graph. The structural patterns formed by these three edges are the graph motifs of (disease, drug, gene) triplets. Among them, the triangle is a steady and important motif structure in the network, and other various motifs different from the triangle also indicate rich semantic relationships. However, existing methods only focus on the triangle representation learning for classification, and fail to further discriminate various motifs of triplets. A comprehensive method is needed to predict the various motifs within triplets, which will uncover new pharmacological mechanisms and improve our understanding of disease-gene-drug interactions. Identifying complex motif structures within triplets can also help us to study the structural properties of triangles.
Results: We consider the seven typical motifs within the triplets and propose a novel graph contrastive learning-based method for triplet motif prediction (TriMoGCL). TriMoGCL utilizes a graph convolutional encoder to extract node features from the global network topology. Next, node pooling and edge pooling extract context information as the triplet features from global and local views. To avoid the redundant context information and motif imbalance problem caused by dense edges, we use node and class-prototype contrastive learning to denoise triplet features and enhance discrimination between motifs. The experiments on two different-scale knowledge graphs demonstrate the effectiveness and reliability of TriMoGCL in identifying various motif types. In addition, our model reveals new pharmacological mechanisms, providing a comprehensive analysis of triplet motifs.
Availability and implementation: Codes and datasets are available at https://github.com/zhanglabNKU/TriMoGCL and https://doi.org/10.5281/zenodo.14633572.
{"title":"A comprehensive graph neural network method for predicting triplet motifs in disease-drug-gene interactions.","authors":"Chuanze Kang, Zonghuan Liu, Han Zhang","doi":"10.1093/bioinformatics/btaf023","DOIUrl":"10.1093/bioinformatics/btaf023","url":null,"abstract":"<p><strong>Motivation: </strong>The drug-disease, gene-disease, and drug-gene relationships, as high-frequency edge types, describe complex biological processes within the biomedical knowledge graph. The structural patterns formed by these three edges are the graph motifs of (disease, drug, gene) triplets. Among them, the triangle is a steady and important motif structure in the network, and other various motifs different from the triangle also indicate rich semantic relationships. However, existing methods only focus on the triangle representation learning for classification, and fail to further discriminate various motifs of triplets. A comprehensive method is needed to predict the various motifs within triplets, which will uncover new pharmacological mechanisms and improve our understanding of disease-gene-drug interactions. Identifying complex motif structures within triplets can also help us to study the structural properties of triangles.</p><p><strong>Results: </strong>We consider the seven typical motifs within the triplets and propose a novel graph contrastive learning-based method for triplet motif prediction (TriMoGCL). TriMoGCL utilizes a graph convolutional encoder to extract node features from the global network topology. Next, node pooling and edge pooling extract context information as the triplet features from global and local views. To avoid the redundant context information and motif imbalance problem caused by dense edges, we use node and class-prototype contrastive learning to denoise triplet features and enhance discrimination between motifs. The experiments on two different-scale knowledge graphs demonstrate the effectiveness and reliability of TriMoGCL in identifying various motif types. In addition, our model reveals new pharmacological mechanisms, providing a comprehensive analysis of triplet motifs.</p><p><strong>Availability and implementation: </strong>Codes and datasets are available at https://github.com/zhanglabNKU/TriMoGCL and https://doi.org/10.5281/zenodo.14633572.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11796092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143018194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-04DOI: 10.1093/bioinformatics/btaf043
Fabricio Almeida-Silva, Yves Van de Peer
Summary: Gene and genome duplications are major evolutionary forces that shape the diversity and complexity of life. However, different duplication modes have distinct impacts on gene function, expression, and regulation. Existing tools for identifying and classifying duplicated genes are either outdated or not user-friendly. Here, we present doubletrouble, an R/Bioconductor package that provides a comprehensive and robust framework for analyzing duplicated genes from genomic data. doubletrouble can detect and classify gene pairs as derived from six duplication modes (segmental, tandem, proximal, retrotransposon-derived, DNA transposon-derived, and dispersed duplications), calculate substitution rates, detect signatures of putative whole-genome duplication events, and visualize results as publication-ready figures. We applied doubletrouble to classify the duplicated gene repertoire in 822 eukaryotic genomes, and results were made available through a user-friendly web interface.
Availability and implementation: doubletrouble is available on Bioconductor (https://bioconductor.org/packages/doubletrouble), and the source code is available in a GitHub repository (https://github.com/almeidasilvaf/doubletrouble). doubletroubledb is available online at https://almeidasilvaf.github.io/doubletroubledb/.
{"title":"doubletrouble: an R/Bioconductor package for the identification, classification, and analysis of gene and genome duplications.","authors":"Fabricio Almeida-Silva, Yves Van de Peer","doi":"10.1093/bioinformatics/btaf043","DOIUrl":"10.1093/bioinformatics/btaf043","url":null,"abstract":"<p><strong>Summary: </strong>Gene and genome duplications are major evolutionary forces that shape the diversity and complexity of life. However, different duplication modes have distinct impacts on gene function, expression, and regulation. Existing tools for identifying and classifying duplicated genes are either outdated or not user-friendly. Here, we present doubletrouble, an R/Bioconductor package that provides a comprehensive and robust framework for analyzing duplicated genes from genomic data. doubletrouble can detect and classify gene pairs as derived from six duplication modes (segmental, tandem, proximal, retrotransposon-derived, DNA transposon-derived, and dispersed duplications), calculate substitution rates, detect signatures of putative whole-genome duplication events, and visualize results as publication-ready figures. We applied doubletrouble to classify the duplicated gene repertoire in 822 eukaryotic genomes, and results were made available through a user-friendly web interface.</p><p><strong>Availability and implementation: </strong>doubletrouble is available on Bioconductor (https://bioconductor.org/packages/doubletrouble), and the source code is available in a GitHub repository (https://github.com/almeidasilvaf/doubletrouble). doubletroubledb is available online at https://almeidasilvaf.github.io/doubletroubledb/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11810640/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}