Sizhe Qiu, Haris Saeed, Will Leonard, Feiran Li, Aidong Yang
Enzyme catalysis, with its advantages in environmental sustainability and efficiency, is gaining traction across diverse industrial applications, such as waste utilization and pharmaceutical biomanufacturing. However, optimizing enzyme catalytic activity remains a significant challenge. To facilitate enzyme mining and engineering, machine learning (ML) models have emerged to predict enzyme substrate specificity, enzyme turnover number, and enzyme catalytic optimum. This review endeavored to assist researchers in effectively utilizing predictive models for enzyme catalytic activity through presenting recent advancements and analyzing different approaches. We also pointed out existing limitations (e.g. dataset imbalance) and offered suggestions on potential enhancements to address them. We identified that the attention mechanism, inclusion of new features such as product information and temperature, and using transfer learning to leverage different datasets were three main useful modeling strategies. Furthermore, we envisaged that accurate predictors of enzyme catalytic activity would potentially transform enzyme and metabolic engineering, and the optimization of biocatalysis.
{"title":"Machine learning for enzyme catalytic activity: current progress and future horizons.","authors":"Sizhe Qiu, Haris Saeed, Will Leonard, Feiran Li, Aidong Yang","doi":"10.1093/bib/bbag002","DOIUrl":"10.1093/bib/bbag002","url":null,"abstract":"<p><p>Enzyme catalysis, with its advantages in environmental sustainability and efficiency, is gaining traction across diverse industrial applications, such as waste utilization and pharmaceutical biomanufacturing. However, optimizing enzyme catalytic activity remains a significant challenge. To facilitate enzyme mining and engineering, machine learning (ML) models have emerged to predict enzyme substrate specificity, enzyme turnover number, and enzyme catalytic optimum. This review endeavored to assist researchers in effectively utilizing predictive models for enzyme catalytic activity through presenting recent advancements and analyzing different approaches. We also pointed out existing limitations (e.g. dataset imbalance) and offered suggestions on potential enhancements to address them. We identified that the attention mechanism, inclusion of new features such as product information and temperature, and using transfer learning to leverage different datasets were three main useful modeling strategies. Furthermore, we envisaged that accurate predictors of enzyme catalytic activity would potentially transform enzyme and metabolic engineering, and the optimization of biocatalysis.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12832030/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146046150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federico Julian Camerota Verdù, Andrea Gasparin, Luca Bortolussi, Lorenzo Castelli
Phylogenetic inference is a key challenge in computational biology, with applications ranging from evolutionary analysis to comparative genomics. The balanced minimum evolution problem (BMEP) offers a well-established formulation of this problem, but remains computationally intractable for large instances. In this work, we propose a reinforcement learning (RL) framework to tackle the BMEP through local search in the space of phylogenetic trees. Our contributions are three-fold: (i) we introduce an improved RL formulation tailored to the structure of phylogenetic inference in the context of the BMEP; (ii) we train an RL agent capable of solving instances with up to 100 taxa; and (iii) we investigate the generalization capabilities of the learned policy across different substitution models, instance sizes, and datasets. To address the limitations of relying solely on the learned policy at inference time, we integrate it into a novel search-based framework that enables effective adaptation during evaluation. Experimental results show that our method outperforms greedy heuristics and matches the performance of state-of-the-art algorithms for the BMEP. When tested under significant distributional shifts, we greatly reduce the gap with state-of-the-art algorithms. This demonstrates the potential of RL applications to phylogenetic inference.
{"title":"Learning to explore tree neighbourhoods for phylogenetic inference.","authors":"Federico Julian Camerota Verdù, Andrea Gasparin, Luca Bortolussi, Lorenzo Castelli","doi":"10.1093/bib/bbaf732","DOIUrl":"10.1093/bib/bbaf732","url":null,"abstract":"<p><p>Phylogenetic inference is a key challenge in computational biology, with applications ranging from evolutionary analysis to comparative genomics. The balanced minimum evolution problem (BMEP) offers a well-established formulation of this problem, but remains computationally intractable for large instances. In this work, we propose a reinforcement learning (RL) framework to tackle the BMEP through local search in the space of phylogenetic trees. Our contributions are three-fold: (i) we introduce an improved RL formulation tailored to the structure of phylogenetic inference in the context of the BMEP; (ii) we train an RL agent capable of solving instances with up to 100 taxa; and (iii) we investigate the generalization capabilities of the learned policy across different substitution models, instance sizes, and datasets. To address the limitations of relying solely on the learned policy at inference time, we integrate it into a novel search-based framework that enables effective adaptation during evaluation. Experimental results show that our method outperforms greedy heuristics and matches the performance of state-of-the-art algorithms for the BMEP. When tested under significant distributional shifts, we greatly reduce the gap with state-of-the-art algorithms. This demonstrates the potential of RL applications to phylogenetic inference.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abnormal dysregulation of microRNAs (miRNAs) expression may lead to a wide spectrum of diseases, and as miRNAs play pivotal roles in disease pathogenesis, diagnosis, and therapy, identifying potential miRNA-disease associations (MDAs) is essential for discovering new diagnostic biomarkers, developing targeted therapeutic strategies, and advancing personalized medicine. Traditional wet-lab experiments are time-consuming, expensive, and consume a lot of resources. Hence, various computational approaches should be developed as auxiliary a priori tools. In the following text, we compile different methods proposed for MDA prediction over the past decade. First, we analyze the data resources supporting MDA studies and introduce approaches for quantifying similarities among MDAs. Second, we comprehensively review 66 computational methods, classify them into five categories, and present comparative experimental analyses on selected methods to identify future research directions. To enhance accessibility, we upload a summary of discussed methods to a GitHub repository (https://github.com/xiesiya/miRNA-disease-association-methods). This review provides comprehensive background knowledge on computational methods for future MDA prediction research.
{"title":"From biogenesis to deep modeling: a holistic review of miRNA-disease prediction computational methods with experimental comparison.","authors":"Siya Xie, K L Eddie Law","doi":"10.1093/bib/bbaf736","DOIUrl":"10.1093/bib/bbaf736","url":null,"abstract":"<p><p>Abnormal dysregulation of microRNAs (miRNAs) expression may lead to a wide spectrum of diseases, and as miRNAs play pivotal roles in disease pathogenesis, diagnosis, and therapy, identifying potential miRNA-disease associations (MDAs) is essential for discovering new diagnostic biomarkers, developing targeted therapeutic strategies, and advancing personalized medicine. Traditional wet-lab experiments are time-consuming, expensive, and consume a lot of resources. Hence, various computational approaches should be developed as auxiliary a priori tools. In the following text, we compile different methods proposed for MDA prediction over the past decade. First, we analyze the data resources supporting MDA studies and introduce approaches for quantifying similarities among MDAs. Second, we comprehensively review 66 computational methods, classify them into five categories, and present comparative experimental analyses on selected methods to identify future research directions. To enhance accessibility, we upload a summary of discussed methods to a GitHub repository (https://github.com/xiesiya/miRNA-disease-association-methods). This review provides comprehensive background knowledge on computational methods for future MDA prediction research.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shida He, Zixu Wang, Jing Li, Quan Zou, Feng Zhang
Drug-gene interactions (DGIs) influence the toxicity or ineffectiveness of the drug therapy and play an important role in elucidating drug mechanisms, predicting potential adverse effects, and facilitating precision medicine. Existing computational methods typically rely on chemical or genetic sequence features of drugs and genes, limiting their effectiveness for novel entities lacking explicit annotations. To address this, we propose BiGvCL, a framework that predicts DGIs exclusively based on network topology, requiring no explicit feature information for drugs or genes. BiGvCL introduces a lightweight graph attention mechanism (GATLite) to efficiently aggregate local neighborhood information. Additionally, we develop a gated graph convolutional network (GatedGCN) to explicitly learn high-order interactions between drugs and genes, further integrating contrastive learning to enhance the model's generalizability. Comprehensive experiments on DrugBank and DGIdb datasets show that BiGvCL achieves competitive performance across all metrics compared with representative baselines. Cross-domain evaluations on OGB datasets further confirm its adaptability to heterogeneous biomedical networks. Ablation and hyperparameter analyses highlight the key contributions of contrastive and gated mechanisms, while case studies and molecular docking provide supporting evidence for the biological relevance of predictions. Collectively, while BiGvCL is constrained by its reliance on network topology and transductive learning paradigm, it demonstrates the potential of topology-based approaches for discovering novel drug-gene interactions, which may inform drug repurposing and precision medicine efforts.
{"title":"BiGvCL: bipartite graph-based cross-domain contrastive learning model for the predicting drug-gene interactions.","authors":"Shida He, Zixu Wang, Jing Li, Quan Zou, Feng Zhang","doi":"10.1093/bib/bbaf710","DOIUrl":"10.1093/bib/bbaf710","url":null,"abstract":"<p><p>Drug-gene interactions (DGIs) influence the toxicity or ineffectiveness of the drug therapy and play an important role in elucidating drug mechanisms, predicting potential adverse effects, and facilitating precision medicine. Existing computational methods typically rely on chemical or genetic sequence features of drugs and genes, limiting their effectiveness for novel entities lacking explicit annotations. To address this, we propose BiGvCL, a framework that predicts DGIs exclusively based on network topology, requiring no explicit feature information for drugs or genes. BiGvCL introduces a lightweight graph attention mechanism (GATLite) to efficiently aggregate local neighborhood information. Additionally, we develop a gated graph convolutional network (GatedGCN) to explicitly learn high-order interactions between drugs and genes, further integrating contrastive learning to enhance the model's generalizability. Comprehensive experiments on DrugBank and DGIdb datasets show that BiGvCL achieves competitive performance across all metrics compared with representative baselines. Cross-domain evaluations on OGB datasets further confirm its adaptability to heterogeneous biomedical networks. Ablation and hyperparameter analyses highlight the key contributions of contrastive and gated mechanisms, while case studies and molecular docking provide supporting evidence for the biological relevance of predictions. Collectively, while BiGvCL is constrained by its reliance on network topology and transductive learning paradigm, it demonstrates the potential of topology-based approaches for discovering novel drug-gene interactions, which may inform drug repurposing and precision medicine efforts.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12848949/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146060084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianyi Chen, Wen Xue, Yunfei Zhang, Yongcan Luo, Cheng Liu, Wenjun Shen, Si Wu, Hau-San Wong
Spatial transcriptomics (ST) technologies have significantly advanced our ability to discern gene expression patterns within intact tissue structures, enabling unprecedented insights into cellular heterogeneity and tissue architecture. However, accurately determining cell-type proportions within spatially aggregated transcriptomic spots remains challenging due to inherent granularity discrepancies, batch effects, and spatial heterogeneity. To address these challenges, we introduce S$^{2}$potAE, a novel spatial spot autoencoder framework that integrates gene expression data, spatial coordinates, and morphological features from histology images for precise spot-level deconvolution. S$^{2}$potAE employs a multilevel feature aggregation strategy, systematically extracting and fusing spatially-aware features through a graph-based spatial encoder and perceptual image embeddings from histological patches. Furthermore, an auxiliary pathological classification task enhances biological relevance and model interpretability. Comprehensive benchmarking across multiple simulated and real datasets-including human breast cancer, mouse brain anterior, and human dorsolateral prefrontal cortex-demonstrates that S$^{2}$potAE consistently surpasses state-of-the-art methods in accuracy, robustness, and biological interpretability. Our approach effectively resolves complex cellular compositions, accurately identifies tumor boundaries, and captures nuanced cell-type distributions, significantly enhancing the utility of ST in biological research and clinical applications.
{"title":"S2potAE: multimodal spatial spot autoencoder integrating image and transcriptomic features for deconvolution.","authors":"Tianyi Chen, Wen Xue, Yunfei Zhang, Yongcan Luo, Cheng Liu, Wenjun Shen, Si Wu, Hau-San Wong","doi":"10.1093/bib/bbag020","DOIUrl":"10.1093/bib/bbag020","url":null,"abstract":"<p><p>Spatial transcriptomics (ST) technologies have significantly advanced our ability to discern gene expression patterns within intact tissue structures, enabling unprecedented insights into cellular heterogeneity and tissue architecture. However, accurately determining cell-type proportions within spatially aggregated transcriptomic spots remains challenging due to inherent granularity discrepancies, batch effects, and spatial heterogeneity. To address these challenges, we introduce S$^{2}$potAE, a novel spatial spot autoencoder framework that integrates gene expression data, spatial coordinates, and morphological features from histology images for precise spot-level deconvolution. S$^{2}$potAE employs a multilevel feature aggregation strategy, systematically extracting and fusing spatially-aware features through a graph-based spatial encoder and perceptual image embeddings from histological patches. Furthermore, an auxiliary pathological classification task enhances biological relevance and model interpretability. Comprehensive benchmarking across multiple simulated and real datasets-including human breast cancer, mouse brain anterior, and human dorsolateral prefrontal cortex-demonstrates that S$^{2}$potAE consistently surpasses state-of-the-art methods in accuracy, robustness, and biological interpretability. Our approach effectively resolves complex cellular compositions, accurately identifies tumor boundaries, and captures nuanced cell-type distributions, significantly enhancing the utility of ST in biological research and clinical applications.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12860387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146096731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minho Noh, Sungkyung Lee, Sunghyun Kim, Sangsoo Lim
Deciphering how molecular programs are spatially organized within tissues is pivotal for understanding tumor evolution and microenvironmental interactions. Existing spatial transcriptomics tools either rely on gene-level features, ignoring the rich topology of biological pathways, or deliver black-box clusters with little mechanistic insight; thus, they limit their translational impact. A method that simultaneously leverages pathway structures and spatially matched histopathology could produce domain delineations that are both accurate and biologically interpretable. We introduce PathCLAST (Pathway-augmented Contrastive Learning with Attention for interpretable Spatial Transcriptomics), which is a framework that integrates gene expression, histopathological images, and curated pathway graphs via bi-modal contrastive learning. By embedding expression profiles into biologically structured graphs, and aligning them with local image features, PathCLAST achieves state-of-the-art spatial domain identification on multiple public datasets, while offering pathway-level attention scores for mechanistic interpretation. The pathway embedding also serves as an explicit, biology-informed dimensionality reduction scheme. PathCLAST not only uncovers domain-specific pathways and spatially organized signaling activities, but also quantifies intra-domain heterogeneity, spatial autocorrelation, and inter-domain crosstalk, providing fine-grained insights into tumor progression and tissue architecture. PathCLAST is available at https://github.com/sslim-aidrug/PathCLAST.
破译分子程序如何在组织内的空间组织是理解肿瘤进化和微环境相互作用的关键。现有的空间转录组学工具要么依赖于基因水平的特征,忽视了生物途径的丰富拓扑结构,要么提供缺乏机制洞察力的黑盒集群;因此,它们限制了它们的翻译影响。一种同时利用通路结构和空间匹配的组织病理学的方法可以产生既准确又具有生物学可解释性的区域描绘。我们介绍了PathCLAST (pathway -augmented contrast Learning with Attention for interpretable Spatial Transcriptomics),这是一个整合了基因表达、组织病理图像和通过双模对比学习的路径图的框架。通过将表达谱嵌入到生物结构图中,并将其与局部图像特征对齐,PathCLAST在多个公共数据集上实现了最先进的空间域识别,同时为机制解释提供路径级注意力评分。路径嵌入也可以作为一个明确的,生物学知情的降维方案。PathCLAST不仅揭示了区域特异性通路和空间组织的信号活动,还量化了区域内异质性、空间自相关性和区域间串扰,为肿瘤进展和组织结构提供了细粒度的见解。PathCLAST可从https://github.com/sslim-aidrug/PathCLAST获得。
{"title":"PathCLAST: pathway-augmented contrastive learning with attention for interpretable spatial transcriptomics.","authors":"Minho Noh, Sungkyung Lee, Sunghyun Kim, Sangsoo Lim","doi":"10.1093/bib/bbag029","DOIUrl":"10.1093/bib/bbag029","url":null,"abstract":"<p><p>Deciphering how molecular programs are spatially organized within tissues is pivotal for understanding tumor evolution and microenvironmental interactions. Existing spatial transcriptomics tools either rely on gene-level features, ignoring the rich topology of biological pathways, or deliver black-box clusters with little mechanistic insight; thus, they limit their translational impact. A method that simultaneously leverages pathway structures and spatially matched histopathology could produce domain delineations that are both accurate and biologically interpretable. We introduce PathCLAST (Pathway-augmented Contrastive Learning with Attention for interpretable Spatial Transcriptomics), which is a framework that integrates gene expression, histopathological images, and curated pathway graphs via bi-modal contrastive learning. By embedding expression profiles into biologically structured graphs, and aligning them with local image features, PathCLAST achieves state-of-the-art spatial domain identification on multiple public datasets, while offering pathway-level attention scores for mechanistic interpretation. The pathway embedding also serves as an explicit, biology-informed dimensionality reduction scheme. PathCLAST not only uncovers domain-specific pathways and spatially organized signaling activities, but also quantifies intra-domain heterogeneity, spatial autocorrelation, and inter-domain crosstalk, providing fine-grained insights into tumor progression and tissue architecture. PathCLAST is available at https://github.com/sslim-aidrug/PathCLAST.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12862980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computational models integrating large-scale gene expression profiles provide a powerful approach for predicting multi-target drug interactions (DTIs). Unlike traditional experimental and computational methods that often require detailed structural or target-specific information, gene expression-based models leverage reference transcriptional signatures. This enables functional inference of interactions without explicit structural data, offering a valuable strategy in data-limited scenarios. By incorporating phenotypic information, these models bridge phenotype screening and target prediction, establishing a novel paradigm for target identification. This review introduces and compares current target identification methods, emphasizing the unique advantages of gene expression profiling in DTI prediction. We also outline major public databases and their applications. As an effective hypothesis-generation tools, computational DTI models reduce experimental costs, enhance understanding of multi-target mechanisms, and accelerate drug discovery. We categorize and analyze three primary model types utilizing large-scale gene expression data: biological network-based, association-based, and multimodal integration approaches, discussing their respective strengths and limitations. Key challenges and future directions are also addressed, including data integration, algorithm optimization, and multi-omics fusion, to fully realize the potential of gene expression data in multi-target drug prediction. This review offers comprehensive guidance on advanced tools, databases, and methodologies, enabling novel research paths for unbiased multi-target exploration. By linking phenotype screening with computational analysis, this integrative approach is expected to advance precision medicine, especially in uncovering drug mechanisms in complex diseases, offering promising prospects.
{"title":"Harnessing AI to fuse phenotypic signatures for drug target identification: progress in computational modeling.","authors":"Fengming Chen, Ranran Zhao, Xingxing Han, Huan Li, Zhishu Tang","doi":"10.1093/bib/bbag045","DOIUrl":"https://doi.org/10.1093/bib/bbag045","url":null,"abstract":"<p><p>Computational models integrating large-scale gene expression profiles provide a powerful approach for predicting multi-target drug interactions (DTIs). Unlike traditional experimental and computational methods that often require detailed structural or target-specific information, gene expression-based models leverage reference transcriptional signatures. This enables functional inference of interactions without explicit structural data, offering a valuable strategy in data-limited scenarios. By incorporating phenotypic information, these models bridge phenotype screening and target prediction, establishing a novel paradigm for target identification. This review introduces and compares current target identification methods, emphasizing the unique advantages of gene expression profiling in DTI prediction. We also outline major public databases and their applications. As an effective hypothesis-generation tools, computational DTI models reduce experimental costs, enhance understanding of multi-target mechanisms, and accelerate drug discovery. We categorize and analyze three primary model types utilizing large-scale gene expression data: biological network-based, association-based, and multimodal integration approaches, discussing their respective strengths and limitations. Key challenges and future directions are also addressed, including data integration, algorithm optimization, and multi-omics fusion, to fully realize the potential of gene expression data in multi-target drug prediction. This review offers comprehensive guidance on advanced tools, databases, and methodologies, enabling novel research paths for unbiased multi-target exploration. By linking phenotype screening with computational analysis, this integrative approach is expected to advance precision medicine, especially in uncovering drug mechanisms in complex diseases, offering promising prospects.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146149095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dong Chen, Yuquan Wang, Dapeng Shi, Yunlong Cao, Yue-Qing Hu
Causal inference is an essential approach for understanding biological processes. Traditional causal inference methods assume a linear relationship between different biological traits, whereas their true causal relationship may be nonlinear, such as U-shaped. Moreover, when the instrument set includes weak and pleiotropic genetic instruments, accurately capturing the shape of these relationships becomes challenging. To address these issues, we propose model-averaged control function-based instrumental variable regression, a two-stage framework based on a model-averaged control function approach to estimate the marginal effect function, which represents the derivative of the causal relationship. In the first stage, a model averaging technique is employed to estimate the control function, thereby reducing weak genetic instrument bias. In the second stage, B-spline approximation is applied to estimate the marginal effect function, while SCAD penalization is used to minimize pleiotropic instrument bias. We establish the asymptotic properties of the proposed estimator and demonstrate its robust performance through simulations. Application to the Atherosclerosis Risk in Communities dataset highlights a nonlinear causal relationship between body mass index and hypertension, with the proposed method effectively estimating the specific shape and trend of the relationship.
{"title":"MACFIV: a novel framework for nonlinear causal inference in the body mass index-hypertension relationship with many weak and pleiotropic genetic instruments.","authors":"Dong Chen, Yuquan Wang, Dapeng Shi, Yunlong Cao, Yue-Qing Hu","doi":"10.1093/bib/bbaf714","DOIUrl":"10.1093/bib/bbaf714","url":null,"abstract":"<p><p>Causal inference is an essential approach for understanding biological processes. Traditional causal inference methods assume a linear relationship between different biological traits, whereas their true causal relationship may be nonlinear, such as U-shaped. Moreover, when the instrument set includes weak and pleiotropic genetic instruments, accurately capturing the shape of these relationships becomes challenging. To address these issues, we propose model-averaged control function-based instrumental variable regression, a two-stage framework based on a model-averaged control function approach to estimate the marginal effect function, which represents the derivative of the causal relationship. In the first stage, a model averaging technique is employed to estimate the control function, thereby reducing weak genetic instrument bias. In the second stage, B-spline approximation is applied to estimate the marginal effect function, while SCAD penalization is used to minimize pleiotropic instrument bias. We establish the asymptotic properties of the proposed estimator and demonstrate its robust performance through simulations. Application to the Atherosclerosis Risk in Communities dataset highlights a nonlinear causal relationship between body mass index and hypertension, with the proposed method effectively estimating the specific shape and trend of the relationship.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790626/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gene regulatory networks (GRNs) inform analyses of cellular state transitions, regulatory mechanisms, and disease processes. With the rapid development of single-cell sequencing technologies, accurate inference of GRNs from complex and high-dimensional single-cell transcriptomic data remains a core challenge. However, the effective use of multi-level structural and expression features among genes remains a major obstacle to improving inference accuracy. This study presents ATFGRN, an adaptive topology-feature fusion graph neural framework that integrates features from three complementary perspectives for accurate prediction of gene regulatory relationships. The subgraph structure encoding module focuses on local subgraphs of regulatory relationships and identifies structural patterns and topological dependencies. The expression-guided module integrates the gene expression matrix with the original regulatory network and employs a graph convolutional network with a self-attention mechanism to examine interactions between expression dynamics and network topology. The similarity structure module derives similarity information between genes through a KNN graph combined with a graph attention mechanism, which helps detect regulatory pairs with similar expression patterns that lack explicit structural links. Features from these three branches are fused through an attention-based weighting mechanism. This fusion achieves complementary integration of structural, expression, and similarity perspectives and produces more informative regulatory features for prediction. Evaluations on single-cell transcriptomic datasets across four types of networks show that ATFGRN improves AUROC performance by 5.09% over existing approaches, which confirms the effectiveness and applicability of its multi-perspective fusion strategy in GRN inference tasks.
{"title":"Revealing hidden regulatory dependencies: multi-perspective graph learning for single-cell gene regulatory network inference.","authors":"Wenying He, Rentao Zhang, Yaowei Zhu, Haolu Zhou, Yun Zuo, Yude Bai, Liang Yang, Fei Guo","doi":"10.1093/bib/bbaf733","DOIUrl":"10.1093/bib/bbaf733","url":null,"abstract":"<p><p>Gene regulatory networks (GRNs) inform analyses of cellular state transitions, regulatory mechanisms, and disease processes. With the rapid development of single-cell sequencing technologies, accurate inference of GRNs from complex and high-dimensional single-cell transcriptomic data remains a core challenge. However, the effective use of multi-level structural and expression features among genes remains a major obstacle to improving inference accuracy. This study presents ATFGRN, an adaptive topology-feature fusion graph neural framework that integrates features from three complementary perspectives for accurate prediction of gene regulatory relationships. The subgraph structure encoding module focuses on local subgraphs of regulatory relationships and identifies structural patterns and topological dependencies. The expression-guided module integrates the gene expression matrix with the original regulatory network and employs a graph convolutional network with a self-attention mechanism to examine interactions between expression dynamics and network topology. The similarity structure module derives similarity information between genes through a KNN graph combined with a graph attention mechanism, which helps detect regulatory pairs with similar expression patterns that lack explicit structural links. Features from these three branches are fused through an attention-based weighting mechanism. This fusion achieves complementary integration of structural, expression, and similarity perspectives and produces more informative regulatory features for prediction. Evaluations on single-cell transcriptomic datasets across four types of networks show that ATFGRN improves AUROC performance by 5.09% over existing approaches, which confirms the effectiveness and applicability of its multi-perspective fusion strategy in GRN inference tasks.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814975/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guangsheng Huang, Yali Peng, Shuai Wu, Hang Wei, Shigang Liu
Aberrant expression of microRNAs (miRNAs) is closely associated with the pathogenesis and progression of various diseases, particularly cancer, as well as therapeutic responses. Identification of miRNA-drug resistance associations is critical for drug screening and precision medicine. However, conventional experimental approaches remain time-consuming and labor-intensive, while existing computational methods often face challenge in capturing higher-order semantic inference from sparse prior bipartite association network. To address this, we propose MPHGNN, a heterogeneous graph convolutional network (GCN) architecture for predicting miRNA-drug resistance associations. MPHGNN constructs a miRNA-gene-drug heterogeneous network with multimodal biological features, including miRNA expression profiles, drug structural descriptors, and gene functional similarities, and leverages dual learning modules at both metapath and global levels to capture localized patterns and global representations simultaneously. Experimental results demonstrate that MPHGNN outperforms state-of-the-art methods and enhances the discriminative ability of association representations. Interpretability analyses further reveal that metapaths effectively capture underlying biological mechanisms, while the constructed heterogeneous biological network makes important contributions to prediction.
{"title":"MPHGNN: metapath-guided heterogeneous graph neural network for miRNA-drug resistance association prediction.","authors":"Guangsheng Huang, Yali Peng, Shuai Wu, Hang Wei, Shigang Liu","doi":"10.1093/bib/bbag013","DOIUrl":"10.1093/bib/bbag013","url":null,"abstract":"<p><p>Aberrant expression of microRNAs (miRNAs) is closely associated with the pathogenesis and progression of various diseases, particularly cancer, as well as therapeutic responses. Identification of miRNA-drug resistance associations is critical for drug screening and precision medicine. However, conventional experimental approaches remain time-consuming and labor-intensive, while existing computational methods often face challenge in capturing higher-order semantic inference from sparse prior bipartite association network. To address this, we propose MPHGNN, a heterogeneous graph convolutional network (GCN) architecture for predicting miRNA-drug resistance associations. MPHGNN constructs a miRNA-gene-drug heterogeneous network with multimodal biological features, including miRNA expression profiles, drug structural descriptors, and gene functional similarities, and leverages dual learning modules at both metapath and global levels to capture localized patterns and global representations simultaneously. Experimental results demonstrate that MPHGNN outperforms state-of-the-art methods and enhances the discriminative ability of association representations. Interpretability analyses further reveal that metapaths effectively capture underlying biological mechanisms, while the constructed heterogeneous biological network makes important contributions to prediction.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12853127/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146084292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}