Tianyi Chen, Wen Xue, Yunfei Zhang, Yongcan Luo, Cheng Liu, Wenjun Shen, Si Wu, Hau-San Wong
Spatial transcriptomics (ST) technologies have significantly advanced our ability to discern gene expression patterns within intact tissue structures, enabling unprecedented insights into cellular heterogeneity and tissue architecture. However, accurately determining cell-type proportions within spatially aggregated transcriptomic spots remains challenging due to inherent granularity discrepancies, batch effects, and spatial heterogeneity. To address these challenges, we introduce S$^{2}$potAE, a novel spatial spot autoencoder framework that integrates gene expression data, spatial coordinates, and morphological features from histology images for precise spot-level deconvolution. S$^{2}$potAE employs a multilevel feature aggregation strategy, systematically extracting and fusing spatially-aware features through a graph-based spatial encoder and perceptual image embeddings from histological patches. Furthermore, an auxiliary pathological classification task enhances biological relevance and model interpretability. Comprehensive benchmarking across multiple simulated and real datasets-including human breast cancer, mouse brain anterior, and human dorsolateral prefrontal cortex-demonstrates that S$^{2}$potAE consistently surpasses state-of-the-art methods in accuracy, robustness, and biological interpretability. Our approach effectively resolves complex cellular compositions, accurately identifies tumor boundaries, and captures nuanced cell-type distributions, significantly enhancing the utility of ST in biological research and clinical applications.
{"title":"S2potAE: multimodal spatial spot autoencoder integrating image and transcriptomic features for deconvolution.","authors":"Tianyi Chen, Wen Xue, Yunfei Zhang, Yongcan Luo, Cheng Liu, Wenjun Shen, Si Wu, Hau-San Wong","doi":"10.1093/bib/bbag020","DOIUrl":"10.1093/bib/bbag020","url":null,"abstract":"<p><p>Spatial transcriptomics (ST) technologies have significantly advanced our ability to discern gene expression patterns within intact tissue structures, enabling unprecedented insights into cellular heterogeneity and tissue architecture. However, accurately determining cell-type proportions within spatially aggregated transcriptomic spots remains challenging due to inherent granularity discrepancies, batch effects, and spatial heterogeneity. To address these challenges, we introduce S$^{2}$potAE, a novel spatial spot autoencoder framework that integrates gene expression data, spatial coordinates, and morphological features from histology images for precise spot-level deconvolution. S$^{2}$potAE employs a multilevel feature aggregation strategy, systematically extracting and fusing spatially-aware features through a graph-based spatial encoder and perceptual image embeddings from histological patches. Furthermore, an auxiliary pathological classification task enhances biological relevance and model interpretability. Comprehensive benchmarking across multiple simulated and real datasets-including human breast cancer, mouse brain anterior, and human dorsolateral prefrontal cortex-demonstrates that S$^{2}$potAE consistently surpasses state-of-the-art methods in accuracy, robustness, and biological interpretability. Our approach effectively resolves complex cellular compositions, accurately identifies tumor boundaries, and captures nuanced cell-type distributions, significantly enhancing the utility of ST in biological research and clinical applications.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12860387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146096731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minho Noh, Sungkyung Lee, Sunghyun Kim, Sangsoo Lim
Deciphering how molecular programs are spatially organized within tissues is pivotal for understanding tumor evolution and microenvironmental interactions. Existing spatial transcriptomics tools either rely on gene-level features, ignoring the rich topology of biological pathways, or deliver black-box clusters with little mechanistic insight; thus, they limit their translational impact. A method that simultaneously leverages pathway structures and spatially matched histopathology could produce domain delineations that are both accurate and biologically interpretable. We introduce PathCLAST (Pathway-augmented Contrastive Learning with Attention for interpretable Spatial Transcriptomics), which is a framework that integrates gene expression, histopathological images, and curated pathway graphs via bi-modal contrastive learning. By embedding expression profiles into biologically structured graphs, and aligning them with local image features, PathCLAST achieves state-of-the-art spatial domain identification on multiple public datasets, while offering pathway-level attention scores for mechanistic interpretation. The pathway embedding also serves as an explicit, biology-informed dimensionality reduction scheme. PathCLAST not only uncovers domain-specific pathways and spatially organized signaling activities, but also quantifies intra-domain heterogeneity, spatial autocorrelation, and inter-domain crosstalk, providing fine-grained insights into tumor progression and tissue architecture. PathCLAST is available at https://github.com/sslim-aidrug/PathCLAST.
破译分子程序如何在组织内的空间组织是理解肿瘤进化和微环境相互作用的关键。现有的空间转录组学工具要么依赖于基因水平的特征,忽视了生物途径的丰富拓扑结构,要么提供缺乏机制洞察力的黑盒集群;因此,它们限制了它们的翻译影响。一种同时利用通路结构和空间匹配的组织病理学的方法可以产生既准确又具有生物学可解释性的区域描绘。我们介绍了PathCLAST (pathway -augmented contrast Learning with Attention for interpretable Spatial Transcriptomics),这是一个整合了基因表达、组织病理图像和通过双模对比学习的路径图的框架。通过将表达谱嵌入到生物结构图中,并将其与局部图像特征对齐,PathCLAST在多个公共数据集上实现了最先进的空间域识别,同时为机制解释提供路径级注意力评分。路径嵌入也可以作为一个明确的,生物学知情的降维方案。PathCLAST不仅揭示了区域特异性通路和空间组织的信号活动,还量化了区域内异质性、空间自相关性和区域间串扰,为肿瘤进展和组织结构提供了细粒度的见解。PathCLAST可从https://github.com/sslim-aidrug/PathCLAST获得。
{"title":"PathCLAST: pathway-augmented contrastive learning with attention for interpretable spatial transcriptomics.","authors":"Minho Noh, Sungkyung Lee, Sunghyun Kim, Sangsoo Lim","doi":"10.1093/bib/bbag029","DOIUrl":"10.1093/bib/bbag029","url":null,"abstract":"<p><p>Deciphering how molecular programs are spatially organized within tissues is pivotal for understanding tumor evolution and microenvironmental interactions. Existing spatial transcriptomics tools either rely on gene-level features, ignoring the rich topology of biological pathways, or deliver black-box clusters with little mechanistic insight; thus, they limit their translational impact. A method that simultaneously leverages pathway structures and spatially matched histopathology could produce domain delineations that are both accurate and biologically interpretable. We introduce PathCLAST (Pathway-augmented Contrastive Learning with Attention for interpretable Spatial Transcriptomics), which is a framework that integrates gene expression, histopathological images, and curated pathway graphs via bi-modal contrastive learning. By embedding expression profiles into biologically structured graphs, and aligning them with local image features, PathCLAST achieves state-of-the-art spatial domain identification on multiple public datasets, while offering pathway-level attention scores for mechanistic interpretation. The pathway embedding also serves as an explicit, biology-informed dimensionality reduction scheme. PathCLAST not only uncovers domain-specific pathways and spatially organized signaling activities, but also quantifies intra-domain heterogeneity, spatial autocorrelation, and inter-domain crosstalk, providing fine-grained insights into tumor progression and tissue architecture. PathCLAST is available at https://github.com/sslim-aidrug/PathCLAST.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12862980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Syed Raza Abbas, Zeeshan Abbas, Arifa Zahir, Mobeen Ur Rehman, Seung Won Lee
TP53 encodes a master tumor suppressor, and understanding its evolutionary constraints is critical for interpreting pathogenic variation. We developed a fully reproducible computational pipeline integrating evolutionary genomics, structural biology, and clinical variant analysis to systematically prioritize functionally critical residues in TP53. We used fixed effects likelihood and fast unconstrained Bayesian approximation to perform genome-wide alignment, maximum-likelihood phylogenetic estimation, and site-specific selection testing over 19 vertebrate orthologs. We mapped these evolutionary signals onto the AlphaFold-predicted structure and integrated 3936 human variants from ClinVar and UniProt. Selection analysis identified five sites under positive or diversifying selection, with a single consensus position detected by both methods: multiple-sequence-alignment position 606 (human codon 129) in the DNA-binding domain. Structural mapping revealed that pathogenic variants concentrate at the DNA-contacting interface, with residues 239-248 emerging as the highest-priority targets based on our composite scoring system that integrates evolutionary constraint, pathogenic burden, hotspot density, and domain importance. Machine learning validation under leave-one-out cross-validation (LOOCV) demonstrated robust predictive performance. A Ridge-ExtraTrees ensemble achieved $textrm{MAE (mean absolute error)}=2.84$, $textrm{RMSE(root mean squared error)}=3.72$, $R^{2}=0.91$ for phylogenetic-distance regression and 89.5% accuracy (17/19) for clade classification. A multi-branch deep neural network attained comparable results ($textrm{MAE}=2.33$, $textrm{RMSE}=2.56$, $R^{2}=0.86$), while Random Forest substantially underperformed ($textrm{MAE}approx 7.19$, $textrm{RMSE}approx 8.82$, $R^{2}approx 0.47$, accuracy $approx 63%$) due to shrinkage and class-imbalance bias. Our findings show that evolutionary signals and clinical variants converge within the structurally constrained DNA-binding core of TP53, with codon 129 representing a robust positive-selection site and residues 239-248 constituting the primary pathogenic hotspot. This AlphaFold-anchored, LOOCV-validated framework offers a systematic, generalizable approach for residue-level prioritization to guide mechanistic studies and potentially inform precision oncology applications pending experimental validation.
{"title":"Phylogenomics to structure: evolutionary and clinical signals in the TP53 DNA-binding core through LOOCV-validated ensemble learning.","authors":"Syed Raza Abbas, Zeeshan Abbas, Arifa Zahir, Mobeen Ur Rehman, Seung Won Lee","doi":"10.1093/bib/bbag087","DOIUrl":"10.1093/bib/bbag087","url":null,"abstract":"<p><p>TP53 encodes a master tumor suppressor, and understanding its evolutionary constraints is critical for interpreting pathogenic variation. We developed a fully reproducible computational pipeline integrating evolutionary genomics, structural biology, and clinical variant analysis to systematically prioritize functionally critical residues in TP53. We used fixed effects likelihood and fast unconstrained Bayesian approximation to perform genome-wide alignment, maximum-likelihood phylogenetic estimation, and site-specific selection testing over 19 vertebrate orthologs. We mapped these evolutionary signals onto the AlphaFold-predicted structure and integrated 3936 human variants from ClinVar and UniProt. Selection analysis identified five sites under positive or diversifying selection, with a single consensus position detected by both methods: multiple-sequence-alignment position 606 (human codon 129) in the DNA-binding domain. Structural mapping revealed that pathogenic variants concentrate at the DNA-contacting interface, with residues 239-248 emerging as the highest-priority targets based on our composite scoring system that integrates evolutionary constraint, pathogenic burden, hotspot density, and domain importance. Machine learning validation under leave-one-out cross-validation (LOOCV) demonstrated robust predictive performance. A Ridge-ExtraTrees ensemble achieved $textrm{MAE (mean absolute error)}=2.84$, $textrm{RMSE(root mean squared error)}=3.72$, $R^{2}=0.91$ for phylogenetic-distance regression and 89.5% accuracy (17/19) for clade classification. A multi-branch deep neural network attained comparable results ($textrm{MAE}=2.33$, $textrm{RMSE}=2.56$, $R^{2}=0.86$), while Random Forest substantially underperformed ($textrm{MAE}approx 7.19$, $textrm{RMSE}approx 8.82$, $R^{2}approx 0.47$, accuracy $approx 63%$) due to shrinkage and class-imbalance bias. Our findings show that evolutionary signals and clinical variants converge within the structurally constrained DNA-binding core of TP53, with codon 129 representing a robust positive-selection site and residues 239-248 constituting the primary pathogenic hotspot. This AlphaFold-anchored, LOOCV-validated framework offers a systematic, generalizable approach for residue-level prioritization to guide mechanistic studies and potentially inform precision oncology applications pending experimental validation.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12936793/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147289331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sizhe Qiu, Haris Saeed, Will Leonard, Feiran Li, Aidong Yang
Enzyme catalysis, with its advantages in environmental sustainability and efficiency, is gaining traction across diverse industrial applications, such as waste utilization and pharmaceutical biomanufacturing. However, optimizing enzyme catalytic activity remains a significant challenge. To facilitate enzyme mining and engineering, machine learning (ML) models have emerged to predict enzyme substrate specificity, enzyme turnover number, and enzyme catalytic optimum. This review endeavored to assist researchers in effectively utilizing predictive models for enzyme catalytic activity through presenting recent advancements and analyzing different approaches. We also pointed out existing limitations (e.g. dataset imbalance) and offered suggestions on potential enhancements to address them. We identified that the attention mechanism, inclusion of new features such as product information and temperature, and using transfer learning to leverage different datasets were three main useful modeling strategies. Furthermore, we envisaged that accurate predictors of enzyme catalytic activity would potentially transform enzyme and metabolic engineering, and the optimization of biocatalysis.
{"title":"Machine learning for enzyme catalytic activity: current progress and future horizons.","authors":"Sizhe Qiu, Haris Saeed, Will Leonard, Feiran Li, Aidong Yang","doi":"10.1093/bib/bbag002","DOIUrl":"10.1093/bib/bbag002","url":null,"abstract":"<p><p>Enzyme catalysis, with its advantages in environmental sustainability and efficiency, is gaining traction across diverse industrial applications, such as waste utilization and pharmaceutical biomanufacturing. However, optimizing enzyme catalytic activity remains a significant challenge. To facilitate enzyme mining and engineering, machine learning (ML) models have emerged to predict enzyme substrate specificity, enzyme turnover number, and enzyme catalytic optimum. This review endeavored to assist researchers in effectively utilizing predictive models for enzyme catalytic activity through presenting recent advancements and analyzing different approaches. We also pointed out existing limitations (e.g. dataset imbalance) and offered suggestions on potential enhancements to address them. We identified that the attention mechanism, inclusion of new features such as product information and temperature, and using transfer learning to leverage different datasets were three main useful modeling strategies. Furthermore, we envisaged that accurate predictors of enzyme catalytic activity would potentially transform enzyme and metabolic engineering, and the optimization of biocatalysis.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12832030/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146046150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federico Julian Camerota Verdù, Andrea Gasparin, Luca Bortolussi, Lorenzo Castelli
Phylogenetic inference is a key challenge in computational biology, with applications ranging from evolutionary analysis to comparative genomics. The balanced minimum evolution problem (BMEP) offers a well-established formulation of this problem, but remains computationally intractable for large instances. In this work, we propose a reinforcement learning (RL) framework to tackle the BMEP through local search in the space of phylogenetic trees. Our contributions are three-fold: (i) we introduce an improved RL formulation tailored to the structure of phylogenetic inference in the context of the BMEP; (ii) we train an RL agent capable of solving instances with up to 100 taxa; and (iii) we investigate the generalization capabilities of the learned policy across different substitution models, instance sizes, and datasets. To address the limitations of relying solely on the learned policy at inference time, we integrate it into a novel search-based framework that enables effective adaptation during evaluation. Experimental results show that our method outperforms greedy heuristics and matches the performance of state-of-the-art algorithms for the BMEP. When tested under significant distributional shifts, we greatly reduce the gap with state-of-the-art algorithms. This demonstrates the potential of RL applications to phylogenetic inference.
{"title":"Learning to explore tree neighbourhoods for phylogenetic inference.","authors":"Federico Julian Camerota Verdù, Andrea Gasparin, Luca Bortolussi, Lorenzo Castelli","doi":"10.1093/bib/bbaf732","DOIUrl":"10.1093/bib/bbaf732","url":null,"abstract":"<p><p>Phylogenetic inference is a key challenge in computational biology, with applications ranging from evolutionary analysis to comparative genomics. The balanced minimum evolution problem (BMEP) offers a well-established formulation of this problem, but remains computationally intractable for large instances. In this work, we propose a reinforcement learning (RL) framework to tackle the BMEP through local search in the space of phylogenetic trees. Our contributions are three-fold: (i) we introduce an improved RL formulation tailored to the structure of phylogenetic inference in the context of the BMEP; (ii) we train an RL agent capable of solving instances with up to 100 taxa; and (iii) we investigate the generalization capabilities of the learned policy across different substitution models, instance sizes, and datasets. To address the limitations of relying solely on the learned policy at inference time, we integrate it into a novel search-based framework that enables effective adaptation during evaluation. Experimental results show that our method outperforms greedy heuristics and matches the performance of state-of-the-art algorithms for the BMEP. When tested under significant distributional shifts, we greatly reduce the gap with state-of-the-art algorithms. This demonstrates the potential of RL applications to phylogenetic inference.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abnormal dysregulation of microRNAs (miRNAs) expression may lead to a wide spectrum of diseases, and as miRNAs play pivotal roles in disease pathogenesis, diagnosis, and therapy, identifying potential miRNA-disease associations (MDAs) is essential for discovering new diagnostic biomarkers, developing targeted therapeutic strategies, and advancing personalized medicine. Traditional wet-lab experiments are time-consuming, expensive, and consume a lot of resources. Hence, various computational approaches should be developed as auxiliary a priori tools. In the following text, we compile different methods proposed for MDA prediction over the past decade. First, we analyze the data resources supporting MDA studies and introduce approaches for quantifying similarities among MDAs. Second, we comprehensively review 66 computational methods, classify them into five categories, and present comparative experimental analyses on selected methods to identify future research directions. To enhance accessibility, we upload a summary of discussed methods to a GitHub repository (https://github.com/xiesiya/miRNA-disease-association-methods). This review provides comprehensive background knowledge on computational methods for future MDA prediction research.
{"title":"From biogenesis to deep modeling: a holistic review of miRNA-disease prediction computational methods with experimental comparison.","authors":"Siya Xie, K L Eddie Law","doi":"10.1093/bib/bbaf736","DOIUrl":"10.1093/bib/bbaf736","url":null,"abstract":"<p><p>Abnormal dysregulation of microRNAs (miRNAs) expression may lead to a wide spectrum of diseases, and as miRNAs play pivotal roles in disease pathogenesis, diagnosis, and therapy, identifying potential miRNA-disease associations (MDAs) is essential for discovering new diagnostic biomarkers, developing targeted therapeutic strategies, and advancing personalized medicine. Traditional wet-lab experiments are time-consuming, expensive, and consume a lot of resources. Hence, various computational approaches should be developed as auxiliary a priori tools. In the following text, we compile different methods proposed for MDA prediction over the past decade. First, we analyze the data resources supporting MDA studies and introduce approaches for quantifying similarities among MDAs. Second, we comprehensively review 66 computational methods, classify them into five categories, and present comparative experimental analyses on selected methods to identify future research directions. To enhance accessibility, we upload a summary of discussed methods to a GitHub repository (https://github.com/xiesiya/miRNA-disease-association-methods). This review provides comprehensive background knowledge on computational methods for future MDA prediction research.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: MetImputBERT: a pretrained BERT framework for missing value imputation in NMR metabolomics data.","authors":"","doi":"10.1093/bib/bbag113","DOIUrl":"10.1093/bib/bbag113","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12935011/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147289272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charting the eukaryotic epitranscriptome by direct RNA sequencing is promising but still very challenging, as current bioinformatics tools are based on modification-unaware software and require multiple modification-specific learning steps. Here, we introduce NanoSpeech, a modification-aware basecaller for the ab initio simultaneous detection of multiple modified bases using a transformer model, and NanoListener, which implements a simulated randomers strategy for robust training datasets and a new generation of ONT basecallers. NanoListener and NanoSpeech are independent of the specific ONT chemistry. Once a training dataset has been created, a single model with an expanded vocabulary can accurately basecall both unmodified and modified bases.
{"title":"Ab initio detection of multiple epitranscriptomic modifications from Oxford nanopore technology direct RNA sequencing data.","authors":"Adriano Fonzino, Bruno Fosso, Grazia Visci, Carmela Gissi, Graziano Pesole, Ernesto Picardi","doi":"10.1093/bib/bbaf709","DOIUrl":"10.1093/bib/bbaf709","url":null,"abstract":"<p><p>Charting the eukaryotic epitranscriptome by direct RNA sequencing is promising but still very challenging, as current bioinformatics tools are based on modification-unaware software and require multiple modification-specific learning steps. Here, we introduce NanoSpeech, a modification-aware basecaller for the ab initio simultaneous detection of multiple modified bases using a transformer model, and NanoListener, which implements a simulated randomers strategy for robust training datasets and a new generation of ONT basecallers. NanoListener and NanoSpeech are independent of the specific ONT chemistry. Once a training dataset has been created, a single model with an expanded vocabulary can accurately basecall both unmodified and modified bases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12848937/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146060091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinlu Zhang, Xuting Zhang, Yizheng Dai, Xin Shao, Xiaohui Fan
Identifying drug-drug interactions (DDIs) is a critical task in pharmaceutical research and clinical applications, as these interactions can pose serious medical risks. Deep learning models, known for their ability to accurately predict DDIs, have become powerful tools for enhancing prediction accuracy and efficiency. However, many existing approaches fail to fully incorporate chemical information and lack interpretability when exploring DDI mechanisms. In this work, we propose TRACE, a transformer-based graph representation learning framework that integrates chemical knowledge into DDI prediction. Extensive experiments demonstrate that TRACE outperforms state-of-the-art baseline models under both in-distribution and out-of-distribution settings, highlighting its strong predictive performance and generalization ability. In terms of interpretability, TRACE leverages its attention mechanism to effectively identify high-risk substructures that may trigger DDIs. In summary, TRACE not only provides new perspectives for elucidating the underlying causes of DDIs through interpretable substructure analysis but also offers robust predictive performance to support drug development and combination therapy.
{"title":"Transformer-based graphs for drug-drug interaction with chemical knowledge embedding.","authors":"Jinlu Zhang, Xuting Zhang, Yizheng Dai, Xin Shao, Xiaohui Fan","doi":"10.1093/bib/bbag039","DOIUrl":"10.1093/bib/bbag039","url":null,"abstract":"<p><p>Identifying drug-drug interactions (DDIs) is a critical task in pharmaceutical research and clinical applications, as these interactions can pose serious medical risks. Deep learning models, known for their ability to accurately predict DDIs, have become powerful tools for enhancing prediction accuracy and efficiency. However, many existing approaches fail to fully incorporate chemical information and lack interpretability when exploring DDI mechanisms. In this work, we propose TRACE, a transformer-based graph representation learning framework that integrates chemical knowledge into DDI prediction. Extensive experiments demonstrate that TRACE outperforms state-of-the-art baseline models under both in-distribution and out-of-distribution settings, highlighting its strong predictive performance and generalization ability. In terms of interpretability, TRACE leverages its attention mechanism to effectively identify high-risk substructures that may trigger DDIs. In summary, TRACE not only provides new perspectives for elucidating the underlying causes of DDIs through interpretable substructure analysis but also offers robust predictive performance to support drug development and combination therapy.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12908692/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146207031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guangsheng Huang, Yali Peng, Shuai Wu, Hang Wei, Shigang Liu
Aberrant expression of microRNAs (miRNAs) is closely associated with the pathogenesis and progression of various diseases, particularly cancer, as well as therapeutic responses. Identification of miRNA-drug resistance associations is critical for drug screening and precision medicine. However, conventional experimental approaches remain time-consuming and labor-intensive, while existing computational methods often face challenge in capturing higher-order semantic inference from sparse prior bipartite association network. To address this, we propose MPHGNN, a heterogeneous graph convolutional network (GCN) architecture for predicting miRNA-drug resistance associations. MPHGNN constructs a miRNA-gene-drug heterogeneous network with multimodal biological features, including miRNA expression profiles, drug structural descriptors, and gene functional similarities, and leverages dual learning modules at both metapath and global levels to capture localized patterns and global representations simultaneously. Experimental results demonstrate that MPHGNN outperforms state-of-the-art methods and enhances the discriminative ability of association representations. Interpretability analyses further reveal that metapaths effectively capture underlying biological mechanisms, while the constructed heterogeneous biological network makes important contributions to prediction.
{"title":"MPHGNN: metapath-guided heterogeneous graph neural network for miRNA-drug resistance association prediction.","authors":"Guangsheng Huang, Yali Peng, Shuai Wu, Hang Wei, Shigang Liu","doi":"10.1093/bib/bbag013","DOIUrl":"10.1093/bib/bbag013","url":null,"abstract":"<p><p>Aberrant expression of microRNAs (miRNAs) is closely associated with the pathogenesis and progression of various diseases, particularly cancer, as well as therapeutic responses. Identification of miRNA-drug resistance associations is critical for drug screening and precision medicine. However, conventional experimental approaches remain time-consuming and labor-intensive, while existing computational methods often face challenge in capturing higher-order semantic inference from sparse prior bipartite association network. To address this, we propose MPHGNN, a heterogeneous graph convolutional network (GCN) architecture for predicting miRNA-drug resistance associations. MPHGNN constructs a miRNA-gene-drug heterogeneous network with multimodal biological features, including miRNA expression profiles, drug structural descriptors, and gene functional similarities, and leverages dual learning modules at both metapath and global levels to capture localized patterns and global representations simultaneously. Experimental results demonstrate that MPHGNN outperforms state-of-the-art methods and enhances the discriminative ability of association representations. Interpretability analyses further reveal that metapaths effectively capture underlying biological mechanisms, while the constructed heterogeneous biological network makes important contributions to prediction.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12853127/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146084292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}