首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
S2potAE: multimodal spatial spot autoencoder integrating image and transcriptomic features for deconvolution. S2potAE:融合图像和转录组特征的多模态空间点自编码器,用于反卷积。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag020
Tianyi Chen, Wen Xue, Yunfei Zhang, Yongcan Luo, Cheng Liu, Wenjun Shen, Si Wu, Hau-San Wong

Spatial transcriptomics (ST) technologies have significantly advanced our ability to discern gene expression patterns within intact tissue structures, enabling unprecedented insights into cellular heterogeneity and tissue architecture. However, accurately determining cell-type proportions within spatially aggregated transcriptomic spots remains challenging due to inherent granularity discrepancies, batch effects, and spatial heterogeneity. To address these challenges, we introduce S$^{2}$potAE, a novel spatial spot autoencoder framework that integrates gene expression data, spatial coordinates, and morphological features from histology images for precise spot-level deconvolution. S$^{2}$potAE employs a multilevel feature aggregation strategy, systematically extracting and fusing spatially-aware features through a graph-based spatial encoder and perceptual image embeddings from histological patches. Furthermore, an auxiliary pathological classification task enhances biological relevance and model interpretability. Comprehensive benchmarking across multiple simulated and real datasets-including human breast cancer, mouse brain anterior, and human dorsolateral prefrontal cortex-demonstrates that S$^{2}$potAE consistently surpasses state-of-the-art methods in accuracy, robustness, and biological interpretability. Our approach effectively resolves complex cellular compositions, accurately identifies tumor boundaries, and captures nuanced cell-type distributions, significantly enhancing the utility of ST in biological research and clinical applications.

空间转录组学(ST)技术极大地提高了我们在完整组织结构中识别基因表达模式的能力,使我们能够前所未有地了解细胞异质性和组织结构。然而,由于固有的粒度差异、批效应和空间异质性,准确确定空间聚集的转录组点内的细胞类型比例仍然具有挑战性。为了解决这些挑战,我们引入了S$^{2}$potAE,这是一种新型的空间点自编码器框架,它集成了基因表达数据、空间坐标和来自组织学图像的形态学特征,以实现精确的点级反卷积。该算法采用多层次特征聚合策略,通过基于图的空间编码器和组织斑块的感知图像嵌入,系统地提取和融合空间感知特征。此外,辅助病理分类任务增强了生物学相关性和模型可解释性。对多个模拟和真实数据集(包括人类乳腺癌、小鼠大脑前部和人类背外侧前额叶皮层)的综合基准测试表明,S$^{2}$potAE在准确性、稳健性和生物学可解释性方面始终优于最先进的方法。我们的方法有效地解决了复杂的细胞组成,准确地识别肿瘤边界,并捕获细微的细胞类型分布,显著提高了ST在生物学研究和临床应用中的效用。
{"title":"S2potAE: multimodal spatial spot autoencoder integrating image and transcriptomic features for deconvolution.","authors":"Tianyi Chen, Wen Xue, Yunfei Zhang, Yongcan Luo, Cheng Liu, Wenjun Shen, Si Wu, Hau-San Wong","doi":"10.1093/bib/bbag020","DOIUrl":"10.1093/bib/bbag020","url":null,"abstract":"<p><p>Spatial transcriptomics (ST) technologies have significantly advanced our ability to discern gene expression patterns within intact tissue structures, enabling unprecedented insights into cellular heterogeneity and tissue architecture. However, accurately determining cell-type proportions within spatially aggregated transcriptomic spots remains challenging due to inherent granularity discrepancies, batch effects, and spatial heterogeneity. To address these challenges, we introduce S$^{2}$potAE, a novel spatial spot autoencoder framework that integrates gene expression data, spatial coordinates, and morphological features from histology images for precise spot-level deconvolution. S$^{2}$potAE employs a multilevel feature aggregation strategy, systematically extracting and fusing spatially-aware features through a graph-based spatial encoder and perceptual image embeddings from histological patches. Furthermore, an auxiliary pathological classification task enhances biological relevance and model interpretability. Comprehensive benchmarking across multiple simulated and real datasets-including human breast cancer, mouse brain anterior, and human dorsolateral prefrontal cortex-demonstrates that S$^{2}$potAE consistently surpasses state-of-the-art methods in accuracy, robustness, and biological interpretability. Our approach effectively resolves complex cellular compositions, accurately identifies tumor boundaries, and captures nuanced cell-type distributions, significantly enhancing the utility of ST in biological research and clinical applications.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12860387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146096731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PathCLAST: pathway-augmented contrastive learning with attention for interpretable spatial transcriptomics. PathCLAST:可解释空间转录组学的通路增强对比学习。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag029
Minho Noh, Sungkyung Lee, Sunghyun Kim, Sangsoo Lim

Deciphering how molecular programs are spatially organized within tissues is pivotal for understanding tumor evolution and microenvironmental interactions. Existing spatial transcriptomics tools either rely on gene-level features, ignoring the rich topology of biological pathways, or deliver black-box clusters with little mechanistic insight; thus, they limit their translational impact. A method that simultaneously leverages pathway structures and spatially matched histopathology could produce domain delineations that are both accurate and biologically interpretable. We introduce PathCLAST (Pathway-augmented Contrastive Learning with Attention for interpretable Spatial Transcriptomics), which is a framework that integrates gene expression, histopathological images, and curated pathway graphs via bi-modal contrastive learning. By embedding expression profiles into biologically structured graphs, and aligning them with local image features, PathCLAST achieves state-of-the-art spatial domain identification on multiple public datasets, while offering pathway-level attention scores for mechanistic interpretation. The pathway embedding also serves as an explicit, biology-informed dimensionality reduction scheme. PathCLAST not only uncovers domain-specific pathways and spatially organized signaling activities, but also quantifies intra-domain heterogeneity, spatial autocorrelation, and inter-domain crosstalk, providing fine-grained insights into tumor progression and tissue architecture. PathCLAST is available at https://github.com/sslim-aidrug/PathCLAST.

破译分子程序如何在组织内的空间组织是理解肿瘤进化和微环境相互作用的关键。现有的空间转录组学工具要么依赖于基因水平的特征,忽视了生物途径的丰富拓扑结构,要么提供缺乏机制洞察力的黑盒集群;因此,它们限制了它们的翻译影响。一种同时利用通路结构和空间匹配的组织病理学的方法可以产生既准确又具有生物学可解释性的区域描绘。我们介绍了PathCLAST (pathway -augmented contrast Learning with Attention for interpretable Spatial Transcriptomics),这是一个整合了基因表达、组织病理图像和通过双模对比学习的路径图的框架。通过将表达谱嵌入到生物结构图中,并将其与局部图像特征对齐,PathCLAST在多个公共数据集上实现了最先进的空间域识别,同时为机制解释提供路径级注意力评分。路径嵌入也可以作为一个明确的,生物学知情的降维方案。PathCLAST不仅揭示了区域特异性通路和空间组织的信号活动,还量化了区域内异质性、空间自相关性和区域间串扰,为肿瘤进展和组织结构提供了细粒度的见解。PathCLAST可从https://github.com/sslim-aidrug/PathCLAST获得。
{"title":"PathCLAST: pathway-augmented contrastive learning with attention for interpretable spatial transcriptomics.","authors":"Minho Noh, Sungkyung Lee, Sunghyun Kim, Sangsoo Lim","doi":"10.1093/bib/bbag029","DOIUrl":"10.1093/bib/bbag029","url":null,"abstract":"<p><p>Deciphering how molecular programs are spatially organized within tissues is pivotal for understanding tumor evolution and microenvironmental interactions. Existing spatial transcriptomics tools either rely on gene-level features, ignoring the rich topology of biological pathways, or deliver black-box clusters with little mechanistic insight; thus, they limit their translational impact. A method that simultaneously leverages pathway structures and spatially matched histopathology could produce domain delineations that are both accurate and biologically interpretable. We introduce PathCLAST (Pathway-augmented Contrastive Learning with Attention for interpretable Spatial Transcriptomics), which is a framework that integrates gene expression, histopathological images, and curated pathway graphs via bi-modal contrastive learning. By embedding expression profiles into biologically structured graphs, and aligning them with local image features, PathCLAST achieves state-of-the-art spatial domain identification on multiple public datasets, while offering pathway-level attention scores for mechanistic interpretation. The pathway embedding also serves as an explicit, biology-informed dimensionality reduction scheme. PathCLAST not only uncovers domain-specific pathways and spatially organized signaling activities, but also quantifies intra-domain heterogeneity, spatial autocorrelation, and inter-domain crosstalk, providing fine-grained insights into tumor progression and tissue architecture. PathCLAST is available at https://github.com/sslim-aidrug/PathCLAST.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12862980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146103860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phylogenomics to structure: evolutionary and clinical signals in the TP53 DNA-binding core through LOOCV-validated ensemble learning. 系统基因组学到结构:通过loocv验证的集成学习在TP53 dna结合核心中的进化和临床信号。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag087
Syed Raza Abbas, Zeeshan Abbas, Arifa Zahir, Mobeen Ur Rehman, Seung Won Lee

TP53 encodes a master tumor suppressor, and understanding its evolutionary constraints is critical for interpreting pathogenic variation. We developed a fully reproducible computational pipeline integrating evolutionary genomics, structural biology, and clinical variant analysis to systematically prioritize functionally critical residues in TP53. We used fixed effects likelihood and fast unconstrained Bayesian approximation to perform genome-wide alignment, maximum-likelihood phylogenetic estimation, and site-specific selection testing over 19 vertebrate orthologs. We mapped these evolutionary signals onto the AlphaFold-predicted structure and integrated 3936 human variants from ClinVar and UniProt. Selection analysis identified five sites under positive or diversifying selection, with a single consensus position detected by both methods: multiple-sequence-alignment position 606 (human codon 129) in the DNA-binding domain. Structural mapping revealed that pathogenic variants concentrate at the DNA-contacting interface, with residues 239-248 emerging as the highest-priority targets based on our composite scoring system that integrates evolutionary constraint, pathogenic burden, hotspot density, and domain importance. Machine learning validation under leave-one-out cross-validation (LOOCV) demonstrated robust predictive performance. A Ridge-ExtraTrees ensemble achieved $textrm{MAE (mean absolute error)}=2.84$, $textrm{RMSE(root mean squared error)}=3.72$, $R^{2}=0.91$ for phylogenetic-distance regression and 89.5% accuracy (17/19) for clade classification. A multi-branch deep neural network attained comparable results ($textrm{MAE}=2.33$, $textrm{RMSE}=2.56$, $R^{2}=0.86$), while Random Forest substantially underperformed ($textrm{MAE}approx 7.19$, $textrm{RMSE}approx 8.82$, $R^{2}approx 0.47$, accuracy $approx 63%$) due to shrinkage and class-imbalance bias. Our findings show that evolutionary signals and clinical variants converge within the structurally constrained DNA-binding core of TP53, with codon 129 representing a robust positive-selection site and residues 239-248 constituting the primary pathogenic hotspot. This AlphaFold-anchored, LOOCV-validated framework offers a systematic, generalizable approach for residue-level prioritization to guide mechanistic studies and potentially inform precision oncology applications pending experimental validation.

TP53编码一种主要的肿瘤抑制因子,了解其进化限制对解释致病变异至关重要。我们开发了一个完全可重复的计算管道,整合了进化基因组学、结构生物学和临床变异分析,系统地优先考虑TP53中功能关键残基。我们使用固定效应似然和快速无约束贝叶斯近似对19种脊椎动物同源物进行全基因组比对、最大似然系统发育估计和位点特异性选择测试。我们将这些进化信号映射到alphafold预测的结构上,并整合了来自ClinVar和UniProt的3936个人类变体。选择分析确定了5个正选择或多样化选择的位点,两种方法都检测到一个一致的位置:dna结合域的多序列比对位置606(人类密码子129)。基于进化约束、致病负担、热点密度和结构域重要性的综合评分系统,结构图谱显示致病变异集中在dna接触界面,残基239-248是最优先的目标。在留一交叉验证(LOOCV)下的机器学习验证显示出稳健的预测性能。Ridge-ExtraTrees集成在系统发育距离回归中获得$textrm{MAE(平均绝对误差)}=2.84$,$textrm{RMSE(均方根误差)}=3.72$,$R^{2}=0.91$,进化枝分类准确率为89.5% $(17/19)。多分支深度神经网络获得了相当的结果($textrm{MAE}=2.33$, $textrm{RMSE}=2.56$, $R^{2}=0.86$),而随机森林由于收缩和类别不平衡偏差而表现不佳($textrm{MAE}约7.19$,$textrm{RMSE}约8.82$,$R^{2}约0.47$,准确率$约63% $)。我们的研究结果表明,进化信号和临床变异在结构受限的TP53 dna结合核心内汇聚,密码子129代表一个强大的正选择位点,残基239-248构成主要致病热点。这个以alphafold为基础、经过loocv验证的框架为残留物水平的优先级排序提供了一个系统的、可推广的方法,以指导机制研究,并可能为有待实验验证的精确肿瘤学应用提供信息。
{"title":"Phylogenomics to structure: evolutionary and clinical signals in the TP53 DNA-binding core through LOOCV-validated ensemble learning.","authors":"Syed Raza Abbas, Zeeshan Abbas, Arifa Zahir, Mobeen Ur Rehman, Seung Won Lee","doi":"10.1093/bib/bbag087","DOIUrl":"10.1093/bib/bbag087","url":null,"abstract":"<p><p>TP53 encodes a master tumor suppressor, and understanding its evolutionary constraints is critical for interpreting pathogenic variation. We developed a fully reproducible computational pipeline integrating evolutionary genomics, structural biology, and clinical variant analysis to systematically prioritize functionally critical residues in TP53. We used fixed effects likelihood and fast unconstrained Bayesian approximation to perform genome-wide alignment, maximum-likelihood phylogenetic estimation, and site-specific selection testing over 19 vertebrate orthologs. We mapped these evolutionary signals onto the AlphaFold-predicted structure and integrated 3936 human variants from ClinVar and UniProt. Selection analysis identified five sites under positive or diversifying selection, with a single consensus position detected by both methods: multiple-sequence-alignment position 606 (human codon 129) in the DNA-binding domain. Structural mapping revealed that pathogenic variants concentrate at the DNA-contacting interface, with residues 239-248 emerging as the highest-priority targets based on our composite scoring system that integrates evolutionary constraint, pathogenic burden, hotspot density, and domain importance. Machine learning validation under leave-one-out cross-validation (LOOCV) demonstrated robust predictive performance. A Ridge-ExtraTrees ensemble achieved $textrm{MAE (mean absolute error)}=2.84$, $textrm{RMSE(root mean squared error)}=3.72$, $R^{2}=0.91$ for phylogenetic-distance regression and 89.5% accuracy (17/19) for clade classification. A multi-branch deep neural network attained comparable results ($textrm{MAE}=2.33$, $textrm{RMSE}=2.56$, $R^{2}=0.86$), while Random Forest substantially underperformed ($textrm{MAE}approx 7.19$, $textrm{RMSE}approx 8.82$, $R^{2}approx 0.47$, accuracy $approx 63%$) due to shrinkage and class-imbalance bias. Our findings show that evolutionary signals and clinical variants converge within the structurally constrained DNA-binding core of TP53, with codon 129 representing a robust positive-selection site and residues 239-248 constituting the primary pathogenic hotspot. This AlphaFold-anchored, LOOCV-validated framework offers a systematic, generalizable approach for residue-level prioritization to guide mechanistic studies and potentially inform precision oncology applications pending experimental validation.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12936793/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147289331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning for enzyme catalytic activity: current progress and future horizons. 酶催化活性的机器学习:当前进展和未来前景。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag002
Sizhe Qiu, Haris Saeed, Will Leonard, Feiran Li, Aidong Yang

Enzyme catalysis, with its advantages in environmental sustainability and efficiency, is gaining traction across diverse industrial applications, such as waste utilization and pharmaceutical biomanufacturing. However, optimizing enzyme catalytic activity remains a significant challenge. To facilitate enzyme mining and engineering, machine learning (ML) models have emerged to predict enzyme substrate specificity, enzyme turnover number, and enzyme catalytic optimum. This review endeavored to assist researchers in effectively utilizing predictive models for enzyme catalytic activity through presenting recent advancements and analyzing different approaches. We also pointed out existing limitations (e.g. dataset imbalance) and offered suggestions on potential enhancements to address them. We identified that the attention mechanism, inclusion of new features such as product information and temperature, and using transfer learning to leverage different datasets were three main useful modeling strategies. Furthermore, we envisaged that accurate predictors of enzyme catalytic activity would potentially transform enzyme and metabolic engineering, and the optimization of biocatalysis.

酶催化以其在环境可持续性和效率方面的优势,在废物利用和制药生物制造等各种工业应用中越来越受到关注。然而,优化酶的催化活性仍然是一个重大的挑战。为了促进酶的挖掘和工程,机器学习(ML)模型已经出现,以预测酶的底物特异性,酶周转数和酶的催化优化。本文通过介绍酶催化活性的最新进展和分析不同的预测方法,以帮助研究人员有效地利用酶催化活性的预测模型。我们还指出了现有的限制(例如数据集不平衡),并提出了潜在的改进建议来解决这些问题。我们发现,注意力机制、产品信息和温度等新特性的包含以及使用迁移学习来利用不同的数据集是三个主要有用的建模策略。此外,我们设想酶催化活性的准确预测可能会改变酶和代谢工程,以及生物催化的优化。
{"title":"Machine learning for enzyme catalytic activity: current progress and future horizons.","authors":"Sizhe Qiu, Haris Saeed, Will Leonard, Feiran Li, Aidong Yang","doi":"10.1093/bib/bbag002","DOIUrl":"10.1093/bib/bbag002","url":null,"abstract":"<p><p>Enzyme catalysis, with its advantages in environmental sustainability and efficiency, is gaining traction across diverse industrial applications, such as waste utilization and pharmaceutical biomanufacturing. However, optimizing enzyme catalytic activity remains a significant challenge. To facilitate enzyme mining and engineering, machine learning (ML) models have emerged to predict enzyme substrate specificity, enzyme turnover number, and enzyme catalytic optimum. This review endeavored to assist researchers in effectively utilizing predictive models for enzyme catalytic activity through presenting recent advancements and analyzing different approaches. We also pointed out existing limitations (e.g. dataset imbalance) and offered suggestions on potential enhancements to address them. We identified that the attention mechanism, inclusion of new features such as product information and temperature, and using transfer learning to leverage different datasets were three main useful modeling strategies. Furthermore, we envisaged that accurate predictors of enzyme catalytic activity would potentially transform enzyme and metabolic engineering, and the optimization of biocatalysis.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12832030/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146046150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to explore tree neighbourhoods for phylogenetic inference. 学习探索树邻域进行系统发育推断。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf732
Federico Julian Camerota Verdù, Andrea Gasparin, Luca Bortolussi, Lorenzo Castelli

Phylogenetic inference is a key challenge in computational biology, with applications ranging from evolutionary analysis to comparative genomics. The balanced minimum evolution problem (BMEP) offers a well-established formulation of this problem, but remains computationally intractable for large instances. In this work, we propose a reinforcement learning (RL) framework to tackle the BMEP through local search in the space of phylogenetic trees. Our contributions are three-fold: (i) we introduce an improved RL formulation tailored to the structure of phylogenetic inference in the context of the BMEP; (ii) we train an RL agent capable of solving instances with up to 100 taxa; and (iii) we investigate the generalization capabilities of the learned policy across different substitution models, instance sizes, and datasets. To address the limitations of relying solely on the learned policy at inference time, we integrate it into a novel search-based framework that enables effective adaptation during evaluation. Experimental results show that our method outperforms greedy heuristics and matches the performance of state-of-the-art algorithms for the BMEP. When tested under significant distributional shifts, we greatly reduce the gap with state-of-the-art algorithms. This demonstrates the potential of RL applications to phylogenetic inference.

系统发育推断是计算生物学中的一个关键挑战,其应用范围从进化分析到比较基因组学。平衡最小进化问题(BMEP)为该问题提供了一个完善的表述,但对于大型实例来说仍然难以计算。在这项工作中,我们提出了一个强化学习(RL)框架,通过在系统发育树空间中的局部搜索来解决BMEP问题。我们的贡献有三个方面:(i)我们引入了一种改进的RL公式,该公式适合BMEP背景下的系统发育推断结构;(ii)我们训练一个能够解决多达100个分类群实例的RL代理;(iii)我们研究了学习策略在不同替代模型、实例大小和数据集上的泛化能力。为了解决在推理时仅依赖学习策略的局限性,我们将其集成到一个新的基于搜索的框架中,该框架能够在评估期间进行有效的适应。实验结果表明,我们的方法优于贪婪启发式算法,并与最先进的BMEP算法的性能相匹配。当在显著的分布变化下进行测试时,我们大大减少了与最先进算法的差距。这证明了强化学习在系统发育推理中的应用潜力。
{"title":"Learning to explore tree neighbourhoods for phylogenetic inference.","authors":"Federico Julian Camerota Verdù, Andrea Gasparin, Luca Bortolussi, Lorenzo Castelli","doi":"10.1093/bib/bbaf732","DOIUrl":"10.1093/bib/bbaf732","url":null,"abstract":"<p><p>Phylogenetic inference is a key challenge in computational biology, with applications ranging from evolutionary analysis to comparative genomics. The balanced minimum evolution problem (BMEP) offers a well-established formulation of this problem, but remains computationally intractable for large instances. In this work, we propose a reinforcement learning (RL) framework to tackle the BMEP through local search in the space of phylogenetic trees. Our contributions are three-fold: (i) we introduce an improved RL formulation tailored to the structure of phylogenetic inference in the context of the BMEP; (ii) we train an RL agent capable of solving instances with up to 100 taxa; and (iii) we investigate the generalization capabilities of the learned policy across different substitution models, instance sizes, and datasets. To address the limitations of relying solely on the learned policy at inference time, we integrate it into a novel search-based framework that enables effective adaptation during evaluation. Experimental results show that our method outperforms greedy heuristics and matches the performance of state-of-the-art algorithms for the BMEP. When tested under significant distributional shifts, we greatly reduce the gap with state-of-the-art algorithms. This demonstrates the potential of RL applications to phylogenetic inference.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From biogenesis to deep modeling: a holistic review of miRNA-disease prediction computational methods with experimental comparison. 从生物发生到深度建模:mirna疾病预测计算方法的整体综述与实验比较。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf736
Siya Xie, K L Eddie Law

Abnormal dysregulation of microRNAs (miRNAs) expression may lead to a wide spectrum of diseases, and as miRNAs play pivotal roles in disease pathogenesis, diagnosis, and therapy, identifying potential miRNA-disease associations (MDAs) is essential for discovering new diagnostic biomarkers, developing targeted therapeutic strategies, and advancing personalized medicine. Traditional wet-lab experiments are time-consuming, expensive, and consume a lot of resources. Hence, various computational approaches should be developed as auxiliary a priori tools. In the following text, we compile different methods proposed for MDA prediction over the past decade. First, we analyze the data resources supporting MDA studies and introduce approaches for quantifying similarities among MDAs. Second, we comprehensively review 66 computational methods, classify them into five categories, and present comparative experimental analyses on selected methods to identify future research directions. To enhance accessibility, we upload a summary of discussed methods to a GitHub repository (https://github.com/xiesiya/miRNA-disease-association-methods). This review provides comprehensive background knowledge on computational methods for future MDA prediction research.

microRNAs (miRNAs)表达异常失调可能导致广泛的疾病,并且由于miRNAs在疾病发病、诊断和治疗中起着关键作用,鉴定潜在的mirna -疾病关联(mda)对于发现新的诊断生物标志物、制定靶向治疗策略和推进个性化医疗至关重要。传统的湿实验室实验耗时长,成本高,消耗大量资源。因此,应该开发各种计算方法作为辅助的先验工具。在接下来的文章中,我们整理了过去十年中提出的预测MDA的不同方法。首先,我们分析了支持MDA研究的数据资源,并介绍了量化MDA之间相似性的方法。其次,综合评述66种计算方法,将其分为5类,并对所选方法进行对比实验分析,确定未来的研究方向。为了增强可访问性,我们将讨论的方法的摘要上传到GitHub存储库(https://github.com/xiesiya/miRNA-disease-association-methods)。这篇综述为未来MDA预测研究的计算方法提供了全面的背景知识。
{"title":"From biogenesis to deep modeling: a holistic review of miRNA-disease prediction computational methods with experimental comparison.","authors":"Siya Xie, K L Eddie Law","doi":"10.1093/bib/bbaf736","DOIUrl":"10.1093/bib/bbaf736","url":null,"abstract":"<p><p>Abnormal dysregulation of microRNAs (miRNAs) expression may lead to a wide spectrum of diseases, and as miRNAs play pivotal roles in disease pathogenesis, diagnosis, and therapy, identifying potential miRNA-disease associations (MDAs) is essential for discovering new diagnostic biomarkers, developing targeted therapeutic strategies, and advancing personalized medicine. Traditional wet-lab experiments are time-consuming, expensive, and consume a lot of resources. Hence, various computational approaches should be developed as auxiliary a priori tools. In the following text, we compile different methods proposed for MDA prediction over the past decade. First, we analyze the data resources supporting MDA studies and introduce approaches for quantifying similarities among MDAs. Second, we comprehensively review 66 computational methods, classify them into five categories, and present comparative experimental analyses on selected methods to identify future research directions. To enhance accessibility, we upload a summary of discussed methods to a GitHub repository (https://github.com/xiesiya/miRNA-disease-association-methods). This review provides comprehensive background knowledge on computational methods for future MDA prediction research.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: MetImputBERT: a pretrained BERT framework for missing value imputation in NMR metabolomics data. 更正:MetImputBERT:一个预训练的BERT框架,用于NMR代谢组学数据中缺失值的输入。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag113
{"title":"Correction to: MetImputBERT: a pretrained BERT framework for missing value imputation in NMR metabolomics data.","authors":"","doi":"10.1093/bib/bbag113","DOIUrl":"10.1093/bib/bbag113","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12935011/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147289272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ab initio detection of multiple epitranscriptomic modifications from Oxford nanopore technology direct RNA sequencing data. 从牛津纳米孔技术直接RNA测序数据中从头开始检测多个表转录组修饰。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf709
Adriano Fonzino, Bruno Fosso, Grazia Visci, Carmela Gissi, Graziano Pesole, Ernesto Picardi

Charting the eukaryotic epitranscriptome by direct RNA sequencing is promising but still very challenging, as current bioinformatics tools are based on modification-unaware software and require multiple modification-specific learning steps. Here, we introduce NanoSpeech, a modification-aware basecaller for the ab initio simultaneous detection of multiple modified bases using a transformer model, and NanoListener, which implements a simulated randomers strategy for robust training datasets and a new generation of ONT basecallers. NanoListener and NanoSpeech are independent of the specific ONT chemistry. Once a training dataset has been created, a single model with an expanded vocabulary can accurately basecall both unmodified and modified bases.

通过直接RNA测序绘制真核细胞表转录组是有希望的,但仍然非常具有挑战性,因为目前的生物信息学工具是基于不知道修饰的软件,需要多个修饰特异性的学习步骤。在这里,我们介绍了NanoSpeech,一个修改感知基调用器,用于使用变压器模型从头开始同时检测多个修改基,以及NanoListener,它实现了鲁棒训练数据集和新一代ONT基调用器的模拟随机策略。纳米听者和纳米语音是独立于特定的ONT化学。一旦创建了训练数据集,具有扩展词汇表的单个模型就可以准确地调用未修改和修改的基。
{"title":"Ab initio detection of multiple epitranscriptomic modifications from Oxford nanopore technology direct RNA sequencing data.","authors":"Adriano Fonzino, Bruno Fosso, Grazia Visci, Carmela Gissi, Graziano Pesole, Ernesto Picardi","doi":"10.1093/bib/bbaf709","DOIUrl":"10.1093/bib/bbaf709","url":null,"abstract":"<p><p>Charting the eukaryotic epitranscriptome by direct RNA sequencing is promising but still very challenging, as current bioinformatics tools are based on modification-unaware software and require multiple modification-specific learning steps. Here, we introduce NanoSpeech, a modification-aware basecaller for the ab initio simultaneous detection of multiple modified bases using a transformer model, and NanoListener, which implements a simulated randomers strategy for robust training datasets and a new generation of ONT basecallers. NanoListener and NanoSpeech are independent of the specific ONT chemistry. Once a training dataset has been created, a single model with an expanded vocabulary can accurately basecall both unmodified and modified bases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12848937/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146060091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-based graphs for drug-drug interaction with chemical knowledge embedding. 基于转换器的药物-药物相互作用图与化学知识嵌入。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag039
Jinlu Zhang, Xuting Zhang, Yizheng Dai, Xin Shao, Xiaohui Fan

Identifying drug-drug interactions (DDIs) is a critical task in pharmaceutical research and clinical applications, as these interactions can pose serious medical risks. Deep learning models, known for their ability to accurately predict DDIs, have become powerful tools for enhancing prediction accuracy and efficiency. However, many existing approaches fail to fully incorporate chemical information and lack interpretability when exploring DDI mechanisms. In this work, we propose TRACE, a transformer-based graph representation learning framework that integrates chemical knowledge into DDI prediction. Extensive experiments demonstrate that TRACE outperforms state-of-the-art baseline models under both in-distribution and out-of-distribution settings, highlighting its strong predictive performance and generalization ability. In terms of interpretability, TRACE leverages its attention mechanism to effectively identify high-risk substructures that may trigger DDIs. In summary, TRACE not only provides new perspectives for elucidating the underlying causes of DDIs through interpretable substructure analysis but also offers robust predictive performance to support drug development and combination therapy.

确定药物-药物相互作用(ddi)是药物研究和临床应用中的一项关键任务,因为这些相互作用可能造成严重的医疗风险。深度学习模型以其准确预测ddi的能力而闻名,已成为提高预测准确性和效率的强大工具。然而,在探索DDI机制时,许多现有的方法未能充分纳入化学信息并且缺乏可解释性。在这项工作中,我们提出了TRACE,这是一个基于变压器的图表示学习框架,将化学知识集成到DDI预测中。大量的实验表明,TRACE在分布内和分布外设置下都优于最先进的基线模型,突出了其强大的预测性能和泛化能力。在可解释性方面,TRACE利用其注意机制有效地识别可能触发ddi的高风险子结构。总之,TRACE不仅通过可解释的亚结构分析为阐明ddi的潜在原因提供了新的视角,而且还为支持药物开发和联合治疗提供了强大的预测性能。
{"title":"Transformer-based graphs for drug-drug interaction with chemical knowledge embedding.","authors":"Jinlu Zhang, Xuting Zhang, Yizheng Dai, Xin Shao, Xiaohui Fan","doi":"10.1093/bib/bbag039","DOIUrl":"10.1093/bib/bbag039","url":null,"abstract":"<p><p>Identifying drug-drug interactions (DDIs) is a critical task in pharmaceutical research and clinical applications, as these interactions can pose serious medical risks. Deep learning models, known for their ability to accurately predict DDIs, have become powerful tools for enhancing prediction accuracy and efficiency. However, many existing approaches fail to fully incorporate chemical information and lack interpretability when exploring DDI mechanisms. In this work, we propose TRACE, a transformer-based graph representation learning framework that integrates chemical knowledge into DDI prediction. Extensive experiments demonstrate that TRACE outperforms state-of-the-art baseline models under both in-distribution and out-of-distribution settings, highlighting its strong predictive performance and generalization ability. In terms of interpretability, TRACE leverages its attention mechanism to effectively identify high-risk substructures that may trigger DDIs. In summary, TRACE not only provides new perspectives for elucidating the underlying causes of DDIs through interpretable substructure analysis but also offers robust predictive performance to support drug development and combination therapy.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12908692/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146207031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MPHGNN: metapath-guided heterogeneous graph neural network for miRNA-drug resistance association prediction. MPHGNN:用于mirna -耐药关联预测的元路径引导异构图神经网络。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag013
Guangsheng Huang, Yali Peng, Shuai Wu, Hang Wei, Shigang Liu

Aberrant expression of microRNAs (miRNAs) is closely associated with the pathogenesis and progression of various diseases, particularly cancer, as well as therapeutic responses. Identification of miRNA-drug resistance associations is critical for drug screening and precision medicine. However, conventional experimental approaches remain time-consuming and labor-intensive, while existing computational methods often face challenge in capturing higher-order semantic inference from sparse prior bipartite association network. To address this, we propose MPHGNN, a heterogeneous graph convolutional network (GCN) architecture for predicting miRNA-drug resistance associations. MPHGNN constructs a miRNA-gene-drug heterogeneous network with multimodal biological features, including miRNA expression profiles, drug structural descriptors, and gene functional similarities, and leverages dual learning modules at both metapath and global levels to capture localized patterns and global representations simultaneously. Experimental results demonstrate that MPHGNN outperforms state-of-the-art methods and enhances the discriminative ability of association representations. Interpretability analyses further reveal that metapaths effectively capture underlying biological mechanisms, while the constructed heterogeneous biological network makes important contributions to prediction.

microRNAs (miRNAs)的异常表达与各种疾病,特别是癌症的发病和进展以及治疗反应密切相关。鉴定mirna -耐药关联对药物筛选和精准医学至关重要。然而,传统的实验方法仍然费时费力,而现有的计算方法在从稀疏先验二部关联网络中获取高阶语义推理时往往面临挑战。为了解决这个问题,我们提出了MPHGNN,一种用于预测mirna -耐药性关联的异构图卷积网络(GCN)架构。MPHGNN构建了一个具有多模态生物学特征的miRNA-基因-药物异质网络,包括miRNA表达谱、药物结构描述符和基因功能相似性,并利用元路径和全局水平的双重学习模块,同时捕获局部模式和全局表征。实验结果表明,该方法优于现有的关联表征方法,提高了关联表征的判别能力。可解释性分析进一步表明,元路径可以有效地捕捉潜在的生物机制,而构建的异质生物网络对预测有重要贡献。
{"title":"MPHGNN: metapath-guided heterogeneous graph neural network for miRNA-drug resistance association prediction.","authors":"Guangsheng Huang, Yali Peng, Shuai Wu, Hang Wei, Shigang Liu","doi":"10.1093/bib/bbag013","DOIUrl":"10.1093/bib/bbag013","url":null,"abstract":"<p><p>Aberrant expression of microRNAs (miRNAs) is closely associated with the pathogenesis and progression of various diseases, particularly cancer, as well as therapeutic responses. Identification of miRNA-drug resistance associations is critical for drug screening and precision medicine. However, conventional experimental approaches remain time-consuming and labor-intensive, while existing computational methods often face challenge in capturing higher-order semantic inference from sparse prior bipartite association network. To address this, we propose MPHGNN, a heterogeneous graph convolutional network (GCN) architecture for predicting miRNA-drug resistance associations. MPHGNN constructs a miRNA-gene-drug heterogeneous network with multimodal biological features, including miRNA expression profiles, drug structural descriptors, and gene functional similarities, and leverages dual learning modules at both metapath and global levels to capture localized patterns and global representations simultaneously. Experimental results demonstrate that MPHGNN outperforms state-of-the-art methods and enhances the discriminative ability of association representations. Interpretability analyses further reveal that metapaths effectively capture underlying biological mechanisms, while the constructed heterogeneous biological network makes important contributions to prediction.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12853127/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146084292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1