首页 > 最新文献

Bioinformatics (Oxford, England)最新文献

英文 中文
Uniform Design-Embedded Predictions of (Tetra-)Peptide Physicochemical Properties. (四)肽物理化学性质的均匀设计嵌入预测。
IF 5.4 Pub Date : 2026-01-19 DOI: 10.1093/bioinformatics/btag036
Zhihui Zhu, Huapeng Liu, Xuechen Li, Haojin Zhou, Jiaqi Wang

Motivation: Short peptides hold significant promise in drug discovery and materials science due to their biocompatibility, multifunctionality, ease of synthesis, etc. However, accurately predicting their physicochemical properties, a prerequisite for application development, remains a grand challenge due to the sheet quantity of peptides.

Results: This study presents an innovative approach integrating uniform design (UD) on the sampling over the whole space with artificial intelligence (AI) on the sampled data to enhance prediction of key physicochemical properties, including aggregation propensity (AP), hydrophilicity (logP), and isoelectric point (pI), within the complete sequence space of tetrapeptides (160,000 sequences). Using UD, we generate 31 distinct peptide datasets, with a consistent amino acid occupation fraction of 5% at each position, thereby creating unbiased training data without any amino acid preferences for training AI models. This work provides comprehensive datasets on the physicochemical properties of all tetrapeptides, develops robust AI-based predictive models, and quantitatively elucidates the relationships between key physicochemical attributes and self-assembly behaviors of short peptides by Shapley Additive Explanations (SHAP) analysis. By integrating the strategic experimental design (i.e., UD), AI modeling, and peptide domain knowledge, our approach facilitates the discovery and optimization of functional peptides, offering new opportunities for peptide-based therapeutic applications.

Availability: The complete datasets, source code, and pre-trained models are made available at the Github repository (https://github.com/JiaqiBenWang/UD-AI-Peptide) and Zenodo (https://doi.org/10.5281/zenodo.17984124).

Supplementary information: Supplementary data are available at Bioinformatics online.

动机:短肽由于其生物相容性、多功能性、易于合成等特点,在药物发现和材料科学中具有重要的应用前景。然而,由于多肽片的数量,准确预测其物理化学性质仍然是一个巨大的挑战,这是应用开发的前提。结果:本研究提出了一种创新的方法,将整个空间采样的均匀设计(UD)与采样数据的人工智能(AI)相结合,增强了对四肽完整序列空间(160,000个序列)内关键物理化学性质的预测,包括聚集倾向(AP)、亲水性(logP)和等电点(pI)。使用UD,我们生成了31个不同的肽数据集,每个位置的氨基酸占用率一致为5%,从而为训练AI模型创建了没有任何氨基酸偏好的无偏训练数据。这项工作提供了所有四肽的物理化学性质的综合数据集,开发了强大的基于人工智能的预测模型,并通过Shapley加性解释(SHAP)分析定量阐明了短肽的关键物理化学属性与自组装行为之间的关系。通过整合战略性实验设计(即UD)、人工智能建模和肽域知识,我们的方法促进了功能肽的发现和优化,为基于肽的治疗应用提供了新的机会。可用性:完整的数据集、源代码和预训练模型可在Github存储库(https://github.com/JiaqiBenWang/UD-AI-Peptide)和Zenodo (https://doi.org/10.5281/zenodo.17984124).Supplementary)上获得。信息:补充数据可在Bioinformatics在线获取。
{"title":"Uniform Design-Embedded Predictions of (Tetra-)Peptide Physicochemical Properties.","authors":"Zhihui Zhu, Huapeng Liu, Xuechen Li, Haojin Zhou, Jiaqi Wang","doi":"10.1093/bioinformatics/btag036","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag036","url":null,"abstract":"<p><strong>Motivation: </strong>Short peptides hold significant promise in drug discovery and materials science due to their biocompatibility, multifunctionality, ease of synthesis, etc. However, accurately predicting their physicochemical properties, a prerequisite for application development, remains a grand challenge due to the sheet quantity of peptides.</p><p><strong>Results: </strong>This study presents an innovative approach integrating uniform design (UD) on the sampling over the whole space with artificial intelligence (AI) on the sampled data to enhance prediction of key physicochemical properties, including aggregation propensity (AP), hydrophilicity (logP), and isoelectric point (pI), within the complete sequence space of tetrapeptides (160,000 sequences). Using UD, we generate 31 distinct peptide datasets, with a consistent amino acid occupation fraction of 5% at each position, thereby creating unbiased training data without any amino acid preferences for training AI models. This work provides comprehensive datasets on the physicochemical properties of all tetrapeptides, develops robust AI-based predictive models, and quantitatively elucidates the relationships between key physicochemical attributes and self-assembly behaviors of short peptides by Shapley Additive Explanations (SHAP) analysis. By integrating the strategic experimental design (i.e., UD), AI modeling, and peptide domain knowledge, our approach facilitates the discovery and optimization of functional peptides, offering new opportunities for peptide-based therapeutic applications.</p><p><strong>Availability: </strong>The complete datasets, source code, and pre-trained models are made available at the Github repository (https://github.com/JiaqiBenWang/UD-AI-Peptide) and Zenodo (https://doi.org/10.5281/zenodo.17984124).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146004742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iModMix: Integrative Module Analysis for Multi-omics Data. iModMix:多组学数据的集成模块分析。
IF 5.4 Pub Date : 2026-01-19 DOI: 10.1093/bioinformatics/btag030
Isis Narváez-Bandera, Ashley Lui, Yonatan Ayalew Mekonnen, Vanessa Rubio, Augustine Takyi, Noah Sulman, Christopher Wilson, Hayley D Ackerman, Oscar E Ospina, Guillermo Gonzalez-Calderon, Elsa Flores, Qian Li, Ann Chen, Brooke Fridley, Paul Stewart

Summary: Integrative Module Analysis for Multi-omics Data (iModMix) is a biology-agnostic framework that enables the discovery of novel associations across any type of quantitative abundance data, including but not limited to transcriptomics, proteomics, and metabolomics. Instead of relying on pathway annotations or prior biological knowledge, iModMix constructs data-driven modules using graphical lasso to estimate sparse networks from omics features. These modules are summarized into eigenfeatures and correlated across datasets for horizontal integration, while preserving the distinct feature sets and interpretability of each omics type. iModMix operates directly on matrices containing expression or abundances for a wide range of features, including but not limited to genes, proteins, and metabolites. Because it does not rely on annotations (e.g., KEGG identifiers), it can seamlessly incorporate both identified and unidentified metabolites, addressing a key limitation of many existing metabolomics tools. iModMix is available as a user-friendly R Shiny application requiring no programming expertise (https://imodmix.moffitt.org), and as a Bioconductor R package for advanced users (https://bioconductor.org/packages/release/bioc/html/iModMix.html). The tool includes several public and in-house datasets to illustrate its utility in identifying novel multi-omics relationships in diverse biological contexts.

Availability and implementation: iModMix is freely available from Bioconductor (https://bioconductor.org/packages/release/bioc/html/iModMix.html) and the example dataset package (iModMixData) is also available from Bioconductor (https://bioconductor.org/packages/release/ data/experiment/html/iModMixData.html). The R package source code and Docker is available from GitHub: https://github.com/biodatalab/iModMix. Shiny application can be accessed at: https://imodmix.moffitt.org.

Supplementary information: Supplementary data are available at Bioinformatics online.

摘要:多组学数据整合模块分析(iModMix)是一个生物学不可知的框架,可以在任何类型的定量丰度数据中发现新的关联,包括但不限于转录组学、蛋白质组学和代谢组学。iModMix不依赖路径注释或先前的生物学知识,而是使用图形lasso构建数据驱动模块,从组学特征中估计稀疏网络。这些模块被总结为特征特征,并在数据集之间进行横向整合,同时保留每个组学类型的独特特征集和可解释性。iModMix直接操作包含表达或丰度的基质,用于广泛的特征,包括但不限于基因,蛋白质和代谢物。因为它不依赖于注释(例如,KEGG标识符),它可以无缝地合并已识别和未识别的代谢物,解决了许多现有代谢组学工具的一个关键限制。iModMix是一个用户友好的R Shiny应用程序,不需要编程专业知识(https://imodmix.moffitt.org),也可以作为高级用户的Bioconductor R包(https://bioconductor.org/packages/release/bioc/html/iModMix.html)。该工具包括几个公共和内部数据集,以说明其在识别不同生物学背景下新的多组学关系方面的效用。可用性和实现:iModMix可以从Bioconductor (https://bioconductor.org/packages/release/bioc/html/iModMix.html)免费获得,示例数据集包(iModMixData)也可以从Bioconductor (https://bioconductor.org/packages/release/ data/experiment/html/iModMixData.html)获得。R包源代码和Docker可从GitHub: https://github.com/biodatalab/iModMix获得。闪亮应用程序可访问:https://imodmix.moffitt.org.Supplementary信息:补充数据可在Bioinformatics在线。
{"title":"iModMix: Integrative Module Analysis for Multi-omics Data.","authors":"Isis Narváez-Bandera, Ashley Lui, Yonatan Ayalew Mekonnen, Vanessa Rubio, Augustine Takyi, Noah Sulman, Christopher Wilson, Hayley D Ackerman, Oscar E Ospina, Guillermo Gonzalez-Calderon, Elsa Flores, Qian Li, Ann Chen, Brooke Fridley, Paul Stewart","doi":"10.1093/bioinformatics/btag030","DOIUrl":"10.1093/bioinformatics/btag030","url":null,"abstract":"<p><strong>Summary: </strong>Integrative Module Analysis for Multi-omics Data (iModMix) is a biology-agnostic framework that enables the discovery of novel associations across any type of quantitative abundance data, including but not limited to transcriptomics, proteomics, and metabolomics. Instead of relying on pathway annotations or prior biological knowledge, iModMix constructs data-driven modules using graphical lasso to estimate sparse networks from omics features. These modules are summarized into eigenfeatures and correlated across datasets for horizontal integration, while preserving the distinct feature sets and interpretability of each omics type. iModMix operates directly on matrices containing expression or abundances for a wide range of features, including but not limited to genes, proteins, and metabolites. Because it does not rely on annotations (e.g., KEGG identifiers), it can seamlessly incorporate both identified and unidentified metabolites, addressing a key limitation of many existing metabolomics tools. iModMix is available as a user-friendly R Shiny application requiring no programming expertise (https://imodmix.moffitt.org), and as a Bioconductor R package for advanced users (https://bioconductor.org/packages/release/bioc/html/iModMix.html). The tool includes several public and in-house datasets to illustrate its utility in identifying novel multi-omics relationships in diverse biological contexts.</p><p><strong>Availability and implementation: </strong>iModMix is freely available from Bioconductor (https://bioconductor.org/packages/release/bioc/html/iModMix.html) and the example dataset package (iModMixData) is also available from Bioconductor (https://bioconductor.org/packages/release/ data/experiment/html/iModMixData.html). The R package source code and Docker is available from GitHub: https://github.com/biodatalab/iModMix. Shiny application can be accessed at: https://imodmix.moffitt.org.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146004647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CADS: A Causal Inference Framework for Identifying Essential Genes to Enhance Drug Synergy Prediction. CADS:鉴定必要基因以增强药物协同作用预测的因果推理框架。
IF 5.4 Pub Date : 2026-01-14 DOI: 10.1093/bioinformatics/btag010
Huaiwu Zhang, Xinliang Sun, Jianxin Wang, Min Li, Jing Tang

Motivation: Drug synergy is crucial for developing effective combination therapies, but traditional screening methods suffer from inefficiency and high costs. While deep learning shows promise for predicting drug synergy, current approaches using Transformers and graph neural networks focus on combining drug and cell line features without modelling how genes causally influence drug responses.

Results: To address this limitation, we propose CADS (Causal Adjustment for Drug Synergy), a deep learning framework that integrates causal relationships between genes and drug responses. Leveraging multi-omics data, CADS uses a learnable mask mechanism to identify key causal genes while filtering out irrelevant genetic factors through backdoor adjustment. Our model achieves two key objectives simultaneously: accurate prediction of drug synergy and interpretable causal gene discovery. Experiments on multiple datasets show that CADS consistently outperforms state-of-the-art methods across multiple metrics. Case studies demonstrate that CADS can reduce unnecessary complexity while providing more biological insights through its gene importance scores, which help identify clinically validated cancer-related genes that mediate drug interactions.

Availability and implementation: Taken together, CADS advances combination therapy prediction by explicitly modelling drug synergy causal genes, offering enhanced interpretability for AI-based drug development. The source code can be found at https://github.com/HuaiwuZhang/causalDC.

Supplementary information: Supplementary data are available at Bioinformatics online.

动机:药物协同作用对于开发有效的联合疗法至关重要,但传统的筛选方法效率低下且成本高。虽然深度学习有望预测药物协同作用,但目前使用transformer和图神经网络的方法侧重于结合药物和细胞系特征,而没有模拟基因如何因果影响药物反应。为了解决这一限制,我们提出了CADS(因果调整药物协同),这是一个深度学习框架,整合了基因和药物反应之间的因果关系。CADS利用多组学数据,利用可学习的掩模机制识别关键的致病基因,同时通过后门调节过滤掉无关的遗传因素。我们的模型同时实现了两个关键目标:准确预测药物协同作用和发现可解释的因果基因。在多个数据集上的实验表明,CADS在多个指标上始终优于最先进的方法。案例研究表明,CADS可以减少不必要的复杂性,同时通过其基因重要性评分提供更多的生物学见解,这有助于识别经临床验证的介导药物相互作用的癌症相关基因。可用性和实施:总的来说,CADS通过明确建模药物协同作用因果基因来推进联合治疗预测,为基于人工智能的药物开发提供增强的可解释性。源代码可在https://github.com/HuaiwuZhang/causalDC.Supplementary信息中找到:补充数据可在Bioinformatics在线获得。
{"title":"CADS: A Causal Inference Framework for Identifying Essential Genes to Enhance Drug Synergy Prediction.","authors":"Huaiwu Zhang, Xinliang Sun, Jianxin Wang, Min Li, Jing Tang","doi":"10.1093/bioinformatics/btag010","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag010","url":null,"abstract":"<p><strong>Motivation: </strong>Drug synergy is crucial for developing effective combination therapies, but traditional screening methods suffer from inefficiency and high costs. While deep learning shows promise for predicting drug synergy, current approaches using Transformers and graph neural networks focus on combining drug and cell line features without modelling how genes causally influence drug responses.</p><p><strong>Results: </strong>To address this limitation, we propose CADS (Causal Adjustment for Drug Synergy), a deep learning framework that integrates causal relationships between genes and drug responses. Leveraging multi-omics data, CADS uses a learnable mask mechanism to identify key causal genes while filtering out irrelevant genetic factors through backdoor adjustment. Our model achieves two key objectives simultaneously: accurate prediction of drug synergy and interpretable causal gene discovery. Experiments on multiple datasets show that CADS consistently outperforms state-of-the-art methods across multiple metrics. Case studies demonstrate that CADS can reduce unnecessary complexity while providing more biological insights through its gene importance scores, which help identify clinically validated cancer-related genes that mediate drug interactions.</p><p><strong>Availability and implementation: </strong>Taken together, CADS advances combination therapy prediction by explicitly modelling drug synergy causal genes, offering enhanced interpretability for AI-based drug development. The source code can be found at https://github.com/HuaiwuZhang/causalDC.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Best practices when benchmarking CATCH for the design of genome enrichment probes. 在设计基因组富集探针时对标CATCH的最佳实践。
IF 5.4 Pub Date : 2026-01-13 DOI: 10.1093/bioinformatics/btag002
Hayden C Metsky, Katherine J Siddle, Christian B Matranga, Pardis C Sabeti
{"title":"Best practices when benchmarking CATCH for the design of genome enrichment probes.","authors":"Hayden C Metsky, Katherine J Siddle, Christian B Matranga, Pardis C Sabeti","doi":"10.1093/bioinformatics/btag002","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag002","url":null,"abstract":"","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145967178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PLXFPred: Interpretable cross-attention networks with hierarchical fusion of multi-modal features for predicting protein-ligand interactions and affinities. 可解释的交叉注意网络与多模态特征的分层融合预测蛋白质-配体相互作用和亲和力。
IF 5.4 Pub Date : 2026-01-09 DOI: 10.1093/bioinformatics/btaf662
Jixiang Li, Ruilin Cai, Ziteng Wang, Ye Sun, Wenge Yang, Yonghong Hu

Accurately predicting protein-ligand interactions and binding affinities is essential for advancing structural biology. Despite recent advancements in deep learning, achieving rapid and precise predictions remains a challenging task. Our approach, PLXFPred (Protein-Ligand Cross-Modal Fusion Predictor), extracts physicochemical properties from amino acid sequences and SMILES. Additionally, it leverages pre-trained models to derive high-dimensional features. GATv2 and BILSTM were used to process the structural and sequence features, respectively. The model's core involves fusing sequence and graph features via a cross-modal cross-attention mechanism, followed by a multi-modal hierarchical fusion strategy that integrates high-level graph, early fusion, and cross-fusion features. Residual connections and conditional domain adversarial learning improve generalization to previously unseen protein-ligand pairs. Compared to state-of-the-art models, PLXFPred demonstrates superior performance, reducing errors (RMSD, MAE, SD) by over 50%, while providing interpretable biological insights through attention weight visualization and SHAP analysis.

Availability: The resource codes are available at https://github.com/xiyuyangtuo/PLXFPred/.

Supplementary information: Supplementary data are available at Bioinformatics online.

准确预测蛋白质与配体的相互作用和结合亲和力对于推进结构生物学至关重要。尽管深度学习最近取得了进展,但实现快速和精确的预测仍然是一项具有挑战性的任务。我们的方法,PLXFPred(蛋白质-配体交叉模态融合预测器),从氨基酸序列和smile中提取物理化学性质。此外,它利用预先训练的模型来派生高维特征。采用GATv2和BILSTM分别对结构特征和序列特征进行处理。该模型的核心包括通过跨模态交叉关注机制融合序列和图形特征,然后是多模态分层融合策略,该策略集成了高级图、早期融合和交叉融合特征。残差连接和条件域对抗学习提高了对以前未见过的蛋白质配体对的泛化。与最先进的模型相比,PLXFPred表现出卓越的性能,将误差(RMSD, MAE, SD)降低了50%以上,同时通过注意力权重可视化和SHAP分析提供可解释的生物学见解。可用性:资源代码可在https://github.com/xiyuyangtuo/PLXFPred/.Supplementary information上获得;补充数据可在Bioinformatics在线上获得。
{"title":"PLXFPred: Interpretable cross-attention networks with hierarchical fusion of multi-modal features for predicting protein-ligand interactions and affinities.","authors":"Jixiang Li, Ruilin Cai, Ziteng Wang, Ye Sun, Wenge Yang, Yonghong Hu","doi":"10.1093/bioinformatics/btaf662","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf662","url":null,"abstract":"<p><p>Accurately predicting protein-ligand interactions and binding affinities is essential for advancing structural biology. Despite recent advancements in deep learning, achieving rapid and precise predictions remains a challenging task. Our approach, PLXFPred (Protein-Ligand Cross-Modal Fusion Predictor), extracts physicochemical properties from amino acid sequences and SMILES. Additionally, it leverages pre-trained models to derive high-dimensional features. GATv2 and BILSTM were used to process the structural and sequence features, respectively. The model's core involves fusing sequence and graph features via a cross-modal cross-attention mechanism, followed by a multi-modal hierarchical fusion strategy that integrates high-level graph, early fusion, and cross-fusion features. Residual connections and conditional domain adversarial learning improve generalization to previously unseen protein-ligand pairs. Compared to state-of-the-art models, PLXFPred demonstrates superior performance, reducing errors (RMSD, MAE, SD) by over 50%, while providing interpretable biological insights through attention weight visualization and SHAP analysis.</p><p><strong>Availability: </strong>The resource codes are available at https://github.com/xiyuyangtuo/PLXFPred/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145947005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MMPCS: multi-view molecular pretraining based on consistency information and specific information. MMPCS:基于一致性信息和特定信息的多视图分子预训练。
IF 5.4 Pub Date : 2026-01-03 DOI: 10.1093/bioinformatics/btag028
Chenyang Xie, Yingying Song, Song He, Xiaochen Bo, Zhongnan Zhang

Motivation: The goal of molecular representation learning is to automate the extraction of molecular features, a critical task in cheminformatics and drug discovery. While pretraining models using multiple views like SMILES, 2D graphs, and 3D conformations have advanced the field, integrating them effectively to produce superior representations remains a challenge.

Results: To bridge this gap, we propose a novel multi-view molecular pretraining method termed MMPCS, which explicitly factorizes representations into consistency and specific information. Our approach utilizes the Graph Isomorphism Network and the RoBERTa model to encode 2D molecular topological graphs and SMILES sequences, respectively. Each resulting molecular embedding is decomposed into a shared consistency component and a view-specific remainder. An autoencoder then aligns the consistency information across views. The combined consistency and view-specific representations serve as input for downstream tasks, enabling precise and task-aware predictions. When benchmarked against 16 state-of-the-art molecular pretraining methods, MMPCS achieved the highest average performance across both classification and regression tasks for molecular property prediction. It also delivered outstanding results in predicting drug-target binding affinity and cancer drug response, demonstrating its robustness and broad applicability. Additionally, a case study on the SARS-CoV-2 Omicron variant highlights the potential of MMPCS in facilitating drug repurposing efforts.

Availability and implementation: The source code and datasets supporting this study are publicly available at GitHub (https://github.com/xmubiocode/MMPCS) and Zenodo (https://doi.org/10.5281/zenodo.18182748).

动机:分子表征学习的目标是自动提取分子特征,这是化学信息学和药物发现的关键任务。虽然使用多种视图(如SMILES、二维图和三维构象)的预训练模型已经推动了该领域的发展,但有效地整合它们以产生更好的表示仍然是一个挑战。结果:为了弥补这一差距,我们提出了一种新的多视图分子预训练方法,称为MMPCS,它明确地将表征分解为一致性和特定信息。该方法利用图同构网络和RoBERTa模型分别对二维分子拓扑图和SMILES序列进行编码。每个结果的分子嵌入被分解成一个共享的一致性组件和一个特定于视图的余项。然后,自动编码器在视图之间对齐一致性信息。组合的一致性和特定于视图的表示作为下游任务的输入,支持精确和任务感知的预测。当与16种最先进的分子预训练方法进行基准测试时,MMPCS在分子性质预测的分类和回归任务中都取得了最高的平均性能。该方法在预测药物靶点结合亲和力和癌症药物反应方面也取得了出色的结果,证明了其稳健性和广泛的适用性。此外,对SARS-CoV-2 Omicron变体的案例研究强调了MMPCS在促进药物再利用方面的潜力。可用性和实现:支持本研究的源代码和数据集可在GitHub (https://github.com/xmubiocode/MMPCS)和Zenodo (https://doi.org/10.5281/zenodo.18182748)上公开获取。
{"title":"MMPCS: multi-view molecular pretraining based on consistency information and specific information.","authors":"Chenyang Xie, Yingying Song, Song He, Xiaochen Bo, Zhongnan Zhang","doi":"10.1093/bioinformatics/btag028","DOIUrl":"10.1093/bioinformatics/btag028","url":null,"abstract":"<p><strong>Motivation: </strong>The goal of molecular representation learning is to automate the extraction of molecular features, a critical task in cheminformatics and drug discovery. While pretraining models using multiple views like SMILES, 2D graphs, and 3D conformations have advanced the field, integrating them effectively to produce superior representations remains a challenge.</p><p><strong>Results: </strong>To bridge this gap, we propose a novel multi-view molecular pretraining method termed MMPCS, which explicitly factorizes representations into consistency and specific information. Our approach utilizes the Graph Isomorphism Network and the RoBERTa model to encode 2D molecular topological graphs and SMILES sequences, respectively. Each resulting molecular embedding is decomposed into a shared consistency component and a view-specific remainder. An autoencoder then aligns the consistency information across views. The combined consistency and view-specific representations serve as input for downstream tasks, enabling precise and task-aware predictions. When benchmarked against 16 state-of-the-art molecular pretraining methods, MMPCS achieved the highest average performance across both classification and regression tasks for molecular property prediction. It also delivered outstanding results in predicting drug-target binding affinity and cancer drug response, demonstrating its robustness and broad applicability. Additionally, a case study on the SARS-CoV-2 Omicron variant highlights the potential of MMPCS in facilitating drug repurposing efforts.</p><p><strong>Availability and implementation: </strong>The source code and datasets supporting this study are publicly available at GitHub (https://github.com/xmubiocode/MMPCS) and Zenodo (https://doi.org/10.5281/zenodo.18182748).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12881828/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASTRO: Automated Spatial-Transcriptome whole RNA Output. ASTRO:自动空间转录组全RNA输出。
IF 5.4 Pub Date : 2026-01-03 DOI: 10.1093/bioinformatics/btaf688
Dingyao Zhang, Zhiyuan Chu, Yiran Huo, Yunzhe Jiang, Yuhang Chen, Zhiliang Bai, Rong Fan, Jun Lu, Mark Gerstein

Motivation: Despite significant advances in spatial transcriptomics, the analysis of formalin-fixed paraffin-embedded (FFPE) tissues, which constitute most clinically available samples, remains challenging. Additionally, capturing both coding and non-coding RNAs in a spatial context poses significant challenges. We recently introduced Patho-DBiT, a technology designed to address these unmet needs. However, the marked differences between Patho-DBiT and existing spatial transcriptomics protocols necessitate specialized computational tools for comprehensive whole-transcriptome analysis in FFPE samples.

Results: Here, we present ASTRO, an automated pipeline developed to process spatial transcriptomics data. In addition to supporting standard datasets, ASTRO is optimized for whole-transcriptome analyses of FFPE samples, enabling the detection of various RNA species, including non-coding RNAs such as miRNAs. To compensate for the reduced RNA quality in FFPE tissues, ASTRO incorporates a specialized filtering step and optimizes spatial barcode calling, increasing the mapping rate. These optimizations allow ASTRO to spatially quantify coding and non-coding RNA species in the entire transcriptome and achieve robust performance in FFPE samples.

Availability and implementation: Codes are available at GitHub (https://github.com/gersteinlab/ASTRO) and Zenodo (doi: 10.5281/zenodo.17913760).

动机:尽管空间转录组学取得了重大进展,但福尔马林固定石蜡包埋(FFPE)组织的分析仍然具有挑战性,它构成了大多数临床可用的样本。此外,在空间环境中捕获编码和非编码rna带来了重大挑战。我们最近推出了pathog - dbit,一种旨在解决这些未满足需求的技术。然而,病理- dbit和现有空间转录组学协议之间的显著差异需要专门的计算工具来对FFPE样本进行全面的全转录组分析。结果:在这里,我们提出ASTRO,一个用于处理空间转录组学数据的自动化管道。除了支持标准数据集,ASTRO还针对FFPE样品的全转录组分析进行了优化,能够检测各种RNA物种,包括非编码RNA,如miRNAs。为了弥补FFPE组织中降低的RNA质量,ASTRO采用了专门的过滤步骤并优化了空间条形码调用,提高了制图率。这些优化使ASTRO能够在空间上量化整个转录组中的编码和非编码RNA物种,并在FFPE样品中实现稳健的性能。可用性:代码可在GitHub (https://github.com/gersteinlab/ASTRO)和Zenodo (doi: 10.5281/ Zenodo .17913760)获得。补充信息:补充数据可在生物信息学在线获取。
{"title":"ASTRO: Automated Spatial-Transcriptome whole RNA Output.","authors":"Dingyao Zhang, Zhiyuan Chu, Yiran Huo, Yunzhe Jiang, Yuhang Chen, Zhiliang Bai, Rong Fan, Jun Lu, Mark Gerstein","doi":"10.1093/bioinformatics/btaf688","DOIUrl":"10.1093/bioinformatics/btaf688","url":null,"abstract":"<p><strong>Motivation: </strong>Despite significant advances in spatial transcriptomics, the analysis of formalin-fixed paraffin-embedded (FFPE) tissues, which constitute most clinically available samples, remains challenging. Additionally, capturing both coding and non-coding RNAs in a spatial context poses significant challenges. We recently introduced Patho-DBiT, a technology designed to address these unmet needs. However, the marked differences between Patho-DBiT and existing spatial transcriptomics protocols necessitate specialized computational tools for comprehensive whole-transcriptome analysis in FFPE samples.</p><p><strong>Results: </strong>Here, we present ASTRO, an automated pipeline developed to process spatial transcriptomics data. In addition to supporting standard datasets, ASTRO is optimized for whole-transcriptome analyses of FFPE samples, enabling the detection of various RNA species, including non-coding RNAs such as miRNAs. To compensate for the reduced RNA quality in FFPE tissues, ASTRO incorporates a specialized filtering step and optimizes spatial barcode calling, increasing the mapping rate. These optimizations allow ASTRO to spatially quantify coding and non-coding RNA species in the entire transcriptome and achieve robust performance in FFPE samples.</p><p><strong>Availability and implementation: </strong>Codes are available at GitHub (https://github.com/gersteinlab/ASTRO) and Zenodo (doi: 10.5281/zenodo.17913760).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FracFixR: a compositional statistical framework for absolute proportion estimation between fractions in RNA sequencing data. FracFixR:用于RNA测序数据中分数之间绝对比例估计的组成统计框架。
IF 5.4 Pub Date : 2026-01-03 DOI: 10.1093/bioinformatics/btaf615
Alice Cleynen, Agin Ravindran, Nikolay E Shirokikh

Summary: RNA fractionation followed by high-throughput sequencing (RNA-seq) is widely used to study RNA localization, translation, structure, stability and subcellular compartmentalization. Interpreting fractionated RNA-seq data poses a fundamental compositional challenge: library preparation and sequencing depth obscure the original proportions of RNA fractions, which can bias comparisons-particularly when biological changes shift RNA distribution across fractions. This bias compromises comparisons of fraction-specific RNA profiles and limits the utility of standard differential expression methods. Existing approaches using transcript frequency ratios or standard normalization fail to account for the compositional nature of fractionated samples and also cannot estimate the unrecoverable "lost" fraction. We developed FracFixR, a statistical framework that reconstructs original fraction proportions by modeling the compositional relationship between the whole and the fractionated RNA samples. Using non-negative linear regression on carefully selected transcripts, FracFixR estimates global fraction weights, corrects individual transcript frequencies, and quantifies the unrecoverable material. The framework includes methods for differential proportion testing between conditions using binomial GLM, logit, or beta-binomial models. We rigorously validated FracFixR using synthetic data with known ground truth based on naturally observed aligned read distributions and real polysome profiling data from multiple cell lines, demonstrating accurate reconstruction of fraction weights (Pearson correlation >0.85) and enabling detection of differentially translated transcripts between cancer subtypes.

Availability and implementation: FracFixR is implemented as an R package freely available on GitHub at https://github.com/Arnaroo/FracFixR as well as on the CRAN repository.

摘要:RNA分离后高通量测序(RNA-seq)被广泛用于研究RNA的定位、翻译、结构、稳定性和亚细胞区隔化。解释分离的RNA-seq数据提出了一个基本的组成挑战:文库制备和测序深度模糊了RNA部分的原始比例,这可能会导致比较偏差,特别是当生物变化改变了RNA在不同部分的分布时。这种偏差损害了部分特异性RNA谱的比较,限制了标准差异表达方法的实用性。使用转录本频率比或标准归一化的现有方法无法解释分馏样本的组成性质,也无法估计不可恢复的“丢失”部分。我们开发了FracFixR,这是一个统计框架,通过模拟整体和分离RNA样品之间的组成关系来重建原始分数比例。对精心挑选的转录本使用非负线性回归,FracFixR估计全局分数权重,校正单个转录本频率,并量化不可恢复的材料。该框架包括使用二项GLM, logit或β -二项模型的条件之间的差异比例测试方法。我们使用基于自然观察到的排列读分布和来自多个细胞系的真实多聚体分析数据的合成数据严格验证了FracFixR,证明了分数权重的准确重建(Pearson相关性> 0.85),并能够检测癌症亚型之间的差异翻译转录本。可用性和实现:FracFixR是作为一个R包实现的,可以在GitHub上免费获得https://github.com/Arnaroo/FracFixR以及CRAN存储库。
{"title":"FracFixR: a compositional statistical framework for absolute proportion estimation between fractions in RNA sequencing data.","authors":"Alice Cleynen, Agin Ravindran, Nikolay E Shirokikh","doi":"10.1093/bioinformatics/btaf615","DOIUrl":"10.1093/bioinformatics/btaf615","url":null,"abstract":"<p><strong>Summary: </strong>RNA fractionation followed by high-throughput sequencing (RNA-seq) is widely used to study RNA localization, translation, structure, stability and subcellular compartmentalization. Interpreting fractionated RNA-seq data poses a fundamental compositional challenge: library preparation and sequencing depth obscure the original proportions of RNA fractions, which can bias comparisons-particularly when biological changes shift RNA distribution across fractions. This bias compromises comparisons of fraction-specific RNA profiles and limits the utility of standard differential expression methods. Existing approaches using transcript frequency ratios or standard normalization fail to account for the compositional nature of fractionated samples and also cannot estimate the unrecoverable \"lost\" fraction. We developed FracFixR, a statistical framework that reconstructs original fraction proportions by modeling the compositional relationship between the whole and the fractionated RNA samples. Using non-negative linear regression on carefully selected transcripts, FracFixR estimates global fraction weights, corrects individual transcript frequencies, and quantifies the unrecoverable material. The framework includes methods for differential proportion testing between conditions using binomial GLM, logit, or beta-binomial models. We rigorously validated FracFixR using synthetic data with known ground truth based on naturally observed aligned read distributions and real polysome profiling data from multiple cell lines, demonstrating accurate reconstruction of fraction weights (Pearson correlation >0.85) and enabling detection of differentially translated transcripts between cancer subtypes.</p><p><strong>Availability and implementation: </strong>FracFixR is implemented as an R package freely available on GitHub at https://github.com/Arnaroo/FracFixR as well as on the CRAN repository.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866640/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Malaria-GENOMAP: a web-based tool for exploring genomic variation of malaria parasites. 疟疾- genomap:一个基于网络的工具,用于探索疟疾寄生虫的基因组变异。
IF 5.4 Pub Date : 2026-01-03 DOI: 10.1093/bioinformatics/btag016
Joseph Thorpe, Nina Billows, Gabrielle C Ngwana-Joseph, Amy Ibrahim, Deborah Nolder, Colin J Sutherland, Thi Hong Ngoc Nguyen, Thi Huong Binh Nguyen, Quang Thieu Nguyen, Jamille G Dombrowski, Silvia Maria Di Santi, Claudio R F Marinho, Jody E Phelan, Tomasz Kurowski, Fady Mohareb, Susana Campino, Taane G Clark

Motivation: Malaria, caused by Plasmodium parasites, imposes a significant public health burden. While Plasmodium falciparum remains the primary target of elimination strategies due to its high mortality rate, lesser-known species such as P. malariae, P. vivax, and P. knowlesi continue to contribute to substantial human morbidity. Genomic approaches, including whole-genome sequencing, offer powerful tools for understanding the biology, transmission, and emerging drug resistance of these neglected Plasmodium species. However, there is an urgent need for informatic tools to summarize and visualize the high-dimensional and complex genomic data generated.

Results: We developed Malaria-GENOMAP, a user-friendly web-based tool, which integrates genomic variant data, such as allele frequencies, with geographical maps and chromosome-wide to gene views for in-depth exploration. The tool includes variation from P. knowlesi (n = 139), P. malariae (n = 158), P. ovale curtisi (n = 36), P. ovale wallikeri (n = 47), P. simium (n = 38), and P. vivax (n = 1359). It enables the investigation of population structure, geographic associations of mutations, and putative drug resistance markers, offering valuable insights for malaria control efforts.

Availability and implementation: Malaria-GENOMAP is available online at https://genomics.lshtm.ac.uk/malaria-genomaps.

动机:由疟原虫引起的疟疾对公共卫生造成重大负担。虽然恶性疟原虫由于其高死亡率仍然是消除战略的主要目标,但鲜为人知的物种,如疟疾疟原虫、间日疟原虫和诺氏疟原虫继续造成大量人类发病率。基因组方法,包括全基因组测序,为了解这些被忽视的疟原虫物种的生物学、传播和新出现的耐药性提供了强大的工具。然而,迫切需要信息工具来总结和可视化所产生的高维和复杂的基因组数据。结果:我们开发了一个用户友好的基于网络的工具——疟疾基因图谱(Malaria-GENOMAP),该工具将基因组变异数据(如等位基因频率)与地理地图和染色体范围到基因的观点相结合,以进行深入探索。该工具包括诺氏疟原虫(n = 139)、疟疾疟原虫(n = 158)、卵形疟原虫curtisi (n = 36)、卵形疟原虫wallikeri (n = 47)、猴形疟原虫(n = 38)和间日疟原虫(n = 1359)的变异。它能够调查种群结构、突变的地理关联和假定的耐药性标记,为疟疾控制工作提供有价值的见解。可用性:疟疾基因组计划可在https://genomics.lshtm.ac.uk/malaria-genomaps/#/.Supplementary上在线获得:补充数据可在Bioinformatics在线获得。
{"title":"Malaria-GENOMAP: a web-based tool for exploring genomic variation of malaria parasites.","authors":"Joseph Thorpe, Nina Billows, Gabrielle C Ngwana-Joseph, Amy Ibrahim, Deborah Nolder, Colin J Sutherland, Thi Hong Ngoc Nguyen, Thi Huong Binh Nguyen, Quang Thieu Nguyen, Jamille G Dombrowski, Silvia Maria Di Santi, Claudio R F Marinho, Jody E Phelan, Tomasz Kurowski, Fady Mohareb, Susana Campino, Taane G Clark","doi":"10.1093/bioinformatics/btag016","DOIUrl":"10.1093/bioinformatics/btag016","url":null,"abstract":"<p><strong>Motivation: </strong>Malaria, caused by Plasmodium parasites, imposes a significant public health burden. While Plasmodium falciparum remains the primary target of elimination strategies due to its high mortality rate, lesser-known species such as P. malariae, P. vivax, and P. knowlesi continue to contribute to substantial human morbidity. Genomic approaches, including whole-genome sequencing, offer powerful tools for understanding the biology, transmission, and emerging drug resistance of these neglected Plasmodium species. However, there is an urgent need for informatic tools to summarize and visualize the high-dimensional and complex genomic data generated.</p><p><strong>Results: </strong>We developed Malaria-GENOMAP, a user-friendly web-based tool, which integrates genomic variant data, such as allele frequencies, with geographical maps and chromosome-wide to gene views for in-depth exploration. The tool includes variation from P. knowlesi (n = 139), P. malariae (n = 158), P. ovale curtisi (n = 36), P. ovale wallikeri (n = 47), P. simium (n = 38), and P. vivax (n = 1359). It enables the investigation of population structure, geographic associations of mutations, and putative drug resistance markers, offering valuable insights for malaria control efforts.</p><p><strong>Availability and implementation: </strong>Malaria-GENOMAP is available online at https://genomics.lshtm.ac.uk/malaria-genomaps.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12881832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145954215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MCOAN: multimodal contrastive representation learning for cross-omics adaptive disease regulatory network prediction. 基于多模态对比表征学习的跨组学适应性疾病调节网络预测。
IF 5.4 Pub Date : 2026-01-03 DOI: 10.1093/bioinformatics/btag033
Junqi Long, Bo Liu, Jianqiang Li, Shuangtao Zhao

Motivation: Interactions among long noncoding RNAs, circular RNAs, microRNAs, and messenger RNAs form complex gene expression regulatory networks, which are of great significance for the diagnosis, prevention, and treatment of complex diseases. Although existing computational methods have been developed to predict interactions among certain molecular types, they are generally limited to single-modality perspectives, overlooking competitive specificity and co-target cooperativity across multi-omics molecules, and thereby limiting their ability to elucidate cross-omics regulatory mechanisms.

Results: We proposed a novel cross-omics adaptive multimodal contrastive learning framework (MCOAN) that learns multimodal regulatory mechanisms and effectively predicts disease-associated molecular regulatory networks. Specifically, we first constructed a five-layer heterogeneous graph architecture to comprehensively integrate the complex regulatory associations among multi-omics nodes. Then, we proposed an unsupervised multimodal contrastive learning strategy that maximizes mutual information across distinct regulatory views, thereby enhancing node representations by efficiently capturing local neighborhood structure and global semantic information. Meanwhile, we also proposed a cross-omics adaptive learning mechanism that captures complex competitive specificity and co-target cooperativity across distinct regulatory networks, thereby further enhancing the structural awareness in node representations. Furthermore, we evaluated multiple downstream classifiers to accurately predict multimodal molecular regulatory networks. Finally, extensive experiments show that MCOAN consistently outperforms existing methods, achieving strong predictive accuracy and generalization (max AUC = 0.9881; max AUPR = 0.9826), and further confirm its real-world predictive performance through case studies.

Availability and implementation: All resources are available at https://github.com/JunqiLab/MCOAN.git.

研究动机:长链非编码rna (lncRNAs)、环状rna (circRNAs)、微rna (miRNAs)、信使rna (mrna)相互作用形成复杂的基因表达调控网络,对复杂疾病的诊断、预防和治疗具有重要意义。虽然现有的计算方法已经发展到预测某些分子类型之间的相互作用,但它们通常仅限于单模态视角,忽略了多组学分子之间的竞争特异性和共同靶标协同性,从而限制了它们阐明跨组学调控机制的能力。结果:我们提出了一种新的跨组学自适应多模态对比学习框架(MCOAN),该框架可以学习多模态调节机制并有效预测疾病相关的分子调节网络。具体而言,我们首先构建了一个五层异构图架构,以全面整合多组学节点之间复杂的调控关联。然后,我们提出了一种无监督的多模态对比学习策略,该策略最大化了不同监管视图之间的互信息,从而通过有效捕获局部邻域结构和全局语义信息来增强节点表示。同时,我们还提出了一种跨组学自适应学习机制,该机制可以捕获不同调控网络之间复杂的竞争特异性和共靶标协同性,从而进一步增强节点表示中的结构意识。此外,我们评估了多个下游分类器,以准确预测多模态分子调控网络。最后,大量实验表明,MCOAN始终优于现有方法,具有较强的预测精度和泛化能力(max AUC = 0.9881; max AUPR = 0.9826),并通过案例研究进一步证实了其在现实世界中的预测性能。可用性:所有资源可在https://github.com/JunqiLab/MCOAN.git.Supplementary信息上获得;补充数据可在Bioinformatics在线上获得。
{"title":"MCOAN: multimodal contrastive representation learning for cross-omics adaptive disease regulatory network prediction.","authors":"Junqi Long, Bo Liu, Jianqiang Li, Shuangtao Zhao","doi":"10.1093/bioinformatics/btag033","DOIUrl":"10.1093/bioinformatics/btag033","url":null,"abstract":"<p><strong>Motivation: </strong>Interactions among long noncoding RNAs, circular RNAs, microRNAs, and messenger RNAs form complex gene expression regulatory networks, which are of great significance for the diagnosis, prevention, and treatment of complex diseases. Although existing computational methods have been developed to predict interactions among certain molecular types, they are generally limited to single-modality perspectives, overlooking competitive specificity and co-target cooperativity across multi-omics molecules, and thereby limiting their ability to elucidate cross-omics regulatory mechanisms.</p><p><strong>Results: </strong>We proposed a novel cross-omics adaptive multimodal contrastive learning framework (MCOAN) that learns multimodal regulatory mechanisms and effectively predicts disease-associated molecular regulatory networks. Specifically, we first constructed a five-layer heterogeneous graph architecture to comprehensively integrate the complex regulatory associations among multi-omics nodes. Then, we proposed an unsupervised multimodal contrastive learning strategy that maximizes mutual information across distinct regulatory views, thereby enhancing node representations by efficiently capturing local neighborhood structure and global semantic information. Meanwhile, we also proposed a cross-omics adaptive learning mechanism that captures complex competitive specificity and co-target cooperativity across distinct regulatory networks, thereby further enhancing the structural awareness in node representations. Furthermore, we evaluated multiple downstream classifiers to accurately predict multimodal molecular regulatory networks. Finally, extensive experiments show that MCOAN consistently outperforms existing methods, achieving strong predictive accuracy and generalization (max AUC = 0.9881; max AUPR = 0.9826), and further confirm its real-world predictive performance through case studies.</p><p><strong>Availability and implementation: </strong>All resources are available at https://github.com/JunqiLab/MCOAN.git.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12881826/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146004644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics (Oxford, England)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1