CMFHMDA: a prediction framework for human disease-microbe associations based on cross-domain matrix factorization.

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS Briefings in bioinformatics Pub Date : 2024-09-23 DOI:10.1093/bib/bbae481

Jing Chen, Ran Tao, Yi Qiu, Qun Yuan

{"title":"CMFHMDA: a prediction framework for human disease-microbe associations based on cross-domain matrix factorization.","authors":"Jing Chen, Ran Tao, Yi Qiu, Qun Yuan","doi":"10.1093/bib/bbae481","DOIUrl":null,"url":null,"abstract":"<p><p>Predicting associations between microbes and diseases opens up new avenues for developing diagnostic, preventive, and therapeutic strategies. Given that laboratory-based biological tests to verify these associations are often time-consuming and expensive, there is a critical need for innovative computational frameworks to predict new microbe-disease associations. In this work, we introduce a novel prediction algorithm called Predicting Human Disease-Microbe Associations using Cross-Domain Matrix Factorization (CMFHMDA). Initially, we calculate the composite similarity of diseases and the Gaussian interaction profile similarity of microbes. We then apply the Weighted K Nearest Known Neighbors (WKNKN) algorithm to refine the microbe-disease association matrix. Our CMFHMDA model is subsequently developed by integrating the network data of both microbes and diseases to predict potential associations. The key innovations of this method include using the WKNKN algorithm to preprocess missing values in the association matrix and incorporating cross-domain information from microbes and diseases into the CMFHMDA model. To validate CMFHMDA, we employed three different cross-validation techniques to evaluate the model's accuracy. The results indicate that the CMFHMDA model achieved Area Under the Receiver Operating Characteristic Curve scores of 0.9172, 0.8551, and 0.9351$\\pm $0.0052 in global Leave-One-Out Cross-Validation (LOOCV), local LOOCV, and five-fold CV, respectively. Furthermore, many predicted associations have been confirmed by published experimental studies, establishing CMFHMDA as an effective tool for predicting potential disease-associated microbes.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11427075/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae481","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Predicting associations between microbes and diseases opens up new avenues for developing diagnostic, preventive, and therapeutic strategies. Given that laboratory-based biological tests to verify these associations are often time-consuming and expensive, there is a critical need for innovative computational frameworks to predict new microbe-disease associations. In this work, we introduce a novel prediction algorithm called Predicting Human Disease-Microbe Associations using Cross-Domain Matrix Factorization (CMFHMDA). Initially, we calculate the composite similarity of diseases and the Gaussian interaction profile similarity of microbes. We then apply the Weighted K Nearest Known Neighbors (WKNKN) algorithm to refine the microbe-disease association matrix. Our CMFHMDA model is subsequently developed by integrating the network data of both microbes and diseases to predict potential associations. The key innovations of this method include using the WKNKN algorithm to preprocess missing values in the association matrix and incorporating cross-domain information from microbes and diseases into the CMFHMDA model. To validate CMFHMDA, we employed three different cross-validation techniques to evaluate the model's accuracy. The results indicate that the CMFHMDA model achieved Area Under the Receiver Operating Characteristic Curve scores of 0.9172, 0.8551, and 0.9351$\pm $0.0052 in global Leave-One-Out Cross-Validation (LOOCV), local LOOCV, and five-fold CV, respectively. Furthermore, many predicted associations have been confirmed by published experimental studies, establishing CMFHMDA as an effective tool for predicting potential disease-associated microbes.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CMFHMDA：基于跨域矩阵因式分解的人类疾病-微生物关联预测框架。

预测微生物与疾病之间的关联为开发诊断、预防和治疗策略开辟了新途径。鉴于验证这些关联的实验室生物测试往往耗时且昂贵，因此亟需创新的计算框架来预测新的微生物与疾病的关联。在这项工作中，我们介绍了一种名为 "利用跨域矩阵因式分解预测人类疾病-微生物关联"（CMFHMDA）的新型预测算法。首先，我们计算疾病的复合相似度和微生物的高斯交互轮廓相似度。然后，我们采用加权 K 最近已知邻居（WKNKN）算法来完善微生物-疾病关联矩阵。随后，通过整合微生物和疾病的网络数据，我们建立了 CMFHMDA 模型，以预测潜在的关联。该方法的主要创新点包括使用 WKNKN 算法预处理关联矩阵中的缺失值，以及将微生物和疾病的跨领域信息纳入 CMFHMDA 模型。为了验证 CMFHMDA，我们采用了三种不同的交叉验证技术来评估模型的准确性。结果表明，CMFHMDA模型在全局留空交叉验证（LOOCV）、局部留空交叉验证（LOOCV）和五倍交叉验证（5-fold CV）中的接收者工作特征曲线下面积（Area Under the Receiver Operating Characteristic Curve）得分分别为0.9172、0.8551和0.9351/pm $0.0052。此外，许多预测的关联已被已发表的实验研究证实，从而使 CMFHMDA 成为预测潜在疾病相关微生物的有效工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.