{"title":"A comparative analysis of mutual information methods for pairwise relationship detection in metagenomic data.","authors":"Dallace Francis, Fengzhu Sun","doi":"10.1186/s12859-024-05883-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Construction of co-occurrence networks in metagenomic data often employs correlation to infer pairwise relationships between microbes. However, biological systems are complex and often display qualities non-linear in nature. Therefore, the reliance on correlation alone may overlook important relationships and fail to capture the full breadth of intricacies presented in underlying interaction networks. It is of interest to incorporate metrics that are not only robust in detecting linear relationships, but non-linear ones as well.</p><p><strong>Results: </strong>In this paper, we explore the use of various mutual information (MI) estimation approaches for quantifying pairwise relationships in biological data and compare their performances against two traditional measures-Pearson's correlation coefficient, r, and Spearman's rank correlation coefficient, ρ. Metrics are tested on both simulated data designed to mimic pairwise relationships that may be found in ecological systems and real data from a previous study on C. diff infection. The results demonstrate that, in the case of asymmetric relationships, mutual information estimators can provide better detection ability than Pearson's or Spearman's correlation coefficients. Specifically, we find that these estimators have elevated performances in the detection of exploitative relationships, demonstrating the potential benefit of including them in future metagenomic studies.</p><p><strong>Conclusions: </strong>Mutual information (MI) can uncover complex pairwise relationships in biological data that may be missed by traditional measures of association. The inclusion of such relationships when constructing co-occurrence networks can result in a more comprehensive analysis than the use of correlation alone.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323399/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05883-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Construction of co-occurrence networks in metagenomic data often employs correlation to infer pairwise relationships between microbes. However, biological systems are complex and often display qualities non-linear in nature. Therefore, the reliance on correlation alone may overlook important relationships and fail to capture the full breadth of intricacies presented in underlying interaction networks. It is of interest to incorporate metrics that are not only robust in detecting linear relationships, but non-linear ones as well.
Results: In this paper, we explore the use of various mutual information (MI) estimation approaches for quantifying pairwise relationships in biological data and compare their performances against two traditional measures-Pearson's correlation coefficient, r, and Spearman's rank correlation coefficient, ρ. Metrics are tested on both simulated data designed to mimic pairwise relationships that may be found in ecological systems and real data from a previous study on C. diff infection. The results demonstrate that, in the case of asymmetric relationships, mutual information estimators can provide better detection ability than Pearson's or Spearman's correlation coefficients. Specifically, we find that these estimators have elevated performances in the detection of exploitative relationships, demonstrating the potential benefit of including them in future metagenomic studies.
Conclusions: Mutual information (MI) can uncover complex pairwise relationships in biological data that may be missed by traditional measures of association. The inclusion of such relationships when constructing co-occurrence networks can result in a more comprehensive analysis than the use of correlation alone.
背景:元基因组数据中共生网络的构建通常采用相关性来推断微生物之间的配对关系。然而,生物系统是复杂的,而且往往表现出非线性的性质。因此,仅仅依靠相关性可能会忽略重要的关系,也无法捕捉潜在交互网络中错综复杂的全部内容。因此,我们有兴趣采用不仅能检测线性关系,而且能检测非线性关系的指标:本文探讨了使用各种互信息(MI)估算方法来量化生物数据中的配对关系,并将它们的性能与两个传统指标--皮尔逊相关系数 r 和斯皮尔曼等级相关系数 ρ 进行了比较。结果表明,在非对称关系的情况下,互信息估计器比皮尔逊或斯皮尔曼相关系数能提供更好的检测能力。具体来说,我们发现这些估计值在检测利用关系时表现更佳,这表明将它们纳入未来的元基因组研究可能会带来益处:结论:互信息(MI)可以发现生物数据中复杂的成对关系,而传统的关联测量方法可能会忽略这些关系。结论:互信息(MI)可以发现生物数据中复杂的成对关系,而传统的关联测量方法可能会忽略这些关系。在构建共现网络时,如果将这些关系纳入其中,就能获得比单独使用相关性更全面的分析结果。
期刊介绍:
BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology.
BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.