Mutual information for detecting multi-class biomarkers when integrating multiple bulk or single-cell transcriptomic studies.

IF 5.4 Bioinformatics (Oxford, England) Pub Date : 2024-11-28 DOI:10.1093/bioinformatics/btae696

Jian Zou, Zheqi Li, Neil Carleton, Steffi Oesterreich, Adrian V Lee, George C Tseng

{"title":"Mutual information for detecting multi-class biomarkers when integrating multiple bulk or single-cell transcriptomic studies.","authors":"Jian Zou, Zheqi Li, Neil Carleton, Steffi Oesterreich, Adrian V Lee, George C Tseng","doi":"10.1093/bioinformatics/btae696","DOIUrl":null,"url":null,"abstract":"Motivation: Biomarker detection plays a pivotal role in biomedical research. Integrating omics studies from multiple cohorts can enhance statistical power, accuracy, and robustness of the detection results. However, existing methods for horizontally combining omics studies are mostly designed for two-class scenarios (e.g. cases versus controls) and are not directly applicable for studies with multi-class design (e.g. samples from multiple disease subtypes, treatments, tissues, or cell types).Results: We propose a statistical framework, namely Mutual Information Concordance Analysis (MICA), to detect biomarkers with concordant multi-class expression pattern across multiple omics studies from an information theoretic perspective. Our approach first detects biomarkers with concordant multi-class patterns across partial or all of the omics studies using a global test by mutual information. A post hoc analysis is then performed for each detected biomarkers and identify studies with concordant pattern. Extensive simulations demonstrate improved accuracy and successful false discovery rate control of MICA compared to an existing multi-class correlation method. The method is then applied to two practical scenarios: four tissues of mouse metabolism-related transcriptomic studies, and three sources of estrogen treatment expression profiles. Detected biomarkers by MICA show intriguing biological insights and functional annotations. Additionally, we implemented MICA for single-cell RNA-Seq data for tumor progression biomarkers, highlighting critical roles of ribosomal function in the tumor microenvironment of triple-negative breast cancer and underscoring the potential of MICA for detecting novel therapeutic targets.Availability and implementation: The source code is available on Figshare at https://doi.org/10.6084/m9.figshare.27635436. Additionally, the R package can be installed directly from GitHub at https://github.com/jianzou75/MICA.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11629966/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation: Biomarker detection plays a pivotal role in biomedical research. Integrating omics studies from multiple cohorts can enhance statistical power, accuracy, and robustness of the detection results. However, existing methods for horizontally combining omics studies are mostly designed for two-class scenarios (e.g. cases versus controls) and are not directly applicable for studies with multi-class design (e.g. samples from multiple disease subtypes, treatments, tissues, or cell types).

Results: We propose a statistical framework, namely Mutual Information Concordance Analysis (MICA), to detect biomarkers with concordant multi-class expression pattern across multiple omics studies from an information theoretic perspective. Our approach first detects biomarkers with concordant multi-class patterns across partial or all of the omics studies using a global test by mutual information. A post hoc analysis is then performed for each detected biomarkers and identify studies with concordant pattern. Extensive simulations demonstrate improved accuracy and successful false discovery rate control of MICA compared to an existing multi-class correlation method. The method is then applied to two practical scenarios: four tissues of mouse metabolism-related transcriptomic studies, and three sources of estrogen treatment expression profiles. Detected biomarkers by MICA show intriguing biological insights and functional annotations. Additionally, we implemented MICA for single-cell RNA-Seq data for tumor progression biomarkers, highlighting critical roles of ribosomal function in the tumor microenvironment of triple-negative breast cancer and underscoring the potential of MICA for detecting novel therapeutic targets.

Availability and implementation: The source code is available on Figshare at https://doi.org/10.6084/m9.figshare.27635436. Additionally, the R package can be installed directly from GitHub at https://github.com/jianzou75/MICA.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在整合多个批量或单细胞转录组研究时检测多类生物标记物的互信息。

动机生物标记物检测在生物医学研究中起着举足轻重的作用。整合来自多个队列的 omics 研究可以提高检测结果的统计能力、准确性和稳健性。然而，现有的横向联合 omics 研究方法大多是针对两类情况（如病例与对照）设计的，并不能直接适用于多类设计的研究（如来自多种疾病亚型、治疗方法、组织或细胞类型的样本）：我们提出了一个统计框架，即互信息一致性分析（MICA），从信息论的角度来检测在多项omics研究中具有多类一致表达模式的生物标记物。我们的方法首先通过互信息进行全局检验，检测在部分或全部 omics 研究中具有多类一致表达模式的生物标记物。然后对每个检测到的生物标记物进行事后分析，找出具有一致模式的研究。大量的仿真证明，与现有的 MCC 方法相比，MICA 的准确性得到了提高，并成功地控制了误诊率。该方法随后被应用于两种实际情况：四种组织的小鼠代谢相关转录组学研究和三种来源的雌激素治疗表达谱。通过 MICA 检测到的生物标记物显示出令人感兴趣的生物学见解和功能注释。此外，我们还对单细胞 RNA-Seq 数据中的肿瘤进展生物标记物实施了 MICA，突出了核糖体功能在三阴性乳腺癌肿瘤微环境中的关键作用，并强调了 MICA 在检测新型治疗靶点方面的潜力：源代码可在 Figshare 网站 https://doi.org/10.6084/m9.figshare.27635436 上获取。此外，R软件包可直接从GitHub安装，网址为 https://github.com/jianzou75/MICA.Supplementary：补充数据可在 Bioinformatics online 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Bioinformatics (Oxford, England)

自引率

0.00%

发文量