Mutual information for detecting multi-class biomarkers when integrating multiple bulk or single-cell transcriptomic studies.

Jian Zou, Zheqi Li, Neil Carleton, Steffi Oesterreich, Adrian V Lee, George C Tseng
{"title":"Mutual information for detecting multi-class biomarkers when integrating multiple bulk or single-cell transcriptomic studies.","authors":"Jian Zou, Zheqi Li, Neil Carleton, Steffi Oesterreich, Adrian V Lee, George C Tseng","doi":"10.1093/bioinformatics/btae696","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Biomarker detection plays a pivotal role in biomedical research. Integrating omics studies from multiple cohorts can enhance statistical power, accuracy and robustness of the detection results. However, existing methods for horizontally combining omics studies are mostly designed for two-class scenarios (e.g., cases versus controls) and are not directly applicable for studies with multi-class design (e.g., samples from multiple disease subtypes, treatments, tissues, or cell types).</p><p><strong>Results: </strong>We propose a statistical framework, namely Mutual Information Concordance Analysis (MICA), to detect biomarkers with concordant multi-class expression pattern across multiple omics studies from an information theoretic perspective. Our approach first detects biomarkers with concordant multi-class patterns across partial or all of the omics studies using a global test by mutual information. A post hoc analysis is then performed for each detected biomarkers and identify studies with concordant pattern. Extensive simulations demonstrate improved accuracy and successful false discovery rate control of MICA compared to an existing MCC method. The method is then applied to two practical scenarios: four tissues of mouse metabolism-related transcriptomic studies, and three sources of estrogen treatment expression profiles. Detected biomarkers by MICA show intriguing biological insights and functional annotations. Additionally, we implemented MICA for single-cell RNA-Seq data for tumor progression biomarkers, highlighting critical roles of ribosomal function in the tumor microenvironment of triple-negative breast cancer and underscoring the potential of MICA for detecting novel therapeutic targets.</p><p><strong>Availability: </strong>The source code is available on Figshare at https://doi.org/10.6084/m9.figshare.27635436. Additionally, the R package can be installed directly from GitHub at https://github.com/jianzou75/MICA.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Biomarker detection plays a pivotal role in biomedical research. Integrating omics studies from multiple cohorts can enhance statistical power, accuracy and robustness of the detection results. However, existing methods for horizontally combining omics studies are mostly designed for two-class scenarios (e.g., cases versus controls) and are not directly applicable for studies with multi-class design (e.g., samples from multiple disease subtypes, treatments, tissues, or cell types).

Results: We propose a statistical framework, namely Mutual Information Concordance Analysis (MICA), to detect biomarkers with concordant multi-class expression pattern across multiple omics studies from an information theoretic perspective. Our approach first detects biomarkers with concordant multi-class patterns across partial or all of the omics studies using a global test by mutual information. A post hoc analysis is then performed for each detected biomarkers and identify studies with concordant pattern. Extensive simulations demonstrate improved accuracy and successful false discovery rate control of MICA compared to an existing MCC method. The method is then applied to two practical scenarios: four tissues of mouse metabolism-related transcriptomic studies, and three sources of estrogen treatment expression profiles. Detected biomarkers by MICA show intriguing biological insights and functional annotations. Additionally, we implemented MICA for single-cell RNA-Seq data for tumor progression biomarkers, highlighting critical roles of ribosomal function in the tumor microenvironment of triple-negative breast cancer and underscoring the potential of MICA for detecting novel therapeutic targets.

Availability: The source code is available on Figshare at https://doi.org/10.6084/m9.figshare.27635436. Additionally, the R package can be installed directly from GitHub at https://github.com/jianzou75/MICA.

Supplementary information: Supplementary data are available at Bioinformatics online.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在整合多个批量或单细胞转录组研究时检测多类生物标记物的互信息。
动机生物标记物检测在生物医学研究中起着举足轻重的作用。整合来自多个队列的 omics 研究可以提高检测结果的统计能力、准确性和稳健性。然而,现有的横向联合 omics 研究方法大多是针对两类情况(如病例与对照)设计的,并不能直接适用于多类设计的研究(如来自多种疾病亚型、治疗方法、组织或细胞类型的样本):我们提出了一个统计框架,即互信息一致性分析(MICA),从信息论的角度来检测在多项omics研究中具有多类一致表达模式的生物标记物。我们的方法首先通过互信息进行全局检验,检测在部分或全部 omics 研究中具有多类一致表达模式的生物标记物。然后对每个检测到的生物标记物进行事后分析,找出具有一致模式的研究。大量的仿真证明,与现有的 MCC 方法相比,MICA 的准确性得到了提高,并成功地控制了误诊率。该方法随后被应用于两种实际情况:四种组织的小鼠代谢相关转录组学研究和三种来源的雌激素治疗表达谱。通过 MICA 检测到的生物标记物显示出令人感兴趣的生物学见解和功能注释。此外,我们还对单细胞 RNA-Seq 数据中的肿瘤进展生物标记物实施了 MICA,突出了核糖体功能在三阴性乳腺癌肿瘤微环境中的关键作用,并强调了 MICA 在检测新型治疗靶点方面的潜力:源代码可在 Figshare 网站 https://doi.org/10.6084/m9.figshare.27635436 上获取。此外,R软件包可直接从GitHub安装,网址为 https://github.com/jianzou75/MICA.Supplementary:补充数据可在 Bioinformatics online 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
RUCova: Removal of Unwanted Covariance in mass cytometry data. ViraLM: Empowering Virus Discovery through the Genome Foundation Model. CVR-BBI: An Open-Source VR Platform for Multi-User Collaborative Brain to Brain Interfaces. Expert-guided protein Language Models enable accurate and blazingly fast fitness prediction. FungiFun3: Systemic gene set enrichment analysis for fungal species.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1