From correlation to causation using directed topological overlap matrix: Applications in genomics

IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Methods Pub Date : 2023-11-01 DOI:10.1016/j.ymeth.2023.09.005
Borzou Alipourfard , Jean Gao
{"title":"From correlation to causation using directed topological overlap matrix: Applications in genomics","authors":"Borzou Alipourfard ,&nbsp;Jean Gao","doi":"10.1016/j.ymeth.2023.09.005","DOIUrl":null,"url":null,"abstract":"<div><p><span>Most causal discovery tools assume the local causal Markov condition. However, the theoretical assumptions that underlie the local causal Markov condition are often not met in practice. This is especially marked in genomics, where the unwanted presence of measurement errors, averaging effects, and feedback loops significantly undermine the legitimacy of the local causal Markov condition. Furthermore, these causal discovery algorithms require very large samples, orders above what is often available. In this paper, relaxing the local causal Markov condition and using Reichenbach's common cause principle instead, we present a more flexible approach to causal discovery, the directed topological overlap matrix (DTOM). DTOM is robust w.r.t. the presence of measurement errors, averaging effects, feedback loops, and is significantly more sample efficient. We study the utility of DTOM for discovering causal relations in biological data using three real gene expression data-sets. We first examine if DTOM can help distinguish the Myostatin mutation in the Piedmontese cattle by contrasting the muscle </span>transcriptomes<span> of the Piedmontese and Wagyu crosses: the Myostatin mutation is the cause of the double-muscling the Piedmontese cattle are famous for. We then consider a large-scale gene deletion study in yeast. We show that DTOM allows us to distinguish the deleted gene in a sample knowing only the set of differentially expressed genes in that sample. We then examine the progression of Alzheimer's disease (AD) under the lens of DTOM. The genes implicated as having a causal role in the progression of AD by our DTOM analysis were significantly enriched in cellular components that had been repeatedly implicated in the progression of AD.</span></p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"219 ","pages":"Pages 58-67"},"PeriodicalIF":4.2000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202323001597","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Most causal discovery tools assume the local causal Markov condition. However, the theoretical assumptions that underlie the local causal Markov condition are often not met in practice. This is especially marked in genomics, where the unwanted presence of measurement errors, averaging effects, and feedback loops significantly undermine the legitimacy of the local causal Markov condition. Furthermore, these causal discovery algorithms require very large samples, orders above what is often available. In this paper, relaxing the local causal Markov condition and using Reichenbach's common cause principle instead, we present a more flexible approach to causal discovery, the directed topological overlap matrix (DTOM). DTOM is robust w.r.t. the presence of measurement errors, averaging effects, feedback loops, and is significantly more sample efficient. We study the utility of DTOM for discovering causal relations in biological data using three real gene expression data-sets. We first examine if DTOM can help distinguish the Myostatin mutation in the Piedmontese cattle by contrasting the muscle transcriptomes of the Piedmontese and Wagyu crosses: the Myostatin mutation is the cause of the double-muscling the Piedmontese cattle are famous for. We then consider a large-scale gene deletion study in yeast. We show that DTOM allows us to distinguish the deleted gene in a sample knowing only the set of differentially expressed genes in that sample. We then examine the progression of Alzheimer's disease (AD) under the lens of DTOM. The genes implicated as having a causal role in the progression of AD by our DTOM analysis were significantly enriched in cellular components that had been repeatedly implicated in the progression of AD.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用有向拓扑重叠矩阵从相关性到因果关系:在基因组学中的应用。
大多数因果发现工具都假设局部因果马尔可夫条件。然而,作为局部因果马尔可夫条件基础的理论假设在实践中往往不满足。这在基因组学中尤其明显,测量误差、平均效应和反馈回路的不必要存在显著破坏了局部因果马尔可夫条件的合法性。此外,这些因果发现算法需要非常大的样本,数量级高于通常可用的数量级。在本文中,我们放松了局部因果马尔可夫条件,转而使用Reichenbach的共因原理,提出了一种更灵活的因果发现方法,即有向拓扑重叠矩阵(DTOM)。DTOM在存在测量误差、平均效应、反馈回路的情况下是稳健的,并且显著提高了采样效率。我们使用三个真实的基因表达数据集研究了DTOM在发现生物学数据中因果关系方面的效用。我们首先通过对比皮埃蒙特牛和Wagyu杂交的肌肉转录组,研究DTOM是否有助于区分皮埃蒙特牛的肌肉抑制素突变:肌肉抑制素变异是皮埃蒙特牛著名的双肌肉的原因。然后我们考虑在酵母中进行大规模的基因缺失研究。我们表明,DTOM使我们能够区分样本中缺失的基因,只知道该样本中差异表达的基因集。然后,我们在DTOM的镜头下检查阿尔茨海默病(AD)的进展。通过我们的DTOM分析,被认为在AD进展中具有因果作用的基因在反复参与AD进展的细胞成分中显著富集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Methods
Methods 生物-生化研究方法
CiteScore
9.80
自引率
2.10%
发文量
222
审稿时长
11.3 weeks
期刊介绍: Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.
期刊最新文献
Ab-Amy 2.0: Predicting light chain amyloidogenic risk of therapeutic antibodies based on antibody language model. SITP: A single cell bioinformatics analysis flow captures proteasome markers in the development of breast cancer Data Preprocessing Methods for Selective Sweep Detection using Convolutional Neural Networks. Exploring drug-target interaction prediction on cold-start scenarios via meta-learning-based graph transformer. MVCLST: A spatial transcriptome data analysis pipeline for cell type classification based on multi-view comparative learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1