Vistla: identifying influence paths with information theory.

IF 5.4 Bioinformatics (Oxford, England) Pub Date : 2025-02-04 DOI:10.1093/bioinformatics/btaf036

Miron B Kursa

{"title":"Vistla: identifying influence paths with information theory.","authors":"Miron B Kursa","doi":"10.1093/bioinformatics/btaf036","DOIUrl":null,"url":null,"abstract":"Motivation: It is a challenging task to decipher the mechanisms of a complex system from observational data, especially in biology, where systems are sophisticated, measurements coarse, and multi-modality common. The typical approaches of inferring a network of relationships between a system's components struggle with the quality and feasibility of estimation, as well as with the interpretability of the results they yield. Said issues can be avoided, however, when dealing with a simpler problem of tracking only the influence paths, defined as circuits relying on the information of an experimental perturbation as it spreads through the system. Such an approach can be formalized with information theory and leads to a relatively streamlined, interpretable output, in contrast to the incomprehensibly dense 'haystack' networks produced by typical tools.Results: Following this idea, the paper introduces Vistla, a novel method built around tri-variate mutual information and data processing inequality, combined with a higher-order generalization of the widest path problem. Vistla can be used standalone, in a machine learning pipeline to aid interpretability, or as a tool for mediation analysis; the paper demonstrates its efficiency both in synthetic and real-world problems.Availability and implementation: The R package implementing the method is available at https://gitlab.com/mbq/vistla, as well as on CRAN.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11806950/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation: It is a challenging task to decipher the mechanisms of a complex system from observational data, especially in biology, where systems are sophisticated, measurements coarse, and multi-modality common. The typical approaches of inferring a network of relationships between a system's components struggle with the quality and feasibility of estimation, as well as with the interpretability of the results they yield. Said issues can be avoided, however, when dealing with a simpler problem of tracking only the influence paths, defined as circuits relying on the information of an experimental perturbation as it spreads through the system. Such an approach can be formalized with information theory and leads to a relatively streamlined, interpretable output, in contrast to the incomprehensibly dense 'haystack' networks produced by typical tools.

Results: Following this idea, the paper introduces Vistla, a novel method built around tri-variate mutual information and data processing inequality, combined with a higher-order generalization of the widest path problem. Vistla can be used standalone, in a machine learning pipeline to aid interpretability, or as a tool for mediation analysis; the paper demonstrates its efficiency both in synthetic and real-world problems.

Availability and implementation: The R package implementing the method is available at https://gitlab.com/mbq/vistla, as well as on CRAN.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用信息论识别影响路径。

动机：从观测数据中解读复杂系统的机制是一项具有挑战性的任务；特别是在系统复杂的生物学中，测量粗糙和多模态是一个共同的特征。推断系统组件之间关系网络的典型方法与评估的质量和可行性以及它们产生的结果的可解释性作斗争。然而，当处理仅跟踪影响路径的简单问题时，可以避免上述问题，影响路径定义为依赖于实验扰动在系统中传播时的信息的电路。这种方法可以用信息论形式化，并导致相对精简的、可解释的输出，与典型工具产生的难以理解的密集“干草堆”网络形成对比。结果：根据这一思路，本文介绍了一种围绕三元互信息和数据处理不等式建立的新方法Vistla，并结合了最宽路径问题的高阶推广。vista可以独立使用，在机器学习管道中帮助可解释性，或者作为中介分析的工具；本文证明了该方法在综合问题和实际问题中的有效性。可用性和实现：实现该方法的R包可以在https://gitlab.com/mbq/vistla和CRAN上获得。补充信息：补充数据可在Bioinformatics网站在线获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Bioinformatics (Oxford, England)

自引率

0.00%

发文量