Interesting paths in the mapper complex

A. Kalyanaraman, M. Kamruzzaman, Bala Krishnamoorthy
{"title":"Interesting paths in the mapper complex","authors":"A. Kalyanaraman, M. Kamruzzaman, Bala Krishnamoorthy","doi":"10.20382/jocg.v10i1a17","DOIUrl":null,"url":null,"abstract":"Given a high dimensional point cloud of data with functions defined on the points, the mapper algorithm produces a compact summary in the form of a simplicial complex connecting the points. We study the problem of quantifying the interestingness of subpopulations in a given mapper complex. First, we create a weighted directed graph G = (V,E) using the 1-skeleton of the mapper complex. We use the average values at the vertices of a target function (dependent variable) to direct the edges from low to high values, and assign the difference (high−low) as the weight of the edge. Covariation of the remaining h functions (independent variables) is captured by a h-bit binary signature assigned to the edge. An interesting path in G is a directed path whose edges all have the same signature. The interestingness score of such a path as a sum of its edge weights multiplied by a nonlinear function of their corresponding ranks, i.e., the depths of the edges along the path. Such a nonlinear function could model application use-cases where the growth in the dependent variable values is expected to be concentrated in specific intervals of a path. Second, we study three optimization problems on this graph G to quantify interesting subpopulations. In the problem Max-IP, the goal is to find the most interesting path in G, i.e., an interesting path with the maximum interestingness score. For the case where G is a directed acyclic graph (DAG), we show that Max-IP can be solved in polynomial time. In the more general problem IP, the goal is to find a collection of interesting paths that are edge-disjoint, and the sum of interestingness scores of all paths is maximized. We also study a variant of IP termed k-IP, where the goal is to identify a collection of edgedisjoint interesting paths each with k edges, and the total interestingness score of all paths is maximized. While k-IP can be solved in polynomial time for k ≤ 2, we show k-IP is NP-complete for k ≥ 3 even when G is a DAG. We develop heuristics for IP and k-IP on DAGs, which use the algorithm for Max-IP on DAGs as a subroutine. We have released open source implementations of our algorithms to find interesting paths. We also present a detailed experimental evaluation of this software framework on a real-world maize plant phenomics data set. We use interesting paths identified on several mapper graphs to explain how the genotype and environmental factors influence the growth rate, both in isolation as well as in combinations. ∗School of Electrical Engineering and Computer Science, Washington State University, Pullman, USA †Department of Mathematics and Statistics, Washington State University, Vancouver, USA {ananth,md.kamruzzaman,kbala}@wsu.edu","PeriodicalId":54969,"journal":{"name":"International Journal of Computational Geometry & Applications","volume":"24 1","pages":"500-531"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computational Geometry & Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20382/jocg.v10i1a17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 6

Abstract

Given a high dimensional point cloud of data with functions defined on the points, the mapper algorithm produces a compact summary in the form of a simplicial complex connecting the points. We study the problem of quantifying the interestingness of subpopulations in a given mapper complex. First, we create a weighted directed graph G = (V,E) using the 1-skeleton of the mapper complex. We use the average values at the vertices of a target function (dependent variable) to direct the edges from low to high values, and assign the difference (high−low) as the weight of the edge. Covariation of the remaining h functions (independent variables) is captured by a h-bit binary signature assigned to the edge. An interesting path in G is a directed path whose edges all have the same signature. The interestingness score of such a path as a sum of its edge weights multiplied by a nonlinear function of their corresponding ranks, i.e., the depths of the edges along the path. Such a nonlinear function could model application use-cases where the growth in the dependent variable values is expected to be concentrated in specific intervals of a path. Second, we study three optimization problems on this graph G to quantify interesting subpopulations. In the problem Max-IP, the goal is to find the most interesting path in G, i.e., an interesting path with the maximum interestingness score. For the case where G is a directed acyclic graph (DAG), we show that Max-IP can be solved in polynomial time. In the more general problem IP, the goal is to find a collection of interesting paths that are edge-disjoint, and the sum of interestingness scores of all paths is maximized. We also study a variant of IP termed k-IP, where the goal is to identify a collection of edgedisjoint interesting paths each with k edges, and the total interestingness score of all paths is maximized. While k-IP can be solved in polynomial time for k ≤ 2, we show k-IP is NP-complete for k ≥ 3 even when G is a DAG. We develop heuristics for IP and k-IP on DAGs, which use the algorithm for Max-IP on DAGs as a subroutine. We have released open source implementations of our algorithms to find interesting paths. We also present a detailed experimental evaluation of this software framework on a real-world maize plant phenomics data set. We use interesting paths identified on several mapper graphs to explain how the genotype and environmental factors influence the growth rate, both in isolation as well as in combinations. ∗School of Electrical Engineering and Computer Science, Washington State University, Pullman, USA †Department of Mathematics and Statistics, Washington State University, Vancouver, USA {ananth,md.kamruzzaman,kbala}@wsu.edu
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
映射器复杂的有趣路径
给定一个高维数据点云,在点上定义了函数,mapper算法以连接点的简单复合体的形式生成一个紧凑的摘要。研究了给定映射复合体中子种群兴趣度的量化问题。首先,我们使用映射复合体的1-骨架创建一个加权有向图G = (V,E)。我们使用目标函数(因变量)顶点的平均值来指导边缘从低到高的值,并将差值(高-低)分配为边缘的权重。剩余的h个函数(自变量)的协变由分配给边缘的h位二进制签名捕获。G中一个有趣的路径是有向路径它的边都有相同的特征。这种路径的有趣度分数是其边权的和乘以其相应的秩的非线性函数,即沿路径的边的深度。这样的非线性函数可以为应用程序用例建模,其中因变量值的增长预计集中在路径的特定间隔中。其次,我们研究了图G上的三个优化问题,以量化感兴趣的子群。在Max-IP问题中,目标是在G中找到最有趣的路径,即具有最大兴趣分数的有趣路径。对于G是有向无环图(DAG)的情况,我们证明了Max-IP可以在多项式时间内求解。在更一般的问题IP中,目标是找到一组边不相交的有趣路径,并且最大化所有路径的兴趣分数之和。我们还研究了IP的一种变体,称为k-IP,其目标是识别一组边不相交的有趣路径,每条路径有k条边,并且所有路径的总兴趣分数最大化。当k≤2时,k- ip可以在多项式时间内求解,但当k≥3时,即使G是DAG, k- ip也是np完全的。我们开发了dag上的IP和k-IP启发式算法,它们使用dag上的Max-IP算法作为子程序。我们已经发布了算法的开源实现,以寻找有趣的路径。我们还在真实世界的玉米植物表型组数据集上对该软件框架进行了详细的实验评估。我们使用在几个绘图图上确定的有趣路径来解释基因型和环境因素如何影响生长速度,无论是单独的还是组合的。*华盛顿州立大学电气工程与计算机科学学院,普尔曼,美国†华盛顿州立大学数学与统计学系,温哥华,美国{ananth,md.kamruzzaman,kbala}@wsu.edu
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
0.80
自引率
0.00%
发文量
4
审稿时长
>12 weeks
期刊介绍: The International Journal of Computational Geometry & Applications (IJCGA) is a quarterly journal devoted to the field of computational geometry within the framework of design and analysis of algorithms. Emphasis is placed on the computational aspects of geometric problems that arise in various fields of science and engineering including computer-aided geometry design (CAGD), computer graphics, constructive solid geometry (CSG), operations research, pattern recognition, robotics, solid modelling, VLSI routing/layout, and others. Research contributions ranging from theoretical results in algorithm design — sequential or parallel, probabilistic or randomized algorithms — to applications in the above-mentioned areas are welcome. Research findings or experiences in the implementations of geometric algorithms, such as numerical stability, and papers with a geometric flavour related to algorithms or the application areas of computational geometry are also welcome.
期刊最新文献
On morphs of 1-plane graphs A Geometric Approach to Inelastic Collapse Near-optimal algorithms for point-line fitting problems Algorithms for approximate sparse regression and nearest induced hulls Recognizing weighted and seeded disk graphs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1