Persistent obstruction theory for a model category of measures with applications to data merging

Abraham Smith, Paul Bendich, J. Harer
{"title":"Persistent obstruction theory for a model category of measures with applications to data merging","authors":"Abraham Smith, Paul Bendich, J. Harer","doi":"10.1090/BTRAN/56","DOIUrl":null,"url":null,"abstract":"Collections of measures on compact metric spaces form a model category (“data complexes”), whose morphisms are marginalization integrals. The fibrant objects in this category represent collections of measures in which there is a measure on a product space that marginalizes to any measures on pairs of its factors. The homotopy and homology for this category allow measurement of obstructions to finding measures on larger and larger product spaces. The obstruction theory is compatible with a fibrant filtration built from the Wasserstein distance on measures.\n\nDespite the abstract tools, this is motivated by a widespread problem in data science. Data complexes provide a mathematical foundation for semi-automated data-alignment tools that are common in commercial database software. Practically speaking, the theory shows that database JOIN operations are subject to genuine topological obstructions. Those obstructions can be detected by an obstruction cocycle and can be resolved by moving through a filtration. Thus, any collection of databases has a persistence level, which measures the difficulty of JOINing those databases. Because of its general formulation, this persistent obstruction theory also encompasses multi-modal data fusion problems, some forms of Bayesian inference, and probability couplings.","PeriodicalId":377306,"journal":{"name":"Transactions of the American Mathematical Society, Series B","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions of the American Mathematical Society, Series B","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1090/BTRAN/56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Collections of measures on compact metric spaces form a model category (“data complexes”), whose morphisms are marginalization integrals. The fibrant objects in this category represent collections of measures in which there is a measure on a product space that marginalizes to any measures on pairs of its factors. The homotopy and homology for this category allow measurement of obstructions to finding measures on larger and larger product spaces. The obstruction theory is compatible with a fibrant filtration built from the Wasserstein distance on measures. Despite the abstract tools, this is motivated by a widespread problem in data science. Data complexes provide a mathematical foundation for semi-automated data-alignment tools that are common in commercial database software. Practically speaking, the theory shows that database JOIN operations are subject to genuine topological obstructions. Those obstructions can be detected by an obstruction cocycle and can be resolved by moving through a filtration. Thus, any collection of databases has a persistence level, which measures the difficulty of JOINing those databases. Because of its general formulation, this persistent obstruction theory also encompasses multi-modal data fusion problems, some forms of Bayesian inference, and probability couplings.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
持续障碍理论作为一种模型范畴的测度及其在数据合并中的应用
紧度量空间上的测度集合形成一个模型范畴(“数据复合体”),其态射是边际积分。在这个类别的纤维对象表示的措施集合,其中有一个措施的产品空间,边缘化的任何措施对其因素对。这个范畴的同伦和同调允许通过测量障碍来在越来越大的积空间上寻找度量。阻塞理论与从度量上的沃瑟斯坦距离建立的纤维过滤是相容的。尽管是抽象的工具,但这是由数据科学中一个普遍存在的问题所驱动的。数据复合体为商业数据库软件中常见的半自动数据对齐工具提供了数学基础。实际上,该理论表明数据库JOIN操作受到真正的拓扑障碍的影响。这些障碍物可以通过障碍物循环检测,并且可以通过移动过滤器来解决。因此,任何数据库集合都具有持久性级别,用于度量连接这些数据库的难度。由于其一般的表述,这种持续障碍理论还包括多模态数据融合问题,某些形式的贝叶斯推理和概率耦合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.70
自引率
0.00%
发文量
0
期刊最新文献
Duality theorems for curves over local fields Density of continuous functions in Sobolev spaces with applications to capacity 𝐶⁰-limits of Legendrian knots Multiple orthogonal polynomials, 𝑑-orthogonal polynomials, production matrices, and branched continued fractions Closed 𝑘-Schur Katalan functions as 𝐾-homology Schubert representatives of the affine Grassmannian
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1