Programmable Dataflows: Abstraction and Programming Model for Data Sharing

Siyuan Xia, Chris Zhu, Tapan Srivastava, Bridget Fahey, Raul Castro Fernandez
{"title":"Programmable Dataflows: Abstraction and Programming Model for Data Sharing","authors":"Siyuan Xia, Chris Zhu, Tapan Srivastava, Bridget Fahey, Raul Castro Fernandez","doi":"arxiv-2408.04092","DOIUrl":null,"url":null,"abstract":"Data sharing is central to a wide variety of applications such as fraud\ndetection, ad matching, and research. The lack of data sharing abstractions\nmakes the solution to each data sharing problem bespoke and cost-intensive,\nhampering value generation. In this paper, we first introduce a data sharing\nmodel to represent every data sharing problem with a sequence of dataflows.\nFrom the model, we distill an abstraction, the contract, which agents use to\ncommunicate the intent of a dataflow and evaluate its consequences, before the\ndataflow takes place. This helps agents move towards a common sharing goal\nwithout violating any regulatory and privacy constraints. Then, we design and\nimplement the contract programming model (CPM), which allows agents to program\ndata sharing applications catered to each problem's needs. Contracts permit data sharing, but their interactive nature may introduce\ninefficiencies. To mitigate those inefficiencies, we extend the CPM so that it\ncan save intermediate outputs of dataflows, and skip computation if a dataflow\ntries to access data that it does not have access to. In our evaluation, we\nshow that 1) the contract abstraction is general enough to represent a wide\nrange of sharing problems, 2) we can write programs for complex data sharing\nproblems and exhibit qualitative improvements over other alternate\ntechnologies, and 3) quantitatively, our optimizations make sharing programs\nwritten with the CPM efficient.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data sharing is central to a wide variety of applications such as fraud detection, ad matching, and research. The lack of data sharing abstractions makes the solution to each data sharing problem bespoke and cost-intensive, hampering value generation. In this paper, we first introduce a data sharing model to represent every data sharing problem with a sequence of dataflows. From the model, we distill an abstraction, the contract, which agents use to communicate the intent of a dataflow and evaluate its consequences, before the dataflow takes place. This helps agents move towards a common sharing goal without violating any regulatory and privacy constraints. Then, we design and implement the contract programming model (CPM), which allows agents to program data sharing applications catered to each problem's needs. Contracts permit data sharing, but their interactive nature may introduce inefficiencies. To mitigate those inefficiencies, we extend the CPM so that it can save intermediate outputs of dataflows, and skip computation if a dataflow tries to access data that it does not have access to. In our evaluation, we show that 1) the contract abstraction is general enough to represent a wide range of sharing problems, 2) we can write programs for complex data sharing problems and exhibit qualitative improvements over other alternate technologies, and 3) quantitatively, our optimizations make sharing programs written with the CPM efficient.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
可编程数据流:数据共享的抽象和编程模型
数据共享是欺诈检测、广告匹配和研究等各种应用的核心。由于缺乏数据共享抽象,每个数据共享问题的解决方案都是定制的,而且成本高昂,阻碍了价值的产生。在本文中,我们首先引入了一个数据共享模型,用一连串数据流来表示每个数据共享问题。从该模型中,我们提炼出了一个抽象概念--契约,在数据流发生之前,代理使用契约来交流数据流的意图并评估其后果。这有助于代理在不违反任何法规和隐私限制的情况下实现共同的共享目标。然后,我们设计并实现了合约编程模型(CPM),它允许代理针对每个问题的需求对数据共享应用程序进行编程。合约允许数据共享,但其交互性可能会带来效率低下的问题。为了降低效率,我们对 CPM 进行了扩展,使其能够保存数据流的中间输出,并在数据流试图访问其无法访问的数据时跳过计算。在评估中,我们发现:1)合约抽象具有足够的通用性,可以代表更广泛的共享问题;2)我们可以为复杂的数据共享问题编写程序,并且与其他替代技术相比,在质量上有所改进;3)从数量上看,我们的优化使使用 CPM 编写的共享程序变得高效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Development of Data Evaluation Benchmark for Data Wrangling Recommendation System Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code! Fast and Adaptive Bulk Loading of Multidimensional Points Matrix Profile for Anomaly Detection on Multidimensional Time Series Extending predictive process monitoring for collaborative processes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1