使用Explain-Da-V解释语义数据版本控制的数据集更改(技术报告)

Proc. VLDB Endow. Pub Date : 2023-01-30 DOI:10.48550/arXiv.2301.13095

Roee Shraga, Renée J. Miller

{"title":"使用Explain-Da-V解释语义数据版本控制的数据集更改(技术报告)","authors":"Roee Shraga, Renée J. Miller","doi":"10.48550/arXiv.2301.13095","DOIUrl":null,"url":null,"abstract":"\n In multi-user environments in which data science and analysis is collaborative, multiple versions of the same datasets are generated. While managing and storing data versions has received some attention in the research literature, the semantic nature of such changes has remained under-explored. In this work, we introduce Explain-Da-V, a framework aiming to explain changes between two given dataset versions. Explain-Da-V generates\n explanations\n that use\n data transformations\n to explain changes. We further introduce a set of measures that evaluate the validity, generalizability, and explainability of these explanations. We empirically show, using an adapted existing benchmark and a newly created benchmark, that Explain-Da-V generates better explanations than existing data transformation synthesis methods.\n","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"84 1","pages":"1587-1600"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V (Technical Report)\",\"authors\":\"Roee Shraga, Renée J. Miller\",\"doi\":\"10.48550/arXiv.2301.13095\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n In multi-user environments in which data science and analysis is collaborative, multiple versions of the same datasets are generated. While managing and storing data versions has received some attention in the research literature, the semantic nature of such changes has remained under-explored. In this work, we introduce Explain-Da-V, a framework aiming to explain changes between two given dataset versions. Explain-Da-V generates\\n explanations\\n that use\\n data transformations\\n to explain changes. We further introduce a set of measures that evaluate the validity, generalizability, and explainability of these explanations. We empirically show, using an adapted existing benchmark and a newly created benchmark, that Explain-Da-V generates better explanations than existing data transformation synthesis methods.\\n\",\"PeriodicalId\":20467,\"journal\":{\"name\":\"Proc. VLDB Endow.\",\"volume\":\"84 1\",\"pages\":\"1587-1600\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proc. VLDB Endow.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2301.13095\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.13095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在数据科学和分析是协作的多用户环境中，会生成相同数据集的多个版本。虽然管理和存储数据版本在研究文献中受到了一些关注，但这种变化的语义性质仍未得到充分探索。在这项工作中，我们引入了explain - da - v框架，旨在解释两个给定数据集版本之间的变化。explain - da - v生成使用数据转换来解释更改的解释。我们进一步介绍了一套评估这些解释的有效性、概括性和可解释性的措施。我们通过经验证明，使用一个改编的现有基准和一个新创建的基准，Explain-Da-V生成的解释比现有的数据转换综合方法更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V (Technical Report)

In multi-user environments in which data science and analysis is collaborative, multiple versions of the same datasets are generated. While managing and storing data versions has received some attention in the research literature, the semantic nature of such changes has remained under-explored. In this work, we introduce Explain-Da-V, a framework aiming to explain changes between two given dataset versions. Explain-Da-V generates explanations that use data transformations to explain changes. We further introduce a set of measures that evaluate the validity, generalizability, and explainability of these explanations. We empirically show, using an adapted existing benchmark and a newly created benchmark, that Explain-Da-V generates better explanations than existing data transformation synthesis methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proc. VLDB Endow.

自引率

0.00%

发文量

期刊最新文献

Cryptographically Secure Private Record Linkage Using Locality-Sensitive Hashing Utility-aware Payment Channel Network Rebalance Relational Query Synthesis ⋈ Decision Tree Learning Billion-Scale Bipartite Graph Embedding: A Global-Local Induced Approach Query Refinement for Diversity Constraint Satisfaction