Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema Graph

Proc. VLDB Endow. Pub Date : 2023-06-01 DOI:10.48550/arXiv.2306.12515

Yiming Lin, Yeye He, S. Chaudhuri

{"title":"Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema Graph","authors":"Yiming Lin, Yeye He, S. Chaudhuri","doi":"10.48550/arXiv.2306.12515","DOIUrl":null,"url":null,"abstract":"Business Intelligence (BI) is crucial in modern enterprises and billion-dollar business. Traditionally, technical experts like database administrators would manually prepare BI-models (e.g., in star or snowflake schemas) that join tables in data warehouses, before less-technical business users can run analytics using end-user dashboarding tools. However, the popularity of self-service BI (e.g., Tableau and Power-BI) in recent years creates a strong demand for less technical end-users to build BI-models themselves.\n \n We develop an Auto-BI system that can accurately predict BI models given a set of input tables, using a principled graph-based optimization problem we propose called\n k-Min-Cost-Arborescence\n (k-MCA), which holistically considers both local join prediction and global schema-graph structures, leveraging a graph-theoretical structure called\n arborescence.\n While we prove k-MCA is intractable and inapproximate in general, we develop novel algorithms that can solve k-MCA optimally, which is shown to be efficient in practice with sub-second latency and can scale to the largest BI-models we encounter (with close to 100 tables).\n \n Auto-BI is rigorously evaluated on a unique dataset with over 100K real BI models we harvested, as well as on 4 popular TPC benchmarks. It is shown to be both efficient and accurate, achieving over 0.9 F1-score on both real and synthetic benchmarks.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"5 1","pages":"2578-2590"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.12515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Business Intelligence (BI) is crucial in modern enterprises and billion-dollar business. Traditionally, technical experts like database administrators would manually prepare BI-models (e.g., in star or snowflake schemas) that join tables in data warehouses, before less-technical business users can run analytics using end-user dashboarding tools. However, the popularity of self-service BI (e.g., Tableau and Power-BI) in recent years creates a strong demand for less technical end-users to build BI-models themselves. We develop an Auto-BI system that can accurately predict BI models given a set of input tables, using a principled graph-based optimization problem we propose called k-Min-Cost-Arborescence (k-MCA), which holistically considers both local join prediction and global schema-graph structures, leveraging a graph-theoretical structure called arborescence. While we prove k-MCA is intractable and inapproximate in general, we develop novel algorithms that can solve k-MCA optimally, which is shown to be efficient in practice with sub-second latency and can scale to the largest BI-models we encounter (with close to 100 tables). Auto-BI is rigorously evaluated on a unique dataset with over 100K real BI models we harvested, as well as on 4 popular TPC benchmarks. It is shown to be both efficient and accurate, achieving over 0.9 F1-score on both real and synthetic benchmarks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Auto-BI:利用本地连接预测和全局模式图自动构建bi模型

商业智能(BI)在现代企业和数十亿美元的业务中至关重要。传统上，像数据库管理员这样的技术专家会在技术水平较低的业务用户使用最终用户仪表板工具运行分析之前，手动准备连接数据仓库中的表的bi模型(例如，在星型或雪花模式中)。然而，近年来自助式BI(例如，Tableau和Power-BI)的流行产生了对技术含量较低的最终用户自己构建BI模型的强烈需求。我们开发了一个Auto-BI系统，该系统可以在给定一组输入表的情况下准确预测BI模型，使用我们提出的基于图的原则优化问题，称为k-Min-Cost-Arborescence (k-MCA)，该问题全面考虑了局部连接预测和全局模式图结构，利用称为arborescence的图理论结构。虽然我们证明k-MCA通常是难以处理的和不近似的，但我们开发了可以最优地解决k-MCA的新算法，这在亚秒延迟的实践中被证明是有效的，并且可以扩展到我们遇到的最大的bi模型(接近100个表)。Auto-BI在一个独特的数据集上进行了严格的评估，其中包含我们收集的超过10万个真实的BI模型，以及4个流行的TPC基准。它被证明既高效又准确，在真实和合成基准测试中都达到了0.9以上的f1分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proc. VLDB Endow.

自引率

0.00%

发文量

期刊最新文献

Cryptographically Secure Private Record Linkage Using Locality-Sensitive Hashing Utility-aware Payment Channel Network Rebalance Relational Query Synthesis ⋈ Decision Tree Learning Billion-Scale Bipartite Graph Embedding: A Global-Local Induced Approach Query Refinement for Diversity Constraint Satisfaction