Supporting Complex Query Time Enrichment For Analytics

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI:10.48786/edbt.2023.08

Dhrubajyoti Ghosh, Peeyush Gupta, S. Mehrotra, Shantanu Sharma

{"title":"Supporting Complex Query Time Enrichment For Analytics","authors":"Dhrubajyoti Ghosh, Peeyush Gupta, S. Mehrotra, Shantanu Sharma","doi":"10.48786/edbt.2023.08","DOIUrl":null,"url":null,"abstract":"Several application domains require data to be enriched prior to its use. Data enrichment is often performed using expensive machine learning models to interpret low-level data ( e . g ., models for face detection) into semantically meaningful observation. Col-lecting and enriching data offline before loading it to a database is infeasible if one desires online analysis on data as it arrives. Enriching data on the fly at insertion could result in redundant work (if applications require only a fraction of the data to be enriched) and could result in a bottleneck (if enrichment functions are expensive). Any scalable solution requires enrichment during query processing. This paper explores two different architectures for integrating enrichment into query processing – a loosely coupled approach wherein enrichment is performed outside of the DBMS and a tightly coupled approach wherein it is performed within the DBMS. The paper addresses the challenges of increased query latency due to query time enrichment.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"91 1","pages":"92-104"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in database technology : proceedings. International Conference on Extending Database Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48786/edbt.2023.08","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Several application domains require data to be enriched prior to its use. Data enrichment is often performed using expensive machine learning models to interpret low-level data ( e . g ., models for face detection) into semantically meaningful observation. Col-lecting and enriching data offline before loading it to a database is infeasible if one desires online analysis on data as it arrives. Enriching data on the fly at insertion could result in redundant work (if applications require only a fraction of the data to be enriched) and could result in a bottleneck (if enrichment functions are expensive). Any scalable solution requires enrichment during query processing. This paper explores two different architectures for integrating enrichment into query processing – a loosely coupled approach wherein enrichment is performed outside of the DBMS and a tightly coupled approach wherein it is performed within the DBMS. The paper addresses the challenges of increased query latency due to query time enrichment.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

支持复杂的查询时间丰富分析

一些应用程序域需要在使用数据之前对其进行充实。数据丰富通常使用昂贵的机器学习模型来解释低级数据(例如:G .人脸检测模型)转化为语义上有意义的观察。如果希望在数据到达时对其进行在线分析，那么在将数据加载到数据库之前离线收集和丰富数据是不可行的。在插入时动态地充实数据可能会导致冗余工作(如果应用程序只需要充实一小部分数据)，并可能导致瓶颈(如果充实函数很昂贵)。任何可扩展的解决方案都需要在查询处理期间进行充实。本文探讨了将浓缩集成到查询处理中的两种不同的体系结构——一种是松耦合的方法，其中浓缩在DBMS之外执行，另一种是紧耦合的方法，其中浓缩在DBMS内执行。本文解决了由于查询时间丰富而增加的查询延迟的挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Advances in database technology : proceedings. International Conference on Extending Database Technology

自引率

0.00%

发文量

期刊最新文献

Computing Generic Abstractions from Application Datasets Fair Spatial Indexing: A paradigm for Group Spatial Fairness. Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach Auditing for Spatial Fairness TransEdge: Supporting Efficient Read Queries Across Untrusted Edge Nodes