首页 > 最新文献

Proceedings of the Vldb Endowment最新文献

英文 中文
Efficient Execution of User-Defined Functions in SQL Queries SQL查询中用户定义函数的高效执行
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611574
Yannis Foufoulas, Alkis Simitsis
User-defined functions (UDFs) have been widely used to overcome the expressivity limitations of SQL and complement its declarative nature with functional capabilities. UDFs are particularly useful in today's applications that involve complex data analytics and machine learning algorithms and logic. However, UDFs pose significant performance challenges in query processing and optimization, largely due to the mismatch of the UDF execution and SQL processing environments. In this tutorial, we present state-of-the-art methods and systems towards efficient execution of UDFs in SQL queries. We focus on low-level techniques for physical optimization and compilation of UDF queries, describe and compare the core, recent approaches in the area, discuss their advantages and limitations, identify critical gaps in theory and practice, and propose promising future research directions.
用户定义函数(udf)已被广泛用于克服SQL的表达性限制,并用函数功能补充其声明性。udf在当今涉及复杂数据分析和机器学习算法和逻辑的应用程序中特别有用。然而,UDF在查询处理和优化方面带来了重大的性能挑战,这主要是由于UDF执行和SQL处理环境的不匹配。在本教程中,我们将介绍在SQL查询中高效执行udf的最新方法和系统。我们专注于物理优化和UDF查询编译的底层技术,描述和比较该领域的核心和最新方法,讨论它们的优势和局限性,确定理论和实践中的关键差距,并提出有希望的未来研究方向。
{"title":"Efficient Execution of User-Defined Functions in SQL Queries","authors":"Yannis Foufoulas, Alkis Simitsis","doi":"10.14778/3611540.3611574","DOIUrl":"https://doi.org/10.14778/3611540.3611574","url":null,"abstract":"User-defined functions (UDFs) have been widely used to overcome the expressivity limitations of SQL and complement its declarative nature with functional capabilities. UDFs are particularly useful in today's applications that involve complex data analytics and machine learning algorithms and logic. However, UDFs pose significant performance challenges in query processing and optimization, largely due to the mismatch of the UDF execution and SQL processing environments. In this tutorial, we present state-of-the-art methods and systems towards efficient execution of UDFs in SQL queries. We focus on low-level techniques for physical optimization and compilation of UDF queries, describe and compare the core, recent approaches in the area, discuss their advantages and limitations, identify critical gaps in theory and practice, and propose promising future research directions.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lynx: A Graph Query Framework for Multiple Heterogeneous Data Sources Lynx:面向多个异构数据源的图形查询框架
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611587
Zhihong Shen, Chuan Hu, Zihao Zhao
Graph model are increasingly popular among modern applications for its ability to model complex relationships between entities. Users tend to query the data as a graph with graph operations (e.g., graph navigation and exploration). However, a large fraction of the data resides in relational databases or other storage systems. Challenges arise in uniformly querying multiple heterogeneous data sources as a graph. Traditional solutions are limited by time-consuming data integration, expensive development effort, and incomplete query requirements. Thus, we developed Lynx, a general graph query framework, to simplify querying graph data by converting complex statements into basic graph operations. Instead of connecting directly to the data sources, Lynx retrieves data through user-implemented interfaces for those graph operations. We demonstrate Lynx's capabilities through real-world scenarios, showcasing Lynx's ability to process graph queries on multiple heterogeneous data sources and also to be used as a generic graph query engine development framework.
图模型由于能够对实体之间的复杂关系进行建模,在现代应用程序中越来越受欢迎。用户倾向于通过图形操作(例如,图形导航和探索)将数据作为图形来查询。然而,很大一部分数据驻留在关系数据库或其他存储系统中。以图的形式统一查询多个异构数据源会带来挑战。传统的解决方案受到耗时的数据集成、昂贵的开发工作和不完整的查询需求的限制。因此,我们开发了通用图查询框架Lynx,通过将复杂语句转换为基本图操作来简化图数据的查询。Lynx没有直接连接到数据源,而是通过用户实现的接口为那些图操作检索数据。我们通过实际场景演示Lynx的功能,展示Lynx在多个异构数据源上处理图形查询的能力,以及作为通用图形查询引擎开发框架使用的能力。
{"title":"Lynx: A Graph Query Framework for Multiple Heterogeneous Data Sources","authors":"Zhihong Shen, Chuan Hu, Zihao Zhao","doi":"10.14778/3611540.3611587","DOIUrl":"https://doi.org/10.14778/3611540.3611587","url":null,"abstract":"Graph model are increasingly popular among modern applications for its ability to model complex relationships between entities. Users tend to query the data as a graph with graph operations (e.g., graph navigation and exploration). However, a large fraction of the data resides in relational databases or other storage systems. Challenges arise in uniformly querying multiple heterogeneous data sources as a graph. Traditional solutions are limited by time-consuming data integration, expensive development effort, and incomplete query requirements. Thus, we developed Lynx, a general graph query framework, to simplify querying graph data by converting complex statements into basic graph operations. Instead of connecting directly to the data sources, Lynx retrieves data through user-implemented interfaces for those graph operations. We demonstrate Lynx's capabilities through real-world scenarios, showcasing Lynx's ability to process graph queries on multiple heterogeneous data sources and also to be used as a generic graph query engine development framework.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable Clustering of Multivariate Time Series with Time2Feat 基于Time2Feat的多元时间序列可解释聚类
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611604
Angela Bonifati, Francesco Del Buono, Francesco Guerra, Miki Lombardi, Donato Tiano
This paper showcases Time2Feat, an end-to-end machine learning system for Multivariate Time Series (MTS) clustering. The system relies on interpretable inter-signal and intra-signal features extracted from the time series. Then, a dimensionality reduction technique is applied to select a subset of features that retain most of the information, thus enhancing the interpretability of the results. In addition, the system enables domain specialists to semi-supervise the process by submitting a small collection of MTS with a target cluster. This process further improves both accuracy and interpretability, by reducing the number of features used by the clustering process. The demonstration shows the application of Time2Feat to various MTS datasets, by creating clusters from MTS datasets of interest, experimenting with different settings and using the approach capabilities to interpret the clusters generated.
本文展示了Time2Feat,一个用于多元时间序列(MTS)聚类的端到端机器学习系统。该系统依赖于从时间序列中提取的可解释的信号间和信号内特征。然后,应用降维技术选择保留大部分信息的特征子集,从而增强结果的可解释性。此外,该系统允许领域专家通过提交带有目标集群的少量MTS集合来半监督该过程。通过减少聚类过程使用的特征数量,该过程进一步提高了准确性和可解释性。该演示演示了Time2Feat在各种MTS数据集上的应用,通过从感兴趣的MTS数据集创建集群,实验不同的设置并使用方法功能来解释生成的集群。
{"title":"Interpretable Clustering of Multivariate Time Series with Time2Feat","authors":"Angela Bonifati, Francesco Del Buono, Francesco Guerra, Miki Lombardi, Donato Tiano","doi":"10.14778/3611540.3611604","DOIUrl":"https://doi.org/10.14778/3611540.3611604","url":null,"abstract":"This paper showcases Time2Feat, an end-to-end machine learning system for Multivariate Time Series (MTS) clustering. The system relies on interpretable inter-signal and intra-signal features extracted from the time series. Then, a dimensionality reduction technique is applied to select a subset of features that retain most of the information, thus enhancing the interpretability of the results. In addition, the system enables domain specialists to semi-supervise the process by submitting a small collection of MTS with a target cluster. This process further improves both accuracy and interpretability, by reducing the number of features used by the clustering process. The demonstration shows the application of Time2Feat to various MTS datasets, by creating clusters from MTS datasets of interest, experimenting with different settings and using the approach capabilities to interpret the clusters generated.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generations of Knowledge Graphs: The Crazy Ideas and the Business Impact 知识图谱的世代:疯狂的想法和商业影响
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611636
Xin Luna Dong
Knowledge Graphs (KGs) have been used to support a wide range of applications, from web search to personal assistant. In this paper, we describe three generations of knowledge graphs: entity-based KGs , which have been supporting general search and question answering ( e.g. , at Google and Bing); text-rich KGs , which have been supporting search and recommendations for products, bio-informatics, etc. ( e.g. , at Amazon and Alibaba); and the emerging integration of KGs and LLMs, which we call dual neural KGs. We describe the characteristics of each generation of KGs, the crazy ideas behind the scenes in constructing such KGs, and the techniques developed over time to enable industry impact. In addition, we use KGs as examples to demonstrate a recipe to evolve research ideas from innovations to production practice, and then to the next level of innovations, to advance both science and business.
知识图谱(KGs)已被用于支持广泛的应用,从网络搜索到个人助理。在本文中,我们描述了三代知识图:基于实体的知识图,它已经支持一般搜索和问答(例如b谷歌和Bing);文本丰富的kg,支持产品搜索和推荐、生物信息学等(例如亚马逊和阿里巴巴);以及KGs和llm的新兴整合,我们称之为双神经KGs。我们描述了每一代KGs的特征,构建此类KGs背后的疯狂想法,以及随着时间的推移而开发的技术,以实现行业影响。此外,我们以kg为例,展示了如何将研究理念从创新发展到生产实践,然后再发展到下一阶段的创新,从而推动科学和商业的发展。
{"title":"Generations of Knowledge Graphs: The Crazy Ideas and the Business Impact","authors":"Xin Luna Dong","doi":"10.14778/3611540.3611636","DOIUrl":"https://doi.org/10.14778/3611540.3611636","url":null,"abstract":"Knowledge Graphs (KGs) have been used to support a wide range of applications, from web search to personal assistant. In this paper, we describe three generations of knowledge graphs: entity-based KGs , which have been supporting general search and question answering ( e.g. , at Google and Bing); text-rich KGs , which have been supporting search and recommendations for products, bio-informatics, etc. ( e.g. , at Amazon and Alibaba); and the emerging integration of KGs and LLMs, which we call dual neural KGs. We describe the characteristics of each generation of KGs, the crazy ideas behind the scenes in constructing such KGs, and the techniques developed over time to enable industry impact. In addition, we use KGs as examples to demonstrate a recipe to evolve research ideas from innovations to production practice, and then to the next level of innovations, to advance both science and business.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134996883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic SQL Error Mitigation in Oracle Oracle中的自动SQL错误缓解
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611568
Krishna Kantikiran Pasupuleti, Jiakun Li, Hong Su, Mohamed Ziauddin
Despite best coding practices, software bugs are inevitable in a large codebase. In traditional databases, when errors occur during query processing, they disrupt user workflow until workarounds are found and applied. Manual identification of workarounds often relies on a trial-and-error method. The process is not only time-consuming but also requires domain expertise that users are often lacking. In this paper, we propose a framework to automatically mitigate errors that occur during query compilation (including optimization and code generation) without any user intervention. An error is intercepted by the database internally, a workaround is identified for it, and the query is recompiled using the workaround. The entire process remains transparent to the user with the query being executed seamlessly. The proposed technique handles SQL errors during query compilation and provides three types of mitigation strategies - i) quickly failover to one of the readily-available historical plans for the statement ii) apply targeted error-correcting directives (hints) identified from the optimizer context at the time of the error iii) modify the global configuration of the optimizer using hints. This feature has been implemented and will be released in an upcoming version of Oracle Autonomous Database.
尽管有最佳的编码实践,但在大型代码库中,软件bug是不可避免的。在传统数据库中,当查询处理过程中出现错误时,它们会中断用户的工作流程,直到找到并应用解决方案。手动识别变通方法通常依赖于试错法。这个过程不仅耗时,而且需要用户通常缺乏的领域专业知识。在本文中,我们提出了一个框架来自动减轻查询编译(包括优化和代码生成)过程中发生的错误,而无需任何用户干预。数据库在内部拦截错误,为其识别一个解决方案,并使用该解决方案重新编译查询。整个过程对用户保持透明,查询被无缝地执行。所提议的技术处理查询编译期间的SQL错误,并提供三种类型的缓解策略——i)快速故障转移到语句的一个现成的历史计划;ii)应用在错误发生时从优化器上下文中确定的有针对性的错误纠正指令(提示);iii)使用提示修改优化器的全局配置。这个特性已经实现,并将在Oracle自治数据库的下一个版本中发布。
{"title":"Automatic SQL Error Mitigation in Oracle","authors":"Krishna Kantikiran Pasupuleti, Jiakun Li, Hong Su, Mohamed Ziauddin","doi":"10.14778/3611540.3611568","DOIUrl":"https://doi.org/10.14778/3611540.3611568","url":null,"abstract":"Despite best coding practices, software bugs are inevitable in a large codebase. In traditional databases, when errors occur during query processing, they disrupt user workflow until workarounds are found and applied. Manual identification of workarounds often relies on a trial-and-error method. The process is not only time-consuming but also requires domain expertise that users are often lacking. In this paper, we propose a framework to automatically mitigate errors that occur during query compilation (including optimization and code generation) without any user intervention. An error is intercepted by the database internally, a workaround is identified for it, and the query is recompiled using the workaround. The entire process remains transparent to the user with the query being executed seamlessly. The proposed technique handles SQL errors during query compilation and provides three types of mitigation strategies - i) quickly failover to one of the readily-available historical plans for the statement ii) apply targeted error-correcting directives (hints) identified from the optimizer context at the time of the error iii) modify the global configuration of the optimizer using hints. This feature has been implemented and will be released in an upcoming version of Oracle Autonomous Database.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Natural Language Interfaces for Databases with Deep Learning 深度学习数据库的自然语言接口
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611575
George Katsogiannis-Meimarakis, Mike Xydas, Georgia Koutrika
In the age of the Digital Revolution, almost all human activities, from industrial and business operations to medical and academic research, are reliant on the constant integration and utilisation of ever-increasing volumes of data. However, the explosive volume and complexity of data makes data querying and exploration challenging even for experts, and makes the need to democratise the access to data, even for non-technical users, all the more evident. It is time to lift all technical barriers, by empowering users to access relational databases through conversation. We consider 3 main research areas that a natural language data interface is based on: Text-to-SQL, SQL-to-Text, and Data-to-Text. The purpose of this tutorial is a deep dive into these areas, covering state-of-the-art techniques and models, and explaining how the progress in the deep learning field has led to impressive advancements. We will present benchmarks that sparked research and competition, and discuss open problems and research opportunities with one of the most important challenges being the integration of these 3 research areas into one conversational system.
在数字革命时代,从工业和商业运营到医疗和学术研究,几乎所有人类活动都依赖于不断整合和利用不断增加的数据量。然而,数据的爆炸性数量和复杂性使得数据查询和探索即使对专家来说也是具有挑战性的,并且使得数据访问民主化的需求,即使对于非技术用户来说,也更加明显。现在是解除所有技术障碍的时候了,允许用户通过对话访问关系数据库。我们考虑了自然语言数据接口所基于的3个主要研究领域:文本到sql、sql到文本和数据到文本。本教程的目的是深入研究这些领域,涵盖最先进的技术和模型,并解释深度学习领域的进展如何导致令人印象深刻的进步。我们将展示激发研究和竞争的基准,并讨论开放的问题和研究机会,其中最重要的挑战之一是将这三个研究领域整合到一个对话系统中。
{"title":"Natural Language Interfaces for Databases with Deep Learning","authors":"George Katsogiannis-Meimarakis, Mike Xydas, Georgia Koutrika","doi":"10.14778/3611540.3611575","DOIUrl":"https://doi.org/10.14778/3611540.3611575","url":null,"abstract":"In the age of the Digital Revolution, almost all human activities, from industrial and business operations to medical and academic research, are reliant on the constant integration and utilisation of ever-increasing volumes of data. However, the explosive volume and complexity of data makes data querying and exploration challenging even for experts, and makes the need to democratise the access to data, even for non-technical users, all the more evident. It is time to lift all technical barriers, by empowering users to access relational databases through conversation. We consider 3 main research areas that a natural language data interface is based on: Text-to-SQL, SQL-to-Text, and Data-to-Text. The purpose of this tutorial is a deep dive into these areas, covering state-of-the-art techniques and models, and explaining how the progress in the deep learning field has led to impressive advancements. We will present benchmarks that sparked research and competition, and discuss open problems and research opportunities with one of the most important challenges being the integration of these 3 research areas into one conversational system.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive Demonstration of EVA EVA互动演示
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611626
Gaurav Tarlok Kakkar, Aryan Rajoria, Myna Prasanna Kalluraya, Ashmita Raju, Jiashen Cao, Kexin Rong, Joy Arulraj
In this demonstration, we will present EVA, an end-to-end AI-Relational database management system. We will demonstrate the capabilities and utility of EVA using three usage scenarios: (1) EVA serves as a backend for an exploratory video analytics interface developed using Streamlit and React, (2) EVA seamlessly integrates with the Python and Data Science ecosystems by allowing users to access EVA in a Python notebook alongside other popular libraries such as Pandas and Matplotlib, and (3) EVA facilitates bulk labeling with Label Studio, a widely-used labeling framework. By optimizing complex vision queries, we illustrate how EVA allows a wide range of application developers to harness the recent advances in computer vision.
在这个演示中,我们将介绍EVA,一个端到端的ai关系数据库管理系统。我们将使用三种使用场景来演示EVA的功能和效用:(1)EVA作为使用Streamlit和React开发的探索性视频分析界面的后端,(2)EVA与Python和数据科学生态系统无缝集成,允许用户在Python笔记本中访问EVA以及其他流行的库,如Pandas和Matplotlib,以及(3)EVA便于使用Label Studio进行大量标记,这是一个广泛使用的标记框架。通过优化复杂的视觉查询,我们说明了EVA如何允许广泛的应用程序开发人员利用计算机视觉的最新进展。
{"title":"Interactive Demonstration of EVA","authors":"Gaurav Tarlok Kakkar, Aryan Rajoria, Myna Prasanna Kalluraya, Ashmita Raju, Jiashen Cao, Kexin Rong, Joy Arulraj","doi":"10.14778/3611540.3611626","DOIUrl":"https://doi.org/10.14778/3611540.3611626","url":null,"abstract":"In this demonstration, we will present EVA, an end-to-end AI-Relational database management system. We will demonstrate the capabilities and utility of EVA using three usage scenarios: (1) EVA serves as a backend for an exploratory video analytics interface developed using Streamlit and React, (2) EVA seamlessly integrates with the Python and Data Science ecosystems by allowing users to access EVA in a Python notebook alongside other popular libraries such as Pandas and Matplotlib, and (3) EVA facilitates bulk labeling with Label Studio, a widely-used labeling framework. By optimizing complex vision queries, we illustrate how EVA allows a wide range of application developers to harness the recent advances in computer vision.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
PAINE Demo: Optimizing Video Selection Queries with Commonsense Knowledge PAINE演示:用常识优化视频选择查询
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611581
Wenjia He, Ibrahim Sabek, Yuze Lou, Michael Cafarella
Because video is becoming more popular and constitutes a major part of data collection, we have the need to process video selection queries --- selecting videos that contain target objects. However, a naïve scan of a video corpus without optimization would be extremely inefficient due to applying complex detectors to irrelevant videos. This demo presents Paine; a video query system that employs a novel index mechanism to optimize video selection queries via commonsense knowledge. Paine samples video frames to build an inexpensive lossy index, then leverages probabilistic models based on existing commonsense knowledge sources to capture the semantic-level correlation among video frames, thereby allowing Paine to predict the content of unindexed video. These models can predict which videos are likely to satisfy selection predicates so as to avoid Paine from processing irrelevant videos. We will demonstrate a system prototype of Paine for accelerating the processing of video selection queries, allowing VLDB'23 participants to use the Paine interface to run queries. Users can compare Paine with the baseline, the SCAN method.
由于视频越来越受欢迎,并且构成了数据收集的主要部分,我们需要处理视频选择查询——选择包含目标对象的视频。然而,由于将复杂的检测器应用于不相关的视频,因此在没有优化的情况下对视频语料库进行naïve扫描将非常低效。这个演示展示了Paine;视频查询系统采用一种新的索引机制,通过常识知识优化视频选择查询。Paine对视频帧进行采样以建立一个廉价的有损索引,然后利用基于现有常识知识来源的概率模型来捕获视频帧之间的语义级相关性,从而允许Paine预测未索引视频的内容。这些模型可以预测哪些视频可能满足选择谓词,从而避免Paine处理不相关的视频。我们将演示Paine的系统原型,用于加速视频选择查询的处理,允许VLDB'23参与者使用Paine接口来运行查询。用户可以与Paine基线进行比较,采用SCAN方法。
{"title":"PAINE Demo: Optimizing Video Selection Queries with Commonsense Knowledge","authors":"Wenjia He, Ibrahim Sabek, Yuze Lou, Michael Cafarella","doi":"10.14778/3611540.3611581","DOIUrl":"https://doi.org/10.14778/3611540.3611581","url":null,"abstract":"Because video is becoming more popular and constitutes a major part of data collection, we have the need to process video selection queries --- selecting videos that contain target objects. However, a naïve scan of a video corpus without optimization would be extremely inefficient due to applying complex detectors to irrelevant videos. This demo presents Paine; a video query system that employs a novel index mechanism to optimize video selection queries via commonsense knowledge. Paine samples video frames to build an inexpensive lossy index, then leverages probabilistic models based on existing commonsense knowledge sources to capture the semantic-level correlation among video frames, thereby allowing Paine to predict the content of unindexed video. These models can predict which videos are likely to satisfy selection predicates so as to avoid Paine from processing irrelevant videos. We will demonstrate a system prototype of Paine for accelerating the processing of video selection queries, allowing VLDB'23 participants to use the Paine interface to run queries. Users can compare Paine with the baseline, the SCAN method.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ChainDash: An Ad-Hoc Blockchain Data Analytics System ChainDash: Ad-Hoc区块链数据分析系统
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611611
Yushi Liu, Liwei Yuan, Zhihao Chen, Yekai Yu, Zhao Zhang, Cheqing Jin, Ying Yan
The emergence of digital asset applications, driven by Web 3.0 and powered by blockchain technology, has led to a growing demand for blockchain-specific graph analytics to unearth the insights. However, current blockchain data analytics systems are unable to perform efficient ad-hoc graph analytics over both live and past time windows due to their inefficient data synchronization and slow graph snapshots retrieval capability. To address these issues, we propose ChainDash, a blockchain data analytics system that dedicates a highly-parallelized data synchronization component and a retrieval-optimized temporal graph store. By leveraging these techniques, ChainDash supports efficient ad-hoc graph analytics of smart contract activities over arbitrary time windows. In the demonstration, we showcase the interactive visualization interfaces of ChainDash, where attendees will execute customized queries for ad-hoc graph analytics of blockchain data.
由Web 3.0驱动并由区块链技术提供支持的数字资产应用程序的出现,导致对区块链特定图形分析的需求不断增长,以挖掘见解。然而,当前的区块链数据分析系统由于其低效的数据同步和缓慢的图形快照检索能力,无法在实时和过去的时间窗口上执行有效的临时图形分析。为了解决这些问题,我们提出了ChainDash,这是一个区块链数据分析系统,专门用于高度并行化的数据同步组件和检索优化的时态图存储。通过利用这些技术,ChainDash支持在任意时间窗口内对智能合约活动进行高效的临时图表分析。在演示中,我们展示了ChainDash的交互式可视化界面,与会者将在其中执行自定义查询,以对区块链数据进行临时图形分析。
{"title":"ChainDash: An Ad-Hoc Blockchain Data Analytics System","authors":"Yushi Liu, Liwei Yuan, Zhihao Chen, Yekai Yu, Zhao Zhang, Cheqing Jin, Ying Yan","doi":"10.14778/3611540.3611611","DOIUrl":"https://doi.org/10.14778/3611540.3611611","url":null,"abstract":"The emergence of digital asset applications, driven by Web 3.0 and powered by blockchain technology, has led to a growing demand for blockchain-specific graph analytics to unearth the insights. However, current blockchain data analytics systems are unable to perform efficient ad-hoc graph analytics over both live and past time windows due to their inefficient data synchronization and slow graph snapshots retrieval capability. To address these issues, we propose ChainDash, a blockchain data analytics system that dedicates a highly-parallelized data synchronization component and a retrieval-optimized temporal graph store. By leveraging these techniques, ChainDash supports efficient ad-hoc graph analytics of smart contract activities over arbitrary time windows. In the demonstration, we showcase the interactive visualization interfaces of ChainDash, where attendees will execute customized queries for ad-hoc graph analytics of blockchain data.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning for Subgraph Extraction: Methods, Applications and Challenges 子图提取的机器学习:方法、应用和挑战
3区 计算机科学 Q1 Computer Science Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611571
Kai Siong Yow, Ningyi Liao, Siqiang Luo, Reynold Cheng
Subgraphs are obtained by extracting a subset of vertices and a subset of edges from the associated original graphs, and many graph properties are known to be inherited by subgraphs. Subgraphs can be applied in many areas such as social networks, recommender systems, biochemistry and fraud discovery. Researchers from various communities have paid a great deal of attention to investigate numerous subgraph problems, by proposing algorithms that mainly extract important structures of a given graph. There are however some limitations that should be addressed, with regard to the efficiency, effectiveness and scalability of these traditional algorithms. As a consequence, machine learning techniques---one of the most latest trends---have recently been employed in the database community to address various subgraph problems considering that they have been shown to be beneficial in dealing with graph-related problems. We discuss learning-based approaches for four well known subgraph problems in this tutorial, namely subgraph isomorphism, maximum common subgraph, community detection and community search problems. We give a general description of each proposed model, and analyse its design and performance. To allow further investigations on relevant subgraph problems, we suggest some potential future directions in this area. We believe that this work can be used as one of the primary resources, for researchers who intend to develop learning models in solving problems that are closely related to subgraphs.
子图是通过从关联的原始图中提取一个顶点子集和一个边子集来获得的,并且已知许多图的属性是由子图继承的。子图可以应用于许多领域,如社交网络、推荐系统、生物化学和欺诈发现。不同领域的研究人员对子图问题进行了大量的研究,提出了主要从给定图中提取重要结构的算法。然而,在这些传统算法的效率、有效性和可扩展性方面,有一些限制需要解决。因此,机器学习技术——最新的趋势之一——最近被应用于数据库社区,以解决各种子图问题,因为它们已被证明在处理图相关问题方面是有益的。在本教程中,我们讨论了四个众所周知的子图问题的基于学习的方法,即子图同构、最大公共子图、社区检测和社区搜索问题。我们给出了每个模型的一般描述,并分析了其设计和性能。为了进一步研究相关的子图问题,我们提出了该领域的一些潜在的未来方向。我们相信,这项工作可以作为主要资源之一,为那些打算开发学习模型来解决与子图密切相关的问题的研究人员。
{"title":"Machine Learning for Subgraph Extraction: Methods, Applications and Challenges","authors":"Kai Siong Yow, Ningyi Liao, Siqiang Luo, Reynold Cheng","doi":"10.14778/3611540.3611571","DOIUrl":"https://doi.org/10.14778/3611540.3611571","url":null,"abstract":"Subgraphs are obtained by extracting a subset of vertices and a subset of edges from the associated original graphs, and many graph properties are known to be inherited by subgraphs. Subgraphs can be applied in many areas such as social networks, recommender systems, biochemistry and fraud discovery. Researchers from various communities have paid a great deal of attention to investigate numerous subgraph problems, by proposing algorithms that mainly extract important structures of a given graph. There are however some limitations that should be addressed, with regard to the efficiency, effectiveness and scalability of these traditional algorithms. As a consequence, machine learning techniques---one of the most latest trends---have recently been employed in the database community to address various subgraph problems considering that they have been shown to be beneficial in dealing with graph-related problems. We discuss learning-based approaches for four well known subgraph problems in this tutorial, namely subgraph isomorphism, maximum common subgraph, community detection and community search problems. We give a general description of each proposed model, and analyse its design and performance. To allow further investigations on relevant subgraph problems, we suggest some potential future directions in this area. We believe that this work can be used as one of the primary resources, for researchers who intend to develop learning models in solving problems that are closely related to subgraphs.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the Vldb Endowment
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1