首页 > 最新文献

Proceedings of the 2018 International Conference on Management of Data最新文献

英文 中文
Session details: Research 5: Graph Data Management 会议详情:研究5:图数据管理
S. Bhowmick
{"title":"Session details: Research 5: Graph Data Management","authors":"S. Bhowmick","doi":"10.1145/3258009","DOIUrl":"https://doi.org/10.1145/3258009","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78300931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FastQRE: Fast Query Reverse Engineering FastQRE:快速查询逆向工程
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183727
D. Kalashnikov, L. Lakshmanan, D. Srivastava
We study the problem of Query Reverse Engineering (QRE), where given a database and an output table, the task is to find a simple project-join SQL query that generates that table when applied on the database. This problem is known for its efficiency challenge due to mainly two reasons. First, the problem has a very large search space and its various variants are known to be NP-hard. Second, executing even a single candidate SQL query can be very computationally expensive. In this work we propose a novel approach for solving the QRE problem efficiently. Our solution outperforms the existing state of the art by 2-3 orders of magnitude for complex queries, resolving those queries in seconds rather than days, thus making our approach more practical in real-life settings.
我们研究查询逆向工程(Query Reverse Engineering, QRE)问题,其中给定一个数据库和一个输出表,任务是找到一个简单的项目连接SQL查询,该查询在应用于数据库时生成该表。这个问题以其效率挑战而闻名,主要有两个原因。首先,这个问题有一个非常大的搜索空间,它的各种变体都是np困难的。其次,即使执行一个候选SQL查询,在计算上也会非常昂贵。在这项工作中,我们提出了一种有效解决QRE问题的新方法。对于复杂的查询,我们的解决方案比现有的技术水平高出2-3个数量级,在几秒钟内解决这些查询,而不是几天,从而使我们的方法在现实生活中更加实用。
{"title":"FastQRE: Fast Query Reverse Engineering","authors":"D. Kalashnikov, L. Lakshmanan, D. Srivastava","doi":"10.1145/3183713.3183727","DOIUrl":"https://doi.org/10.1145/3183713.3183727","url":null,"abstract":"We study the problem of Query Reverse Engineering (QRE), where given a database and an output table, the task is to find a simple project-join SQL query that generates that table when applied on the database. This problem is known for its efficiency challenge due to mainly two reasons. First, the problem has a very large search space and its various variants are known to be NP-hard. Second, executing even a single candidate SQL query can be very computationally expensive. In this work we propose a novel approach for solving the QRE problem efficiently. Our solution outperforms the existing state of the art by 2-3 orders of magnitude for complex queries, resolving those queries in seconds rather than days, thus making our approach more practical in real-life settings.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75454344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
DITA: A Distributed In-Memory Trajectory Analytics System 分布式内存轨迹分析系统
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193553
Zeyuan Shang, Guoliang Li, Z. Bao
Trajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. In this paper, we demonstrate a distributed in-memory trajectory analytics system DITA to support large-scale trajectory data analytics. DITA exhibit three unique features. First, DITA supports threshold-based and KNN-based trajectory similarity search and join operations, as well as range queries (i.e., space and time). Second, DITA is versatile to support most existing similarity functions to cater for different analytic purposes and scenarios. Last, DITA is seamlessly integrated into Spark SQL to support easy-to-use SQL and DataFrame API interfaces. Technically, DITA proposes an effective partitioning method, global index and local index, to address the data locality problem. It also devises cost-based techniques to balance the workload, and develops a filter-verification framework for efficient and scalable search and join.
轨迹分析可以使许多现实世界的应用受益,例如,基于频繁轨迹的导航系统、道路规划、拼车和交通优化。在本文中,我们展示了一个分布式内存中的轨迹分析系统DITA,以支持大规模的轨迹数据分析。DITA有三个独特的特点。首先,DITA支持基于阈值和基于knn的轨迹相似性搜索和连接操作,以及范围查询(即空间和时间)。其次,DITA是通用的,可以支持大多数现有的相似性函数,以满足不同的分析目的和场景。最后,DITA无缝集成到Spark SQL中,以支持易于使用的SQL和DataFrame API接口。在技术上,DITA提出了一种有效的分区方法:全局索引和局部索引,以解决数据局部性问题。它还设计了基于成本的技术来平衡工作负载,并开发了一个过滤器验证框架,以实现高效和可扩展的搜索和连接。
{"title":"DITA: A Distributed In-Memory Trajectory Analytics System","authors":"Zeyuan Shang, Guoliang Li, Z. Bao","doi":"10.1145/3183713.3193553","DOIUrl":"https://doi.org/10.1145/3183713.3193553","url":null,"abstract":"Trajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. In this paper, we demonstrate a distributed in-memory trajectory analytics system DITA to support large-scale trajectory data analytics. DITA exhibit three unique features. First, DITA supports threshold-based and KNN-based trajectory similarity search and join operations, as well as range queries (i.e., space and time). Second, DITA is versatile to support most existing similarity functions to cater for different analytic purposes and scenarios. Last, DITA is seamlessly integrated into Spark SQL to support easy-to-use SQL and DataFrame API interfaces. Technically, DITA proposes an effective partitioning method, global index and local index, to address the data locality problem. It also devises cost-based techniques to balance the workload, and develops a filter-verification framework for efficient and scalable search and join.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73491422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Session details: Research 3: Transactions and Indexing 会议细节:研究3:交易和索引
Pınar Tözün
{"title":"Session details: Research 3: Transactions and Indexing","authors":"Pınar Tözün","doi":"10.1145/3258007","DOIUrl":"https://doi.org/10.1145/3258007","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74467015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Research 2: Usability and Security/Privacy 研究2:可用性和安全性/隐私
Ashwin Machanavajjhala
{"title":"Session details: Research 2: Usability and Security/Privacy","authors":"Ashwin Machanavajjhala","doi":"10.1145/3258005","DOIUrl":"https://doi.org/10.1145/3258005","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76021761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Qetch: Time Series Querying with Expressive Sketches Qetch:时间序列查询与表达草图
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193547
M. Mannino, A. Abouzeid
Query-by-sketch tools allow users to sketch a pattern to search a time series database for matches. Prior work adopts a bottom-up design approach: the sketching interface is built to reflect the inner workings of popular matching algorithms like Dynamic time warping (DTW) or Euclidean distance (ED). We design Qetch, a query-by-sketch tool for time series data, top-down. Users freely sketch patterns on a scale-less canvas. By studying how humans sketch time series patterns we develop a matching algorithm that accounts for human sketching errors. Qetch's top-down design and novel matching algorithm enable the easy construction of expressive queries that include regular expressions over sketches and queries over multiple time series. Our demonstration showcases Qetch and summarizes results from our evaluation of Qetch's effectiveness.
按草图查询工具允许用户绘制模式草图,以便在时间序列数据库中搜索匹配项。先前的工作采用自下而上的设计方法:构建草图界面来反映流行的匹配算法的内部工作原理,如动态时间翘曲(DTW)或欧几里得距离(ED)。我们设计了Qetch,一个时间序列数据的按草图查询工具,自上而下。用户可以在无比例画布上自由绘制图案。通过研究人类如何绘制时间序列模式,我们开发了一种匹配算法来解释人类绘制错误。Qetch的自顶向下设计和新颖的匹配算法可以轻松构建表达性查询,包括草图上的正则表达式和多个时间序列上的查询。我们的演示展示了Qetch,并总结了我们对Qetch有效性的评估结果。
{"title":"Qetch: Time Series Querying with Expressive Sketches","authors":"M. Mannino, A. Abouzeid","doi":"10.1145/3183713.3193547","DOIUrl":"https://doi.org/10.1145/3183713.3193547","url":null,"abstract":"Query-by-sketch tools allow users to sketch a pattern to search a time series database for matches. Prior work adopts a bottom-up design approach: the sketching interface is built to reflect the inner workings of popular matching algorithms like Dynamic time warping (DTW) or Euclidean distance (ED). We design Qetch, a query-by-sketch tool for time series data, top-down. Users freely sketch patterns on a scale-less canvas. By studying how humans sketch time series patterns we develop a matching algorithm that accounts for human sketching errors. Qetch's top-down design and novel matching algorithm enable the easy construction of expressive queries that include regular expressions over sketches and queries over multiple time series. Our demonstration showcases Qetch and summarizes results from our evaluation of Qetch's effectiveness.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75961849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
IoT-Detective: Analyzing IoT Data Under Differential Privacy 物联网检测:在差异隐私下分析物联网数据
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193571
Sameera Ghayyur, Yan Chen, Roberto Yus, Ashwin Machanavajjhala, Michael Hay, G. Miklau, S. Mehrotra
Emerging IoT technologies promise to bring revolutionary changes to many domains including health, transportation, and building management. However, continuous monitoring of individuals threatens privacy. The success of IoT thus depends on integrating privacy protections into IoT infrastructures. This demonstration adapts a recently-proposed system, PeGaSus, which releases streaming data under the formal guarantee of differential privacy, with a state-of-the-art IoT testbed (TIPPERS) located at UC Irvine. PeGaSus protects individuals' data by introducing distortion into the output stream. While PeGaSuS has been shown to offer lower numerical error compared to competing methods, assessing the usefulness of the output is application dependent. The goal of the demonstration is to assess the usefulness of private streaming data in a real-world IoT application setting. The demo consists of a game, IoT-Detective, in which participants carry out visual data analysis tasks on private data streams, earning points when they achieve results similar to those on the true data stream. The demo will educate participants about the impact of privacy mechanisms on IoT data while at the same time generating insights into privacy-utility trade-offs in IoT applications.
新兴的物联网技术有望给许多领域带来革命性的变化,包括健康、交通和建筑管理。然而,对个人的持续监控会威胁到隐私。因此,物联网的成功取决于将隐私保护集成到物联网基础设施中。该演示采用了最近提出的系统PeGaSus,该系统在差分隐私的正式保证下发布流数据,并配备了位于加州大学欧文分校的最先进的物联网测试平台(TIPPERS)。PeGaSus通过在输出流中引入失真来保护个人数据。虽然PeGaSuS已被证明与竞争方法相比提供更低的数值误差,但评估输出的有效性取决于应用程序。演示的目的是评估私有流数据在现实世界物联网应用设置中的有用性。该演示包括一个名为“物联网侦探”的游戏,参与者在私人数据流上执行可视化数据分析任务,当他们获得与真实数据流相似的结果时,就会获得积分。该演示将向参与者介绍隐私机制对物联网数据的影响,同时对物联网应用中的隐私-效用权衡产生见解。
{"title":"IoT-Detective: Analyzing IoT Data Under Differential Privacy","authors":"Sameera Ghayyur, Yan Chen, Roberto Yus, Ashwin Machanavajjhala, Michael Hay, G. Miklau, S. Mehrotra","doi":"10.1145/3183713.3193571","DOIUrl":"https://doi.org/10.1145/3183713.3193571","url":null,"abstract":"Emerging IoT technologies promise to bring revolutionary changes to many domains including health, transportation, and building management. However, continuous monitoring of individuals threatens privacy. The success of IoT thus depends on integrating privacy protections into IoT infrastructures. This demonstration adapts a recently-proposed system, PeGaSus, which releases streaming data under the formal guarantee of differential privacy, with a state-of-the-art IoT testbed (TIPPERS) located at UC Irvine. PeGaSus protects individuals' data by introducing distortion into the output stream. While PeGaSuS has been shown to offer lower numerical error compared to competing methods, assessing the usefulness of the output is application dependent. The goal of the demonstration is to assess the usefulness of private streaming data in a real-world IoT application setting. The demo consists of a game, IoT-Detective, in which participants carry out visual data analysis tasks on private data streams, earning points when they achieve results similar to those on the true data stream. The demo will educate participants about the impact of privacy mechanisms on IoT data while at the same time generating insights into privacy-utility trade-offs in IoT applications.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73807204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Dynamic Pricing in Spatial Crowdsourcing: A Matching-Based Approach 空间众包的动态定价:基于匹配的方法
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196929
Yongxin Tong, Libin Wang, Zimu Zhou, Lei Chen, Bowen Du, Jieping Ye
In spatial crowdsourcing, requesters submit their task-related locations and increase the demand of a local area. The platform prices these tasks and assigns spatial workers to serve if the prices are accepted by requesters. There exist mature pricing strategies which specialize in tackling the imbalance between supply and demand in a local market. However, in global optimization, the platform should consider the mobility of workers; that is, any single worker can be the potential supply for several areas, while it can only be the true supply of one area when assigned by the platform. The hardness lies in the uncertainty of the true supply of each area, hence the existing pricing strategies do not work. In the paper, we formally define this Global Dynamic Pricing(GDP) problem in spatial crowdsourcing. And since the objective is concerned with how the platform matches the supply to areas, we let the matching algorithm guide us how to price. We propose a MAtching-based Pricing Strategy (MAPS) with guaranteed bound. Extensive experiments conducted on the synthetic and real datasets demonstrate the effectiveness of MAPS.
在空间众包中,请求者提交与任务相关的位置,并增加当地的需求。该平台对这些任务进行定价,并在请求者接受价格的情况下分配空间工作人员提供服务。存在成熟的定价策略,专门解决当地市场的供需不平衡问题。然而,在全局优化中,平台应该考虑工人的流动性;也就是说,任何一个工人都可以是几个区域的潜在供应,而当平台分配时,它只能是一个区域的真正供应。难点在于每个区域的真实供给的不确定性,因此现有的定价策略不起作用。本文正式定义了空间众包中的全球动态定价(GDP)问题。由于目标是关注平台如何将供应与区域匹配,我们让匹配算法指导我们如何定价。提出了一种具有保证界的基于匹配的定价策略(MAPS)。在合成数据集和真实数据集上进行的大量实验证明了MAPS的有效性。
{"title":"Dynamic Pricing in Spatial Crowdsourcing: A Matching-Based Approach","authors":"Yongxin Tong, Libin Wang, Zimu Zhou, Lei Chen, Bowen Du, Jieping Ye","doi":"10.1145/3183713.3196929","DOIUrl":"https://doi.org/10.1145/3183713.3196929","url":null,"abstract":"In spatial crowdsourcing, requesters submit their task-related locations and increase the demand of a local area. The platform prices these tasks and assigns spatial workers to serve if the prices are accepted by requesters. There exist mature pricing strategies which specialize in tackling the imbalance between supply and demand in a local market. However, in global optimization, the platform should consider the mobility of workers; that is, any single worker can be the potential supply for several areas, while it can only be the true supply of one area when assigned by the platform. The hardness lies in the uncertainty of the true supply of each area, hence the existing pricing strategies do not work. In the paper, we formally define this Global Dynamic Pricing(GDP) problem in spatial crowdsourcing. And since the objective is concerned with how the platform matches the supply to areas, we let the matching algorithm guide us how to price. We propose a MAtching-based Pricing Strategy (MAPS) with guaranteed bound. Extensive experiments conducted on the synthetic and real datasets demonstrate the effectiveness of MAPS.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83957391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 98
GeoFlux: Hands-Off Data Integration Leveraging Join Key Knowledge GeoFlux:利用连接关键知识的数据集成
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193546
Jie Song, Danai Koutra, Murali Mani, H. Jagadish
Data integration is frequently required to obtain the full value of data from multiple sources. In spite of extensive research on tools to assist users, data integration remains hard, particularly for users with limited technical proficiency. To address this barrier, we study how much we can do with no user guidance. Our vision is that the user should merely specify two input datasets to be joined and get a meaningful integrated result. It turns out that our vision can be realized if the system can correctly determine the join key, for example based on domain knowledge. We demonstrate this notion by considering a broad domain: socioeconomic data aggregated by geography, a widespread category that accounts for 80% of the data published by government agencies. Intuitively two such datasets can be integrated by joining on the geographic unit column. Although it sounds easy, this task has many challenges: How can we automatically identify columns corresponding to geographic units, other dimension variables and measure variables, respectively? If multiple geographic types exist, which one should be chosen for the join? How to join tables with idiosyncratic schema, different geographic units of aggregation or no aggregation at all? We have developed GeoFlux, a data integration system that handles all these challenges and joins tabular data by automatically aggregating geographic information with a new, advanced crosswalk algorithm. In this demo paper, we overview the architecture of the system and its user-friendly interfaces, and then demonstrate via a real-world example that it is general, fully automatic and easy-to-use. In the demonstration, we invite users to interact with GeoFlux to integrate more sample socioeconomic data from data.ny.gov.
为了从多个数据源获得数据的全部价值,经常需要进行数据集成。尽管对帮助用户的工具进行了广泛的研究,但数据集成仍然很困难,特别是对于技术熟练程度有限的用户。为了解决这个障碍,我们研究了在没有用户指导的情况下我们能做多少事情。我们的设想是,用户应该只指定两个输入数据集来连接,并得到一个有意义的集成结果。事实证明,如果系统能够正确地确定连接键,例如基于领域知识,则可以实现我们的愿景。我们通过考虑一个广泛的领域来证明这一概念:按地理位置汇总的社会经济数据,这是一个广泛的类别,占政府机构发布的数据的80%。直观地说,两个这样的数据集可以通过连接地理单位列来集成。虽然听起来很简单,但这项任务有许多挑战:我们如何自动识别分别对应于地理单位、其他维度变量和度量变量的列?如果存在多个地理类型,应该选择哪一个进行连接?如何连接具有特殊模式、不同地理聚合单元或根本没有聚合的表?我们已经开发了GeoFlux,这是一个数据集成系统,可以处理所有这些挑战,并通过使用新的先进的人行横道算法自动聚合地理信息来连接表格数据。在这篇演示论文中,我们概述了系统的架构及其用户友好的界面,然后通过一个现实世界的例子来演示它是通用的,全自动的和易于使用的。在演示中,我们邀请用户与GeoFlux进行交互,以整合来自data.ny.gov的更多样本社会经济数据。
{"title":"GeoFlux: Hands-Off Data Integration Leveraging Join Key Knowledge","authors":"Jie Song, Danai Koutra, Murali Mani, H. Jagadish","doi":"10.1145/3183713.3193546","DOIUrl":"https://doi.org/10.1145/3183713.3193546","url":null,"abstract":"Data integration is frequently required to obtain the full value of data from multiple sources. In spite of extensive research on tools to assist users, data integration remains hard, particularly for users with limited technical proficiency. To address this barrier, we study how much we can do with no user guidance. Our vision is that the user should merely specify two input datasets to be joined and get a meaningful integrated result. It turns out that our vision can be realized if the system can correctly determine the join key, for example based on domain knowledge. We demonstrate this notion by considering a broad domain: socioeconomic data aggregated by geography, a widespread category that accounts for 80% of the data published by government agencies. Intuitively two such datasets can be integrated by joining on the geographic unit column. Although it sounds easy, this task has many challenges: How can we automatically identify columns corresponding to geographic units, other dimension variables and measure variables, respectively? If multiple geographic types exist, which one should be chosen for the join? How to join tables with idiosyncratic schema, different geographic units of aggregation or no aggregation at all? We have developed GeoFlux, a data integration system that handles all these challenges and joins tabular data by automatically aggregating geographic information with a new, advanced crosswalk algorithm. In this demo paper, we overview the architecture of the system and its user-friendly interfaces, and then demonstrate via a real-world example that it is general, fully automatic and easy-to-use. In the demonstration, we invite users to interact with GeoFlux to integrate more sample socioeconomic data from data.ny.gov.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84169369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Pipelined Query Processing in Coprocessor Environments 协处理器环境中的流水线查询处理
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183734
Henning Funke, S. Breß, Stefan Noll, V. Markl, J. Teubner
Query processing on GPU-style coprocessors is severely limited by the movement of data. With teraflops of compute throughput in one device, even high-bandwidth memory cannot provision enough data for a reasonable utilization. Query compilation is a proven technique to improve memory efficiency. However, its inherent tuple-at-a-time processing style does not suit the massively parallel execution model of GPU-style coprocessors. This compromises the improvements in efficiency offered by query compilation. In this paper, we show how query compilation and GPU-style parallelism can be made to play in unison nevertheless. We describe a compiler strategy that merges multiple operations into a single GPU kernel, thereby significantly reducing bandwidth demand. Compared to operator-at-a-time, we show reductions of memory access volumes by factors of up to 7.5x resulting in shorter kernel execution times by factors of up to 9.5x.
gpu样式的协处理器上的查询处理受到数据移动的严重限制。如果一台设备的计算吞吐量达到每秒万亿次,那么即使是高带宽内存也无法提供足够的数据以实现合理的利用。查询编译是一种经过验证的提高内存效率的技术。然而,其固有的一次元组处理风格并不适合gpu风格的协处理器的大规模并行执行模型。这损害了查询编译提供的效率改进。在本文中,我们展示了如何使查询编译和gpu风格的并行性同时发挥作用。我们描述了一种编译器策略,该策略将多个操作合并到单个GPU内核中,从而显着降低带宽需求。与每次操作符相比,我们发现内存访问量减少了7.5倍,内核执行时间缩短了9.5倍。
{"title":"Pipelined Query Processing in Coprocessor Environments","authors":"Henning Funke, S. Breß, Stefan Noll, V. Markl, J. Teubner","doi":"10.1145/3183713.3183734","DOIUrl":"https://doi.org/10.1145/3183713.3183734","url":null,"abstract":"Query processing on GPU-style coprocessors is severely limited by the movement of data. With teraflops of compute throughput in one device, even high-bandwidth memory cannot provision enough data for a reasonable utilization. Query compilation is a proven technique to improve memory efficiency. However, its inherent tuple-at-a-time processing style does not suit the massively parallel execution model of GPU-style coprocessors. This compromises the improvements in efficiency offered by query compilation. In this paper, we show how query compilation and GPU-style parallelism can be made to play in unison nevertheless. We describe a compiler strategy that merges multiple operations into a single GPU kernel, thereby significantly reducing bandwidth demand. Compared to operator-at-a-time, we show reductions of memory access volumes by factors of up to 7.5x resulting in shorter kernel execution times by factors of up to 9.5x.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78163453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
期刊
Proceedings of the 2018 International Conference on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1