首页 > 最新文献

Proceedings of the 2018 International Conference on Management of Data最新文献

英文 中文
Top-k Sorting Under Partial Order Information 偏序信息下的Top-k排序
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3199672
Eyal Dushkin, T. Milo
We address the problem of sorting the top-k elements of a set, given a predefined partial order over the set elements. Our means to obtain missing order information is via a comparison operator that interacts with a crowd of domain experts to determine the order between two unordered items. The practical motivation for studying this problem is the common scenario where elements cannot be easily compared by machines and thus human experts are harnessed for this task. As some initial partial order is given, our goal is to optimally exploit it in order to minimize the domain experts work. The problem lies at the intersection of two well-studied problems in the theory and crowdsourcing communities:full sorting under partial order information and top-k sorting with no prior partial order information. As we show, resorting to one of the existing state-of-the-art algorithms in these two problems turns out to be extravagant in terms of the number of comparisons performed by the users. In light of this, we present a dedicated algorithm for top-k sorting that aims to minimize the number of comparisons by thoroughly leveraging the partial order information. We examine two possible interpretations of the comparison operator, taken from the theory and crowdsourcing communities, and demonstrate the efficiency and effectiveness of our algorithm for both of them. We further demonstrate the utility of our algorithm, beyond identifying the top-k elements in a dataset, as a vehicle to improve the performance of Learning-to-Rank algorithms in machine learning context. We conduct a comprehensive experimental evaluation in both synthetic and real-world settings.
在给定集合元素的预定义偏序的情况下,我们解决了对集合中最上面的k个元素排序的问题。我们获取缺失订单信息的方法是通过与一群领域专家交互的比较运算符来确定两个无序项目之间的顺序。研究这个问题的实际动机是机器不能轻易比较元素的常见场景,因此利用人类专家来完成这项任务。当给定一些初始偏序时,我们的目标是最优地利用它,以最小化领域专家的工作。这个问题是理论和众包社区中两个研究得很好的问题的交集:偏序信息下的完全排序和没有先验偏序信息的top-k排序。正如我们所展示的,就用户执行的比较次数而言,在这两个问题中使用现有的最先进的算法之一被证明是奢侈的。鉴于此,我们提出了一种专门用于top-k排序的算法,旨在通过充分利用部分顺序信息来最小化比较次数。我们研究了比较运算符的两种可能的解释,分别来自理论和众包社区,并展示了我们的算法在这两种情况下的效率和有效性。我们进一步展示了我们的算法的实用性,除了识别数据集中的top-k元素之外,还可以作为提高机器学习环境中学习排序算法性能的工具。我们在合成和现实环境中进行了全面的实验评估。
{"title":"Top-k Sorting Under Partial Order Information","authors":"Eyal Dushkin, T. Milo","doi":"10.1145/3183713.3199672","DOIUrl":"https://doi.org/10.1145/3183713.3199672","url":null,"abstract":"We address the problem of sorting the top-k elements of a set, given a predefined partial order over the set elements. Our means to obtain missing order information is via a comparison operator that interacts with a crowd of domain experts to determine the order between two unordered items. The practical motivation for studying this problem is the common scenario where elements cannot be easily compared by machines and thus human experts are harnessed for this task. As some initial partial order is given, our goal is to optimally exploit it in order to minimize the domain experts work. The problem lies at the intersection of two well-studied problems in the theory and crowdsourcing communities:full sorting under partial order information and top-k sorting with no prior partial order information. As we show, resorting to one of the existing state-of-the-art algorithms in these two problems turns out to be extravagant in terms of the number of comparisons performed by the users. In light of this, we present a dedicated algorithm for top-k sorting that aims to minimize the number of comparisons by thoroughly leveraging the partial order information. We examine two possible interpretations of the comparison operator, taken from the theory and crowdsourcing communities, and demonstrate the efficiency and effectiveness of our algorithm for both of them. We further demonstrate the utility of our algorithm, beyond identifying the top-k elements in a dataset, as a vehicle to improve the performance of Learning-to-Rank algorithms in machine learning context. We conduct a comprehensive experimental evaluation in both synthetic and real-world settings.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74801859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Efficient Top-K Query Processing on Massively Parallel Hardware 大规模并行硬件上高效的Top-K查询处理
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183735
Anil Shanbhag, H. Pirk, S. Madden
A common operation in many data analytics workloads is to find the top-k items, i.e., the largest or smallest operations according to some sort order (implemented via LIMIT or ORDER BY expressions in SQL). A naive implementation of top-k is to sort all of the items and then return the first k, but this does much more work than needed. Although efficient implementations for top-k have been explored on traditional multi-core processors, there has been no prior systematic study of top-k implementations on GPUs, despite open requests for such implementations in GPU-based frameworks like TensorFlow and ArrayFire. In this work, we present several top-k algorithms for GPUs, including a new algorithm based on bitonic sort called bitonic top-k. The bitonic top-k algorithm is up to a factor of new15x faster than sort and 4x faster than a variety of other possible implementations for values of k up to 256. We also develop a cost model to predict the performance of several of our algorithms, and show that it accurately predicts actual performance on modern GPUs.
许多数据分析工作负载中的一个常见操作是查找top-k项,即根据某种排序顺序(通过SQL中的LIMIT或order BY表达式实现)查找最大或最小的操作。top-k的一种简单实现是对所有项进行排序,然后返回第一个k项,但这样做的工作量远远超过需要的工作量。尽管在传统的多核处理器上已经探索了top-k的有效实现,但是在gpu上还没有对top-k实现的系统研究,尽管在基于gpu的框架(如TensorFlow和ArrayFire)中有公开的实现请求。在这项工作中,我们提出了几种gpu的top-k算法,包括一种基于bitonic排序的新算法,称为bitonic top-k。bitonic top-k算法比sort快15倍,比k值为256的各种其他可能实现快4倍。我们还开发了一个成本模型来预测我们的几个算法的性能,并表明它准确地预测了现代gpu的实际性能。
{"title":"Efficient Top-K Query Processing on Massively Parallel Hardware","authors":"Anil Shanbhag, H. Pirk, S. Madden","doi":"10.1145/3183713.3183735","DOIUrl":"https://doi.org/10.1145/3183713.3183735","url":null,"abstract":"A common operation in many data analytics workloads is to find the top-k items, i.e., the largest or smallest operations according to some sort order (implemented via LIMIT or ORDER BY expressions in SQL). A naive implementation of top-k is to sort all of the items and then return the first k, but this does much more work than needed. Although efficient implementations for top-k have been explored on traditional multi-core processors, there has been no prior systematic study of top-k implementations on GPUs, despite open requests for such implementations in GPU-based frameworks like TensorFlow and ArrayFire. In this work, we present several top-k algorithms for GPUs, including a new algorithm based on bitonic sort called bitonic top-k. The bitonic top-k algorithm is up to a factor of new15x faster than sort and 4x faster than a variety of other possible implementations for values of k up to 256. We also develop a cost model to predict the performance of several of our algorithms, and show that it accurately predicts actual performance on modern GPUs.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74818824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Session details: Keynote1 会议详情
P. Bernstein
{"title":"Session details: Keynote1","authors":"P. Bernstein","doi":"10.1145/3258003","DOIUrl":"https://doi.org/10.1145/3258003","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72863947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Research 6: Storage & Indexing 会议细节:研究6:存储和索引
K. A. Ross
{"title":"Session details: Research 6: Storage & Indexing","authors":"K. A. Ross","doi":"10.1145/3258010","DOIUrl":"https://doi.org/10.1145/3258010","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79311329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Research 13: Machine Learning & Knowledge-base Construction 会议详情:研究13:机器学习与知识库构建
Guoliang Li
{"title":"Session details: Research 13: Machine Learning & Knowledge-base Construction","authors":"Guoliang Li","doi":"10.1145/3258020","DOIUrl":"https://doi.org/10.1145/3258020","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84238589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Query-based Workload Forecasting for Self-Driving Database Management Systems 基于查询的自驾车数据库管理系统工作负荷预测
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196908
Lin Ma, Dana Van Aken, Ahmed S. Hefny, Gustavo Mezerhane, Andrew Pavlo, Geoffrey J. Gordon
The first step towards an autonomous database management system (DBMS) is the ability to model the target application's workload. This is necessary to allow the system to anticipate future workload needs and select the proper optimizations in a timely manner. Previous forecasting techniques model the resource utilization of the queries. Such metrics, however, change whenever the physical design of the database and the hardware resources change, thereby rendering previous forecasting models useless. We present a robust forecasting framework called QueryBot 5000 that allows a DBMS to predict the expected arrival rate of queries in the future based on historical data. To better support highly dynamic environments, our approach uses the logical composition of queries in the workload rather than the amount of physical resources used for query execution. It provides multiple horizons (short- vs. long-term) with different aggregation intervals. We also present a clustering-based technique for reducing the total number of forecasting models to maintain. To evaluate our approach, we compare our forecasting models against other state-of-the-art models on three real-world database traces. We implemented our models in an external controller for PostgreSQL and MySQL and demonstrate their effectiveness in selecting indexes.
迈向自治数据库管理系统(DBMS)的第一步是能够对目标应用程序的工作负载进行建模。这对于允许系统预测未来的工作负载需求并及时选择适当的优化是必要的。以前的预测技术对查询的资源利用进行建模。然而,每当数据库的物理设计和硬件资源发生变化时,这些度量就会发生变化,从而使以前的预测模型变得无用。我们提出了一个健壮的预测框架QueryBot 5000,它允许DBMS根据历史数据预测未来查询的预期到达率。为了更好地支持高度动态的环境,我们的方法在工作负载中使用查询的逻辑组合,而不是用于查询执行的物理资源量。它提供了具有不同聚合间隔的多个视界(短期或长期)。我们还提出了一种基于聚类的技术,用于减少需要维护的预测模型的总数。为了评估我们的方法,我们将我们的预测模型与其他最先进的模型在三个真实世界的数据库轨迹上进行比较。我们在PostgreSQL和MySQL的外部控制器中实现了我们的模型,并演示了它们在选择索引方面的有效性。
{"title":"Query-based Workload Forecasting for Self-Driving Database Management Systems","authors":"Lin Ma, Dana Van Aken, Ahmed S. Hefny, Gustavo Mezerhane, Andrew Pavlo, Geoffrey J. Gordon","doi":"10.1145/3183713.3196908","DOIUrl":"https://doi.org/10.1145/3183713.3196908","url":null,"abstract":"The first step towards an autonomous database management system (DBMS) is the ability to model the target application's workload. This is necessary to allow the system to anticipate future workload needs and select the proper optimizations in a timely manner. Previous forecasting techniques model the resource utilization of the queries. Such metrics, however, change whenever the physical design of the database and the hardware resources change, thereby rendering previous forecasting models useless. We present a robust forecasting framework called QueryBot 5000 that allows a DBMS to predict the expected arrival rate of queries in the future based on historical data. To better support highly dynamic environments, our approach uses the logical composition of queries in the workload rather than the amount of physical resources used for query execution. It provides multiple horizons (short- vs. long-term) with different aggregation intervals. We also present a clustering-based technique for reducing the total number of forecasting models to maintain. To evaluate our approach, we compare our forecasting models against other state-of-the-art models on three real-world database traces. We implemented our models in an external controller for PostgreSQL and MySQL and demonstrate their effectiveness in selecting indexes.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79657544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 154
Session details: Industry 2: Real-time Analytics 行业2:实时分析
Barzan Mozafari
{"title":"Session details: Industry 2: Real-time Analytics","authors":"Barzan Mozafari","doi":"10.1145/3258011","DOIUrl":"https://doi.org/10.1145/3258011","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78771786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions DimBoost:提升梯度提升决策树到更高的维度
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196892
Jiawei Jiang, B. Cui, Ce Zhang, Fangcheng Fu
Gradient boosting decision tree (GBDT) is one of the most popular machine learning models widely used in both academia and industry. Although GBDT has been widely supported by existing systems such as XGBoost, LightGBM, and MLlib, one system bottleneck appears when the dimensionality of the data becomes high. As a result, when we tried to support our industrial partner on datasets of the dimension up to 330K, we observed suboptimal performance for all these aforementioned systems. In this paper, we ask "Can we build a scalable GBDT training system whose performance scales better with respect to dimensionality of the data?" The first contribution of this paper is a careful investigation of existing systems by developing a performance model with respect to the dimensionality of the data. We find that the collective communication operations in many existing systems only implement the algorithm designed for small messages. By just fixing this problem, we are able to speed up these systems by up to 2X. Our second contribution is a series of optimizations to further optimize the performance of collective communications. These optimizations include a task scheduler, a two-phase split finding method, and low-precision gradient histograms. Our third contribution is a sparsity-aware algorithm to build gradient histograms and a novel index structure to build histograms in parallel. We implement these optimizations in DimBoost and show that it can be 2-9X faster than existing systems.
梯度增强决策树(GBDT)是目前学术界和工业界应用最广泛的机器学习模型之一。虽然GBDT已经得到XGBoost、LightGBM和MLlib等现有系统的广泛支持,但是当数据的维数变得很高时,就会出现一个系统瓶颈。因此,当我们试图在维度高达330K的数据集上支持我们的工业合作伙伴时,我们观察到上述所有系统的性能都不是最优的。在本文中,我们提出了一个问题:“我们能否建立一个可扩展的GBDT训练系统,其性能在数据维度方面的可扩展性更好?”本文的第一个贡献是通过开发与数据维度相关的性能模型,对现有系统进行了仔细的调查。我们发现许多现有系统中的集体通信操作只实现了针对小消息设计的算法。通过解决这个问题,我们可以将这些系统的速度提高2倍。我们的第二个贡献是一系列优化,以进一步优化集体通信的性能。这些优化包括任务调度器、两阶段拆分查找方法和低精度梯度直方图。我们的第三个贡献是用于构建梯度直方图的稀疏感知算法和用于并行构建直方图的新型索引结构。我们在DimBoost中实现了这些优化,并表明它可以比现有系统快2-9倍。
{"title":"DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions","authors":"Jiawei Jiang, B. Cui, Ce Zhang, Fangcheng Fu","doi":"10.1145/3183713.3196892","DOIUrl":"https://doi.org/10.1145/3183713.3196892","url":null,"abstract":"Gradient boosting decision tree (GBDT) is one of the most popular machine learning models widely used in both academia and industry. Although GBDT has been widely supported by existing systems such as XGBoost, LightGBM, and MLlib, one system bottleneck appears when the dimensionality of the data becomes high. As a result, when we tried to support our industrial partner on datasets of the dimension up to 330K, we observed suboptimal performance for all these aforementioned systems. In this paper, we ask \"Can we build a scalable GBDT training system whose performance scales better with respect to dimensionality of the data?\" The first contribution of this paper is a careful investigation of existing systems by developing a performance model with respect to the dimensionality of the data. We find that the collective communication operations in many existing systems only implement the algorithm designed for small messages. By just fixing this problem, we are able to speed up these systems by up to 2X. Our second contribution is a series of optimizations to further optimize the performance of collective communications. These optimizations include a task scheduler, a two-phase split finding method, and low-precision gradient histograms. Our third contribution is a sparsity-aware algorithm to build gradient histograms and a novel index structure to build histograms in parallel. We implement these optimizations in DimBoost and show that it can be 2-9X faster than existing systems.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78813421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Robust, Scalable, Real-Time Event Time Series Aggregation at Twitter 健壮的,可扩展的,实时事件时间序列聚合在Twitter上
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3190663
Peilin Yang, S. Thiagarajan, Jimmy J. Lin
Twitter's data engineering team is faced with the challenge of processing billions of events every day in batch and in real time, and we have built various tools to meet these demands. In this paper, we describe TSAR (TimeSeries AggregatoR), a robust, scalable, real-time event time series aggregation framework built primarily for engagement monitoring: aggregating interactions with Tweets, segmented along a multitude of dimensions such as device, engagement type, etc. TSAR is built on top of Summingbird, an open-source framework for integrating batch and online MapReduce computations, and removes much of the tedium associated with building end-to-end aggregation pipelines---from the ingestion and processing of events to the publication of results in heterogeneous datastores. Clients are provided a query interface that powers dashboards and supports downstream ad hoc analytics.
Twitter的数据工程团队面临着每天批量实时处理数十亿个事件的挑战,我们已经构建了各种工具来满足这些需求。在本文中,我们描述了TSAR (TimeSeries AggregatoR),这是一个鲁棒的、可扩展的、实时的事件时间序列聚合框架,主要用于参与度监测:聚合与tweet的交互,沿着多个维度(如设备、参与度类型等)进行分割。TSAR建立在Summingbird之上,Summingbird是一个用于集成批处理和在线MapReduce计算的开源框架,它消除了与构建端到端聚合管道相关的许多单调乏味的工作——从事件的摄取和处理到在异构数据存储中发布结果。为客户端提供了一个查询接口,该接口为仪表板提供动力,并支持下游临时分析。
{"title":"Robust, Scalable, Real-Time Event Time Series Aggregation at Twitter","authors":"Peilin Yang, S. Thiagarajan, Jimmy J. Lin","doi":"10.1145/3183713.3190663","DOIUrl":"https://doi.org/10.1145/3183713.3190663","url":null,"abstract":"Twitter's data engineering team is faced with the challenge of processing billions of events every day in batch and in real time, and we have built various tools to meet these demands. In this paper, we describe TSAR (TimeSeries AggregatoR), a robust, scalable, real-time event time series aggregation framework built primarily for engagement monitoring: aggregating interactions with Tweets, segmented along a multitude of dimensions such as device, engagement type, etc. TSAR is built on top of Summingbird, an open-source framework for integrating batch and online MapReduce computations, and removes much of the tedium associated with building end-to-end aggregation pipelines---from the ingestion and processing of events to the publication of results in heterogeneous datastores. Clients are provided a query interface that powers dashboards and supports downstream ad hoc analytics.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"120 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89637741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
POIsam: a System for Efficient Selection of Large-scale Geospatial Data on Maps POIsam:地图上大规模地理空间数据的高效选择系统
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193549
Tao Guo, Mingzhao Li, Peishan Li, Z. Bao, G. Cong
In this demonstration we present POIsam, a visualization system supporting the following desirable features: representativeness, visibility constraint, zooming consistency, and panning consistency. The first two constraints aim to efficiently select a small set of representative objects from the current region of user's interest, and any two selected objects should not be too close to each other for users to distinguish in the limited space of a screen. One unique feature of POISam is that any similarity metrics can be plugged into POISam to meet the user's specific needs in different scenarios. The latter two consistencies are fundamental challenges to efficiently update the selection result w.r.t. user's zoom in, zoom out and panning operations when they interact with the map. POISam drops a common assumption from all previous work, i.e. the zoom levels and region cells are pre-defined and indexed, and objects are selected from such region cells at a particular zoom level rather than from user's current region of interest (which in most cases do not correspond to the pre-defined cells). It results in extra challenge as we need to do object selection via online computation. To our best knowledge, this is the first system that is able to meet all the four features to achieve an interactive visualization map exploration system.
在本演示中,我们介绍了POIsam,这是一个支持以下理想特性的可视化系统:代表性、可见性约束、缩放一致性和平移一致性。前两个约束旨在有效地从用户当前感兴趣的区域中选择一小部分具有代表性的对象,并且在有限的屏幕空间中,任何两个被选择的对象都不能太近,以至于用户无法区分。POISam的一个独特特性是,任何相似度量都可以插入到POISam中,以满足用户在不同场景中的特定需求。后两种一致性是用户在与地图交互时进行放大、缩小和平移操作时有效更新选择结果的基本挑战。POISam从之前的所有工作中放弃了一个共同的假设,即缩放级别和区域单元都是预定义并索引的,并且对象是从特定缩放级别的这些区域单元中选择的,而不是从用户当前感兴趣的区域中选择的(在大多数情况下,这与预定义的单元不对应)。它带来了额外的挑战,因为我们需要通过在线计算来进行对象选择。据我们所知,这是第一个能够满足所有四个特征来实现交互式可视化地图探索系统的系统。
{"title":"POIsam: a System for Efficient Selection of Large-scale Geospatial Data on Maps","authors":"Tao Guo, Mingzhao Li, Peishan Li, Z. Bao, G. Cong","doi":"10.1145/3183713.3193549","DOIUrl":"https://doi.org/10.1145/3183713.3193549","url":null,"abstract":"In this demonstration we present POIsam, a visualization system supporting the following desirable features: representativeness, visibility constraint, zooming consistency, and panning consistency. The first two constraints aim to efficiently select a small set of representative objects from the current region of user's interest, and any two selected objects should not be too close to each other for users to distinguish in the limited space of a screen. One unique feature of POISam is that any similarity metrics can be plugged into POISam to meet the user's specific needs in different scenarios. The latter two consistencies are fundamental challenges to efficiently update the selection result w.r.t. user's zoom in, zoom out and panning operations when they interact with the map. POISam drops a common assumption from all previous work, i.e. the zoom levels and region cells are pre-defined and indexed, and objects are selected from such region cells at a particular zoom level rather than from user's current region of interest (which in most cases do not correspond to the pre-defined cells). It results in extra challenge as we need to do object selection via online computation. To our best knowledge, this is the first system that is able to meet all the four features to achieve an interactive visualization map exploration system.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86689664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Proceedings of the 2018 International Conference on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1