首页 > 最新文献

Proceedings. ACM-SIGMOD International Conference on Management of Data最新文献

英文 中文
CHIC: a combination-based recommendation system CHIC:基于组合的推荐系统
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465270
Manasi Vartak, S. Madden
Current recommender systems are focused largely on recommending items based on similarity. For instance, Netflix can recommend movies similar to previously viewed movies, and Amazon can recommend items based on ratings of similar users. Although similarity-based recommendation works well for books and movies, it provides an incomplete solution for items such as clothing or furniture which are inherently used in combination with other items of the same type, e.g., shirt with pants, and desk with a chair. As a result, the decision to buy a clothing or furniture item depends not only on the item itself, but also on how well it works with other items of that type. Recommending such items therefore requires a combination-based recommendation system that given an item, can suggest interesting and diverse combinations containing that item. This problem is challenging because features affecting combination quality are often difficult to identify; quality, being a function of all items in the combination, cannot be computed independently; and there are an exponential number of combinations to explore. In this demonstration, we present CHIC, a first-of-its-kind, combination-based recommendation system for clothing. The audience will interact with our system through the CHIC mobile app which allows the user to take a picture of a clothing item and search for interesting combinations containing the item instantly. The audience can also compete with CHIC to create alternate ensembles and compare quality. Finally, we highlight via visualizations the core modules of CHIC including model building and our novel search and classification algorithm, C-Search.
目前的推荐系统主要是基于相似性来推荐项目。例如,Netflix可以推荐与之前看过的电影相似的电影,亚马逊可以根据类似用户的评分推荐商品。尽管基于相似性的推荐对书籍和电影很有效,但它对服装或家具等物品提供了一个不完整的解决方案,这些物品本质上是与其他相同类型的物品结合使用的,例如衬衫和裤子,桌子和椅子。因此,购买衣服或家具的决定不仅取决于物品本身,还取决于它与其他同类物品的搭配效果。因此,推荐这样的项目需要一个基于组合的推荐系统,给定一个项目,可以建议包含该项目的有趣和多样化的组合。这个问题具有挑战性,因为影响组合质量的特征通常难以识别;质量是所有项目组合的函数,不能单独计算;并且有指数级的组合需要探索。在这个演示中,我们展示了CHIC,这是首个基于组合的服装推荐系统。观众将通过CHIC移动应用程序与我们的系统进行互动,该应用程序允许用户拍摄一件衣服的照片,并立即搜索包含该物品的有趣组合。观众还可以与CHIC竞争,创造不同的组合,并比较质量。最后,我们通过可视化强调了CHIC的核心模块,包括模型构建和我们新颖的搜索和分类算法C-Search。
{"title":"CHIC: a combination-based recommendation system","authors":"Manasi Vartak, S. Madden","doi":"10.1145/2463676.2465270","DOIUrl":"https://doi.org/10.1145/2463676.2465270","url":null,"abstract":"Current recommender systems are focused largely on recommending items based on similarity. For instance, Netflix can recommend movies similar to previously viewed movies, and Amazon can recommend items based on ratings of similar users. Although similarity-based recommendation works well for books and movies, it provides an incomplete solution for items such as clothing or furniture which are inherently used in combination with other items of the same type, e.g., shirt with pants, and desk with a chair. As a result, the decision to buy a clothing or furniture item depends not only on the item itself, but also on how well it works with other items of that type. Recommending such items therefore requires a combination-based recommendation system that given an item, can suggest interesting and diverse combinations containing that item. This problem is challenging because features affecting combination quality are often difficult to identify; quality, being a function of all items in the combination, cannot be computed independently; and there are an exponential number of combinations to explore. In this demonstration, we present CHIC, a first-of-its-kind, combination-based recommendation system for clothing. The audience will interact with our system through the CHIC mobile app which allows the user to take a picture of a clothing item and search for interesting combinations containing the item instantly. The audience can also compete with CHIC to create alternate ensembles and compare quality. Finally, we highlight via visualizations the core modules of CHIC including model building and our novel search and classification algorithm, C-Search.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83079990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
COCCUS: self-configured cost-based query services in the cloud 球菌:在云端自配置的基于成本的查询服务
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465233
I. Konstantinou, Verena Kantere, Dimitrios Tsoumakos, N. Koziris
Recently, a large number of pay-as-you-go data services are offered over cloud infrastructures. Data service providers need appropriate and flexible query charging mechanisms and query optimization that take into consideration cloud operational expenses, pricing strategies and user preferences. Yet, existing solutions are static and non-configurable. We demonstrate COCCUS, a modular system for cost-aware query execution, adaptive query charge and optimization of cloud data services. The audience can set their queries along with their execution preferences and budget constraints, while COCCUS adaptively determines query charge and manages secondary data structures according to various economic policies. We demonstrate COCCUS's operation over centralized and shared nothing CloudDBMS architectures on top of public and private IaaS clouds. The audience is enabled to set economic policies and execute various workloads through a comprehensive GUI. COCCUS's adaptability is showcased using real-time graphs depicting a number of key performance metrics.
最近,通过云基础设施提供了大量的随用随付数据服务。数据服务提供商需要考虑云运营费用、定价策略和用户偏好的适当、灵活的查询收费机制和查询优化。然而,现有的解决方案是静态的、不可配置的。我们演示了COCCUS,一个模块化系统,用于成本感知查询执行,自适应查询收费和云数据服务优化。受众可以根据自己的执行偏好和预算约束来设置查询,而球菌可以根据各种经济政策自适应地确定查询费用和管理二级数据结构。我们演示了COCCUS在公共和私有IaaS云之上的集中式和无共享的CloudDBMS架构上的操作。用户可以通过一个全面的GUI设置经济策略和执行各种工作负载。球菌的适应性通过实时图表展示了一些关键的性能指标。
{"title":"COCCUS: self-configured cost-based query services in the cloud","authors":"I. Konstantinou, Verena Kantere, Dimitrios Tsoumakos, N. Koziris","doi":"10.1145/2463676.2465233","DOIUrl":"https://doi.org/10.1145/2463676.2465233","url":null,"abstract":"Recently, a large number of pay-as-you-go data services are offered over cloud infrastructures. Data service providers need appropriate and flexible query charging mechanisms and query optimization that take into consideration cloud operational expenses, pricing strategies and user preferences. Yet, existing solutions are static and non-configurable. We demonstrate COCCUS, a modular system for cost-aware query execution, adaptive query charge and optimization of cloud data services. The audience can set their queries along with their execution preferences and budget constraints, while COCCUS adaptively determines query charge and manages secondary data structures according to various economic policies. We demonstrate COCCUS's operation over centralized and shared nothing CloudDBMS architectures on top of public and private IaaS clouds. The audience is enabled to set economic policies and execute various workloads through a comprehensive GUI. COCCUS's adaptability is showcased using real-time graphs depicting a number of key performance metrics.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83291361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Automatic synthesis of out-of-core algorithms 外核算法的自动合成
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465334
Yannis Klonatos, Andres Nötzli, A. Spielmann, Christoph E. Koch, Viktor Kunčak
We present a system for the automatic synthesis of efficient algorithms specialized for a particular memory hierarchy and a set of storage devices. The developer provides two independent inputs: 1) an algorithm that ignores memory hierarchy and external storage aspects; and 2) a description of the target memory hierarchy, including its topology and parameters. Our system is able to automatically synthesize memory-hierarchy and storage-device-aware algorithms out of those specifications, for tasks such as joins and sorting. The framework is extensible and allows developers to quickly synthesize custom out-of-core algorithms as new storage technologies become available.
我们提出了一个系统,用于自动合成有效的算法专门为一个特定的存储器层次和一组存储设备。开发人员提供了两个独立的输入:1)忽略内存层次和外部存储方面的算法;2)描述目标存储器的层次结构,包括其拓扑结构和参数。我们的系统能够根据这些规范自动合成内存层次结构和存储设备感知算法,用于连接和排序等任务。该框架是可扩展的,允许开发人员在新的存储技术出现时快速合成自定义的核心外算法。
{"title":"Automatic synthesis of out-of-core algorithms","authors":"Yannis Klonatos, Andres Nötzli, A. Spielmann, Christoph E. Koch, Viktor Kunčak","doi":"10.1145/2463676.2465334","DOIUrl":"https://doi.org/10.1145/2463676.2465334","url":null,"abstract":"We present a system for the automatic synthesis of efficient algorithms specialized for a particular memory hierarchy and a set of storage devices. The developer provides two independent inputs: 1) an algorithm that ignores memory hierarchy and external storage aspects; and 2) a description of the target memory hierarchy, including its topology and parameters. Our system is able to automatically synthesize memory-hierarchy and storage-device-aware algorithms out of those specifications, for tasks such as joins and sorting. The framework is extensible and allows developers to quickly synthesize custom out-of-core algorithms as new storage technologies become available.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78830313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
FAST: differentially private real-time aggregate monitor with filtering and adaptive sampling FAST:具有过滤和自适应采样的差分私有实时聚合监视器
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465253
Liyue Fan, Li Xiong, V. Sunderam
Sharing aggregate statistics of private data can be of great value when data mining can be performed in real-time to understand important phenomena such as influenza outbreaks or traffic congestion. However, to this date there have been no tools for releasing real-time aggregated data with differential privacy, a strong and provable privacy guarantee. We propose FAST, a real-time system that allows differentially private aggregate sharing and time-series analytics. FAST employs a set of novel, adaptive strategies to improve the utility of shared/released data while guaranteeing the user-specified level of differential privacy. We will demonstrate the challenges and our solutions in the context of prepared data sets as well as live participation data dynamically collected among the SIGMOD'13 attendees.
当可以实时进行数据挖掘以了解流感爆发或交通拥堵等重要现象时,共享私有数据的汇总统计数据可能非常有价值。然而,到目前为止,还没有工具可以发布具有差异隐私的实时聚合数据,这是一种强大且可证明的隐私保证。我们提出FAST,一个实时系统,允许不同的私有聚合共享和时间序列分析。FAST采用了一套新颖的自适应策略来提高共享/发布数据的效用,同时保证用户指定的差异隐私级别。我们将在准备好的数据集以及SIGMOD'13与会者动态收集的实时参与数据的背景下展示挑战和我们的解决方案。
{"title":"FAST: differentially private real-time aggregate monitor with filtering and adaptive sampling","authors":"Liyue Fan, Li Xiong, V. Sunderam","doi":"10.1145/2463676.2465253","DOIUrl":"https://doi.org/10.1145/2463676.2465253","url":null,"abstract":"Sharing aggregate statistics of private data can be of great value when data mining can be performed in real-time to understand important phenomena such as influenza outbreaks or traffic congestion. However, to this date there have been no tools for releasing real-time aggregated data with differential privacy, a strong and provable privacy guarantee. We propose FAST, a real-time system that allows differentially private aggregate sharing and time-series analytics. FAST employs a set of novel, adaptive strategies to improve the utility of shared/released data while guaranteeing the user-specified level of differential privacy. We will demonstrate the challenges and our solutions in the context of prepared data sets as well as live participation data dynamically collected among the SIGMOD'13 attendees.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91261823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Stat!: an interactive analytics environment for big data 统计!:大数据交互式分析环境
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463683
Mike Barnett, B. Chandramouli, R. Deline, S. Drucker, Danyel Fisher, J. Goldstein, P. Morrison, John C. Platt
Exploratory analysis on big data requires us to rethink data management across the entire stack -- from the underlying data processing techniques to the user experience. We demonstrate Stat! -- a visualization and analytics environment that allows users to rapidly experiment with exploratory queries over big data. Data scientists can use Stat! to quickly refine to the correct query, while getting immediate feedback after processing a fraction of the data. Stat! can work with multiple processing engines in the backend; in this demo, we use Stat! with the Microsoft StreamInsight streaming engine. StreamInsight is used to generate incremental early results to queries and refine these results as more data is processed. Stat! allows data scientists to explore data, dynamically compose multiple queries to generate streams of partial results, and display partial results in both textual and visual form.
对大数据的探索性分析要求我们重新思考整个堆栈的数据管理——从底层数据处理技术到用户体验。我们演示一下,开始!—一个可视化和分析环境,允许用户快速试验对大数据的探索性查询。数据科学家可以使用Stat!快速细化到正确的查询,同时在处理一小部分数据后获得即时反馈。统计!可以在后端使用多个处理引擎;在这个演示中,我们使用Stat!微软StreamInsight流媒体引擎。StreamInsight用于为查询生成增量的早期结果,并在处理更多数据时改进这些结果。统计!允许数据科学家探索数据,动态组合多个查询以生成部分结果流,并以文本和视觉形式显示部分结果。
{"title":"Stat!: an interactive analytics environment for big data","authors":"Mike Barnett, B. Chandramouli, R. Deline, S. Drucker, Danyel Fisher, J. Goldstein, P. Morrison, John C. Platt","doi":"10.1145/2463676.2463683","DOIUrl":"https://doi.org/10.1145/2463676.2463683","url":null,"abstract":"Exploratory analysis on big data requires us to rethink data management across the entire stack -- from the underlying data processing techniques to the user experience. We demonstrate Stat! -- a visualization and analytics environment that allows users to rapidly experiment with exploratory queries over big data. Data scientists can use Stat! to quickly refine to the correct query, while getting immediate feedback after processing a fraction of the data. Stat! can work with multiple processing engines in the backend; in this demo, we use Stat! with the Microsoft StreamInsight streaming engine. StreamInsight is used to generate incremental early results to queries and refine these results as more data is processed. Stat! allows data scientists to explore data, dynamically compose multiple queries to generate streams of partial results, and display partial results in both textual and visual form.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89961005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Building, maintaining, and using knowledge bases: a report from the trenches 构建、维护和使用知识库:来自战壕的报告
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465297
Omkar Deshpande, Digvijay S. Lamba, Michel Tourn, Sanjib Das, S. Subramaniam, A. Rajaraman, Venky Harinarayan, A. Doan
A knowledge base (KB) contains a set of concepts, instances, and relationships. Over the past decade, numerous KBs have been built, and used to power a growing array of applications. Despite this flurry of activities, however, surprisingly little has been published about the end-to-end process of building, maintaining, and using such KBs in industry. In this paper we describe such a process. In particular, we describe how we build, update, and curate a large KB at Kosmix, a Bay Area startup, and later at WalmartLabs, a development and research lab of Walmart. We discuss how we use this KB to power a range of applications, including query understanding, Deep Web search, in-context advertising, event monitoring in social media, product search, social gifting, and social mining. Finally, we discuss how the KB team is organized, and the lessons learned. Our goal with this paper is to provide a real-world case study, and to contribute to the emerging direction of building, maintaining, and using knowledge bases for data management applications.
知识库(KB)包含一组概念、实例和关系。在过去的十年中,已经建立了大量的KBs,并用于支持越来越多的应用程序。然而,尽管有这些活动,令人惊讶的是,关于在工业中构建、维护和使用此类KBs的端到端过程的文章却很少。本文描述了这一过程。特别是,我们描述了我们如何在海湾地区的初创公司Kosmix以及后来在沃尔玛的开发和研究实验室WalmartLabs建立、更新和管理大型知识库。我们将讨论如何使用这个知识库来支持一系列应用,包括查询理解、深度网络搜索、上下文广告、社交媒体事件监控、产品搜索、社交礼品和社交挖掘。最后,我们将讨论如何组织KB团队,以及从中吸取的经验教训。本文的目标是提供一个真实的案例研究,并为数据管理应用程序的构建、维护和使用知识库的新兴方向做出贡献。
{"title":"Building, maintaining, and using knowledge bases: a report from the trenches","authors":"Omkar Deshpande, Digvijay S. Lamba, Michel Tourn, Sanjib Das, S. Subramaniam, A. Rajaraman, Venky Harinarayan, A. Doan","doi":"10.1145/2463676.2465297","DOIUrl":"https://doi.org/10.1145/2463676.2465297","url":null,"abstract":"A knowledge base (KB) contains a set of concepts, instances, and relationships. Over the past decade, numerous KBs have been built, and used to power a growing array of applications. Despite this flurry of activities, however, surprisingly little has been published about the end-to-end process of building, maintaining, and using such KBs in industry. In this paper we describe such a process. In particular, we describe how we build, update, and curate a large KB at Kosmix, a Bay Area startup, and later at WalmartLabs, a development and research lab of Walmart. We discuss how we use this KB to power a range of applications, including query understanding, Deep Web search, in-context advertising, event monitoring in social media, product search, social gifting, and social mining. Finally, we discuss how the KB team is organized, and the lessons learned. Our goal with this paper is to provide a real-world case study, and to contribute to the emerging direction of building, maintaining, and using knowledge bases for data management applications.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83507985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 95
Performance and resource modeling in highly-concurrent OLTP workloads 高并发OLTP工作负载中的性能和资源建模
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2467800
Barzan Mozafari, C. Curino, Alekh Jindal, S. Madden
Database administrators of Online Transaction Processing (OLTP) systems constantly face difficult questions. For example, "What is the maximum throughput I can sustain with my current hardware?", "How much disk I/O will my system perform if the requests per second double?", or "What will happen if the ratio of transactions in my system changes?". Resource prediction and performance analysis are both vital and difficult in this setting. Here the challenge is due to high degrees of concurrency, competition for resources, and complex interactions between transactions, all of which non-linearly impact performance. Although difficult, such analysis is a key component in enabling database administrators to understand which queries are eating up the resources, and how their system would scale under load. In this paper, we introduce our framework, called DBSeer, that addresses this problem by employing statistical models that provide resource and performance analysis and prediction for highly concurrent OLTP workloads. Our models are built on a small amount of training data from standard log information collected during normal system operation. These models are capable of accurately measuring several performance metrics, including resource consumption on a per-transaction-type basis, resource bottlenecks, and throughput at different load levels. We have validated these models on MySQL/Linux with numerous experiments on standard benchmarks (TPC-C) and real workloads (Wikipedia), observing high accuracy (within a few percent error) when predicting all of the above metrics.
联机事务处理(OLTP)系统的数据库管理员经常面临一些难题。例如,“我当前的硬件能够承受的最大吞吐量是多少?”、“如果每秒请求数翻倍,系统将执行多少磁盘I/O ?”或者“如果系统中的事务比率发生变化,将会发生什么?”在这种情况下,资源预测和性能分析既重要又困难。这里的挑战是由于高度的并发性、资源竞争和事务之间复杂的交互,所有这些都会非线性地影响性能。尽管很困难,但这种分析是使数据库管理员能够了解哪些查询正在消耗资源以及系统在负载下如何扩展的关键组件。在本文中,我们介绍了一个名为DBSeer的框架,该框架通过使用统计模型来解决这个问题,该模型为高度并发的OLTP工作负载提供资源和性能分析和预测。我们的模型是建立在少量的训练数据上的,这些数据来自于正常系统运行期间收集的标准日志信息。这些模型能够准确地度量几个性能指标,包括基于每个事务类型的资源消耗、资源瓶颈和不同负载级别下的吞吐量。我们在MySQL/Linux上对这些模型进行了验证,并在标准基准测试(TPC-C)和实际工作负载(Wikipedia)上进行了大量实验,在预测所有上述指标时观察到很高的准确性(误差在几个百分点以内)。
{"title":"Performance and resource modeling in highly-concurrent OLTP workloads","authors":"Barzan Mozafari, C. Curino, Alekh Jindal, S. Madden","doi":"10.1145/2463676.2467800","DOIUrl":"https://doi.org/10.1145/2463676.2467800","url":null,"abstract":"Database administrators of Online Transaction Processing (OLTP) systems constantly face difficult questions. For example, \"What is the maximum throughput I can sustain with my current hardware?\", \"How much disk I/O will my system perform if the requests per second double?\", or \"What will happen if the ratio of transactions in my system changes?\". Resource prediction and performance analysis are both vital and difficult in this setting. Here the challenge is due to high degrees of concurrency, competition for resources, and complex interactions between transactions, all of which non-linearly impact performance.\u0000 Although difficult, such analysis is a key component in enabling database administrators to understand which queries are eating up the resources, and how their system would scale under load. In this paper, we introduce our framework, called DBSeer, that addresses this problem by employing statistical models that provide resource and performance analysis and prediction for highly concurrent OLTP workloads. Our models are built on a small amount of training data from standard log information collected during normal system operation. These models are capable of accurately measuring several performance metrics, including resource consumption on a per-transaction-type basis, resource bottlenecks, and throughput at different load levels. We have validated these models on MySQL/Linux with numerous experiments on standard benchmarks (TPC-C) and real workloads (Wikipedia), observing high accuracy (within a few percent error) when predicting all of the above metrics.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75922561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 105
Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms 数据流中的连续异常值检测:可扩展框架和最先进的算法
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463691
D. Georgiadis, Maria Kontaki, A. Gounaris, A. Papadopoulos, K. Tsichlas, Y. Manolopoulos
Anomaly detection is an important data mining task, aiming at the discovery of elements that show significant diversion from the expected behavior; such elements are termed as outliers. One of the most widely employed criteria for determining whether an element is an outlier is based on the number of neighboring elements within a fixed distance (R), against a fixed threshold (k). Such outliers are referred to as distance-based outliers and are the focus of this work. In this demo, we show both an extendible framework for outlier detection algorithms and specific outlier detection algorithms for the demanding case where outlier detection is continuously performed over a data stream. More specifically: i) first we demonstrate a novel flavor of an open-source publicly available tool for Massive Online Analysis (MOA) that is endowed with capabilities to encapsulate algorithms that continuously detect outliers and ii) second, we present four online outlier detection algorithms. Two of these algorithms have been designed by the authors of this demo, with a view to improving on key aspects related to outlier mining, such as running time, flexibility and space requirements.
异常检测是一项重要的数据挖掘任务,旨在发现与预期行为显著偏离的元素;这样的元素被称为异常值。确定元素是否为异常值的最广泛使用的标准之一是基于固定距离(R)内邻近元素的数量,相对于固定阈值(k)。这种异常值被称为基于距离的异常值,是本工作的重点。在这个演示中,我们展示了一个可扩展的离群检测算法框架和特定的离群检测算法,用于在数据流上连续执行离群检测的苛刻情况。更具体地说:i)首先,我们展示了一种新的开源公开可用工具,用于大规模在线分析(MOA),该工具具有封装连续检测异常值的算法的能力;ii)第二,我们提出了四种在线异常值检测算法。本演示的作者设计了其中两种算法,以改进与离群值挖掘相关的关键方面,例如运行时间、灵活性和空间要求。
{"title":"Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms","authors":"D. Georgiadis, Maria Kontaki, A. Gounaris, A. Papadopoulos, K. Tsichlas, Y. Manolopoulos","doi":"10.1145/2463676.2463691","DOIUrl":"https://doi.org/10.1145/2463676.2463691","url":null,"abstract":"Anomaly detection is an important data mining task, aiming at the discovery of elements that show significant diversion from the expected behavior; such elements are termed as outliers. One of the most widely employed criteria for determining whether an element is an outlier is based on the number of neighboring elements within a fixed distance (R), against a fixed threshold (k). Such outliers are referred to as distance-based outliers and are the focus of this work. In this demo, we show both an extendible framework for outlier detection algorithms and specific outlier detection algorithms for the demanding case where outlier detection is continuously performed over a data stream. More specifically: i) first we demonstrate a novel flavor of an open-source publicly available tool for Massive Online Analysis (MOA) that is endowed with capabilities to encapsulate algorithms that continuously detect outliers and ii) second, we present four online outlier detection algorithms. Two of these algorithms have been designed by the authors of this demo, with a view to improving on key aspects related to outlier mining, such as running time, flexibility and space requirements.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87552472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Lightweight authentication of linear algebraic queries on data streams 数据流上线性代数查询的轻量级认证
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465281
Stavros Papadopoulos, Graham Cormode, Antonios Deligiannakis, M. Garofalakis
We consider a stream outsourcing setting, where a data owner delegates the management of a set of disjoint data streams to an untrusted server. The owner authenticates his streams via signatures. The server processes continuous queries on the union of the streams for clients trusted by the owner. Along with the results, the server sends proofs of result correctness derived from the owner's signatures, which are easily verifiable by the clients. We design novel constructions for a collection of fundamental problems over streams represented as linear algebraic queries. In particular, our basic schemes authenticate dynamic vector sums and dot products, as well as dynamic matrix products. These techniques can be adapted for authenticating a wide range of important operations in streaming environments, including group by queries, joins, in-network aggregation, similarity matching, and event processing. All our schemes are very lightweight, and offer strong cryptographic guarantees derived from formal definitions and proofs. We experimentally confirm the practicality of our schemes.
我们考虑一个流外包设置,其中数据所有者将一组不相连的数据流的管理委托给一个不受信任的服务器。所有者通过签名验证他的流。服务器为所有者信任的客户端处理流联合上的连续查询。与结果一起,服务器发送来自所有者签名的结果正确性证明,这很容易被客户端验证。我们设计了一种新颖的结构,用于表示为线性代数查询的流上的一系列基本问题。特别是,我们的基本方案验证了动态向量和和和点积,以及动态矩阵积。这些技术可用于对流环境中的各种重要操作进行身份验证,包括按组查询、连接、网络内聚合、相似性匹配和事件处理。我们所有的方案都是非常轻量级的,并提供来自正式定义和证明的强大加密保证。我们通过实验证实了我们方案的实用性。
{"title":"Lightweight authentication of linear algebraic queries on data streams","authors":"Stavros Papadopoulos, Graham Cormode, Antonios Deligiannakis, M. Garofalakis","doi":"10.1145/2463676.2465281","DOIUrl":"https://doi.org/10.1145/2463676.2465281","url":null,"abstract":"We consider a stream outsourcing setting, where a data owner delegates the management of a set of disjoint data streams to an untrusted server. The owner authenticates his streams via signatures. The server processes continuous queries on the union of the streams for clients trusted by the owner. Along with the results, the server sends proofs of result correctness derived from the owner's signatures, which are easily verifiable by the clients. We design novel constructions for a collection of fundamental problems over streams represented as linear algebraic queries. In particular, our basic schemes authenticate dynamic vector sums and dot products, as well as dynamic matrix products. These techniques can be adapted for authenticating a wide range of important operations in streaming environments, including group by queries, joins, in-network aggregation, similarity matching, and event processing. All our schemes are very lightweight, and offer strong cryptographic guarantees derived from formal definitions and proofs. We experimentally confirm the practicality of our schemes.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90917470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Secure database-as-a-service with Cipherbase 使用Cipherbase安全数据库即服务
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2467797
A. Arasu, Spyros Blanas, Ken Eguro, Manas R. Joglekar, R. Kaushik, Donald Kossmann, Ravishankar Ramamurthy, P. Upadhyaya, R. Venkatesan
Data confidentiality is one of the main concerns for users of public cloud services. The key problem is protecting sensitive data from being accessed by cloud administrators who have root privileges and can remotely inspect the memory and disk contents of the cloud servers. While encryption is the basic mechanism that can leveraged to provide data confidentiality, providing an efficient database-as-a-service that can run on encrypted data raises several interesting challenges. In this demonstration we outline the functionality of Cipherbase --- a full fledged SQL database system that supports the full generality of a database system while providing high data confidentiality. Cipherbase has a novel architecture that tightly integrates custom-designed trusted hardware for performing operations on encrypted data securely such that an administrator cannot get access to any plaintext corresponding to sensitive data.
数据保密性是公共云服务用户关注的主要问题之一。关键问题是保护敏感数据不被具有根权限的云管理员访问,这些管理员可以远程检查云服务器的内存和磁盘内容。虽然加密是可以用来提供数据机密性的基本机制,但是提供可以在加密数据上运行的高效数据库即服务会带来一些有趣的挑战。在这个演示中,我们概述了Cipherbase的功能——一个成熟的SQL数据库系统,它支持数据库系统的全部通用性,同时提供高数据机密性。Cipherbase具有新颖的体系结构,它紧密集成了定制设计的可信硬件,可以安全地对加密数据执行操作,这样管理员就无法访问与敏感数据对应的任何明文。
{"title":"Secure database-as-a-service with Cipherbase","authors":"A. Arasu, Spyros Blanas, Ken Eguro, Manas R. Joglekar, R. Kaushik, Donald Kossmann, Ravishankar Ramamurthy, P. Upadhyaya, R. Venkatesan","doi":"10.1145/2463676.2467797","DOIUrl":"https://doi.org/10.1145/2463676.2467797","url":null,"abstract":"Data confidentiality is one of the main concerns for users of public cloud services. The key problem is protecting sensitive data from being accessed by cloud administrators who have root privileges and can remotely inspect the memory and disk contents of the cloud servers. While encryption is the basic mechanism that can leveraged to provide data confidentiality, providing an efficient database-as-a-service that can run on encrypted data raises several interesting challenges. In this demonstration we outline the functionality of Cipherbase --- a full fledged SQL database system that supports the full generality of a database system while providing high data confidentiality. Cipherbase has a novel architecture that tightly integrates custom-designed trusted hardware for performing operations on encrypted data securely such that an administrator cannot get access to any plaintext corresponding to sensitive data.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91153126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
期刊
Proceedings. ACM-SIGMOD International Conference on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1