首页 > 最新文献

Proceedings. ACM-SIGMOD International Conference on Management of Data最新文献

英文 中文
RTP: robust tenant placement for elastic in-memory database clusters RTP:用于弹性内存中数据库集群的健壮的租户安置
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465302
J. Schaffner, Tim Januschowski, Mary H. Kercher, Tim Kraska, H. Plattner, M. Franklin, D. Jacobs
In the cloud services industry, a key issue for cloud operators is to minimize operational costs. In this paper, we consider algorithms that elastically contract and expand a cluster of in-memory databases depending on tenants' behavior over time while maintaining response time guarantees. We evaluate our tenant placement algorithms using traces obtained from one of SAP's production on-demand applications. Our experiments reveal that our approach lowers operating costs for the database cluster of this application by a factor of 2.2 to 10, measured in Amazon EC2 hourly rates, in comparison to the state of the art. In addition, we carefully study the trade-off between cost savings obtained by continuously migrating tenants and the robustness of servers towards load spikes and failures.
在云服务行业,云运营商面临的一个关键问题是最小化运营成本。在本文中,我们考虑根据租户的行为弹性收缩和扩展内存数据库集群的算法,同时保持响应时间保证。我们使用从SAP的按需生产应用程序之一获得的跟踪来评估我们的租户安置算法。我们的实验表明,与现有技术相比,我们的方法将该应用程序的数据库集群的操作成本降低了2.2到10倍(以Amazon EC2小时费率衡量)。此外,我们还仔细研究了通过持续迁移租户获得的成本节约与服务器对负载峰值和故障的健壮性之间的权衡。
{"title":"RTP: robust tenant placement for elastic in-memory database clusters","authors":"J. Schaffner, Tim Januschowski, Mary H. Kercher, Tim Kraska, H. Plattner, M. Franklin, D. Jacobs","doi":"10.1145/2463676.2465302","DOIUrl":"https://doi.org/10.1145/2463676.2465302","url":null,"abstract":"In the cloud services industry, a key issue for cloud operators is to minimize operational costs. In this paper, we consider algorithms that elastically contract and expand a cluster of in-memory databases depending on tenants' behavior over time while maintaining response time guarantees.\u0000 We evaluate our tenant placement algorithms using traces obtained from one of SAP's production on-demand applications. Our experiments reveal that our approach lowers operating costs for the database cluster of this application by a factor of 2.2 to 10, measured in Amazon EC2 hourly rates, in comparison to the state of the art. In addition, we carefully study the trade-off between cost savings obtained by continuously migrating tenants and the robustness of servers towards load spikes and failures.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"44 1","pages":"773-784"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91227980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes 不要害怕:使用具有最大可能性和有限更改的可伸缩自动修复
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463706
M. Yakout, Laure Berti-Équille, A. Elmagarmid
Various computational procedures or constraint-based methods for data repairing have been proposed over the last decades to identify errors and, when possible, correct them. However, these approaches have several limitations including the scalability and quality of the values to be used in replacement of the errors. In this paper, we propose a new data repairing approach that is based on maximizing the likelihood of replacement data given the data distribution, which can be modeled using statistical machine learning techniques. This is a novel approach combining machine learning and likelihood methods for cleaning dirty databases by value modification. We develop a quality measure of the repairing updates based on the likelihood benefit and the amount of changes applied to the database. We propose SCARE (SCalable Automatic REpairing), a systematic scalable framework that follows our approach. SCARE relies on a robust mechanism for horizontal data partitioning and a combination of machine learning techniques to predict the set of possible updates. Due to data partitioning, several updates can be predicted for a single record based on local views on each data partition. Therefore, we propose a mechanism to combine the local predictions and obtain accurate final predictions. Finally, we experimentally demonstrate the effectiveness, efficiency, and scalability of our approach on real-world datasets in comparison to recent data cleaning approaches.
在过去的几十年里,已经提出了各种计算程序或基于约束的数据修复方法来识别错误,并在可能的情况下纠正它们。然而,这些方法有一些限制,包括可伸缩性和用于替换错误的值的质量。在本文中,我们提出了一种新的数据修复方法,该方法基于给定数据分布的替换数据的可能性最大化,可以使用统计机器学习技术进行建模。这是一种结合机器学习和似然方法的新方法,通过值修改来清理脏数据库。我们根据可能的收益和应用于数据库的更改量开发修复更新的质量度量。我们提出了SCARE(可伸缩自动修复),这是一个遵循我们方法的系统可伸缩框架。SCARE依靠强大的水平数据分区机制和机器学习技术的组合来预测可能的更新集。由于存在数据分区,可以基于每个数据分区上的本地视图预测单个记录的多个更新。因此,我们提出了一种结合局部预测并获得准确的最终预测的机制。最后,我们通过实验证明了与最近的数据清理方法相比,我们的方法在真实数据集上的有效性、效率和可扩展性。
{"title":"Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes","authors":"M. Yakout, Laure Berti-Équille, A. Elmagarmid","doi":"10.1145/2463676.2463706","DOIUrl":"https://doi.org/10.1145/2463676.2463706","url":null,"abstract":"Various computational procedures or constraint-based methods for data repairing have been proposed over the last decades to identify errors and, when possible, correct them. However, these approaches have several limitations including the scalability and quality of the values to be used in replacement of the errors. In this paper, we propose a new data repairing approach that is based on maximizing the likelihood of replacement data given the data distribution, which can be modeled using statistical machine learning techniques. This is a novel approach combining machine learning and likelihood methods for cleaning dirty databases by value modification. We develop a quality measure of the repairing updates based on the likelihood benefit and the amount of changes applied to the database. We propose SCARE (SCalable Automatic REpairing), a systematic scalable framework that follows our approach. SCARE relies on a robust mechanism for horizontal data partitioning and a combination of machine learning techniques to predict the set of possible updates. Due to data partitioning, several updates can be predicted for a single record based on local views on each data partition. Therefore, we propose a mechanism to combine the local predictions and obtain accurate final predictions. Finally, we experimentally demonstrate the effectiveness, efficiency, and scalability of our approach on real-world datasets in comparison to recent data cleaning approaches.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"20 1","pages":"553-564"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73100375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 144
On optimal worst-case matching 关于最优最坏匹配
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465321
Cheng Long, R. C. Wong, Philip S. Yu, Minhao Jiang
Bichromatic reverse nearest neighbor (BRNN) queries have been studied extensively in the literature of spatial databases. Given a set P of service-providers and a set O of customers, a BRNN query is to find which customers in O are "interested" in a given service-provider in P. Recently, it has been found that this kind of queries lacks the consideration of the capacities of service-providers and the demands of customers. In order to address this issue, some spatial matching problems have been proposed, which, however, cannot be used for some real-life applications like emergency facility allocation where the maximum matching cost (or distance) should be minimized. In this paper, we propose a new problem called Spatial Matching for Minimizing Maximum matching distance (SPM-MM). Then, we design two algorithms for SPM-MM, Threshold-Adapt and Swap-Chain. Threshold-Adapt is simple and easy to understand but not scalable to large datasets due to its relatively high time/space complexity. Swap-Chain, which follows a fundamentally different idea from Threshold-Adapt, runs faster than Threshold-Adapt by orders of magnitude and uses significantly less memory. We conducted extensive empirical studies which verified the efficiency and scalability of Swap-Chain.
双色逆最近邻查询在空间数据库中得到了广泛的研究。给定一组P个服务提供商和一组O个客户,BRNN查询的目的是找出O中哪些客户对P中给定的服务提供商“感兴趣”。最近,人们发现这种查询缺乏对服务提供商能力和客户需求的考虑。为了解决这一问题,人们提出了一些空间匹配问题,但这些问题不能用于实际应用,如应急设施分配,在实际应用中需要最小化最大匹配成本(或距离)。在本文中,我们提出了一个新的问题,称为空间匹配最小化最大匹配距离(SPM-MM)。然后,我们设计了阈值自适应和交换链两种SPM-MM算法。Threshold-Adapt简单易懂,但由于其相对较高的时间/空间复杂性,无法扩展到大型数据集。Swap-Chain遵循与Threshold-Adapt完全不同的思想,其运行速度比Threshold-Adapt快几个数量级,并且使用的内存也少得多。我们进行了广泛的实证研究,验证了Swap-Chain的效率和可扩展性。
{"title":"On optimal worst-case matching","authors":"Cheng Long, R. C. Wong, Philip S. Yu, Minhao Jiang","doi":"10.1145/2463676.2465321","DOIUrl":"https://doi.org/10.1145/2463676.2465321","url":null,"abstract":"Bichromatic reverse nearest neighbor (BRNN) queries have been studied extensively in the literature of spatial databases. Given a set P of service-providers and a set O of customers, a BRNN query is to find which customers in O are \"interested\" in a given service-provider in P. Recently, it has been found that this kind of queries lacks the consideration of the capacities of service-providers and the demands of customers. In order to address this issue, some spatial matching problems have been proposed, which, however, cannot be used for some real-life applications like emergency facility allocation where the maximum matching cost (or distance) should be minimized. In this paper, we propose a new problem called Spatial Matching for Minimizing Maximum matching distance (SPM-MM). Then, we design two algorithms for SPM-MM, Threshold-Adapt and Swap-Chain. Threshold-Adapt is simple and easy to understand but not scalable to large datasets due to its relatively high time/space complexity. Swap-Chain, which follows a fundamentally different idea from Threshold-Adapt, runs faster than Threshold-Adapt by orders of magnitude and uses significantly less memory. We conducted extensive empirical studies which verified the efficiency and scalability of Swap-Chain.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"22 1","pages":"845-856"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73900038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Enhancements to SQL server column stores 对SQL server列存储的增强
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463708
P. Larson, C. Clinciu, Campbell Fraser, E. Hanson, Mostafa Mokhtar, Michal Nowakiewicz, Vassilis Papadimos, Susan Price, Srikumar Rangarajan, Remus Rusanu, Mayukh Saubhasik
SQL Server 2012 introduced two innovations targeted for data warehousing workloads: column store indexes and batch (vectorized) processing mode. Together they greatly improve performance of typical data warehouse queries, routinely by 10X and in some cases by a 100X or more. The main limitations of the initial version are addressed in the upcoming release. Column store indexes are updatable and can be used as the base storage for a table. The repertoire of batch mode operators has been expanded, existing operators have been improved, and query optimization has been enhanced. This paper gives an overview of SQL Server's column stores and batch processing, in particular the enhancements introduced in the upcoming release.
SQL Server 2012引入了两项针对数据仓库工作负载的创新:列存储索引和批处理(矢量化)处理模式。它们一起极大地提高了典型数据仓库查询的性能,通常提高10倍,在某些情况下提高100倍甚至更多。在即将发布的版本中解决了初始版本的主要限制。列存储索引是可更新的,可以用作表的基础存储。批处理模式操作符的列表得到了扩展,现有操作符得到了改进,查询优化得到了增强。本文概述了SQL Server的列存储和批处理,特别是即将发布的版本中引入的增强功能。
{"title":"Enhancements to SQL server column stores","authors":"P. Larson, C. Clinciu, Campbell Fraser, E. Hanson, Mostafa Mokhtar, Michal Nowakiewicz, Vassilis Papadimos, Susan Price, Srikumar Rangarajan, Remus Rusanu, Mayukh Saubhasik","doi":"10.1145/2463676.2463708","DOIUrl":"https://doi.org/10.1145/2463676.2463708","url":null,"abstract":"SQL Server 2012 introduced two innovations targeted for data warehousing workloads: column store indexes and batch (vectorized) processing mode. Together they greatly improve performance of typical data warehouse queries, routinely by 10X and in some cases by a 100X or more. The main limitations of the initial version are addressed in the upcoming release. Column store indexes are updatable and can be used as the base storage for a table. The repertoire of batch mode operators has been expanded, existing operators have been improved, and query optimization has been enhanced. This paper gives an overview of SQL Server's column stores and batch processing, in particular the enhancements introduced in the upcoming release.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"78 1","pages":"1159-1168"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73004273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Photon: fault-tolerant and scalable joining of continuous data streams Photon:连续数据流的容错和可扩展连接
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465272
R. Ananthanarayanan, Venkatesh Basker, Sumit Das, A. Gupta, H. Jiang, Tianhao Qiu, Alexey Reznichenko, D.Yu. Ryabkov, Manpreet Singh, S. Venkataraman
Web-based enterprises process events generated by millions of users interacting with their websites. Rich statistical data distilled from combining such interactions in near real-time generates enormous business value. In this paper, we describe the architecture of Photon, a geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency, where the streams may be unordered or delayed. The system fully tolerates infrastructure degradation and datacenter-level outages without any manual intervention. Photon guarantees that there will be no duplicates in the joined output (at-most-once semantics) at any point in time, that most joinable events will be present in the output in real-time (near-exact semantics), and exactly-once semantics eventually. Photon is deployed within Google Advertising System to join data streams such as web search queries and user clicks on advertisements. It produces joined logs that are used to derive key business metrics, including billing for advertisers. Our production deployment processes millions of events per minute at peak with an average end-to-end latency of less than 10 seconds. We also present challenges and solutions in maintaining large persistent state across geographically distant locations, and highlight the design principles that emerged from our experience.
基于web的企业处理由数百万用户与其网站交互产生的事件。从近乎实时地组合这些交互中提取的丰富统计数据产生了巨大的业务价值。在本文中,我们描述了Photon的架构,Photon是一个地理分布式系统,用于实时连接多个连续流动的数据流,具有高可扩展性和低延迟,其中流可能是无序或延迟的。该系统完全容忍基础设施退化和数据中心级别的中断,无需任何人工干预。Photon保证在任何时间点都不会有重复的连接输出(最多一次语义),大多数可连接事件将实时出现在输出中(近精确语义),最终精确一次语义。Photon部署在谷歌广告系统中,以连接网络搜索查询和用户点击广告等数据流。它生成用于派生关键业务指标的连接日志,包括广告商的计费。我们的生产部署在峰值时每分钟处理数百万个事件,平均端到端延迟不到10秒。我们还提出了在地理位置遥远的地方维护大型持久状态的挑战和解决方案,并强调了从我们的经验中产生的设计原则。
{"title":"Photon: fault-tolerant and scalable joining of continuous data streams","authors":"R. Ananthanarayanan, Venkatesh Basker, Sumit Das, A. Gupta, H. Jiang, Tianhao Qiu, Alexey Reznichenko, D.Yu. Ryabkov, Manpreet Singh, S. Venkataraman","doi":"10.1145/2463676.2465272","DOIUrl":"https://doi.org/10.1145/2463676.2465272","url":null,"abstract":"Web-based enterprises process events generated by millions of users interacting with their websites. Rich statistical data distilled from combining such interactions in near real-time generates enormous business value. In this paper, we describe the architecture of Photon, a geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency, where the streams may be unordered or delayed. The system fully tolerates infrastructure degradation and datacenter-level outages without any manual intervention. Photon guarantees that there will be no duplicates in the joined output (at-most-once semantics) at any point in time, that most joinable events will be present in the output in real-time (near-exact semantics), and exactly-once semantics eventually.\u0000 Photon is deployed within Google Advertising System to join data streams such as web search queries and user clicks on advertisements. It produces joined logs that are used to derive key business metrics, including billing for advertisers. Our production deployment processes millions of events per minute at peak with an average end-to-end latency of less than 10 seconds. We also present challenges and solutions in maintaining large persistent state across geographically distant locations, and highlight the design principles that emerged from our experience.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"132 1","pages":"577-588"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76421080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 146
CS2: a new database synopsis for query estimation CS2:用于查询估计的新数据库概要
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463701
Feng Yu, W. Hou, Cheng Luo, D. Che, Mengxia Zhu
Fast and accurate estimations for complex queries are profoundly beneficial for large databases with heavy workloads. In this research, we propose a statistical summary for a database, called CS2 (Correlated Sample Synopsis), to provide rapid and accurate result size estimations for all queries with joins and arbitrary selections. Unlike the state-of-the-art techniques, CS2 does not completely rely on simple random samples, but mainly consists of correlated sample tuples that retain join relationships with less storage. We introduce a statistical technique, called reverse sample, and design a powerful estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. We prove both theoretically and empirically that the reverse estimator is unbiased and accurate using CS2. Extensive experiments on multiple datasets show that CS2 is fast to construct and derives more accurate estimations than existing methods with the same space budget.
对复杂查询进行快速而准确的估计对于具有繁重工作负载的大型数据库非常有益。在这项研究中,我们提出了一个数据库的统计摘要,称为CS2(相关样本概要),为所有具有连接和任意选择的查询提供快速和准确的结果大小估计。与最先进的技术不同,CS2并不完全依赖于简单的随机样本,而是主要由相关的样本元组组成,这些元组用较少的存储空间保留了连接关系。我们引入了一种称为反向样本的统计技术,并设计了一个强大的估计器,称为反向估计器,以充分利用相关样本元组进行查询估计。利用CS2从理论上和经验上证明了反向估计量的无偏性和准确性。在多个数据集上的大量实验表明,在相同空间预算下,CS2比现有方法构建速度快,得到的估计精度更高。
{"title":"CS2: a new database synopsis for query estimation","authors":"Feng Yu, W. Hou, Cheng Luo, D. Che, Mengxia Zhu","doi":"10.1145/2463676.2463701","DOIUrl":"https://doi.org/10.1145/2463676.2463701","url":null,"abstract":"Fast and accurate estimations for complex queries are profoundly beneficial for large databases with heavy workloads. In this research, we propose a statistical summary for a database, called CS2 (Correlated Sample Synopsis), to provide rapid and accurate result size estimations for all queries with joins and arbitrary selections. Unlike the state-of-the-art techniques, CS2 does not completely rely on simple random samples, but mainly consists of correlated sample tuples that retain join relationships with less storage. We introduce a statistical technique, called reverse sample, and design a powerful estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. We prove both theoretically and empirically that the reverse estimator is unbiased and accurate using CS2. Extensive experiments on multiple datasets show that CS2 is fast to construct and derives more accurate estimations than existing methods with the same space budget.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"35 1","pages":"469-480"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76733782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Big data in capital markets 资本市场的大数据
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2486082
A. Nazaruk, M. Rauchman
Over the past decade global securities markets have dramatically changed. Evolution of market structure in combination with advances in computer technologies led to emergence of electronic securities trading. Securities transactions that used to be conducted in person and over the phone are now predominantly executed by automated trading systems. This resulted in significant fragmentation of the markets, vast increase in the exchange volumes and even greater increase in the number of orders. In this talk we present and analyze forces behind the wide proliferation of electronic securities trading in US stocks and options markets. We also make a high-level introduction into electronic securities market structure. We discuss trading objectives of different classes of market participants and analyze how their activity affects data volumes. We also present typical securities trading firm data flow and analyze various types of data it uses in its trading operations. We close with the implications this "sea change" has on DBMS requirements in capital markets.
过去十年,全球证券市场发生了巨大变化。市场结构的演变和计算机技术的进步导致了电子证券交易的出现。过去亲自或通过电话进行的证券交易现在主要由自动交易系统执行。这导致了市场的严重分裂,交易量大幅增加,订单数量甚至更多。在这次演讲中,我们将介绍并分析美国股票和期权市场中电子证券交易广泛扩散背后的力量。本文还对电子证券市场的结构作了高层次的介绍。我们讨论了不同类别的市场参与者的交易目标,并分析了他们的活动如何影响数据量。本文还介绍了典型的证券交易公司的数据流程,并分析了其在交易操作中使用的各种类型的数据。最后,我们讨论了这种“翻天覆地的变化”对资本市场DBMS需求的影响。
{"title":"Big data in capital markets","authors":"A. Nazaruk, M. Rauchman","doi":"10.1145/2463676.2486082","DOIUrl":"https://doi.org/10.1145/2463676.2486082","url":null,"abstract":"Over the past decade global securities markets have dramatically changed. Evolution of market structure in combination with advances in computer technologies led to emergence of electronic securities trading. Securities transactions that used to be conducted in person and over the phone are now predominantly executed by automated trading systems. This resulted in significant fragmentation of the markets, vast increase in the exchange volumes and even greater increase in the number of orders.\u0000 In this talk we present and analyze forces behind the wide proliferation of electronic securities trading in US stocks and options markets. We also make a high-level introduction into electronic securities market structure. We discuss trading objectives of different classes of market participants and analyze how their activity affects data volumes. We also present typical securities trading firm data flow and analyze various types of data it uses in its trading operations.\u0000 We close with the implications this \"sea change\" has on DBMS requirements in capital markets.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"15 1","pages":"917-918"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79619364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
CARTILAGE: adding flexibility to the Hadoop skeleton 软骨:为Hadoop骨架增加灵活性
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465258
Alekh Jindal, Jorge-Arnulfo Quiané-Ruiz, S. Madden
Modern enterprises have to deal with a variety of analytical queries over very large datasets. In this respect, Hadoop has gained much popularity since it scales to thousand of nodes and terabytes of data. However, Hadoop suffers from poor performance, especially in I/O performance. Several works have proposed alternate data storage for Hadoop in order to improve the query performance. However, many of these works end up making deep changes in Hadoop or HDFS. As a result, they are (i) difficult to adopt by several users, and (ii) not compatible with future Hadoop releases. In this paper, we present CARTILAGE, a comprehensive data storage framework built on top of HDFS. CARTILAGE allows users full control over their data storage, including data partitioning, data replication, data layouts, and data placement. Furthermore, CARTILAGE can be layered on top of an existing HDFS installation. This means that Hadoop, as well as other query engines, can readily make use of CARTILAGE. We describe several use-cases of CARTILAGE and propose to demonstrate the flexibility and efficiency of CARTILAGE through a set of novel scenarios.
现代企业必须在非常大的数据集上处理各种分析查询。在这方面,Hadoop已经获得了很大的普及,因为它可以扩展到数千个节点和tb级的数据。然而,Hadoop的性能很差,特别是在I/O性能方面。为了提高查询性能,已经有几篇文章提出了Hadoop的替代数据存储。然而,这些工作中的许多最终在Hadoop或HDFS中进行了深刻的更改。因此,它们(i)难以被多个用户采用,(ii)与未来的Hadoop版本不兼容。在本文中,我们介绍了一个基于HDFS的综合数据存储框架——软骨。软骨允许用户完全控制他们的数据存储,包括数据分区、数据复制、数据布局和数据放置。此外,软骨可以在现有HDFS安装的基础上分层。这意味着Hadoop以及其他查询引擎可以很容易地利用软骨。我们描述了软骨的几个用例,并建议通过一组新颖的场景来展示软骨的灵活性和效率。
{"title":"CARTILAGE: adding flexibility to the Hadoop skeleton","authors":"Alekh Jindal, Jorge-Arnulfo Quiané-Ruiz, S. Madden","doi":"10.1145/2463676.2465258","DOIUrl":"https://doi.org/10.1145/2463676.2465258","url":null,"abstract":"Modern enterprises have to deal with a variety of analytical queries over very large datasets. In this respect, Hadoop has gained much popularity since it scales to thousand of nodes and terabytes of data. However, Hadoop suffers from poor performance, especially in I/O performance. Several works have proposed alternate data storage for Hadoop in order to improve the query performance. However, many of these works end up making deep changes in Hadoop or HDFS. As a result, they are (i) difficult to adopt by several users, and (ii) not compatible with future Hadoop releases. In this paper, we present CARTILAGE, a comprehensive data storage framework built on top of HDFS. CARTILAGE allows users full control over their data storage, including data partitioning, data replication, data layouts, and data placement. Furthermore, CARTILAGE can be layered on top of an existing HDFS installation. This means that Hadoop, as well as other query engines, can readily make use of CARTILAGE. We describe several use-cases of CARTILAGE and propose to demonstrate the flexibility and efficiency of CARTILAGE through a set of novel scenarios.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"85 1","pages":"1057-1060"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79703805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A demonstration of SQLVM: performance isolation in multi-tenant relational database-as-a-service SQLVM:多租户关系数据库即服务中的性能隔离演示
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2463686
Vivek R. Narasayya, Sudipto Das, M. Syamala, B. Chandramouli, S. Chaudhuri
Sharing resources of a single database server among multiple tenants is common in multi-tenant Database-as-a-Service providers, such as Microsoft SQL Azure. Multi-tenancy enables cost reduction for the cloud service provider which it can pass on as savings to the tenants. However, resource sharing can adversely affect a tenant's performance due to other tenants' workloads contending for shared resources. Service providers today do not provide any assurances to a tenant in terms of isolating its performance from other co-located tenants. SQLVM, a project at Microsoft Research, is an abstraction for performance isolation which is built on a promise of reserving key database server resources, such as CPU, I/O and memory, for each tenant. The key challenge is in supporting this abstraction within a RDBMS without statically allocating resources to tenants, while ensuring low overheads and scaling to large numbers of tenants. This demonstration will show how SQLVM can effectively isolate a tenant's performance from other tenant workloads co-located at the same database server. Our demonstration will use various scripted scenarios and a data collection and visualization framework to illustrate performance isolation using SQLVM.
在多租户数据库即服务(database -as-a- service)提供商(如Microsoft SQL Azure)中,在多个租户之间共享单个数据库服务器的资源很常见。多租户可以降低云服务提供商的成本,并将其作为节省转嫁给租户。但是,资源共享可能会对租户的性能产生不利影响,因为其他租户的工作负载会争夺共享资源。目前,服务提供商不向租户提供任何保证,将其性能与其他共址租户隔离开来。SQLVM是微软研究院的一个项目,是性能隔离的抽象,它建立在为每个租户保留关键数据库服务器资源(如CPU、I/O和内存)的承诺之上。关键的挑战是在不向租户静态分配资源的情况下在RDBMS中支持这种抽象,同时确保低开销并可扩展到大量租户。此演示将展示SQLVM如何有效地将租户的性能与位于同一数据库服务器上的其他租户工作负载隔离开来。我们的演示将使用各种脚本场景以及数据收集和可视化框架来说明使用SQLVM进行性能隔离。
{"title":"A demonstration of SQLVM: performance isolation in multi-tenant relational database-as-a-service","authors":"Vivek R. Narasayya, Sudipto Das, M. Syamala, B. Chandramouli, S. Chaudhuri","doi":"10.1145/2463676.2463686","DOIUrl":"https://doi.org/10.1145/2463676.2463686","url":null,"abstract":"Sharing resources of a single database server among multiple tenants is common in multi-tenant Database-as-a-Service providers, such as Microsoft SQL Azure. Multi-tenancy enables cost reduction for the cloud service provider which it can pass on as savings to the tenants. However, resource sharing can adversely affect a tenant's performance due to other tenants' workloads contending for shared resources. Service providers today do not provide any assurances to a tenant in terms of isolating its performance from other co-located tenants. SQLVM, a project at Microsoft Research, is an abstraction for performance isolation which is built on a promise of reserving key database server resources, such as CPU, I/O and memory, for each tenant. The key challenge is in supporting this abstraction within a RDBMS without statically allocating resources to tenants, while ensuring low overheads and scaling to large numbers of tenants. This demonstration will show how SQLVM can effectively isolate a tenant's performance from other tenant workloads co-located at the same database server. Our demonstration will use various scripted scenarios and a data collection and visualization framework to illustrate performance isolation using SQLVM.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"12 1","pages":"1077-1080"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89322704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
PARAS: interactive parameter space exploration for association rule mining PARAS:关联规则挖掘的交互式参数空间探索
Pub Date : 2013-06-22 DOI: 10.1145/2463676.2465245
Abhishek Mukherji, Xika Lin, Christopher R. Botaish, Jason Whitehouse, Elke A. Rundensteiner, M. Ward, Carolina Ruiz
We demonstrate our PARAS technology for supporting interactive association mining at near real-time speeds. Key technical innovations of PARAS, in particular, stable region abstractions and rule redundancy management supporting novel parameter space-centric exploratory queries will be showcased. The audience will be able to interactively explore the parameter space view of rules. They will experience near real-time speeds achieved by PARAS for operations, such as comparing rule sets mined using different parameter values, that would otherwise take hours of computation and much manual investigation. Overall, we will demonstrate that the PARAS system provides a rich experience to data analysts through parameter tuning recommendations while significantly reducing the trial-and-error interactions.
我们展示了我们的PARAS技术,用于支持接近实时速度的交互式关联挖掘。将展示PARAS的关键技术创新,特别是稳定的区域抽象和规则冗余管理,支持新的以参数空间为中心的探索性查询。观众将能够交互式地探索规则的参数空间视图。他们将体验到PARAS实现的接近实时的操作速度,例如比较使用不同参数值挖掘的规则集,否则将花费数小时的计算和大量的人工调查。总之,我们将演示PARAS系统通过参数调优建议为数据分析人员提供丰富的体验,同时显著减少试错交互。
{"title":"PARAS: interactive parameter space exploration for association rule mining","authors":"Abhishek Mukherji, Xika Lin, Christopher R. Botaish, Jason Whitehouse, Elke A. Rundensteiner, M. Ward, Carolina Ruiz","doi":"10.1145/2463676.2465245","DOIUrl":"https://doi.org/10.1145/2463676.2465245","url":null,"abstract":"We demonstrate our PARAS technology for supporting interactive association mining at near real-time speeds. Key technical innovations of PARAS, in particular, stable region abstractions and rule redundancy management supporting novel parameter space-centric exploratory queries will be showcased. The audience will be able to interactively explore the parameter space view of rules. They will experience near real-time speeds achieved by PARAS for operations, such as comparing rule sets mined using different parameter values, that would otherwise take hours of computation and much manual investigation. Overall, we will demonstrate that the PARAS system provides a rich experience to data analysts through parameter tuning recommendations while significantly reducing the trial-and-error interactions.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"39 1","pages":"1017-1020"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88163181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Proceedings. ACM-SIGMOD International Conference on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1