首页 > 最新文献

Proceedings of the 2018 International Conference on Management of Data最新文献

英文 中文
Session details: Research 4: Query Processing 会议详情:研究4:查询处理
H. Pirk
{"title":"Session details: Research 4: Query Processing","authors":"H. Pirk","doi":"10.1145/3258008","DOIUrl":"https://doi.org/10.1145/3258008","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87595786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Processing Algorithms for Influence Maximization 影响最大化的在线处理算法
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183749
Jing Tang, Xueyan Tang, Xiaokui Xiao, Junsong Yuan
Influence maximization is a classic and extensively studied problem with important applications in viral marketing. Existing algorithms for influence maximization, however, mostly focus on offline processing, in the sense that they do not provide any output to the user until the final answer is derived, and that the user is not allowed to terminate the algorithm early to trade the quality of solution for efficiency. Such lack of interactiveness and flexibility leads to poor user experience, especially when the algorithm incurs long running time. To address the above problem, this paper studies algorithms for online processing of influence maximization (OPIM), where the user can pause the algorithm at any time and ask for a solution (to the influence maximization problem) and its approximation guarantee, and can resume the algorithm to let it improve the quality of solution by giving it more time to run. (This interactive paradigm is similar in spirit to online query processing in database systems.) We show that the only existing algorithm for OPIM is vastly ineffective in practice, and that adopting existing influence maximization methods for OPIM yields unsatisfactory results. Motivated by this, we propose a new algorithm for OPIM with both superior empirical effectiveness and strong theoretical guarantees, and we show that it can also be extended to handle conventional influence maximization. Extensive experiments on real data demonstrate that our solutions outperform the state of the art for both OPIM and conventional influence maximization.
影响力最大化是一个经典的、被广泛研究的问题,在病毒式营销中有着重要的应用。然而,现有的影响最大化算法大多侧重于离线处理,也就是说,它们在得到最终答案之前不向用户提供任何输出,并且不允许用户提前终止算法以换取效率。由于缺乏交互性和灵活性,导致用户体验差,特别是算法运行时间长。针对上述问题,本文研究了影响最大化(OPIM)的在线处理算法,用户可以随时暂停算法,要求求解(影响最大化问题)及其近似保证,并可以通过给算法更多的运行时间来恢复算法,使其提高求解质量。(这种交互范例在精神上类似于数据库系统中的在线查询处理。)我们证明了现有的OPIM算法在实践中是非常无效的,并且采用现有的OPIM影响最大化方法会产生令人不满意的结果。在此基础上,本文提出了一种新的OPIM算法,该算法具有较好的经验有效性和较强的理论保证,并证明该算法也可以扩展到处理传统的影响力最大化问题。对真实数据的大量实验表明,我们的解决方案在OPIM和传统影响力最大化方面都优于最先进的解决方案。
{"title":"Online Processing Algorithms for Influence Maximization","authors":"Jing Tang, Xueyan Tang, Xiaokui Xiao, Junsong Yuan","doi":"10.1145/3183713.3183749","DOIUrl":"https://doi.org/10.1145/3183713.3183749","url":null,"abstract":"Influence maximization is a classic and extensively studied problem with important applications in viral marketing. Existing algorithms for influence maximization, however, mostly focus on offline processing, in the sense that they do not provide any output to the user until the final answer is derived, and that the user is not allowed to terminate the algorithm early to trade the quality of solution for efficiency. Such lack of interactiveness and flexibility leads to poor user experience, especially when the algorithm incurs long running time. To address the above problem, this paper studies algorithms for online processing of influence maximization (OPIM), where the user can pause the algorithm at any time and ask for a solution (to the influence maximization problem) and its approximation guarantee, and can resume the algorithm to let it improve the quality of solution by giving it more time to run. (This interactive paradigm is similar in spirit to online query processing in database systems.) We show that the only existing algorithm for OPIM is vastly ineffective in practice, and that adopting existing influence maximization methods for OPIM yields unsatisfactory results. Motivated by this, we propose a new algorithm for OPIM with both superior empirical effectiveness and strong theoretical guarantees, and we show that it can also be extended to handle conventional influence maximization. Extensive experiments on real data demonstrate that our solutions outperform the state of the art for both OPIM and conventional influence maximization.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86170758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 122
Adaptive Optimization of Very Large Join Queries 超大型连接查询的自适应优化
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183733
Thomas Neumann, Bernhard Radke
The use of business intelligence tools and other means to generate queries has led to great variety in the size of join queries. While most queries are reasonably small, join queries with up to a hundred relations are not that exotic anymore, and the distribution of query sizes has an incredible long tail. The largest real-world query that we are aware of accesses more than 4,000 relations. This large spread makes query optimization very challenging. Join ordering is known to be NP-hard, which means that we cannot hope to solve such large problems exactly. On the other hand most queries are much smaller, and there is no reason to sacrifice optimality there. This paper introduces an adaptive optimization framework that is able to solve most common join queries exactly, while simultaneously scaling to queries with thousands of joins. A key component there is a novel search space linearization technique that leads to near-optimal execution plans for large classes of queries. In addition, we describe implementation techniques that are necessary to scale join ordering algorithms to these extremely large queries. Extensive experiments with over 10 different approaches show that the new adaptive approach proposed here performs excellent over a huge spectrum of query sizes, and produces optimal or near-optimal solutions for most common queries.
使用商业智能工具和其他方法生成查询导致了连接查询大小的巨大变化。虽然大多数查询都相当小,但具有多达100个关系的连接查询不再那么奇特,并且查询大小的分布具有令人难以置信的长尾。我们所知道的最大的现实世界查询访问了超过4000个关系。这种巨大的分布使得查询优化非常具有挑战性。众所周知,连接排序是np困难的,这意味着我们不能指望精确地解决如此大的问题。另一方面,大多数查询都要小得多,没有理由牺牲最优性。本文介绍了一个自适应优化框架,它能够准确地解决最常见的连接查询,同时扩展到具有数千个连接的查询。其中一个关键组件是一种新颖的搜索空间线性化技术,它可以为大型查询类提供近乎最佳的执行计划。此外,我们还描述了将连接排序算法扩展到这些超大型查询所需的实现技术。对超过10种不同方法的大量实验表明,本文提出的新的自适应方法在查询大小的巨大范围内表现出色,并为大多数常见查询产生最优或接近最优的解决方案。
{"title":"Adaptive Optimization of Very Large Join Queries","authors":"Thomas Neumann, Bernhard Radke","doi":"10.1145/3183713.3183733","DOIUrl":"https://doi.org/10.1145/3183713.3183733","url":null,"abstract":"The use of business intelligence tools and other means to generate queries has led to great variety in the size of join queries. While most queries are reasonably small, join queries with up to a hundred relations are not that exotic anymore, and the distribution of query sizes has an incredible long tail. The largest real-world query that we are aware of accesses more than 4,000 relations. This large spread makes query optimization very challenging. Join ordering is known to be NP-hard, which means that we cannot hope to solve such large problems exactly. On the other hand most queries are much smaller, and there is no reason to sacrifice optimality there. This paper introduces an adaptive optimization framework that is able to solve most common join queries exactly, while simultaneously scaling to queries with thousands of joins. A key component there is a novel search space linearization technique that leads to near-optimal execution plans for large classes of queries. In addition, we describe implementation techniques that are necessary to scale join ordering algorithms to these extremely large queries. Extensive experiments with over 10 different approaches show that the new adaptive approach proposed here performs excellent over a huge spectrum of query sizes, and produces optimal or near-optimal solutions for most common queries.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88569765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Approximate Triangle Count and Clustering Coefficient 近似三角形计数和聚类系数
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183715
Siddharth Bhatia
Two important metrics used to characterise a graph are its triangle count and clustering coefficient. In this paper, we present methods to approximate these metrics for graphs.
用来描述图的两个重要指标是三角形计数和聚类系数。在本文中,我们提出了近似图的这些度量的方法。
{"title":"Approximate Triangle Count and Clustering Coefficient","authors":"Siddharth Bhatia","doi":"10.1145/3183713.3183715","DOIUrl":"https://doi.org/10.1145/3183713.3183715","url":null,"abstract":"Two important metrics used to characterise a graph are its triangle count and clustering coefficient. In this paper, we present methods to approximate these metrics for graphs.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89382566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Big Data Linkage for Product Specification Pages 产品规格页面的大数据联动
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3183757
Disheng Qiu, Luciano Barbosa, Valter Crescenzi, P. Merialdo, D. Srivastava
An increasing number of product pages are available from thousands of web sources, each page associated with a product, containing its attributes and one or more product identifiers. The sources provide overlapping information about the products, using diverse schemas, making web-scale integration extremely challenging. In this paper, we take advantage of the opportunity that sources publish product identifiers to perform big data linkage across sources at the beginning of the data integration pipeline, before schema alignment. To realize this opportunity, several challenges need to be addressed: identifiers need to be discovered on product pages, made difficult by the diversity of identifiers; the main product identifier on the page needs to be identified, made difficult by the many related products presented on the page; and identifiers across pages need to beresolved, made difficult by the ambiguity between identifiers across product categories. We present our RaF (Redundancy as Friend) solution to the problem of big data linkage for product specification pages, which takes advantage of the redundancy of identifiers at a global level, and the homogeneity of structure and semantics at the local source level, to effectively and efficiently link millions of pages of head and tail products across thousands of head and tail sources. We perform a thorough empirical evaluation of our RaF approach using the publicly available Dexter dataset consisting of 1.9M product pages from 7.1k sources of 3.5k websites, and demonstrate its effectiveness in practice.
从成千上万的web源中可以获得越来越多的产品页面,每个页面都与一个产品相关联,包含其属性和一个或多个产品标识符。这些来源提供了关于产品的重叠信息,使用了不同的模式,使得web规模的集成极具挑战性。在本文中,我们利用数据源发布产品标识符的机会,在数据集成管道的开始,在模式对齐之前,跨数据源执行大数据链接。要实现这一机会,需要解决几个挑战:标识符需要在产品页面上被发现,这由于标识符的多样性而变得困难;需要识别页面上的主要产品标识,由于页面上呈现的众多相关产品而变得困难;并且需要解析跨页面的标识符,这由于产品类别之间标识符的模糊性而变得困难。我们针对产品规格页面的大数据链接问题提出了我们的RaF(冗余如朋友)解决方案,该解决方案利用了全局级标识符的冗余,以及本地源级结构和语义的同质性,有效地链接了数千个头尾源的数百万页头部和尾部产品。我们使用公开可用的Dexter数据集对我们的RaF方法进行了彻底的实证评估,该数据集由来自350个网站的7.1万个来源的190万个产品页面组成,并证明了其在实践中的有效性。
{"title":"Big Data Linkage for Product Specification Pages","authors":"Disheng Qiu, Luciano Barbosa, Valter Crescenzi, P. Merialdo, D. Srivastava","doi":"10.1145/3183713.3183757","DOIUrl":"https://doi.org/10.1145/3183713.3183757","url":null,"abstract":"An increasing number of product pages are available from thousands of web sources, each page associated with a product, containing its attributes and one or more product identifiers. The sources provide overlapping information about the products, using diverse schemas, making web-scale integration extremely challenging. In this paper, we take advantage of the opportunity that sources publish product identifiers to perform big data linkage across sources at the beginning of the data integration pipeline, before schema alignment. To realize this opportunity, several challenges need to be addressed: identifiers need to be discovered on product pages, made difficult by the diversity of identifiers; the main product identifier on the page needs to be identified, made difficult by the many related products presented on the page; and identifiers across pages need to beresolved, made difficult by the ambiguity between identifiers across product categories. We present our RaF (Redundancy as Friend) solution to the problem of big data linkage for product specification pages, which takes advantage of the redundancy of identifiers at a global level, and the homogeneity of structure and semantics at the local source level, to effectively and efficiently link millions of pages of head and tail products across thousands of head and tail sources. We perform a thorough empirical evaluation of our RaF approach using the publicly available Dexter dataset consisting of 1.9M product pages from 7.1k sources of 3.5k websites, and demonstrate its effectiveness in practice.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86953641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel 数据按例转换(TDE): Excel中的可扩展数据转换
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3193539
Yeye He, Kris Ganjam, Kukjin Lee, Yue Wang, Vivek R. Narasayya, S. Chaudhuri, Xu Chu, Yudian Zheng
Business analysts and data scientists today increasingly need to clean, standardize and transform diverse data sets, such as name, address, date time, phone number, etc., before they can perform analysis. These ad-hoc transformation problems are typically solved by one-off scripts, which is both difficult and time-consuming. Our observation is that these domain-specific transformation problems have long been solved by developers with code libraries, which are often shared in places like GitHub. We thus develop an extensible data transformation system called Transform-Data-by-Example (TDE) that can leverage rich transformation logic in source code, DLLs, web services and mapping tables, so that end-users only need to provide a few (typically 3) input/output examples, and TDE can synthesize desired programs using relevant transformation logic from these sources. The beta version of TDE was released in Office Store for Excel.
如今,业务分析师和数据科学家越来越需要清理、标准化和转换各种数据集,如姓名、地址、日期、时间、电话号码等,然后才能进行分析。这些特殊的转换问题通常由一次性脚本解决,这既困难又耗时。我们的观察是,这些特定领域的转换问题早就被开发人员用代码库解决了,这些代码库通常在GitHub等地方共享。因此,我们开发了一个可扩展的数据转换系统,称为按例转换数据(TDE),它可以利用源代码、dll、web服务和映射表中的丰富转换逻辑,这样最终用户只需要提供几个(通常是3个)输入/输出示例,TDE可以使用来自这些源的相关转换逻辑合成所需的程序。TDE的测试版在Office Store中发布。
{"title":"Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel","authors":"Yeye He, Kris Ganjam, Kukjin Lee, Yue Wang, Vivek R. Narasayya, S. Chaudhuri, Xu Chu, Yudian Zheng","doi":"10.1145/3183713.3193539","DOIUrl":"https://doi.org/10.1145/3183713.3193539","url":null,"abstract":"Business analysts and data scientists today increasingly need to clean, standardize and transform diverse data sets, such as name, address, date time, phone number, etc., before they can perform analysis. These ad-hoc transformation problems are typically solved by one-off scripts, which is both difficult and time-consuming. Our observation is that these domain-specific transformation problems have long been solved by developers with code libraries, which are often shared in places like GitHub. We thus develop an extensible data transformation system called Transform-Data-by-Example (TDE) that can leverage rich transformation logic in source code, DLLs, web services and mapping tables, so that end-users only need to provide a few (typically 3) input/output examples, and TDE can synthesize desired programs using relevant transformation logic from these sources. The beta version of TDE was released in Office Store for Excel.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90281917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Data Integration and Machine Learning: A Natural Synergy 数据集成和机器学习:自然的协同作用
Pub Date : 2018-05-27 DOI: 10.14778/3229863.3229876
X. Dong, Theodoros Rekatsinas
There is now more data to analyze than ever before. As data volume and variety have increased, so have the ties between machine learning and data integration become stronger. For machine learning to be effective, one must utilize data from the greatest possible variety of sources; and this is why data integration plays a key role. At the same time machine learning is driving automation in data integration, resulting in overall reduction of integration costs and improved accuracy. This tutorial focuses on three aspects of the synergistic relationship between data integration and machine learning: (1) we survey how state-of-the-art data integration solutions rely on machine learning-based approaches for accurate results and effective human-in-the-loop pipelines, (2) we review how end-to-end machine learning applications rely on data integration to identify accurate, clean, and relevant data for their analytics exercises, and (3) we discuss open research challenges and opportunities that span across data integration and machine learning.
现在要分析的数据比以往任何时候都多。随着数据量和种类的增加,机器学习和数据集成之间的联系也变得更加紧密。为了使机器学习有效,必须利用尽可能多的来源的数据;这就是数据集成发挥关键作用的原因。与此同时,机器学习正在推动数据集成的自动化,从而降低集成成本并提高准确性。本教程侧重于数据集成和机器学习之间的协同关系的三个方面:(1)我们调查了最先进的数据集成解决方案如何依赖于基于机器学习的方法来获得准确的结果和有效的人在循环管道;(2)我们回顾了端到端机器学习应用程序如何依赖于数据集成来识别准确、干净和相关的数据进行分析练习;(3)我们讨论了跨越数据集成和机器学习的开放研究挑战和机遇。
{"title":"Data Integration and Machine Learning: A Natural Synergy","authors":"X. Dong, Theodoros Rekatsinas","doi":"10.14778/3229863.3229876","DOIUrl":"https://doi.org/10.14778/3229863.3229876","url":null,"abstract":"There is now more data to analyze than ever before. As data volume and variety have increased, so have the ties between machine learning and data integration become stronger. For machine learning to be effective, one must utilize data from the greatest possible variety of sources; and this is why data integration plays a key role. At the same time machine learning is driving automation in data integration, resulting in overall reduction of integration costs and improved accuracy. This tutorial focuses on three aspects of the synergistic relationship between data integration and machine learning: (1) we survey how state-of-the-art data integration solutions rely on machine learning-based approaches for accurate results and effective human-in-the-loop pipelines, (2) we review how end-to-end machine learning applications rely on data integration to identify accurate, clean, and relevant data for their analytics exercises, and (3) we discuss open research challenges and opportunities that span across data integration and machine learning.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90449046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 95
DPaxos: Managing Data Closer to Users for Low-Latency and Mobile Applications DPaxos:管理数据更接近用户的低延迟和移动应用程序
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3196928
Faisal Nawab, D. Agrawal, A. E. Abbadi
In this paper, we propose Dynamic Paxos (DPaxos), a Paxos-based consensus protocol to manage access to partitioned data across globally-distributed datacenters and edge nodes. DPaxos is intended to implement a State Machine Replication component in data management systems for the edge. DPaxos targets the unique opportunities of utilizing edge computing resources to support emerging applications with stringent mobility and real-time requirements such as Augmented and Virtual Reality and vehicular applications. The main objective of DPaxos is to reduce the latency of serving user requests, recovering from failures, and reacting to mobility. DPaxos achieves these objectives by a few proposed changes to the traditional Paxos protocol. Most notably, DPaxos proposes a dynamic allocation of quorums ( i.e. , groups of nodes) that are needed for Paxos Leader Election. Leader Election quorums in DPaxos are smaller than traditional Paxos and expand only in the presence of conflicts.
在本文中,我们提出了动态Paxos (DPaxos),这是一种基于Paxos的共识协议,用于管理对全球分布式数据中心和边缘节点上分区数据的访问。DPaxos旨在在边缘的数据管理系统中实现状态机复制组件。DPaxos瞄准了利用边缘计算资源的独特机会,以支持具有严格移动性和实时性要求的新兴应用,如增强现实和虚拟现实以及车载应用。DPaxos的主要目标是减少服务用户请求、从故障中恢复以及对移动性做出反应的延迟。DPaxos通过对传统Paxos协议进行一些修改来实现这些目标。最值得注意的是,DPaxos提出了Paxos Leader选举所需的quorum(即节点组)的动态分配。DPaxos中的领导人选举法定人数比传统Paxos小,只有在存在冲突时才会扩大。
{"title":"DPaxos: Managing Data Closer to Users for Low-Latency and Mobile Applications","authors":"Faisal Nawab, D. Agrawal, A. E. Abbadi","doi":"10.1145/3183713.3196928","DOIUrl":"https://doi.org/10.1145/3183713.3196928","url":null,"abstract":"In this paper, we propose Dynamic Paxos (DPaxos), a Paxos-based consensus protocol to manage access to partitioned data across globally-distributed datacenters and edge nodes. DPaxos is intended to implement a State Machine Replication component in data management systems for the edge. DPaxos targets the unique opportunities of utilizing edge computing resources to support emerging applications with stringent mobility and real-time requirements such as Augmented and Virtual Reality and vehicular applications. The main objective of DPaxos is to reduce the latency of serving user requests, recovering from failures, and reacting to mobility. DPaxos achieves these objectives by a few proposed changes to the traditional Paxos protocol. Most notably, DPaxos proposes a dynamic allocation of quorums ( i.e. , groups of nodes) that are needed for Paxos Leader Election. Leader Election quorums in DPaxos are smaller than traditional Paxos and expand only in the presence of conflicts.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76561711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Special Session: A Technical Research Agenda in Data Ethics and Responsible Data Management 特别会议:数据伦理和负责任数据管理的技术研究议程
Pub Date : 2018-05-27 DOI: 10.1145/3183713.3205185
Julia Stoyanovich, Bill Howe, H. Jagadish
SESSION DESCRIPTION Recently, there has begun a movement towards fairness, accountability, and transparency (FAT) in algorithmic decision making, and in data science more broadly [1–4]. The database community has not been significantly involved in this movement, despite “owning” the models, languages, and systems that produce the input to the machine learning applications that are often the focus in data science. If training data are biased, or have errors, it stands to reason that the algorithmic result will also be unfair or erroneous. Similarly, transparency of just the algorithm is usually insufficient to understand why certain results were obtained: one needs also to know the data used. In short, FAT depend not just on the algorithm, but also on the data. This observation raises several important questions: What are the core data management issues to which the objectives of fairness, accountability and transparency give rise? What role should the database community play in this movement? Will emphasis on these topics dilute our core competency in techniques and technologies for data, or can it reinforce our central role in technology stacks ranging from startups to the enterprise, and from local non-profits to the federal government? This special session features leading researchers from machine learning, software engineering, security and privacy, and natural language processing, who are doing exciting technical work in FAT. The goal of this session is to outline a technical research agenda in data management foundations and systems around data ethics.
最近,在算法决策和更广泛的数据科学中,开始出现一种朝着公平、问责和透明(FAT)的运动[1-4]。尽管“拥有”为机器学习应用程序(通常是数据科学的焦点)提供输入的模型、语言和系统,但数据库社区并没有明显地参与到这一运动中。如果训练数据有偏差,或者有错误,那么理所当然,算法的结果也会不公平或错误。同样,仅仅是算法的透明度通常不足以理解为什么会得到某些结果:人们还需要知道所使用的数据。简而言之,FAT不仅依赖于算法,还依赖于数据。这一观察提出了几个重要的问题:公平、问责和透明的目标所产生的核心数据管理问题是什么?数据库社区在这场运动中应该扮演什么角色?对这些主题的强调会削弱我们在数据技术和技术方面的核心竞争力,还是会加强我们在从初创公司到企业,从地方非营利组织到联邦政府的技术堆栈中的核心作用?本次特别会议邀请了来自机器学习、软件工程、安全和隐私以及自然语言处理领域的领先研究人员,他们正在FAT领域从事令人兴奋的技术工作。本次会议的目标是围绕数据伦理概述数据管理基础和系统的技术研究议程。
{"title":"Special Session: A Technical Research Agenda in Data Ethics and Responsible Data Management","authors":"Julia Stoyanovich, Bill Howe, H. Jagadish","doi":"10.1145/3183713.3205185","DOIUrl":"https://doi.org/10.1145/3183713.3205185","url":null,"abstract":"SESSION DESCRIPTION Recently, there has begun a movement towards fairness, accountability, and transparency (FAT) in algorithmic decision making, and in data science more broadly [1–4]. The database community has not been significantly involved in this movement, despite “owning” the models, languages, and systems that produce the input to the machine learning applications that are often the focus in data science. If training data are biased, or have errors, it stands to reason that the algorithmic result will also be unfair or erroneous. Similarly, transparency of just the algorithm is usually insufficient to understand why certain results were obtained: one needs also to know the data used. In short, FAT depend not just on the algorithm, but also on the data. This observation raises several important questions: What are the core data management issues to which the objectives of fairness, accountability and transparency give rise? What role should the database community play in this movement? Will emphasis on these topics dilute our core competency in techniques and technologies for data, or can it reinforce our central role in technology stacks ranging from startups to the enterprise, and from local non-profits to the federal government? This special session features leading researchers from machine learning, software engineering, security and privacy, and natural language processing, who are doing exciting technical work in FAT. The goal of this session is to outline a technical research agenda in data management foundations and systems around data ethics.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78227358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Session details: Industry 1: Adaptive Query Processing 行业1:自适应查询处理
J. Dittrich
{"title":"Session details: Industry 1: Adaptive Query Processing","authors":"J. Dittrich","doi":"10.1145/3258006","DOIUrl":"https://doi.org/10.1145/3258006","url":null,"abstract":"","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78394440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2018 International Conference on Management of Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1