首页 > 最新文献

2013 IEEE 29th International Conference on Data Engineering (ICDE)最新文献

英文 中文
Big data integration 大数据集成
Pub Date : 2013-08-27 DOI: 10.1109/ICDE.2013.6544914
X. Dong, D. Srivastava
The Big Data era is upon us: data is being generated, collected and analyzed at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Since the value of data explodes when it can be linked and fused with other data, addressing the big data integration (BDI) challenge is critical to realizing the promise of Big Data. BDI differs from traditional data integration in many dimensions: (i) the number of data sources, even for a single domain, has grown to be in the tens of thousands, (ii) many of the data sources are very dynamic, as a huge amount of newly collected data are continuously made available, (iii) the data sources are extremely heterogeneous in their structure, with considerable variety even for substantially similar entities, and (iv) the data sources are of widely differing qualities, with significant differences in the coverage, accuracy and timeliness of data provided. This seminar explores the progress that has been made by the data integration community on the topics of schema mapping, record linkage and data fusion in addressing these novel challenges faced by big data integration, and identifies a range of open problems for the community.
大数据时代即将来临:数据正在以前所未有的规模被生成、收集和分析,数据驱动的决策正在席卷社会的方方面面。当数据可以与其他数据链接和融合时,数据的价值就会爆发,因此解决大数据集成(BDI)挑战对于实现大数据的承诺至关重要。BDI与传统数据集成的不同之处在于:(i)数据源的数量,即使是单一领域,也已增长到数以万计;(ii)由于不断提供大量新收集的数据,许多数据源非常动态;(iii)数据源的结构极其异质,即使对于基本相似的实体也有相当大的差异;(iv)数据源的质量差别很大,覆盖范围也有很大差异。所提供数据的准确性和及时性。本次研讨会探讨了数据集成领域在模式映射、记录链接和数据融合等方面所取得的进展,以应对大数据集成所面临的新挑战,并确定了一系列有待解决的问题。
{"title":"Big data integration","authors":"X. Dong, D. Srivastava","doi":"10.1109/ICDE.2013.6544914","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544914","url":null,"abstract":"The Big Data era is upon us: data is being generated, collected and analyzed at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Since the value of data explodes when it can be linked and fused with other data, addressing the big data integration (BDI) challenge is critical to realizing the promise of Big Data. BDI differs from traditional data integration in many dimensions: (i) the number of data sources, even for a single domain, has grown to be in the tens of thousands, (ii) many of the data sources are very dynamic, as a huge amount of newly collected data are continuously made available, (iii) the data sources are extremely heterogeneous in their structure, with considerable variety even for substantially similar entities, and (iv) the data sources are of widely differing qualities, with significant differences in the coverage, accuracy and timeliness of data provided. This seminar explores the progress that has been made by the data integration community on the topics of schema mapping, record linkage and data fusion in addressing these novel challenges faced by big data integration, and identifies a range of open problems for the community.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"6 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114100697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 513
Machine learning on Big Data 基于大数据的机器学习
Pub Date : 2013-04-08 DOI: 10.1145/2463676.2465338
Tyson Condie, Paul Mineiro, N. Polyzotis, Markus Weimer
Statistical Machine Learning has undergone a phase transition from a pure academic endeavor to being one of the main drivers of modern commerce and science. Even more so, recent results such as those on tera-scale learning [1] and on very large neural networks [2] suggest that scale is an important ingredient in quality modeling. This tutorial introduces current applications, techniques and systems with the aim of cross-fertilizing research between the database and machine learning communities. The tutorial covers current large scale applications of Machine Learning, their computational model and the workflow behind building those. Based on this foundation, we present the current state-of-the-art in systems support in the bulk of the tutorial. We also identify critical gaps in the state-of-the-art. This leads to the closing of the seminar, where we introduce two sets of open research questions: Better systems support for the already established use cases of Machine Learning and support for recent advances in Machine Learning research.
统计机器学习经历了从纯粹的学术努力到成为现代商业和科学的主要驱动力之一的阶段转变。更重要的是,最近的研究结果,如在太尺度学习[1]和非常大的神经网络[2]上的研究结果表明,尺度是质量建模的重要组成部分。本教程介绍了当前的应用程序、技术和系统,旨在促进数据库和机器学习社区之间的交叉研究。本教程涵盖了当前机器学习的大规模应用,它们的计算模型以及构建这些模型背后的工作流程。在此基础上,我们在本教程的大部分内容中介绍了当前最先进的系统支持。我们还确定了最先进技术的关键差距。这导致了研讨会的结束,我们介绍了两组开放的研究问题:为已经建立的机器学习用例提供更好的系统支持,以及为机器学习研究的最新进展提供支持。
{"title":"Machine learning on Big Data","authors":"Tyson Condie, Paul Mineiro, N. Polyzotis, Markus Weimer","doi":"10.1145/2463676.2465338","DOIUrl":"https://doi.org/10.1145/2463676.2465338","url":null,"abstract":"Statistical Machine Learning has undergone a phase transition from a pure academic endeavor to being one of the main drivers of modern commerce and science. Even more so, recent results such as those on tera-scale learning [1] and on very large neural networks [2] suggest that scale is an important ingredient in quality modeling. This tutorial introduces current applications, techniques and systems with the aim of cross-fertilizing research between the database and machine learning communities. The tutorial covers current large scale applications of Machine Learning, their computational model and the workflow behind building those. Based on this foundation, we present the current state-of-the-art in systems support in the bulk of the tutorial. We also identify critical gaps in the state-of-the-art. This leads to the closing of the seminar, where we introduce two sets of open research questions: Better systems support for the already established use cases of Machine Learning and support for recent advances in Machine Learning research.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125010068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 191
CrowdPlanr: Planning made easy with crowd CrowdPlanr:利用人群让计划变得更容易
Pub Date : 2013-04-08 DOI: 10.1109/ICDE.2013.6544940
Ilia Lotosh, T. Milo, Slava Novgorodov
Recent research has shown that crowd sourcing can be used effectively to solve problems that are difficult for computers, e.g., optical character recognition and identification of the structural configuration of natural proteins [1]. In this demo we propose to use the power of the crowd to address yet another difficult problem that frequently occurs in a daily life-planning a sequence of actions, when the goal is hard to formalize. For example, planning the sequence of places/attractions to visit in the course of a vacation, where the goal is to enjoy the resulting vacation the most, or planning the sequence of courses to take in an academic schedule planning, where the goal is to obtain solid knowledge of a given subject domain. Such goals may be easily understandable by humans, but hard or even impossible to formalize for a computer. We present a novel algorithm for efficiently harnessing the crowd to assist in solving such planning problems. The algorithm builds the desired plans incrementally, optimally choosing at each step the `best' questions so that the overall number of questions that need to be asked is minimized. We demonstrate the effectiveness of our solution in CrowdPlanr, a system for vacation travel planning. Given a destination, dates, preferred activities and other constraints CrowdPlanr employs the crowd to build a vacation plan (sequence of places to visit) that is expected to maximize the “enjoyment” of the vacation.
最近的研究表明,众包可以有效地用于解决计算机难以解决的问题,例如光学字符识别和天然蛋白质结构构型的识别[1]。在这个演示中,我们建议使用人群的力量来解决日常生活中经常出现的另一个难题——当目标难以形式化时,计划一系列行动。例如,在假期中计划要参观的地方/景点的顺序,目标是最大限度地享受假期,或者在学术时间表计划中计划课程的顺序,目标是获得给定学科领域的扎实知识。这样的目标对于人类来说可能很容易理解,但对于计算机来说很难甚至不可能形式化。我们提出了一种新的算法来有效地利用人群来帮助解决这类规划问题。该算法逐步构建所需的计划,在每一步中选择最优的“最佳”问题,以便需要提出的问题总数最小化。我们在度假旅行规划系统crowdplanner中展示了我们解决方案的有效性。给定目的地、日期、喜欢的活动和其他限制条件,CrowdPlanr会让人们制定一个假期计划(要去的地方的顺序),以期最大限度地享受假期。
{"title":"CrowdPlanr: Planning made easy with crowd","authors":"Ilia Lotosh, T. Milo, Slava Novgorodov","doi":"10.1109/ICDE.2013.6544940","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544940","url":null,"abstract":"Recent research has shown that crowd sourcing can be used effectively to solve problems that are difficult for computers, e.g., optical character recognition and identification of the structural configuration of natural proteins [1]. In this demo we propose to use the power of the crowd to address yet another difficult problem that frequently occurs in a daily life-planning a sequence of actions, when the goal is hard to formalize. For example, planning the sequence of places/attractions to visit in the course of a vacation, where the goal is to enjoy the resulting vacation the most, or planning the sequence of courses to take in an academic schedule planning, where the goal is to obtain solid knowledge of a given subject domain. Such goals may be easily understandable by humans, but hard or even impossible to formalize for a computer. We present a novel algorithm for efficiently harnessing the crowd to assist in solving such planning problems. The algorithm builds the desired plans incrementally, optimally choosing at each step the `best' questions so that the overall number of questions that need to be asked is minimized. We demonstrate the effectiveness of our solution in CrowdPlanr, a system for vacation travel planning. Given a destination, dates, preferred activities and other constraints CrowdPlanr employs the crowd to build a vacation plan (sequence of places to visit) that is expected to maximize the “enjoyment” of the vacation.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122708647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Query time scaling of attribute values in interval timestamped databases 查询间隔时间戳数据库中属性值的时间缩放
Pub Date : 2013-04-08 DOI: 10.1109/ICDE.2013.6544930
Anton Dignös, Michael H. Böhlen, J. Gamper
In valid-time databases with interval timestamping each tuple is associated with a time interval over which the recorded fact is true in the modeled reality. The adjustment of these intervals is an essential part of processing interval timestamped data. Some attribute values remain valid if the associated interval changes, whereas others have to be scaled along with the time interval. For example, attributes that record total (cumulative) quantities over time, such as project budgets, total sales or total costs, often must be scaled if the timestamp is adjusted. The goal of this demo is to show how to support the scaling of attribute values in SQL at query time.
在具有间隔时间戳的有效时间数据库中,每个元组都与一个时间间隔相关联,在这个时间间隔上,所记录的事实在建模的现实中为真。这些间隔的调整是处理间隔时间戳数据的重要组成部分。如果关联的间隔发生变化,一些属性值仍然有效,而其他属性值则必须随着时间间隔的变化而变化。例如,记录总(累积)数量的属性,如项目预算、总销售额或总成本,如果时间戳被调整,通常必须进行缩放。这个演示的目的是展示如何在查询时支持SQL中属性值的缩放。
{"title":"Query time scaling of attribute values in interval timestamped databases","authors":"Anton Dignös, Michael H. Böhlen, J. Gamper","doi":"10.1109/ICDE.2013.6544930","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544930","url":null,"abstract":"In valid-time databases with interval timestamping each tuple is associated with a time interval over which the recorded fact is true in the modeled reality. The adjustment of these intervals is an essential part of processing interval timestamped data. Some attribute values remain valid if the associated interval changes, whereas others have to be scaled along with the time interval. For example, attributes that record total (cumulative) quantities over time, such as project budgets, total sales or total costs, often must be scaled if the timestamp is adjusted. The goal of this demo is to show how to support the scaling of attribute values in SQL at query time.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122533992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Cleaning uncertain data for top-k queries 清理top-k查询的不确定数据
Pub Date : 2013-04-08 DOI: 10.1109/ICDE.2013.6544820
Luyi Mo, Reynold Cheng, Xiang Li, D. Cheung, Xuan S. Yang
The information managed in emerging applications, such as sensor networks, location-based services, and data integration, is inherently imprecise. To handle data uncertainty, probabilistic databases have been recently developed. In this paper, we study how to quantify the ambiguity of answers returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database, in order to improve top-k query quality. Cleaning involves the reduction of ambiguity associated with the database entities. For example, the uncertainty of a temperature value acquired from a sensor can be reduced, or cleaned, by requesting its newest value from the sensor. While this “cleaning operation” may produce a better query result, it may involve a cost and fail. We investigate the problem of selecting entities to be cleaned under a limited budget. Particularly, we propose an optimal solution and several heuristics. Experiments show that the greedy algorithm is efficient and close to optimal.
新兴应用程序(如传感器网络、基于位置的服务和数据集成)中管理的信息本质上是不精确的。为了处理数据的不确定性,最近发展了概率数据库。在本文中,我们研究了如何量化由概率上k查询返回的答案的模糊性。我们开发了有效的算法来计算在可能世界语义下的查询质量。我们进一步解决了概率数据库的清理问题,以提高top-k查询质量。清理涉及减少与数据库实体相关的歧义。例如,从传感器获取的温度值的不确定性可以通过从传感器请求其最新值来减小或消除。虽然这种“清理操作”可能产生更好的查询结果,但它可能涉及成本和失败。我们研究了在有限预算下选择要清洗的实体的问题。特别地,我们提出了一个最优解和几个启发式方法。实验表明,贪心算法是一种高效且接近最优的算法。
{"title":"Cleaning uncertain data for top-k queries","authors":"Luyi Mo, Reynold Cheng, Xiang Li, D. Cheung, Xuan S. Yang","doi":"10.1109/ICDE.2013.6544820","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544820","url":null,"abstract":"The information managed in emerging applications, such as sensor networks, location-based services, and data integration, is inherently imprecise. To handle data uncertainty, probabilistic databases have been recently developed. In this paper, we study how to quantify the ambiguity of answers returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database, in order to improve top-k query quality. Cleaning involves the reduction of ambiguity associated with the database entities. For example, the uncertainty of a temperature value acquired from a sensor can be reduced, or cleaned, by requesting its newest value from the sensor. While this “cleaning operation” may produce a better query result, it may involve a cost and fail. We investigate the problem of selecting entities to be cleaned under a limited budget. Particularly, we propose an optimal solution and several heuristics. Experiments show that the greedy algorithm is efficient and close to optimal.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129720063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
SHARE: Secure information sharing framework for emergency management SHARE:用于应急管理的安全信息共享框架
Pub Date : 2013-04-08 DOI: 10.1109/ICDE.2013.6544938
B. Carminati, E. Ferrari, M. Guglielmi
9/11, Katrina, Fukushima and other recent emergencies demonstrate the need for effective information sharing across government agencies as well as non-governmental and private organizations to assess emergency situations, and generate proper response plans. In this demo, we present a system to enforce timely and controlled information sharing in emergency situations. The framework is able to detect emergencies, enforce temporary access control policies and obligations to be activated during emergencies, simulate emergency situations for demonstrational purposes and show statistical results related to emergency activation/deactivation and consequent access control policies triggering.
9/11、卡特里娜飓风、福岛和最近的其他紧急情况表明,政府机构之间以及非政府组织和私人组织之间需要有效地共享信息,以评估紧急情况,并制定适当的应对计划。在这个演示中,我们展示了一个系统,用于在紧急情况下执行及时和可控的信息共享。该框架能够发现紧急情况,执行临时访问控制政策和在紧急情况期间启动的义务,模拟用于演示目的的紧急情况,并显示与紧急情况启动/停用以及随后触发访问控制政策有关的统计结果。
{"title":"SHARE: Secure information sharing framework for emergency management","authors":"B. Carminati, E. Ferrari, M. Guglielmi","doi":"10.1109/ICDE.2013.6544938","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544938","url":null,"abstract":"9/11, Katrina, Fukushima and other recent emergencies demonstrate the need for effective information sharing across government agencies as well as non-governmental and private organizations to assess emergency situations, and generate proper response plans. In this demo, we present a system to enforce timely and controlled information sharing in emergency situations. The framework is able to detect emergencies, enforce temporary access control policies and obligations to be activated during emergencies, simulate emergency situations for demonstrational purposes and show statistical results related to emergency activation/deactivation and consequent access control policies triggering.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129864390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A unified model for stable and temporal topic detection from social media data 从社交媒体数据中进行稳定和时间主题检测的统一模型
Pub Date : 2013-04-08 DOI: 10.1109/ICDE.2013.6544864
Hongzhi Yin, B. Cui, Hua Lu, Yuxin Huang, Junjie Yao
Web 2.0 users generate and spread huge amounts of messages in online social media. Such user-generated contents are mixture of temporal topics (e.g., breaking events) and stable topics (e.g., user interests). Due to their different natures, it is important and useful to distinguish temporal topics from stable topics in social media. However, such a discrimination is very challenging because the user-generated texts in social media are very short in length and thus lack useful linguistic features for precise analysis using traditional approaches. In this paper, we propose a novel solution to detect both stable and temporal topics simultaneously from social media data. Specifically, a unified user-temporal mixture model is proposed to distinguish temporal topics from stable topics. To improve this model's performance, we design a regularization framework that exploits prior spatial information in a social network, as well as a burst-weighted smoothing scheme that exploits temporal prior information in the time dimension. We conduct extensive experiments to evaluate our proposal on two real data sets obtained from Del.icio.us and Twitter. The experimental results verify that our mixture model is able to distinguish temporal topics from stable topics in a single detection process. Our mixture model enhanced with the spatial regularization and the burst-weighted smoothing scheme significantly outperforms competitor approaches, in terms of topic detection accuracy and discrimination in stable and temporal topics.
Web 2.0用户在在线社交媒体上生成和传播大量信息。这些用户生成的内容是临时主题(例如突发事件)和稳定主题(例如用户兴趣)的混合。由于时间话题和稳定话题的性质不同,在社交媒体中区分时间话题和稳定话题是非常重要和有用的。然而,这种区分是非常具有挑战性的,因为社交媒体中用户生成的文本长度非常短,因此缺乏有用的语言特征,无法使用传统方法进行精确分析。在本文中,我们提出了一种新的解决方案,可以同时从社交媒体数据中检测稳定话题和时态话题。具体来说,提出了一个统一的用户-时间混合模型来区分时间主题和稳定主题。为了提高该模型的性能,我们设计了一个利用社会网络中先验空间信息的正则化框架,以及一个利用时间维度上的时间先验信息的突发加权平滑方案。我们在Del.icio.us和Twitter的两个真实数据集上进行了大量的实验来评估我们的建议。实验结果表明,该混合模型能够在一次检测过程中区分出时间主题和稳定主题。我们的混合模型增强了空间正则化和突发加权平滑方案,在主题检测精度和对稳定和时间主题的区分方面明显优于竞争对手的方法。
{"title":"A unified model for stable and temporal topic detection from social media data","authors":"Hongzhi Yin, B. Cui, Hua Lu, Yuxin Huang, Junjie Yao","doi":"10.1109/ICDE.2013.6544864","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544864","url":null,"abstract":"Web 2.0 users generate and spread huge amounts of messages in online social media. Such user-generated contents are mixture of temporal topics (e.g., breaking events) and stable topics (e.g., user interests). Due to their different natures, it is important and useful to distinguish temporal topics from stable topics in social media. However, such a discrimination is very challenging because the user-generated texts in social media are very short in length and thus lack useful linguistic features for precise analysis using traditional approaches. In this paper, we propose a novel solution to detect both stable and temporal topics simultaneously from social media data. Specifically, a unified user-temporal mixture model is proposed to distinguish temporal topics from stable topics. To improve this model's performance, we design a regularization framework that exploits prior spatial information in a social network, as well as a burst-weighted smoothing scheme that exploits temporal prior information in the time dimension. We conduct extensive experiments to evaluate our proposal on two real data sets obtained from Del.icio.us and Twitter. The experimental results verify that our mixture model is able to distinguish temporal topics from stable topics in a single detection process. Our mixture model enhanced with the spatial regularization and the burst-weighted smoothing scheme significantly outperforms competitor approaches, in terms of topic detection accuracy and discrimination in stable and temporal topics.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128744942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
SAP HANA distributed in-memory database system: Transaction, session, and metadata management SAP HANA分布式内存数据库系统:事务、会话、元数据管理
Pub Date : 2013-04-08 DOI: 10.1109/ICDE.2013.6544906
Juchang Lee, Y. Kwon, Franz Färber, Michael Muehle, Chulwon Lee, Christian Bensberg, Joo-Yeon Lee, Arthur H. Lee, Wolfgang Lehner
One of the core principles of the SAP HANA database system is the comprehensive support of distributed query facility. Supporting scale-out scenarios was one of the major design principles of the system from the very beginning. Within this paper, we first give an overview of the overall functionality with respect to data allocation, metadata caching and query routing. We then dive into some level of detail for specific topics and explain features and methods not common in traditional disk-based database systems. In summary, the paper provides a comprehensive overview of distributed query processing in SAP HANA database to achieve scalability to handle large databases and heterogeneous types of workloads.
SAP HANA数据库系统的核心原则之一是对分布式查询功能的全面支持。支持横向扩展场景从一开始就是该系统的主要设计原则之一。在本文中,我们首先概述了数据分配、元数据缓存和查询路由方面的总体功能。然后,我们深入到特定主题的一些细节,并解释传统的基于磁盘的数据库系统中不常见的特性和方法。综上所述,本文全面概述了SAP HANA数据库中的分布式查询处理,以实现处理大型数据库和异构类型工作负载的可扩展性。
{"title":"SAP HANA distributed in-memory database system: Transaction, session, and metadata management","authors":"Juchang Lee, Y. Kwon, Franz Färber, Michael Muehle, Chulwon Lee, Christian Bensberg, Joo-Yeon Lee, Arthur H. Lee, Wolfgang Lehner","doi":"10.1109/ICDE.2013.6544906","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544906","url":null,"abstract":"One of the core principles of the SAP HANA database system is the comprehensive support of distributed query facility. Supporting scale-out scenarios was one of the major design principles of the system from the very beginning. Within this paper, we first give an overview of the overall functionality with respect to data allocation, metadata caching and query routing. We then dive into some level of detail for specific topics and explain features and methods not common in traditional disk-based database systems. In summary, the paper provides a comprehensive overview of distributed query processing in SAP HANA database to achieve scalability to handle large databases and heterogeneous types of workloads.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130635408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Twitter+: Build personalized newspaper for Twitter Twitter+:为Twitter打造个性化报纸
Pub Date : 2013-04-08 DOI: 10.1109/ICDE.2013.6544922
Chen Liu, A. Tung
Nowadays, microblogging services, e.g., Twitter, have played important roles in people's everyday lives. It enables users to publish and read text-based posts, known as “tweets” and interact with each other through re-tweeting or commenting. In the literature, many efforts have been devoted on exploiting the social property of Twitter. However, except the social component, Twitter itself has become an indispensable source for users to acquire useful information. To maximize its value, we expect to pay more attention on the media property of Twitter. To be good media, the first requirement is that it should provide an effective presentation of its news so that users are facilitated of reading. Currently, all tweets from followings are presented to the users and usually organized by their published timelines or coming sources. However, too few dimensions of presenting tweets hinder users from finding their interested information conveniently. In this demo, we presents “Twitter+”, which aims to enrich user's reading experiences in Twitter by providing multiple ways for them to explore tweets, such as keyword presentation, topic finding. It presents users an alternative interface to browse tweets more effectively.
如今,微博服务,如Twitter,在人们的日常生活中扮演着重要的角色。它允许用户发布和阅读基于文本的帖子,即“tweet”,并通过转发或评论进行互动。在文献中,许多努力都致力于利用Twitter的社会属性。然而,除了社交部分,Twitter本身已经成为用户获取有用信息不可或缺的来源。为了最大限度地发挥其价值,我们希望更多地关注Twitter的媒体属性。要成为一个好的媒体,首先要提供有效的新闻呈现,方便用户阅读。目前,所有来自关注者的推文都会呈现给用户,并且通常按照他们发布的时间线或即将发布的消息来源进行组织。然而,tweets的呈现维度太少,不利于用户方便地找到感兴趣的信息。在这个演示中,我们展示了“Twitter+”,它旨在通过提供多种方式来丰富用户在Twitter上的阅读体验,例如关键词呈现,主题查找。它为用户提供了一个可选择的界面来更有效地浏览tweet。
{"title":"Twitter+: Build personalized newspaper for Twitter","authors":"Chen Liu, A. Tung","doi":"10.1109/ICDE.2013.6544922","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544922","url":null,"abstract":"Nowadays, microblogging services, e.g., Twitter, have played important roles in people's everyday lives. It enables users to publish and read text-based posts, known as “tweets” and interact with each other through re-tweeting or commenting. In the literature, many efforts have been devoted on exploiting the social property of Twitter. However, except the social component, Twitter itself has become an indispensable source for users to acquire useful information. To maximize its value, we expect to pay more attention on the media property of Twitter. To be good media, the first requirement is that it should provide an effective presentation of its news so that users are facilitated of reading. Currently, all tweets from followings are presented to the users and usually organized by their published timelines or coming sources. However, too few dimensions of presenting tweets hinder users from finding their interested information conveniently. In this demo, we presents “Twitter+”, which aims to enrich user's reading experiences in Twitter by providing multiple ways for them to explore tweets, such as keyword presentation, topic finding. It presents users an alternative interface to browse tweets more effectively.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123497447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Time travel in column stores 在专栏商店里进行时间旅行
Pub Date : 2013-04-08 DOI: 10.1109/ICDE.2013.6544818
Martin Kaufmann, Amin Amiri Manjili, Stefan Hildenbrand, Donald Kossmann, Andreas Tonder
Recent studies have shown that column stores can outperform row stores significantly. This paper explores alternative approaches to extend column stores with versioning, i.e., time travel queries and the maintenance of historic data. On the one hand, adding versioning can actually simplify the design of a column store because it provides a solution for the implementation of updates, traditionally a weak point in the design of column stores. On the other hand, implementing a versioned column store is challenging because it imposes a two dimensional clustering problem: should the data be clustered by row or by version? This paper devises the details of three memory layouts: clustering by row, clustering by version, and hybrid clustering. Performance experiments demonstrate that all three approaches outperform a (traditional) versioned row store. The efficiency of these three memory layouts depends on the query and update workload. Furthermore, the performance experiments analyze the time-space tradeoff that can be made in the implementation of versioned column stores.
最近的研究表明,列存储的性能明显优于行存储。本文探讨了用版本控制扩展列存储的替代方法,即时间旅行查询和历史数据的维护。一方面,添加版本控制实际上可以简化列存储的设计,因为它为更新的实现提供了解决方案,而更新是列存储设计中的传统弱点。另一方面,实现版本化的列存储很有挑战性,因为它带来了一个二维集群问题:数据应该按行还是按版本集群?本文设计了三种内存布局的细节:按行聚类、按版本聚类和混合聚类。性能实验表明,这三种方法的性能都优于(传统的)版本行存储。这三种内存布局的效率取决于查询和更新工作负载。此外,性能实验分析了在实现版本列存储时可能进行的时间-空间权衡。
{"title":"Time travel in column stores","authors":"Martin Kaufmann, Amin Amiri Manjili, Stefan Hildenbrand, Donald Kossmann, Andreas Tonder","doi":"10.1109/ICDE.2013.6544818","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544818","url":null,"abstract":"Recent studies have shown that column stores can outperform row stores significantly. This paper explores alternative approaches to extend column stores with versioning, i.e., time travel queries and the maintenance of historic data. On the one hand, adding versioning can actually simplify the design of a column store because it provides a solution for the implementation of updates, traditionally a weak point in the design of column stores. On the other hand, implementing a versioned column store is challenging because it imposes a two dimensional clustering problem: should the data be clustered by row or by version? This paper devises the details of three memory layouts: clustering by row, clustering by version, and hybrid clustering. Performance experiments demonstrate that all three approaches outperform a (traditional) versioned row store. The efficiency of these three memory layouts depends on the query and update workload. Furthermore, the performance experiments analyze the time-space tradeoff that can be made in the implementation of versioned column stores.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127773174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2013 IEEE 29th International Conference on Data Engineering (ICDE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1