Pub Date : 2013-08-27DOI: 10.1109/ICDE.2013.6544914
X. Dong, D. Srivastava
The Big Data era is upon us: data is being generated, collected and analyzed at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Since the value of data explodes when it can be linked and fused with other data, addressing the big data integration (BDI) challenge is critical to realizing the promise of Big Data. BDI differs from traditional data integration in many dimensions: (i) the number of data sources, even for a single domain, has grown to be in the tens of thousands, (ii) many of the data sources are very dynamic, as a huge amount of newly collected data are continuously made available, (iii) the data sources are extremely heterogeneous in their structure, with considerable variety even for substantially similar entities, and (iv) the data sources are of widely differing qualities, with significant differences in the coverage, accuracy and timeliness of data provided. This seminar explores the progress that has been made by the data integration community on the topics of schema mapping, record linkage and data fusion in addressing these novel challenges faced by big data integration, and identifies a range of open problems for the community.
{"title":"Big data integration","authors":"X. Dong, D. Srivastava","doi":"10.1109/ICDE.2013.6544914","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544914","url":null,"abstract":"The Big Data era is upon us: data is being generated, collected and analyzed at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Since the value of data explodes when it can be linked and fused with other data, addressing the big data integration (BDI) challenge is critical to realizing the promise of Big Data. BDI differs from traditional data integration in many dimensions: (i) the number of data sources, even for a single domain, has grown to be in the tens of thousands, (ii) many of the data sources are very dynamic, as a huge amount of newly collected data are continuously made available, (iii) the data sources are extremely heterogeneous in their structure, with considerable variety even for substantially similar entities, and (iv) the data sources are of widely differing qualities, with significant differences in the coverage, accuracy and timeliness of data provided. This seminar explores the progress that has been made by the data integration community on the topics of schema mapping, record linkage and data fusion in addressing these novel challenges faced by big data integration, and identifies a range of open problems for the community.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"6 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114100697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tyson Condie, Paul Mineiro, N. Polyzotis, Markus Weimer
Statistical Machine Learning has undergone a phase transition from a pure academic endeavor to being one of the main drivers of modern commerce and science. Even more so, recent results such as those on tera-scale learning [1] and on very large neural networks [2] suggest that scale is an important ingredient in quality modeling. This tutorial introduces current applications, techniques and systems with the aim of cross-fertilizing research between the database and machine learning communities. The tutorial covers current large scale applications of Machine Learning, their computational model and the workflow behind building those. Based on this foundation, we present the current state-of-the-art in systems support in the bulk of the tutorial. We also identify critical gaps in the state-of-the-art. This leads to the closing of the seminar, where we introduce two sets of open research questions: Better systems support for the already established use cases of Machine Learning and support for recent advances in Machine Learning research.
{"title":"Machine learning on Big Data","authors":"Tyson Condie, Paul Mineiro, N. Polyzotis, Markus Weimer","doi":"10.1145/2463676.2465338","DOIUrl":"https://doi.org/10.1145/2463676.2465338","url":null,"abstract":"Statistical Machine Learning has undergone a phase transition from a pure academic endeavor to being one of the main drivers of modern commerce and science. Even more so, recent results such as those on tera-scale learning [1] and on very large neural networks [2] suggest that scale is an important ingredient in quality modeling. This tutorial introduces current applications, techniques and systems with the aim of cross-fertilizing research between the database and machine learning communities. The tutorial covers current large scale applications of Machine Learning, their computational model and the workflow behind building those. Based on this foundation, we present the current state-of-the-art in systems support in the bulk of the tutorial. We also identify critical gaps in the state-of-the-art. This leads to the closing of the seminar, where we introduce two sets of open research questions: Better systems support for the already established use cases of Machine Learning and support for recent advances in Machine Learning research.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125010068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-08DOI: 10.1109/ICDE.2013.6544940
Ilia Lotosh, T. Milo, Slava Novgorodov
Recent research has shown that crowd sourcing can be used effectively to solve problems that are difficult for computers, e.g., optical character recognition and identification of the structural configuration of natural proteins [1]. In this demo we propose to use the power of the crowd to address yet another difficult problem that frequently occurs in a daily life-planning a sequence of actions, when the goal is hard to formalize. For example, planning the sequence of places/attractions to visit in the course of a vacation, where the goal is to enjoy the resulting vacation the most, or planning the sequence of courses to take in an academic schedule planning, where the goal is to obtain solid knowledge of a given subject domain. Such goals may be easily understandable by humans, but hard or even impossible to formalize for a computer. We present a novel algorithm for efficiently harnessing the crowd to assist in solving such planning problems. The algorithm builds the desired plans incrementally, optimally choosing at each step the `best' questions so that the overall number of questions that need to be asked is minimized. We demonstrate the effectiveness of our solution in CrowdPlanr, a system for vacation travel planning. Given a destination, dates, preferred activities and other constraints CrowdPlanr employs the crowd to build a vacation plan (sequence of places to visit) that is expected to maximize the “enjoyment” of the vacation.
{"title":"CrowdPlanr: Planning made easy with crowd","authors":"Ilia Lotosh, T. Milo, Slava Novgorodov","doi":"10.1109/ICDE.2013.6544940","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544940","url":null,"abstract":"Recent research has shown that crowd sourcing can be used effectively to solve problems that are difficult for computers, e.g., optical character recognition and identification of the structural configuration of natural proteins [1]. In this demo we propose to use the power of the crowd to address yet another difficult problem that frequently occurs in a daily life-planning a sequence of actions, when the goal is hard to formalize. For example, planning the sequence of places/attractions to visit in the course of a vacation, where the goal is to enjoy the resulting vacation the most, or planning the sequence of courses to take in an academic schedule planning, where the goal is to obtain solid knowledge of a given subject domain. Such goals may be easily understandable by humans, but hard or even impossible to formalize for a computer. We present a novel algorithm for efficiently harnessing the crowd to assist in solving such planning problems. The algorithm builds the desired plans incrementally, optimally choosing at each step the `best' questions so that the overall number of questions that need to be asked is minimized. We demonstrate the effectiveness of our solution in CrowdPlanr, a system for vacation travel planning. Given a destination, dates, preferred activities and other constraints CrowdPlanr employs the crowd to build a vacation plan (sequence of places to visit) that is expected to maximize the “enjoyment” of the vacation.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122708647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-08DOI: 10.1109/ICDE.2013.6544930
Anton Dignös, Michael H. Böhlen, J. Gamper
In valid-time databases with interval timestamping each tuple is associated with a time interval over which the recorded fact is true in the modeled reality. The adjustment of these intervals is an essential part of processing interval timestamped data. Some attribute values remain valid if the associated interval changes, whereas others have to be scaled along with the time interval. For example, attributes that record total (cumulative) quantities over time, such as project budgets, total sales or total costs, often must be scaled if the timestamp is adjusted. The goal of this demo is to show how to support the scaling of attribute values in SQL at query time.
{"title":"Query time scaling of attribute values in interval timestamped databases","authors":"Anton Dignös, Michael H. Böhlen, J. Gamper","doi":"10.1109/ICDE.2013.6544930","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544930","url":null,"abstract":"In valid-time databases with interval timestamping each tuple is associated with a time interval over which the recorded fact is true in the modeled reality. The adjustment of these intervals is an essential part of processing interval timestamped data. Some attribute values remain valid if the associated interval changes, whereas others have to be scaled along with the time interval. For example, attributes that record total (cumulative) quantities over time, such as project budgets, total sales or total costs, often must be scaled if the timestamp is adjusted. The goal of this demo is to show how to support the scaling of attribute values in SQL at query time.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122533992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-08DOI: 10.1109/ICDE.2013.6544820
Luyi Mo, Reynold Cheng, Xiang Li, D. Cheung, Xuan S. Yang
The information managed in emerging applications, such as sensor networks, location-based services, and data integration, is inherently imprecise. To handle data uncertainty, probabilistic databases have been recently developed. In this paper, we study how to quantify the ambiguity of answers returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database, in order to improve top-k query quality. Cleaning involves the reduction of ambiguity associated with the database entities. For example, the uncertainty of a temperature value acquired from a sensor can be reduced, or cleaned, by requesting its newest value from the sensor. While this “cleaning operation” may produce a better query result, it may involve a cost and fail. We investigate the problem of selecting entities to be cleaned under a limited budget. Particularly, we propose an optimal solution and several heuristics. Experiments show that the greedy algorithm is efficient and close to optimal.
{"title":"Cleaning uncertain data for top-k queries","authors":"Luyi Mo, Reynold Cheng, Xiang Li, D. Cheung, Xuan S. Yang","doi":"10.1109/ICDE.2013.6544820","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544820","url":null,"abstract":"The information managed in emerging applications, such as sensor networks, location-based services, and data integration, is inherently imprecise. To handle data uncertainty, probabilistic databases have been recently developed. In this paper, we study how to quantify the ambiguity of answers returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database, in order to improve top-k query quality. Cleaning involves the reduction of ambiguity associated with the database entities. For example, the uncertainty of a temperature value acquired from a sensor can be reduced, or cleaned, by requesting its newest value from the sensor. While this “cleaning operation” may produce a better query result, it may involve a cost and fail. We investigate the problem of selecting entities to be cleaned under a limited budget. Particularly, we propose an optimal solution and several heuristics. Experiments show that the greedy algorithm is efficient and close to optimal.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129720063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-08DOI: 10.1109/ICDE.2013.6544938
B. Carminati, E. Ferrari, M. Guglielmi
9/11, Katrina, Fukushima and other recent emergencies demonstrate the need for effective information sharing across government agencies as well as non-governmental and private organizations to assess emergency situations, and generate proper response plans. In this demo, we present a system to enforce timely and controlled information sharing in emergency situations. The framework is able to detect emergencies, enforce temporary access control policies and obligations to be activated during emergencies, simulate emergency situations for demonstrational purposes and show statistical results related to emergency activation/deactivation and consequent access control policies triggering.
{"title":"SHARE: Secure information sharing framework for emergency management","authors":"B. Carminati, E. Ferrari, M. Guglielmi","doi":"10.1109/ICDE.2013.6544938","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544938","url":null,"abstract":"9/11, Katrina, Fukushima and other recent emergencies demonstrate the need for effective information sharing across government agencies as well as non-governmental and private organizations to assess emergency situations, and generate proper response plans. In this demo, we present a system to enforce timely and controlled information sharing in emergency situations. The framework is able to detect emergencies, enforce temporary access control policies and obligations to be activated during emergencies, simulate emergency situations for demonstrational purposes and show statistical results related to emergency activation/deactivation and consequent access control policies triggering.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129864390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-08DOI: 10.1109/ICDE.2013.6544864
Hongzhi Yin, B. Cui, Hua Lu, Yuxin Huang, Junjie Yao
Web 2.0 users generate and spread huge amounts of messages in online social media. Such user-generated contents are mixture of temporal topics (e.g., breaking events) and stable topics (e.g., user interests). Due to their different natures, it is important and useful to distinguish temporal topics from stable topics in social media. However, such a discrimination is very challenging because the user-generated texts in social media are very short in length and thus lack useful linguistic features for precise analysis using traditional approaches. In this paper, we propose a novel solution to detect both stable and temporal topics simultaneously from social media data. Specifically, a unified user-temporal mixture model is proposed to distinguish temporal topics from stable topics. To improve this model's performance, we design a regularization framework that exploits prior spatial information in a social network, as well as a burst-weighted smoothing scheme that exploits temporal prior information in the time dimension. We conduct extensive experiments to evaluate our proposal on two real data sets obtained from Del.icio.us and Twitter. The experimental results verify that our mixture model is able to distinguish temporal topics from stable topics in a single detection process. Our mixture model enhanced with the spatial regularization and the burst-weighted smoothing scheme significantly outperforms competitor approaches, in terms of topic detection accuracy and discrimination in stable and temporal topics.
Web 2.0用户在在线社交媒体上生成和传播大量信息。这些用户生成的内容是临时主题(例如突发事件)和稳定主题(例如用户兴趣)的混合。由于时间话题和稳定话题的性质不同,在社交媒体中区分时间话题和稳定话题是非常重要和有用的。然而,这种区分是非常具有挑战性的,因为社交媒体中用户生成的文本长度非常短,因此缺乏有用的语言特征,无法使用传统方法进行精确分析。在本文中,我们提出了一种新的解决方案,可以同时从社交媒体数据中检测稳定话题和时态话题。具体来说,提出了一个统一的用户-时间混合模型来区分时间主题和稳定主题。为了提高该模型的性能,我们设计了一个利用社会网络中先验空间信息的正则化框架,以及一个利用时间维度上的时间先验信息的突发加权平滑方案。我们在Del.icio.us和Twitter的两个真实数据集上进行了大量的实验来评估我们的建议。实验结果表明,该混合模型能够在一次检测过程中区分出时间主题和稳定主题。我们的混合模型增强了空间正则化和突发加权平滑方案,在主题检测精度和对稳定和时间主题的区分方面明显优于竞争对手的方法。
{"title":"A unified model for stable and temporal topic detection from social media data","authors":"Hongzhi Yin, B. Cui, Hua Lu, Yuxin Huang, Junjie Yao","doi":"10.1109/ICDE.2013.6544864","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544864","url":null,"abstract":"Web 2.0 users generate and spread huge amounts of messages in online social media. Such user-generated contents are mixture of temporal topics (e.g., breaking events) and stable topics (e.g., user interests). Due to their different natures, it is important and useful to distinguish temporal topics from stable topics in social media. However, such a discrimination is very challenging because the user-generated texts in social media are very short in length and thus lack useful linguistic features for precise analysis using traditional approaches. In this paper, we propose a novel solution to detect both stable and temporal topics simultaneously from social media data. Specifically, a unified user-temporal mixture model is proposed to distinguish temporal topics from stable topics. To improve this model's performance, we design a regularization framework that exploits prior spatial information in a social network, as well as a burst-weighted smoothing scheme that exploits temporal prior information in the time dimension. We conduct extensive experiments to evaluate our proposal on two real data sets obtained from Del.icio.us and Twitter. The experimental results verify that our mixture model is able to distinguish temporal topics from stable topics in a single detection process. Our mixture model enhanced with the spatial regularization and the burst-weighted smoothing scheme significantly outperforms competitor approaches, in terms of topic detection accuracy and discrimination in stable and temporal topics.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128744942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-08DOI: 10.1109/ICDE.2013.6544906
Juchang Lee, Y. Kwon, Franz Färber, Michael Muehle, Chulwon Lee, Christian Bensberg, Joo-Yeon Lee, Arthur H. Lee, Wolfgang Lehner
One of the core principles of the SAP HANA database system is the comprehensive support of distributed query facility. Supporting scale-out scenarios was one of the major design principles of the system from the very beginning. Within this paper, we first give an overview of the overall functionality with respect to data allocation, metadata caching and query routing. We then dive into some level of detail for specific topics and explain features and methods not common in traditional disk-based database systems. In summary, the paper provides a comprehensive overview of distributed query processing in SAP HANA database to achieve scalability to handle large databases and heterogeneous types of workloads.
SAP HANA数据库系统的核心原则之一是对分布式查询功能的全面支持。支持横向扩展场景从一开始就是该系统的主要设计原则之一。在本文中,我们首先概述了数据分配、元数据缓存和查询路由方面的总体功能。然后,我们深入到特定主题的一些细节,并解释传统的基于磁盘的数据库系统中不常见的特性和方法。综上所述,本文全面概述了SAP HANA数据库中的分布式查询处理,以实现处理大型数据库和异构类型工作负载的可扩展性。
{"title":"SAP HANA distributed in-memory database system: Transaction, session, and metadata management","authors":"Juchang Lee, Y. Kwon, Franz Färber, Michael Muehle, Chulwon Lee, Christian Bensberg, Joo-Yeon Lee, Arthur H. Lee, Wolfgang Lehner","doi":"10.1109/ICDE.2013.6544906","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544906","url":null,"abstract":"One of the core principles of the SAP HANA database system is the comprehensive support of distributed query facility. Supporting scale-out scenarios was one of the major design principles of the system from the very beginning. Within this paper, we first give an overview of the overall functionality with respect to data allocation, metadata caching and query routing. We then dive into some level of detail for specific topics and explain features and methods not common in traditional disk-based database systems. In summary, the paper provides a comprehensive overview of distributed query processing in SAP HANA database to achieve scalability to handle large databases and heterogeneous types of workloads.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130635408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-08DOI: 10.1109/ICDE.2013.6544922
Chen Liu, A. Tung
Nowadays, microblogging services, e.g., Twitter, have played important roles in people's everyday lives. It enables users to publish and read text-based posts, known as “tweets” and interact with each other through re-tweeting or commenting. In the literature, many efforts have been devoted on exploiting the social property of Twitter. However, except the social component, Twitter itself has become an indispensable source for users to acquire useful information. To maximize its value, we expect to pay more attention on the media property of Twitter. To be good media, the first requirement is that it should provide an effective presentation of its news so that users are facilitated of reading. Currently, all tweets from followings are presented to the users and usually organized by their published timelines or coming sources. However, too few dimensions of presenting tweets hinder users from finding their interested information conveniently. In this demo, we presents “Twitter+”, which aims to enrich user's reading experiences in Twitter by providing multiple ways for them to explore tweets, such as keyword presentation, topic finding. It presents users an alternative interface to browse tweets more effectively.
{"title":"Twitter+: Build personalized newspaper for Twitter","authors":"Chen Liu, A. Tung","doi":"10.1109/ICDE.2013.6544922","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544922","url":null,"abstract":"Nowadays, microblogging services, e.g., Twitter, have played important roles in people's everyday lives. It enables users to publish and read text-based posts, known as “tweets” and interact with each other through re-tweeting or commenting. In the literature, many efforts have been devoted on exploiting the social property of Twitter. However, except the social component, Twitter itself has become an indispensable source for users to acquire useful information. To maximize its value, we expect to pay more attention on the media property of Twitter. To be good media, the first requirement is that it should provide an effective presentation of its news so that users are facilitated of reading. Currently, all tweets from followings are presented to the users and usually organized by their published timelines or coming sources. However, too few dimensions of presenting tweets hinder users from finding their interested information conveniently. In this demo, we presents “Twitter+”, which aims to enrich user's reading experiences in Twitter by providing multiple ways for them to explore tweets, such as keyword presentation, topic finding. It presents users an alternative interface to browse tweets more effectively.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123497447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-04-08DOI: 10.1109/ICDE.2013.6544818
Martin Kaufmann, Amin Amiri Manjili, Stefan Hildenbrand, Donald Kossmann, Andreas Tonder
Recent studies have shown that column stores can outperform row stores significantly. This paper explores alternative approaches to extend column stores with versioning, i.e., time travel queries and the maintenance of historic data. On the one hand, adding versioning can actually simplify the design of a column store because it provides a solution for the implementation of updates, traditionally a weak point in the design of column stores. On the other hand, implementing a versioned column store is challenging because it imposes a two dimensional clustering problem: should the data be clustered by row or by version? This paper devises the details of three memory layouts: clustering by row, clustering by version, and hybrid clustering. Performance experiments demonstrate that all three approaches outperform a (traditional) versioned row store. The efficiency of these three memory layouts depends on the query and update workload. Furthermore, the performance experiments analyze the time-space tradeoff that can be made in the implementation of versioned column stores.
{"title":"Time travel in column stores","authors":"Martin Kaufmann, Amin Amiri Manjili, Stefan Hildenbrand, Donald Kossmann, Andreas Tonder","doi":"10.1109/ICDE.2013.6544818","DOIUrl":"https://doi.org/10.1109/ICDE.2013.6544818","url":null,"abstract":"Recent studies have shown that column stores can outperform row stores significantly. This paper explores alternative approaches to extend column stores with versioning, i.e., time travel queries and the maintenance of historic data. On the one hand, adding versioning can actually simplify the design of a column store because it provides a solution for the implementation of updates, traditionally a weak point in the design of column stores. On the other hand, implementing a versioned column store is challenging because it imposes a two dimensional clustering problem: should the data be clustered by row or by version? This paper devises the details of three memory layouts: clustering by row, clustering by version, and hybrid clustering. Performance experiments demonstrate that all three approaches outperform a (traditional) versioned row store. The efficiency of these three memory layouts depends on the query and update workload. Furthermore, the performance experiments analyze the time-space tradeoff that can be made in the implementation of versioned column stores.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127773174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}