Trajectory classification has many useful applications. Existing works on trajectory classification do not consider the duration information of trajectory. In this paper, we extract duration-aware features from trajectories to build a classifier. Our method utilizes information theory to obtain regions where the trajectories have similar speeds and directions. Further, trajectories are summarized into a network based on the MDL principle that takes into account the duration difference among trajectories of different classes. A graph traversal is performed on this trajectory network to obtain the top-k covering path rules for each trajectory. Based on the discovered regions and top-k path rules, we build a classifier to predict the class labels of new trajectories. Experiment results on real-world datasets show that the proposed duration-aware classifier can obtain higher classification accuracy than the state-of-the-art trajectory classifier.
{"title":"Incorporating Duration Information for Trajectory Classification","authors":"D. Patel, Chang Sheng, W. Hsu, M. Lee","doi":"10.1109/ICDE.2012.72","DOIUrl":"https://doi.org/10.1109/ICDE.2012.72","url":null,"abstract":"Trajectory classification has many useful applications. Existing works on trajectory classification do not consider the duration information of trajectory. In this paper, we extract duration-aware features from trajectories to build a classifier. Our method utilizes information theory to obtain regions where the trajectories have similar speeds and directions. Further, trajectories are summarized into a network based on the MDL principle that takes into account the duration difference among trajectories of different classes. A graph traversal is performed on this trajectory network to obtain the top-k covering path rules for each trajectory. Based on the discovered regions and top-k path rules, we build a classifier to predict the class labels of new trajectories. Experiment results on real-world datasets show that the proposed duration-aware classifier can obtain higher classification accuracy than the state-of-the-art trajectory classifier.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127065914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When compared to a gold standard, the set of mappings that are generated by an automatic ontology matching process is neither complete nor are the individual mappings always correct. However, given the explosion in the number, size, and complexity of available ontologies, domain experts no longer have the capability to create ontology mappings without considerable effort. We present a solution to this problem that consists of making the ontology matching process interactive so as to incorporate user feedback in the loop. Our approach clusters mappings to identify where user feedback will be most beneficial in reducing the number of user interactions and system iterations. This feedback process has been implemented in the Agreement Maker system and is supported by visual analytic techniques that help users to better understand the matching process. Experimental results using the OAEI benchmarks show the effectiveness of our approach. We will demonstrate how users can interact with the ontology matching process through the Agreement Maker user interface to match real-world ontologies.
{"title":"Interactive User Feedback in Ontology Matching Using Signature Vectors","authors":"I. Cruz, Cosmin Stroe, M. Palmonari","doi":"10.1109/ICDE.2012.137","DOIUrl":"https://doi.org/10.1109/ICDE.2012.137","url":null,"abstract":"When compared to a gold standard, the set of mappings that are generated by an automatic ontology matching process is neither complete nor are the individual mappings always correct. However, given the explosion in the number, size, and complexity of available ontologies, domain experts no longer have the capability to create ontology mappings without considerable effort. We present a solution to this problem that consists of making the ontology matching process interactive so as to incorporate user feedback in the loop. Our approach clusters mappings to identify where user feedback will be most beneficial in reducing the number of user interactions and system iterations. This feedback process has been implemented in the Agreement Maker system and is supported by visual analytic techniques that help users to better understand the matching process. Experimental results using the OAEI benchmarks show the effectiveness of our approach. We will demonstrate how users can interact with the ontology matching process through the Agreement Maker user interface to match real-world ontologies.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128825347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Witnessing the emergence of Twitter, we propose a Twitter-based Event Detection and Analysis System (TEDAS), which helps to (1) detect new events, to (2) analyze the spatial and temporal pattern of an event, and to (3) identify importance of events. In this demonstration, we show the overall system architecture, explain in detail the implementation of the components that crawl, classify, and rank tweets and extract location from tweets, and present some interesting results of our system.
{"title":"TEDAS: A Twitter-based Event Detection and Analysis System","authors":"Rui Li, Kin Hou Lei, Ravi V. Khadiwala, K. Chang","doi":"10.1109/ICDE.2012.125","DOIUrl":"https://doi.org/10.1109/ICDE.2012.125","url":null,"abstract":"Witnessing the emergence of Twitter, we propose a Twitter-based Event Detection and Analysis System (TEDAS), which helps to (1) detect new events, to (2) analyze the spatial and temporal pattern of an event, and to (3) identify importance of events. In this demonstration, we show the overall system architecture, explain in detail the implementation of the components that crawl, classify, and rank tweets and extract location from tweets, and present some interesting results of our system.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121619881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a wide array of disciplines, data can be modeled as an interconnected network of entities, where various attributes could be associated with both the entities and the relations among them. Knowledge is often hidden in the complex structure and attributes inside these networks. While querying and mining these linked datasets are essential for various applications, traditional graph queries may not be able to capture the rich semantics in these networks. With the advent of complex information networks, new graph queries are emerging, including graph pattern matching and mining, similarity search, ranking and expert finding, graph aggregation and OLAP. These queries require both the topology and content information of the network data, and hence, different from classical graph algorithms such as shortest path, reach ability and minimum cut, which depend only on the structure of the network. In this tutorial, we shall give an introduction of the emerging graph queries, their indexing and resolution techniques, the current challenges and the future research directions.
{"title":"Emerging Graph Queries in Linked Data","authors":"Arijit Khan, Yinghui Wu, Xifeng Yan","doi":"10.1109/ICDE.2012.143","DOIUrl":"https://doi.org/10.1109/ICDE.2012.143","url":null,"abstract":"In a wide array of disciplines, data can be modeled as an interconnected network of entities, where various attributes could be associated with both the entities and the relations among them. Knowledge is often hidden in the complex structure and attributes inside these networks. While querying and mining these linked datasets are essential for various applications, traditional graph queries may not be able to capture the rich semantics in these networks. With the advent of complex information networks, new graph queries are emerging, including graph pattern matching and mining, similarity search, ranking and expert finding, graph aggregation and OLAP. These queries require both the topology and content information of the network data, and hence, different from classical graph algorithms such as shortest path, reach ability and minimum cut, which depend only on the structure of the network. In this tutorial, we shall give an introduction of the emerging graph queries, their indexing and resolution techniques, the current challenges and the future research directions.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123064627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The topic of outlier detection in sensor networks has received significant attention in recent years. Detecting when the measurements of a node become "abnormal'' is interesting, because this event may help detect either a malfunctioning node, or a node that starts observing a local interesting phenomenon (i.e., a fire). In this paper we present a new algorithm for detecting outliers in sensor networks, based on the geometric approach. Unlike prior work. our algorithms perform a distributed monitoring of outlier readings, exhibit 100% accuracy in their monitoring (assuming no message losses), and require the transmission of messages only at a fraction of the epochs, thus allowing nodes to safely refrain from transmitting in many epochs. Our approach is based on transforming common similarity metrics in a way that admits the application of the recently proposed geometric approach. We then propose a general framework and suggest multiple modes of operation, which allow each sensor node to accurately monitor its similarity to other nodes. Our experiments demonstrate that our algorithms can accurately detect outliers at a fraction of the communication cost that a centralized approach would require (even in the case where the central node lies just one hop away from all sensor nodes). Moreover, we demonstrate that these bandwidth savings become even larger as we incorporate further optimizations in our proposed modes of operation.
{"title":"Detecting Outliers in Sensor Networks Using the Geometric Approach","authors":"Sabbas Burdakis, Antonios Deligiannakis","doi":"10.1109/ICDE.2012.85","DOIUrl":"https://doi.org/10.1109/ICDE.2012.85","url":null,"abstract":"The topic of outlier detection in sensor networks has received significant attention in recent years. Detecting when the measurements of a node become \"abnormal'' is interesting, because this event may help detect either a malfunctioning node, or a node that starts observing a local interesting phenomenon (i.e., a fire). In this paper we present a new algorithm for detecting outliers in sensor networks, based on the geometric approach. Unlike prior work. our algorithms perform a distributed monitoring of outlier readings, exhibit 100% accuracy in their monitoring (assuming no message losses), and require the transmission of messages only at a fraction of the epochs, thus allowing nodes to safely refrain from transmitting in many epochs. Our approach is based on transforming common similarity metrics in a way that admits the application of the recently proposed geometric approach. We then propose a general framework and suggest multiple modes of operation, which allow each sensor node to accurately monitor its similarity to other nodes. Our experiments demonstrate that our algorithms can accurately detect outliers at a fraction of the communication cost that a centralized approach would require (even in the case where the central node lies just one hop away from all sensor nodes). Moreover, we demonstrate that these bandwidth savings become even larger as we incorporate further optimizations in our proposed modes of operation.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125729429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Regulations and societal expectations have recently emphasized the need to mediate access to valuable databases, even access by insiders. Fraud occurs when a person, often an insider, tries to hide illegal activity. Companies would like to be assured that such tampering has not occurred, or if it does, that it will be quickly discovered and used to identify the perpetrator. At one end of the compliance spectrum lies the approach of restricting access to information and on the other that of information accountability. We focus on effecting information accountability of data stored in high-performance databases. The demonstrated work ensures appropriate use and thus end-to-end accountability of database information via a continuous assurance technology based on cryptographic hashing techniques. A prototype tamper detection and forensic analysis system named DRAGOON was designed and implemented to determine when tampering(s) occurred and what data were tampered with. DRAGOON is scalable, customizable, and intuitive. This work will show that information accountability is a viable alternative to information restriction for ensuring the correct storage, use, and maintenance of databases on extant DBMSes.
{"title":"DRAGOON: An Information Accountability System for High-Performance Databases","authors":"Kyriacos E. Pavlou, R. Snodgrass","doi":"10.1109/ICDE.2012.139","DOIUrl":"https://doi.org/10.1109/ICDE.2012.139","url":null,"abstract":"Regulations and societal expectations have recently emphasized the need to mediate access to valuable databases, even access by insiders. Fraud occurs when a person, often an insider, tries to hide illegal activity. Companies would like to be assured that such tampering has not occurred, or if it does, that it will be quickly discovered and used to identify the perpetrator. At one end of the compliance spectrum lies the approach of restricting access to information and on the other that of information accountability. We focus on effecting information accountability of data stored in high-performance databases. The demonstrated work ensures appropriate use and thus end-to-end accountability of database information via a continuous assurance technology based on cryptographic hashing techniques. A prototype tamper detection and forensic analysis system named DRAGOON was designed and implemented to determine when tampering(s) occurred and what data were tampered with. DRAGOON is scalable, customizable, and intuitive. This work will show that information accountability is a viable alternative to information restriction for ensuring the correct storage, use, and maintenance of databases on extant DBMSes.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125237388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We demonstrate MXQuery/H, a modified version of MXQuery that uses hardware acceleration to speed up XML processing. The main goal of this demonstration is to give an interactive example of hardware/software co-design and show how system performance and energy efficiency can be improved by off-loading tasks to FPGA hardware. To this end, we equipped MXQuery/H with various hooks to inspect the different parts of the system. Besides that, our system can finally really leverage the idea of XML projection. Though the idea of projection had been around for a while, its effectiveness remained always limited because of the unavoidable and high parsing overhead. By performing the task in hardware, we relieve the software part from this overhead and achieve processing speed-ups of several factors.
{"title":"MXQuery with Hardware Acceleration","authors":"Peter M. Fischer, J. Teubner","doi":"10.1109/ICDE.2012.130","DOIUrl":"https://doi.org/10.1109/ICDE.2012.130","url":null,"abstract":"We demonstrate MXQuery/H, a modified version of MXQuery that uses hardware acceleration to speed up XML processing. The main goal of this demonstration is to give an interactive example of hardware/software co-design and show how system performance and energy efficiency can be improved by off-loading tasks to FPGA hardware. To this end, we equipped MXQuery/H with various hooks to inspect the different parts of the system. Besides that, our system can finally really leverage the idea of XML projection. Though the idea of projection had been around for a while, its effectiveness remained always limited because of the unavoidable and high parsing overhead. By performing the task in hardware, we relieve the software part from this overhead and achieve processing speed-ups of several factors.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134542450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary form only given. Cooperation and trust play an increasingly important role in today's information systems. For instance, peer-to-peer systems like BitTorrent, Sopcast and Skype are powered by resource contributions from participating users, federated systems like the Internet have to respect the interests, policies and laws of participating organizations and countries; in the Cloud, users entrust their data and computation to third-part infrastructure. In this talk, we consider accountability as a way to facilitate transparency and trust in cooperative systems. We look at practical techniques to account for the integrity of distributed, cooperative computations, and look at some of the difficulties and open problems in accountability.
{"title":"Accountability and Trust in Cooperative Information Systems","authors":"P. Druschel","doi":"10.1109/ICDE.2012.152","DOIUrl":"https://doi.org/10.1109/ICDE.2012.152","url":null,"abstract":"Summary form only given. Cooperation and trust play an increasingly important role in today's information systems. For instance, peer-to-peer systems like BitTorrent, Sopcast and Skype are powered by resource contributions from participating users, federated systems like the Internet have to respect the interests, policies and laws of participating organizations and countries; in the Cloud, users entrust their data and computation to third-part infrastructure. In this talk, we consider accountability as a way to facilitate transparency and trust in cooperative systems. We look at practical techniques to account for the integrity of distributed, cooperative computations, and look at some of the difficulties and open problems in accountability.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124784481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We demonstrate DPCube, a component in our Health Information DE-identification (HIDE) framework, for releasing differentially private data cubes (or multi-dimensional histograms) for sensitive data. HIDE is a framework we developed for integrating heterogenous structured and unstructured health information and provides methods for privacy preserving data publishing. The DPCube component uses differentially private access mechanisms and an innovative 2-phase multidimensional partitioning strategy to publish a multi-dimensional data cube or histogram that achieves good utility while satisfying differential privacy. We demonstrate that the released data cubes can serve as a sanitized synopsis of the raw database and, together with an optional synthesized dataset based on the data cubes, can support various Online Analytical Processing (OLAP) queries and learning tasks.
{"title":"DPCube: Releasing Differentially Private Data Cubes for Health Information","authors":"Yonghui Xiao, James J. Gardner, Li Xiong","doi":"10.1109/ICDE.2012.135","DOIUrl":"https://doi.org/10.1109/ICDE.2012.135","url":null,"abstract":"We demonstrate DPCube, a component in our Health Information DE-identification (HIDE) framework, for releasing differentially private data cubes (or multi-dimensional histograms) for sensitive data. HIDE is a framework we developed for integrating heterogenous structured and unstructured health information and provides methods for privacy preserving data publishing. The DPCube component uses differentially private access mechanisms and an innovative 2-phase multidimensional partitioning strategy to publish a multi-dimensional data cube or histogram that achieves good utility while satisfying differential privacy. We demonstrate that the released data cubes can serve as a sanitized synopsis of the raw database and, together with an optional synthesized dataset based on the data cubes, can support various Online Analytical Processing (OLAP) queries and learning tasks.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117162025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shenoda Guirguis, M. Sharaf, Panos K. Chrysanthis, Alexandros Labrinidis
Aggregate Continuous Queries (ACQs) are both a very popular class of Continuous Queries (CQs) and also have a potentially high execution cost. As such, optimizing the processing of ACQs is imperative for Data Stream Management Systems (DSMSs) to reach their full potential in supporting (critical) monitoring applications. For multiple ACQs that vary in window specifications and pre-aggregation filters, existing multiple ACQs optimization schemes assume a processing model where each ACQ is computed as a final-aggregation of a sub-aggregation. In this paper, we propose a novel processing model for ACQs, called Tri Ops, with the goal of minimizing the repetition of operator execution at the sub-aggregation level. We also propose Tri Weave, a Tri Ops-aware multi-query optimizer. We analytically and experimentally demonstrate the performance gains of our proposed schemes which shows their superiority over alternative schemes. Finally, we generalize Tri Weave to incorporate the classical subsumption-based multi-query optimization techniques.
{"title":"Three-Level Processing of Multiple Aggregate Continuous Queries","authors":"Shenoda Guirguis, M. Sharaf, Panos K. Chrysanthis, Alexandros Labrinidis","doi":"10.1109/ICDE.2012.112","DOIUrl":"https://doi.org/10.1109/ICDE.2012.112","url":null,"abstract":"Aggregate Continuous Queries (ACQs) are both a very popular class of Continuous Queries (CQs) and also have a potentially high execution cost. As such, optimizing the processing of ACQs is imperative for Data Stream Management Systems (DSMSs) to reach their full potential in supporting (critical) monitoring applications. For multiple ACQs that vary in window specifications and pre-aggregation filters, existing multiple ACQs optimization schemes assume a processing model where each ACQ is computed as a final-aggregation of a sub-aggregation. In this paper, we propose a novel processing model for ACQs, called Tri Ops, with the goal of minimizing the repetition of operator execution at the sub-aggregation level. We also propose Tri Weave, a Tri Ops-aware multi-query optimizer. We analytically and experimentally demonstrate the performance gains of our proposed schemes which shows their superiority over alternative schemes. Finally, we generalize Tri Weave to incorporate the classical subsumption-based multi-query optimization techniques.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120959231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}