Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452762
Christoph Böhm, Felix Naumann, Ziawasch Abedjan, D. Fenz, Toni Grütze, Daniel Hefenbrock, M. Pohl, David Sonnabend
Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset.
{"title":"Profiling linked open data with ProLOD","authors":"Christoph Böhm, Felix Naumann, Ziawasch Abedjan, D. Fenz, Toni Grütze, Daniel Hefenbrock, M. Pohl, David Sonnabend","doi":"10.1109/ICDEW.2010.5452762","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452762","url":null,"abstract":"Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116100373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452741
Karsten Schmidt, T. Härder
Autonomous index management in native XML DBMSs has to address XML's flexibility and storage mapping features, which provide a rich set of indexing options. Change of workload characteristics, indexes selected by the query optimizer's "magic", subtle differences in the expressiveness of indexes, and tailor-made index properties ask-in addition to (long-range) manual index selection-for rapid autonomic reactions and self-tuning options by the DBMS. Hence, when managing an existing set of indexes (i.e., a configuration), its cost trade-off has to be steadily controlled by observing query runtimes, index creation and maintenance, and space constraints.
{"title":"On the use of query-driven XML auto-indexing","authors":"Karsten Schmidt, T. Härder","doi":"10.1109/ICDEW.2010.5452741","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452741","url":null,"abstract":"Autonomous index management in native XML DBMSs has to address XML's flexibility and storage mapping features, which provide a rich set of indexing options. Change of workload characteristics, indexes selected by the query optimizer's \"magic\", subtle differences in the expressiveness of indexes, and tailor-made index properties ask-in addition to (long-range) manual index selection-for rapid autonomic reactions and self-tuning options by the DBMS. Hence, when managing an existing set of indexes (i.e., a configuration), its cost trade-off has to be steadily controlled by observing query runtimes, index creation and maintenance, and space constraints.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"446 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122485378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452743
G. Graefe, Harumi A. Kuno
Adaptive indexing schemes such as database cracking and adaptive merging have been investigated to-date only in the context of range queries. These are typical for non-key columns in relational databases. For complete self-managing indexing, adaptive indexing must also apply to key columns. The present paper proposes a design and offers a first performance evaluation in the context of keys. Adaptive merging for keys also enables further improvements in B-tree indexes. First, partitions can be matched to levels in the memory hierarchy such as a CPU cache and an in-memory buffer pool. Second, adaptive merging in merged B-trees enables automatic master-detail clustering.
{"title":"Adaptive indexing for relational keys","authors":"G. Graefe, Harumi A. Kuno","doi":"10.1109/ICDEW.2010.5452743","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452743","url":null,"abstract":"Adaptive indexing schemes such as database cracking and adaptive merging have been investigated to-date only in the context of range queries. These are typical for non-key columns in relational databases. For complete self-managing indexing, adaptive indexing must also apply to key columns. The present paper proposes a design and offers a first performance evaluation in the context of keys. Adaptive merging for keys also enables further improvements in B-tree indexes. First, partitions can be matched to levels in the memory hierarchy such as a CPU cache and an in-memory buffer pool. Second, adaptive merging in merged B-trees enables automatic master-detail clustering.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129674130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452722
Ruilin Liu, Wendy Hui Wang
Data publishing has generated much concern on individual privacy. Recent work has focused on different background knowledge and their various threats to the privacy of published data. However, there still exist a few types of adversary knowledge waiting to be investigated. In this paper, I explain my research on privacy-preserving data publishing (PPDP) by using full functional dependencies (FFDs) as part of adversary knowledge. I also briefly explain my research plan.
{"title":"Privacy-preserving data publishing","authors":"Ruilin Liu, Wendy Hui Wang","doi":"10.1109/ICDEW.2010.5452722","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452722","url":null,"abstract":"Data publishing has generated much concern on individual privacy. Recent work has focused on different background knowledge and their various threats to the privacy of published data. However, there still exist a few types of adversary knowledge waiting to be investigated. In this paper, I explain my research on privacy-preserving data publishing (PPDP) by using full functional dependencies (FFDs) as part of adversary knowledge. I also briefly explain my research plan.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128615387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452726
Gabriele Tolomei, S. Orlando, F. Silvestri
Nowadays, people have been increasingly interested in exploiting Web Search Engines (WSEs) not only for having access to simple Web pages, but mainly for carrying out even complex activities, namely Web-mediated processes (or taskflows). Therefore, users' information needs will become more complex, and (Web) search and recommender systems should change accordingly for dealing with this shift. We claim that such taskflows and their composing tasks are implicitly present in users' minds when they interact, thus, with a WSE to access the Web. Our first research challenge is thus to evaluate this belief by analyzing a very large, longterm log of queries submitted to a WSE, and associating meaningful semantic labels with the extracted tasks (i.e., clusters of task-related queries) and taskflows. This large knowledge base constitutes a good starting point for building a model of users' behaviors. The second research challenge is to devise a novel recommender system that goes beyond the simple query suggestion of modern WSEs. Our system has to exploit the knowledge base of Webmediated processes and the learned model of users' behaviors, to generate complex insights and task-based suggestions to incoming users while they interact with a WSE.
{"title":"Towards a task-based search and recommender systems","authors":"Gabriele Tolomei, S. Orlando, F. Silvestri","doi":"10.1109/ICDEW.2010.5452726","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452726","url":null,"abstract":"Nowadays, people have been increasingly interested in exploiting Web Search Engines (WSEs) not only for having access to simple Web pages, but mainly for carrying out even complex activities, namely Web-mediated processes (or taskflows). Therefore, users' information needs will become more complex, and (Web) search and recommender systems should change accordingly for dealing with this shift. We claim that such taskflows and their composing tasks are implicitly present in users' minds when they interact, thus, with a WSE to access the Web. Our first research challenge is thus to evaluate this belief by analyzing a very large, longterm log of queries submitted to a WSE, and associating meaningful semantic labels with the extracted tasks (i.e., clusters of task-related queries) and taskflows. This large knowledge base constitutes a good starting point for building a model of users' behaviors. The second research challenge is to devise a novel recommender system that goes beyond the simple query suggestion of modern WSEs. Our system has to exploit the knowledge base of Webmediated processes and the learned model of users' behaviors, to generate complex insights and task-based suggestions to incoming users while they interact with a WSE.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121074736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452767
M. Yakout, A. Elmagarmid, Jennifer Neville
Improving data quality is a time-consuming, laborintensive and often domain specific operation. A recent principled approach for repairing dirty database is to use data quality rules in the form of database constraints to identify dirty tuples and then use the rules to derive data repairs. Most of existing data repair approaches focus on providing fully automated solutions, which could be risky to depend upon especially for critical data. To guarantee the optimal quality repairs applied to the database, users should be involved to confirm each repair. This highlights the need for an interactive approach that combines the best of both; automatically generating repairs, while efficiently employing user's efforts to verify the repairs. In such approach, the user will guide an online repairing process to incrementally generate repairs. A key challenge in this approach is the response time within the user's interactive sessions, because the process of generating the repairs is time consuming due to the large search space of possible repairs. To this end, we present in this paper a mechanism to continuously generate repairs only to the current top k important violated data quality rules. Moreover, the repairs are grouped and ranked such that the most beneficial in terms of improving data quality comes first to consult the user for verification and feedback. Our experiments on real-world dataset demonstrate the effectiveness of our ranking mechanism to provide a fast response time for the user while improving the data quality as quickly as possible.
{"title":"Ranking for data repairs","authors":"M. Yakout, A. Elmagarmid, Jennifer Neville","doi":"10.1109/ICDEW.2010.5452767","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452767","url":null,"abstract":"Improving data quality is a time-consuming, laborintensive and often domain specific operation. A recent principled approach for repairing dirty database is to use data quality rules in the form of database constraints to identify dirty tuples and then use the rules to derive data repairs. Most of existing data repair approaches focus on providing fully automated solutions, which could be risky to depend upon especially for critical data. To guarantee the optimal quality repairs applied to the database, users should be involved to confirm each repair. This highlights the need for an interactive approach that combines the best of both; automatically generating repairs, while efficiently employing user's efforts to verify the repairs. In such approach, the user will guide an online repairing process to incrementally generate repairs. A key challenge in this approach is the response time within the user's interactive sessions, because the process of generating the repairs is time consuming due to the large search space of possible repairs. To this end, we present in this paper a mechanism to continuously generate repairs only to the current top k important violated data quality rules. Moreover, the repairs are grouped and ranked such that the most beneficial in terms of improving data quality comes first to consult the user for verification and feedback. Our experiments on real-world dataset demonstrate the effectiveness of our ranking mechanism to provide a fast response time for the user while improving the data quality as quickly as possible.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116768519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452714
I. Assent
Recommender systems provide an automatic means of filtering out interesting items, usually based on past similarity of user ratings. In previous work, we have suggested a model that allows users to actively build a recommender network. Users express trust, obtain transparency, and grow (anonymous) recommender connections. In this work, we propose mining such active systems to generate easily understandable representations of the recommender network. Users may review these representations to provide active feedback. This approach further enhances the quality of recommendations, especially as topics of interest change over time. Most notably, it extends the amount of control users have over the model that the recommender network builds of their interests.
{"title":"Mining and representing recommendations in actively evolving recommender systems","authors":"I. Assent","doi":"10.1109/ICDEW.2010.5452714","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452714","url":null,"abstract":"Recommender systems provide an automatic means of filtering out interesting items, usually based on past similarity of user ratings. In previous work, we have suggested a model that allows users to actively build a recommender network. Users express trust, obtain transparency, and grow (anonymous) recommender connections. In this work, we propose mining such active systems to generate easily understandable representations of the recommender network. Users may review these representations to provide active feedback. This approach further enhances the quality of recommendations, especially as topics of interest change over time. Most notably, it extends the amount of control users have over the model that the recommender network builds of their interests.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"82 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123234774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452772
C. Beecks, M. S. Uysal, T. Seidl
A frequently encountered query type in multimedia databases is the k-nearest neighbor query which finds the k-nearest neighbors of a given query. To speed up such queries and to meet the user requirements in low response time, approximation techniques play an important role. In this paper, we present an efficient approximation technique applicable to distance measures defined over flexible feature representations, i.e. feature signatures. We apply our approximation technique to the recently proposed Signature Quadratic Form Distance applicable to feature signatures. We performed our experiments on numerous image databases, gathering k-nearest neighbor query rankings in significantly low computation time with an average speed-up factor of 13.
{"title":"Efficient k-nearest neighbor queries with the Signature Quadratic Form Distance","authors":"C. Beecks, M. S. Uysal, T. Seidl","doi":"10.1109/ICDEW.2010.5452772","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452772","url":null,"abstract":"A frequently encountered query type in multimedia databases is the k-nearest neighbor query which finds the k-nearest neighbors of a given query. To speed up such queries and to meet the user requirements in low response time, approximation techniques play an important role. In this paper, we present an efficient approximation technique applicable to distance measures defined over flexible feature representations, i.e. feature signatures. We apply our approximation technique to the recently proposed Signature Quadratic Form Distance applicable to feature signatures. We performed our experiments on numerous image databases, gathering k-nearest neighbor query rankings in significantly low computation time with an average speed-up factor of 13.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129455403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452738
M. Holze, A. Haschimi, N. Ritter
The workloads of enterprise DBS often show periodic patterns, e.g. because there are mainly OLTP transactions during day-time and analysis operations at night. However, current DBS self-management functions do not consider these periodic patterns in their analysis. Instead, they either adapt the DBS configuration to an overall “average” workload, or they reactively try to adapt the DBS configuration after every periodic change as if the workload had never been observed before. In this paper we propose a periodicity detection approach, which allows the prediction of workload changes for DBS self-management functions. For this purpose, we first describe how recurring DBS workloads, i.e. workloads that are similar to workloads that have been observed in the past, can be identified. We then propose two different approaches for detecting periodic patterns in the history of recurring DBS workloads. Finally we show how this knowledge on periodic patterns can be used to predict workload changes, and how it can be adapted to changes in the periodic patterns over time.
{"title":"Towards workload-aware self-management: Predicting significant workload shifts","authors":"M. Holze, A. Haschimi, N. Ritter","doi":"10.1109/ICDEW.2010.5452738","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452738","url":null,"abstract":"The workloads of enterprise DBS often show periodic patterns, e.g. because there are mainly OLTP transactions during day-time and analysis operations at night. However, current DBS self-management functions do not consider these periodic patterns in their analysis. Instead, they either adapt the DBS configuration to an overall “average” workload, or they reactively try to adapt the DBS configuration after every periodic change as if the workload had never been observed before. In this paper we propose a periodicity detection approach, which allows the prediction of workload changes for DBS self-management functions. For this purpose, we first describe how recurring DBS workloads, i.e. workloads that are similar to workloads that have been observed in the past, can be identified. We then propose two different approaches for detecting periodic patterns in the history of recurring DBS workloads. Finally we show how this knowledge on periodic patterns can be used to predict workload changes, and how it can be adapted to changes in the periodic patterns over time.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128218865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452725
Ruiwen Chen, I. Kiringa, Yongyi Mao
Dependency and query are two fundamental concepts in databases. Specifically, hypergraph representations of join dependencies and conjunctive queries have been widely investigated for conventional relational databases. However, we still lack a systematic study of such graphical representations for uncertain and probabilistic databases. In this paper we initiate a comprehensive study of the role of graphical models in representing uncertainty and evaluating queries.
{"title":"Graphical models for dependencies and queries in uncertain data","authors":"Ruiwen Chen, I. Kiringa, Yongyi Mao","doi":"10.1109/ICDEW.2010.5452725","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452725","url":null,"abstract":"Dependency and query are two fundamental concepts in databases. Specifically, hypergraph representations of join dependencies and conjunctive queries have been widely investigated for conventional relational databases. However, we still lack a systematic study of such graphical representations for uncertain and probabilistic databases. In this paper we initiate a comprehensive study of the role of graphical models in representing uncertainty and evaluating queries.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130033765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}