Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452708
Heiko Stoermer, Themis Palpanas, George Giannakopoulos
We are currently witnessing an increasing interest in the use of the web as an information and knowledge source. Much of the information sought after in the web is in this case relevant to named entities (i.e., persons, locations, organizations, etc.). An important observation is that the entity identification problem lies at the core of many applications in this context. In order to deal with this problem, we propose the Entity Name System (ENS), a large scale, distributed infrastructure for assigning and managing unique identifiers for entities in the web. In this paper, we examine the special requirements for storage and management of entities, in the context of the ENS. We present a conceptual model for the representation of entities, and discuss problems related to data quality, as well as the management of the entity lifecycle. Finally, we describe the architecture of the current prototype of the system.
{"title":"The Entity Name System: Enabling the web of entities","authors":"Heiko Stoermer, Themis Palpanas, George Giannakopoulos","doi":"10.1109/ICDEW.2010.5452708","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452708","url":null,"abstract":"We are currently witnessing an increasing interest in the use of the web as an information and knowledge source. Much of the information sought after in the web is in this case relevant to named entities (i.e., persons, locations, organizations, etc.). An important observation is that the entity identification problem lies at the core of many applications in this context. In order to deal with this problem, we propose the Entity Name System (ENS), a large scale, distributed infrastructure for assigning and managing unique identifiers for entities in the web. In this paper, we examine the special requirements for storage and management of entities, in the context of the ENS. We present a conceptual model for the representation of entities, and discuss problems related to data quality, as well as the management of the entity lifecycle. Finally, we describe the architecture of the current prototype of the system.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129755260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452703
P. Cappellari, Denilson Barbosa, P. Atzeni
We advocate an automated approach for verifying mappings between source and target databases in which semantics are taken into account, and that avoids two serious limitations of current verification approaches: reliance on availability of sample source and target instances, and reliance on strong statistical assumptions. We discuss how our approach can be integrated into the workflow of state-of-the-art mapping design systems, and all its necessary inputs. Our approach relies on checking the entailment of verification statements derived directly from the schema mappings and from semantic annotations to the variables used in such mappings. We discuss how such verification statements can be produced and how such annotations can be extracted from different kinds of alignments of schemas into domain ontologies. Such alignments can be derived semi-automatically; thus, our framework might prove useful in also greatly reducing the amount of input from domain experts in the development of mappings.
{"title":"A framework for automatic schema mapping verification through reasoning","authors":"P. Cappellari, Denilson Barbosa, P. Atzeni","doi":"10.1109/ICDEW.2010.5452703","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452703","url":null,"abstract":"We advocate an automated approach for verifying mappings between source and target databases in which semantics are taken into account, and that avoids two serious limitations of current verification approaches: reliance on availability of sample source and target instances, and reliance on strong statistical assumptions. We discuss how our approach can be integrated into the workflow of state-of-the-art mapping design systems, and all its necessary inputs. Our approach relies on checking the entailment of verification statements derived directly from the schema mappings and from semantic annotations to the variables used in such mappings. We discuss how such verification statements can be produced and how such annotations can be extracted from different kinds of alignments of schemas into domain ontologies. Such alignments can be derived semi-automatically; thus, our framework might prove useful in also greatly reducing the amount of input from domain experts in the development of mappings.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"376 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123110484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452701
Thorsten Möller, H. Schuldt
Executing Semantic Services requires, in contrast to traditional SOAP-based Web Services, frequent read and write accesses to graph-based semantic data stores - for instance, for the evaluation of preconditions or the materialization of service effects. Therefore, the overall performance of semantic service execution, in particular for composite services, is strongly affected by the efficiency of these reads and writes. In this paper we present two data access optimization techniques for semantic data stores: Prepared Queries and Frame Caching. The former reduces the costs for repeated query evaluation, e.g., in loops. The latter provides rapid access to frequently read triples or subgraphs based on materialized views using a Frame-based data structure. The described techniques have been implemented and evaluated on the basis of OSIRIS Next, our open infrastructure for Semantic Service support.
{"title":"Optimized data access for efficient execution of Semantic Services","authors":"Thorsten Möller, H. Schuldt","doi":"10.1109/ICDEW.2010.5452701","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452701","url":null,"abstract":"Executing Semantic Services requires, in contrast to traditional SOAP-based Web Services, frequent read and write accesses to graph-based semantic data stores - for instance, for the evaluation of preconditions or the materialization of service effects. Therefore, the overall performance of semantic service execution, in particular for composite services, is strongly affected by the efficiency of these reads and writes. In this paper we present two data access optimization techniques for semantic data stores: Prepared Queries and Frame Caching. The former reduces the costs for repeated query evaluation, e.g., in loops. The latter provides rapid access to frequently read triples or subgraphs based on materialized views using a Frame-based data structure. The described techniques have been implemented and evaluated on the basis of OSIRIS Next, our open infrastructure for Semantic Service support.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123216672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452747
Shengsheng Huang, Jie Huang, J. Dai, T. Xie, Bo Huang
The MapReduce model is becoming prominent for the large-scale data analysis in the cloud. In this paper, we present the benchmarking, evaluation and characterization of Hadoop, an open-source implementation of MapReduce. We first introduce HiBench, a new benchmark suite for Hadoop. It consists of a set of Hadoop programs, including both synthetic micro-benchmarks and real-world Hadoop applications. We then evaluate and characterize the Hadoop framework using HiBench, in terms of speed (i.e., job running time), throughput (i.e., the number of tasks completed per minute), HDFS bandwidth, system resource (e.g., CPU, memory and I/O) utilizations, and data access patterns.
{"title":"The HiBench benchmark suite: Characterization of the MapReduce-based data analysis","authors":"Shengsheng Huang, Jie Huang, J. Dai, T. Xie, Bo Huang","doi":"10.1109/ICDEW.2010.5452747","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452747","url":null,"abstract":"The MapReduce model is becoming prominent for the large-scale data analysis in the cloud. In this paper, we present the benchmarking, evaluation and characterization of Hadoop, an open-source implementation of MapReduce. We first introduce HiBench, a new benchmark suite for Hadoop. It consists of a set of Hadoop programs, including both synthetic micro-benchmarks and real-world Hadoop applications. We then evaluate and characterize the Hadoop framework using HiBench, in terms of speed (i.e., job running time), throughput (i.e., the number of tasks completed per minute), HDFS bandwidth, system resource (e.g., CPU, memory and I/O) utilizations, and data access patterns.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115912500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452733
Haiquan Chen, Wei-Shinn Ku, Haixun Wang
Emerging uncertain database applications often involve the cleansing (conditioning) of uncertain databases using additional information as new evidence for reducing the uncertainty. However, past researches on conditioning probabilistic databases, unfortunately, only focus on functional dependency. In real world applications, most additional information on uncertain data sets can be acquired in the form of aggregate constraints (e.g., the aggregate results are published online for various statistical purposes). Therefore, if these aggregate constraints can be taken into account, uncertainty in data sets can be largely reduced. However, finding a practical method to exploit aggregate constraints to decrease uncertainty is a very challenging problem. In this paper, we present three approaches to cleanse (condition) uncertain databases by employing aggregate constraints. Because the problem is NP-hard, we focus on the two approximation strategies by modeling the problem as a nonlinear optimization problem and then utilizing Simulated Annealing (SA) and Evolutionary Algorithm (EA) to sample from the entire solution space of possible worlds. In order to favor those possible worlds holding higher probabilities and satisfying all the constraints at the same time, we define Satisfaction Degree Functions (SDF) and then construct the objective function accordingly. Subsequently, based on the sample result, we remove duplicates, re-normalize the probabilities of all the qualified possible worlds, and derive the posterior probabilistic database. Our experiments verify the efficiency and effectiveness of our algorithms and show that our approximate approaches scale well to large-sized databases.
{"title":"Cleansing uncertain databases leveraging aggregate constraints","authors":"Haiquan Chen, Wei-Shinn Ku, Haixun Wang","doi":"10.1109/ICDEW.2010.5452733","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452733","url":null,"abstract":"Emerging uncertain database applications often involve the cleansing (conditioning) of uncertain databases using additional information as new evidence for reducing the uncertainty. However, past researches on conditioning probabilistic databases, unfortunately, only focus on functional dependency. In real world applications, most additional information on uncertain data sets can be acquired in the form of aggregate constraints (e.g., the aggregate results are published online for various statistical purposes). Therefore, if these aggregate constraints can be taken into account, uncertainty in data sets can be largely reduced. However, finding a practical method to exploit aggregate constraints to decrease uncertainty is a very challenging problem. In this paper, we present three approaches to cleanse (condition) uncertain databases by employing aggregate constraints. Because the problem is NP-hard, we focus on the two approximation strategies by modeling the problem as a nonlinear optimization problem and then utilizing Simulated Annealing (SA) and Evolutionary Algorithm (EA) to sample from the entire solution space of possible worlds. In order to favor those possible worlds holding higher probabilities and satisfying all the constraints at the same time, we define Satisfaction Degree Functions (SDF) and then construct the objective function accordingly. Subsequently, based on the sample result, we remove duplicates, re-normalize the probabilities of all the qualified possible worlds, and derive the posterior probabilistic database. Our experiments verify the efficiency and effectiveness of our algorithms and show that our approximate approaches scale well to large-sized databases.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128866989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452755
Wojciech M. Barczynski, Falk Brauer, Adrian Mocan, M. Schramm, Jan Froemberg
Business Intelligence (BI) over unstructured text is under intense scrutiny both in industry and research. Recent work in this field includes automatic integration of unstructured text into BI systems, model recognition, and probabilistic databases to handle uncertainty of Information Extraction (IE) results. Our aim is to use analytics to discover statistically relevant and unknown relationship between entities in documents' fragments. We present a method for transforming IE results to an OLAP model and we demonstrate it in a real world scenario for the SAP Community Network.
{"title":"BI-style relation discovery among entities in text","authors":"Wojciech M. Barczynski, Falk Brauer, Adrian Mocan, M. Schramm, Jan Froemberg","doi":"10.1109/ICDEW.2010.5452755","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452755","url":null,"abstract":"Business Intelligence (BI) over unstructured text is under intense scrutiny both in industry and research. Recent work in this field includes automatic integration of unstructured text into BI systems, model recognition, and probabilistic databases to handle uncertainty of Information Extraction (IE) results. Our aim is to use analytics to discover statistically relevant and unknown relationship between entities in documents' fragments. We present a method for transforming IE results to an OLAP model and we demonstrate it in a real world scenario for the SAP Community Network.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"969 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123075645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452727
Beibei Li, Panagiotis G. Ipeirotis, A. Ghose
With the growing pervasiveness of the Internet, online search for commercial goods and services is constantly increasing, as more and more people search and purchase goods from the Internet. Most of the current algorithms for product search are based on adaptations of theoretical models devised for “classic” information retrieval. However, the decision mechanism that underlies the process of buying a product is different than the process of judging a document as relevant or not. So, applying theories of relevance for the task of product search may not be the best approach. We propose a theory model for product search based on expected utility theory from economics. Specifically, we propose a ranking technique in which we rank highest the products that generate the highest consumer surplus after the purchase. In a sense, we rank highest the products that are the “best value for money” for a specific user. Our approach naturally builds on decades of research in the field of economics and presents a solid theoretical foundation in which further research can build on. We instantiate our research by building a search engine for hotels, and show how we can build algorithms that naturally take into account consumer demographics, heterogeneity of consumer preferences, and also account for the varying price of the hotel rooms. Our extensive user studies demonstrate an overwhelming preference for the rankings generated by our techniques, compared to a large number of existing strong baselines.
{"title":"Improving product search with economic theory","authors":"Beibei Li, Panagiotis G. Ipeirotis, A. Ghose","doi":"10.1109/ICDEW.2010.5452727","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452727","url":null,"abstract":"With the growing pervasiveness of the Internet, online search for commercial goods and services is constantly increasing, as more and more people search and purchase goods from the Internet. Most of the current algorithms for product search are based on adaptations of theoretical models devised for “classic” information retrieval. However, the decision mechanism that underlies the process of buying a product is different than the process of judging a document as relevant or not. So, applying theories of relevance for the task of product search may not be the best approach. We propose a theory model for product search based on expected utility theory from economics. Specifically, we propose a ranking technique in which we rank highest the products that generate the highest consumer surplus after the purchase. In a sense, we rank highest the products that are the “best value for money” for a specific user. Our approach naturally builds on decades of research in the field of economics and presents a solid theoretical foundation in which further research can build on. We instantiate our research by building a search engine for hotels, and show how we can build algorithms that naturally take into account consumer demographics, heterogeneity of consumer preferences, and also account for the varying price of the hotel rooms. Our extensive user studies demonstrate an overwhelming preference for the rankings generated by our techniques, compared to a large number of existing strong baselines.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130469851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452728
Zijie Qi, Yinghui (Catherine) Yang
Constrained clustering (semi-supervised learning) techniques have attracted more attention in recent years. However, the commonly used constraints are restricted to the instance level, thus we introduced two new classifications for the type of constraints: decision constraints and non-decision constraints. We implemented applications involving non-decision constraints to find alternative clusterings. Due to the fact that randomly generated constraints might adversely impact the performance, we discussed the main reasons for carefully generating a subset of useful constraints, and defined two basic questions on how to generate useful constraints.
{"title":"Advances in constrained clustering","authors":"Zijie Qi, Yinghui (Catherine) Yang","doi":"10.1109/ICDEW.2010.5452728","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452728","url":null,"abstract":"Constrained clustering (semi-supervised learning) techniques have attracted more attention in recent years. However, the commonly used constraints are restricted to the instance level, thus we introduced two new classifications for the type of constraints: decision constraints and non-decision constraints. We implemented applications involving non-decision constraints to find alternative clusterings. Due to the fact that randomly generated constraints might adversely impact the performance, we discussed the main reasons for carefully generating a subset of useful constraints, and defined two basic questions on how to generate useful constraints.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131380160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452744
W. Powley, Patrick Martin, Mingyi Zhang, Paul Bird, Keith McDonald
Database Management Systems (DBMSs) are often required to simultaneously process multiple diverse workloads while enforcing business policies that govern workload performance. Workload control mechanisms such as admission control, query scheduling, and workload execution control serve to ensure that such policies are enforced and that individual workload goals are met. Query throttling can be used as a workload execution control method whereby problematic queries are slowed down, thus freeing resources to allow the more important work to complete more rapidly. In a self-managed system, a controller would be used to determine the appropriate level of throttling necessary to allow the important workload to meet is goals. The throttling would be increased or decreased depending upon the current system performance. In this paper, we explore two techniques to maintain an appropriate level of query throttling. The first technique uses a simple controller based on a diminishing step function to determine the amount of throttling. The second technique adopts a control theory approach that uses a black-box modelling technique to model the system and to determine the appropriate throttle value given current performance. We present a set of experiments that illustrate the effectiveness of each controller, then propose and evaluate a hybrid controller that combines the two techniques.
{"title":"Autonomic workload execution control using throttling","authors":"W. Powley, Patrick Martin, Mingyi Zhang, Paul Bird, Keith McDonald","doi":"10.1109/ICDEW.2010.5452744","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452744","url":null,"abstract":"Database Management Systems (DBMSs) are often required to simultaneously process multiple diverse workloads while enforcing business policies that govern workload performance. Workload control mechanisms such as admission control, query scheduling, and workload execution control serve to ensure that such policies are enforced and that individual workload goals are met. Query throttling can be used as a workload execution control method whereby problematic queries are slowed down, thus freeing resources to allow the more important work to complete more rapidly. In a self-managed system, a controller would be used to determine the appropriate level of throttling necessary to allow the important workload to meet is goals. The throttling would be increased or decreased depending upon the current system performance. In this paper, we explore two techniques to maintain an appropriate level of query throttling. The first technique uses a simple controller based on a diminishing step function to determine the amount of throttling. The second technique adopts a control theory approach that uses a black-box modelling technique to model the system and to determine the appropriate throttle value given current performance. We present a set of experiments that illustrate the effectiveness of each controller, then propose and evaluate a hybrid controller that combines the two techniques.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115833054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-03-01DOI: 10.1109/ICDEW.2010.5452730
M. Miah
In recent years, there has been significant interest in the development of ranking functions and efficient top-k retrieval algorithms to help users in ad-hoc search and retrieval in databases (e.g., buyers searching for products in a catalog). We introduce a complementary problem: how to guide a seller in selecting the best attributes of a new tuple (e.g., a new product) to highlight so that it stands out in the crowd of existing competitive products and is widely visible to the pool of potential buyers. We refer this problem as “attributes selection” problem. Package design based on user input is a problem that has also attracted recent interest. Given a set of elements, and a set of user preferences (where each preference is a conjunction of positive or negative preferences for individual elements), we investigate the problem of designing the most “popular package”, i.e., a subset of the elements that maximizes the number of satisfied users. Numerous instances of this problem occur in practice. We refer this later problem as “package design” problem. We develop several formulations of both the problems. Even for the NP-complete problems, we give several exact (optimal) and approximation algorithms that work well in practice. Our experimental evaluation on real and synthetic datasets shows that the optimal and approximate algorithms are efficient for moderate and large datasets respectively, and also that the approximate algorithms have small approximation error.
{"title":"Maximizing visibility of objects","authors":"M. Miah","doi":"10.1109/ICDEW.2010.5452730","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452730","url":null,"abstract":"In recent years, there has been significant interest in the development of ranking functions and efficient top-k retrieval algorithms to help users in ad-hoc search and retrieval in databases (e.g., buyers searching for products in a catalog). We introduce a complementary problem: how to guide a seller in selecting the best attributes of a new tuple (e.g., a new product) to highlight so that it stands out in the crowd of existing competitive products and is widely visible to the pool of potential buyers. We refer this problem as “attributes selection” problem. Package design based on user input is a problem that has also attracted recent interest. Given a set of elements, and a set of user preferences (where each preference is a conjunction of positive or negative preferences for individual elements), we investigate the problem of designing the most “popular package”, i.e., a subset of the elements that maximizes the number of satisfied users. Numerous instances of this problem occur in practice. We refer this later problem as “package design” problem. We develop several formulations of both the problems. Even for the NP-complete problems, we give several exact (optimal) and approximation algorithms that work well in practice. Our experimental evaluation on real and synthetic datasets shows that the optimal and approximate algorithms are efficient for moderate and large datasets respectively, and also that the approximate algorithms have small approximation error.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121125941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}