首页 > 最新文献

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)最新文献

英文 中文
The Entity Name System: Enabling the web of entities 实体名称系统:启用实体网络
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452708
Heiko Stoermer, Themis Palpanas, George Giannakopoulos
We are currently witnessing an increasing interest in the use of the web as an information and knowledge source. Much of the information sought after in the web is in this case relevant to named entities (i.e., persons, locations, organizations, etc.). An important observation is that the entity identification problem lies at the core of many applications in this context. In order to deal with this problem, we propose the Entity Name System (ENS), a large scale, distributed infrastructure for assigning and managing unique identifiers for entities in the web. In this paper, we examine the special requirements for storage and management of entities, in the context of the ENS. We present a conceptual model for the representation of entities, and discuss problems related to data quality, as well as the management of the entity lifecycle. Finally, we describe the architecture of the current prototype of the system.
目前,我们看到越来越多的人对使用网络作为信息和知识来源感兴趣。在这种情况下,在网络上搜索的大部分信息都与命名实体相关(即,人员、地点、组织等)。一个重要的观察结果是,在这种情况下,实体标识问题是许多应用程序的核心。为了解决这个问题,我们提出了实体名称系统(ENS),这是一个大规模的分布式基础设施,用于分配和管理网络中实体的唯一标识符。在本文中,我们研究了在ens背景下对实体的存储和管理的特殊要求。我们提出了一个实体表示的概念模型,并讨论了与数据质量以及实体生命周期管理相关的问题。最后,我们描述了当前系统原型的体系结构。
{"title":"The Entity Name System: Enabling the web of entities","authors":"Heiko Stoermer, Themis Palpanas, George Giannakopoulos","doi":"10.1109/ICDEW.2010.5452708","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452708","url":null,"abstract":"We are currently witnessing an increasing interest in the use of the web as an information and knowledge source. Much of the information sought after in the web is in this case relevant to named entities (i.e., persons, locations, organizations, etc.). An important observation is that the entity identification problem lies at the core of many applications in this context. In order to deal with this problem, we propose the Entity Name System (ENS), a large scale, distributed infrastructure for assigning and managing unique identifiers for entities in the web. In this paper, we examine the special requirements for storage and management of entities, in the context of the ENS. We present a conceptual model for the representation of entities, and discuss problems related to data quality, as well as the management of the entity lifecycle. Finally, we describe the architecture of the current prototype of the system.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129755260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A framework for automatic schema mapping verification through reasoning 一个通过推理自动验证模式映射的框架
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452703
P. Cappellari, Denilson Barbosa, P. Atzeni
We advocate an automated approach for verifying mappings between source and target databases in which semantics are taken into account, and that avoids two serious limitations of current verification approaches: reliance on availability of sample source and target instances, and reliance on strong statistical assumptions. We discuss how our approach can be integrated into the workflow of state-of-the-art mapping design systems, and all its necessary inputs. Our approach relies on checking the entailment of verification statements derived directly from the schema mappings and from semantic annotations to the variables used in such mappings. We discuss how such verification statements can be produced and how such annotations can be extracted from different kinds of alignments of schemas into domain ontologies. Such alignments can be derived semi-automatically; thus, our framework might prove useful in also greatly reducing the amount of input from domain experts in the development of mappings.
我们提倡一种自动化的方法来验证源和目标数据库之间的映射,其中考虑了语义,并且避免了当前验证方法的两个严重限制:依赖于样本源和目标实例的可用性,以及依赖于强统计假设。我们讨论了如何将我们的方法集成到最先进的地图设计系统的工作流程中,以及所有必要的输入。我们的方法依赖于检查直接从模式映射和对这种映射中使用的变量的语义注释派生的验证语句的蕴涵。我们将讨论如何生成这样的验证语句,以及如何从不同类型的模式对齐中提取这样的注释到域本体。这种对准可以半自动地推导出来;因此,我们的框架可能在极大地减少领域专家在映射开发中的输入量方面被证明是有用的。
{"title":"A framework for automatic schema mapping verification through reasoning","authors":"P. Cappellari, Denilson Barbosa, P. Atzeni","doi":"10.1109/ICDEW.2010.5452703","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452703","url":null,"abstract":"We advocate an automated approach for verifying mappings between source and target databases in which semantics are taken into account, and that avoids two serious limitations of current verification approaches: reliance on availability of sample source and target instances, and reliance on strong statistical assumptions. We discuss how our approach can be integrated into the workflow of state-of-the-art mapping design systems, and all its necessary inputs. Our approach relies on checking the entailment of verification statements derived directly from the schema mappings and from semantic annotations to the variables used in such mappings. We discuss how such verification statements can be produced and how such annotations can be extracted from different kinds of alignments of schemas into domain ontologies. Such alignments can be derived semi-automatically; thus, our framework might prove useful in also greatly reducing the amount of input from domain experts in the development of mappings.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"376 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123110484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Optimized data access for efficient execution of Semantic Services 优化数据访问,实现语义服务的高效执行
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452701
Thorsten Möller, H. Schuldt
Executing Semantic Services requires, in contrast to traditional SOAP-based Web Services, frequent read and write accesses to graph-based semantic data stores - for instance, for the evaluation of preconditions or the materialization of service effects. Therefore, the overall performance of semantic service execution, in particular for composite services, is strongly affected by the efficiency of these reads and writes. In this paper we present two data access optimization techniques for semantic data stores: Prepared Queries and Frame Caching. The former reduces the costs for repeated query evaluation, e.g., in loops. The latter provides rapid access to frequently read triples or subgraphs based on materialized views using a Frame-based data structure. The described techniques have been implemented and evaluated on the basis of OSIRIS Next, our open infrastructure for Semantic Service support.
与传统的基于soap的Web服务相比,执行语义服务需要对基于图的语义数据存储进行频繁的读写访问——例如,为了评估前提条件或实现服务效果。因此,语义服务执行的总体性能,特别是组合服务的总体性能,很大程度上受到这些读写效率的影响。本文提出了两种语义数据存储的数据访问优化技术:准备查询和帧缓存。前者减少了重复查询计算的成本,例如在循环中。后者使用基于帧的数据结构提供了对基于物化视图的频繁读取三元组或子图的快速访问。所描述的技术已经在OSIRIS Next的基础上实现和评估,OSIRIS Next是我们用于语义服务支持的开放基础设施。
{"title":"Optimized data access for efficient execution of Semantic Services","authors":"Thorsten Möller, H. Schuldt","doi":"10.1109/ICDEW.2010.5452701","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452701","url":null,"abstract":"Executing Semantic Services requires, in contrast to traditional SOAP-based Web Services, frequent read and write accesses to graph-based semantic data stores - for instance, for the evaluation of preconditions or the materialization of service effects. Therefore, the overall performance of semantic service execution, in particular for composite services, is strongly affected by the efficiency of these reads and writes. In this paper we present two data access optimization techniques for semantic data stores: Prepared Queries and Frame Caching. The former reduces the costs for repeated query evaluation, e.g., in loops. The latter provides rapid access to frequently read triples or subgraphs based on materialized views using a Frame-based data structure. The described techniques have been implemented and evaluated on the basis of OSIRIS Next, our open infrastructure for Semantic Service support.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123216672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The HiBench benchmark suite: Characterization of the MapReduce-based data analysis HiBench基准测试套件:基于mapreduce的数据分析的特征
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452747
Shengsheng Huang, Jie Huang, J. Dai, T. Xie, Bo Huang
The MapReduce model is becoming prominent for the large-scale data analysis in the cloud. In this paper, we present the benchmarking, evaluation and characterization of Hadoop, an open-source implementation of MapReduce. We first introduce HiBench, a new benchmark suite for Hadoop. It consists of a set of Hadoop programs, including both synthetic micro-benchmarks and real-world Hadoop applications. We then evaluate and characterize the Hadoop framework using HiBench, in terms of speed (i.e., job running time), throughput (i.e., the number of tasks completed per minute), HDFS bandwidth, system resource (e.g., CPU, memory and I/O) utilizations, and data access patterns.
MapReduce模型在云中的大规模数据分析中变得越来越突出。在本文中,我们提出了对Hadoop的基准测试,评估和表征,Hadoop是MapReduce的开源实现。我们首先介绍HiBench,一个新的Hadoop基准测试套件。它由一组Hadoop程序组成,包括合成微基准测试和真实的Hadoop应用程序。然后,我们使用HiBench来评估和描述Hadoop框架,包括速度(即作业运行时间)、吞吐量(即每分钟完成的任务数量)、HDFS带宽、系统资源(例如CPU、内存和I/O)利用率和数据访问模式。
{"title":"The HiBench benchmark suite: Characterization of the MapReduce-based data analysis","authors":"Shengsheng Huang, Jie Huang, J. Dai, T. Xie, Bo Huang","doi":"10.1109/ICDEW.2010.5452747","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452747","url":null,"abstract":"The MapReduce model is becoming prominent for the large-scale data analysis in the cloud. In this paper, we present the benchmarking, evaluation and characterization of Hadoop, an open-source implementation of MapReduce. We first introduce HiBench, a new benchmark suite for Hadoop. It consists of a set of Hadoop programs, including both synthetic micro-benchmarks and real-world Hadoop applications. We then evaluate and characterize the Hadoop framework using HiBench, in terms of speed (i.e., job running time), throughput (i.e., the number of tasks completed per minute), HDFS bandwidth, system resource (e.g., CPU, memory and I/O) utilizations, and data access patterns.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115912500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 748
Cleansing uncertain databases leveraging aggregate constraints 利用聚合约束清理不确定的数据库
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452733
Haiquan Chen, Wei-Shinn Ku, Haixun Wang
Emerging uncertain database applications often involve the cleansing (conditioning) of uncertain databases using additional information as new evidence for reducing the uncertainty. However, past researches on conditioning probabilistic databases, unfortunately, only focus on functional dependency. In real world applications, most additional information on uncertain data sets can be acquired in the form of aggregate constraints (e.g., the aggregate results are published online for various statistical purposes). Therefore, if these aggregate constraints can be taken into account, uncertainty in data sets can be largely reduced. However, finding a practical method to exploit aggregate constraints to decrease uncertainty is a very challenging problem. In this paper, we present three approaches to cleanse (condition) uncertain databases by employing aggregate constraints. Because the problem is NP-hard, we focus on the two approximation strategies by modeling the problem as a nonlinear optimization problem and then utilizing Simulated Annealing (SA) and Evolutionary Algorithm (EA) to sample from the entire solution space of possible worlds. In order to favor those possible worlds holding higher probabilities and satisfying all the constraints at the same time, we define Satisfaction Degree Functions (SDF) and then construct the objective function accordingly. Subsequently, based on the sample result, we remove duplicates, re-normalize the probabilities of all the qualified possible worlds, and derive the posterior probabilistic database. Our experiments verify the efficiency and effectiveness of our algorithms and show that our approximate approaches scale well to large-sized databases.
新兴的不确定数据库应用通常涉及使用附加信息作为减少不确定性的新证据来清理(调整)不确定数据库。遗憾的是,以往对条件反射概率数据库的研究主要集中在函数依赖关系上。在现实世界的应用中,大多数不确定数据集的附加信息可以以聚合约束的形式获得(例如,出于各种统计目的,将聚合结果在线发布)。因此,如果考虑到这些总体约束,就可以大大降低数据集的不确定性。然而,寻找一种实用的方法来利用集合约束来减少不确定性是一个非常具有挑战性的问题。在本文中,我们提出了三种利用聚合约束来清理(条件)不确定数据库的方法。由于该问题是NP-hard问题,我们将该问题建模为一个非线性优化问题,然后利用模拟退火(SA)和进化算法(EA)从整个可能世界的解空间中采样,重点研究两种逼近策略。为了偏爱那些具有较高概率且同时满足所有约束条件的可能世界,我们定义了满意度函数(Satisfaction Degree Functions, SDF),并据此构造目标函数。随后,基于样本结果,我们去除重复项,对所有符合条件的可能世界的概率进行重新归一化,并导出后验概率数据库。我们的实验验证了我们算法的效率和有效性,并表明我们的近似方法可以很好地扩展到大型数据库。
{"title":"Cleansing uncertain databases leveraging aggregate constraints","authors":"Haiquan Chen, Wei-Shinn Ku, Haixun Wang","doi":"10.1109/ICDEW.2010.5452733","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452733","url":null,"abstract":"Emerging uncertain database applications often involve the cleansing (conditioning) of uncertain databases using additional information as new evidence for reducing the uncertainty. However, past researches on conditioning probabilistic databases, unfortunately, only focus on functional dependency. In real world applications, most additional information on uncertain data sets can be acquired in the form of aggregate constraints (e.g., the aggregate results are published online for various statistical purposes). Therefore, if these aggregate constraints can be taken into account, uncertainty in data sets can be largely reduced. However, finding a practical method to exploit aggregate constraints to decrease uncertainty is a very challenging problem. In this paper, we present three approaches to cleanse (condition) uncertain databases by employing aggregate constraints. Because the problem is NP-hard, we focus on the two approximation strategies by modeling the problem as a nonlinear optimization problem and then utilizing Simulated Annealing (SA) and Evolutionary Algorithm (EA) to sample from the entire solution space of possible worlds. In order to favor those possible worlds holding higher probabilities and satisfying all the constraints at the same time, we define Satisfaction Degree Functions (SDF) and then construct the objective function accordingly. Subsequently, based on the sample result, we remove duplicates, re-normalize the probabilities of all the qualified possible worlds, and derive the posterior probabilistic database. Our experiments verify the efficiency and effectiveness of our algorithms and show that our approximate approaches scale well to large-sized databases.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128866989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
BI-style relation discovery among entities in text 文本中实体之间bi风格的关系发现
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452755
Wojciech M. Barczynski, Falk Brauer, Adrian Mocan, M. Schramm, Jan Froemberg
Business Intelligence (BI) over unstructured text is under intense scrutiny both in industry and research. Recent work in this field includes automatic integration of unstructured text into BI systems, model recognition, and probabilistic databases to handle uncertainty of Information Extraction (IE) results. Our aim is to use analytics to discover statistically relevant and unknown relationship between entities in documents' fragments. We present a method for transforming IE results to an OLAP model and we demonstrate it in a real world scenario for the SAP Community Network.
非结构化文本的商业智能(BI)在工业界和研究领域都受到了严格的审查。该领域最近的工作包括将非结构化文本自动集成到BI系统、模型识别和概率数据库中以处理信息提取(IE)结果的不确定性。我们的目标是使用分析来发现文档片段中实体之间的统计相关和未知关系。我们提出了一种将IE结果转换为OLAP模型的方法,并在SAP社区网络的实际场景中进行了演示。
{"title":"BI-style relation discovery among entities in text","authors":"Wojciech M. Barczynski, Falk Brauer, Adrian Mocan, M. Schramm, Jan Froemberg","doi":"10.1109/ICDEW.2010.5452755","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452755","url":null,"abstract":"Business Intelligence (BI) over unstructured text is under intense scrutiny both in industry and research. Recent work in this field includes automatic integration of unstructured text into BI systems, model recognition, and probabilistic databases to handle uncertainty of Information Extraction (IE) results. Our aim is to use analytics to discover statistically relevant and unknown relationship between entities in documents' fragments. We present a method for transforming IE results to an OLAP model and we demonstrate it in a real world scenario for the SAP Community Network.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"969 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123075645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving product search with economic theory 用经济理论改进产品搜索
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452727
Beibei Li, Panagiotis G. Ipeirotis, A. Ghose
With the growing pervasiveness of the Internet, online search for commercial goods and services is constantly increasing, as more and more people search and purchase goods from the Internet. Most of the current algorithms for product search are based on adaptations of theoretical models devised for “classic” information retrieval. However, the decision mechanism that underlies the process of buying a product is different than the process of judging a document as relevant or not. So, applying theories of relevance for the task of product search may not be the best approach. We propose a theory model for product search based on expected utility theory from economics. Specifically, we propose a ranking technique in which we rank highest the products that generate the highest consumer surplus after the purchase. In a sense, we rank highest the products that are the “best value for money” for a specific user. Our approach naturally builds on decades of research in the field of economics and presents a solid theoretical foundation in which further research can build on. We instantiate our research by building a search engine for hotels, and show how we can build algorithms that naturally take into account consumer demographics, heterogeneity of consumer preferences, and also account for the varying price of the hotel rooms. Our extensive user studies demonstrate an overwhelming preference for the rankings generated by our techniques, compared to a large number of existing strong baselines.
随着互联网的日益普及,越来越多的人从互联网上搜索和购买商品,对商业商品和服务的在线搜索也在不断增加。目前大多数产品搜索算法都是基于对“经典”信息检索理论模型的改编。然而,作为购买产品过程基础的决策机制不同于判断文档是否相关的过程。因此,将相关理论应用于产品搜索任务可能不是最好的方法。基于经济学的期望效用理论,提出了产品搜索的理论模型。具体来说,我们提出了一种排名技术,其中我们将购买后产生最高消费者剩余的产品排名最高。从某种意义上说,我们把对特定用户来说“物有所值”的产品排名靠前。我们的方法自然建立在经济学领域数十年的研究基础之上,并为进一步的研究提供了坚实的理论基础。我们通过建立一个酒店搜索引擎来实例化我们的研究,并展示了我们如何构建算法,自然地考虑到消费者的人口统计数据、消费者偏好的异质性,以及酒店房间的不同价格。我们广泛的用户研究表明,与大量现有的强大基线相比,用户对我们的技术生成的排名有压倒性的偏好。
{"title":"Improving product search with economic theory","authors":"Beibei Li, Panagiotis G. Ipeirotis, A. Ghose","doi":"10.1109/ICDEW.2010.5452727","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452727","url":null,"abstract":"With the growing pervasiveness of the Internet, online search for commercial goods and services is constantly increasing, as more and more people search and purchase goods from the Internet. Most of the current algorithms for product search are based on adaptations of theoretical models devised for “classic” information retrieval. However, the decision mechanism that underlies the process of buying a product is different than the process of judging a document as relevant or not. So, applying theories of relevance for the task of product search may not be the best approach. We propose a theory model for product search based on expected utility theory from economics. Specifically, we propose a ranking technique in which we rank highest the products that generate the highest consumer surplus after the purchase. In a sense, we rank highest the products that are the “best value for money” for a specific user. Our approach naturally builds on decades of research in the field of economics and presents a solid theoretical foundation in which further research can build on. We instantiate our research by building a search engine for hotels, and show how we can build algorithms that naturally take into account consumer demographics, heterogeneity of consumer preferences, and also account for the varying price of the hotel rooms. Our extensive user studies demonstrate an overwhelming preference for the rankings generated by our techniques, compared to a large number of existing strong baselines.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130469851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in constrained clustering 约束聚类研究进展
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452728
Zijie Qi, Yinghui (Catherine) Yang
Constrained clustering (semi-supervised learning) techniques have attracted more attention in recent years. However, the commonly used constraints are restricted to the instance level, thus we introduced two new classifications for the type of constraints: decision constraints and non-decision constraints. We implemented applications involving non-decision constraints to find alternative clusterings. Due to the fact that randomly generated constraints might adversely impact the performance, we discussed the main reasons for carefully generating a subset of useful constraints, and defined two basic questions on how to generate useful constraints.
约束聚类(半监督学习)技术近年来受到越来越多的关注。然而,常用的约束仅限于实例级别,因此我们为约束类型引入了两种新的分类:决策约束和非决策约束。我们实现了涉及非决策约束的应用程序,以查找备选聚类。由于随机生成的约束可能会对性能产生不利影响,因此我们讨论了仔细生成有用约束子集的主要原因,并定义了关于如何生成有用约束的两个基本问题。
{"title":"Advances in constrained clustering","authors":"Zijie Qi, Yinghui (Catherine) Yang","doi":"10.1109/ICDEW.2010.5452728","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452728","url":null,"abstract":"Constrained clustering (semi-supervised learning) techniques have attracted more attention in recent years. However, the commonly used constraints are restricted to the instance level, thus we introduced two new classifications for the type of constraints: decision constraints and non-decision constraints. We implemented applications involving non-decision constraints to find alternative clusterings. Due to the fact that randomly generated constraints might adversely impact the performance, we discussed the main reasons for carefully generating a subset of useful constraints, and defined two basic questions on how to generate useful constraints.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131380160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Autonomic workload execution control using throttling 使用节流的自主工作负载执行控制
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452744
W. Powley, Patrick Martin, Mingyi Zhang, Paul Bird, Keith McDonald
Database Management Systems (DBMSs) are often required to simultaneously process multiple diverse workloads while enforcing business policies that govern workload performance. Workload control mechanisms such as admission control, query scheduling, and workload execution control serve to ensure that such policies are enforced and that individual workload goals are met. Query throttling can be used as a workload execution control method whereby problematic queries are slowed down, thus freeing resources to allow the more important work to complete more rapidly. In a self-managed system, a controller would be used to determine the appropriate level of throttling necessary to allow the important workload to meet is goals. The throttling would be increased or decreased depending upon the current system performance. In this paper, we explore two techniques to maintain an appropriate level of query throttling. The first technique uses a simple controller based on a diminishing step function to determine the amount of throttling. The second technique adopts a control theory approach that uses a black-box modelling technique to model the system and to determine the appropriate throttle value given current performance. We present a set of experiments that illustrate the effectiveness of each controller, then propose and evaluate a hybrid controller that combines the two techniques.
通常需要数据库管理系统(dbms)同时处理多个不同的工作负载,同时执行管理工作负载性能的业务策略。诸如准入控制、查询调度和工作负载执行控制等工作负载控制机制用于确保执行这些策略并满足各个工作负载目标。查询节流可以用作一种工作负载执行控制方法,通过这种方法可以减缓有问题的查询,从而释放资源,使更重要的工作能够更快地完成。在自我管理的系统中,将使用控制器来确定适当的节流级别,以允许重要的工作负载满足其目标。节流将根据当前系统性能增加或减少。在本文中,我们将探讨两种技术来维护适当级别的查询节流。第一种技术使用基于递减阶跃函数的简单控制器来确定节流量。第二种技术采用控制理论方法,使用黑盒建模技术对系统进行建模,并确定给定当前性能的适当油门值。我们提出了一组实验来说明每种控制器的有效性,然后提出并评估结合这两种技术的混合控制器。
{"title":"Autonomic workload execution control using throttling","authors":"W. Powley, Patrick Martin, Mingyi Zhang, Paul Bird, Keith McDonald","doi":"10.1109/ICDEW.2010.5452744","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452744","url":null,"abstract":"Database Management Systems (DBMSs) are often required to simultaneously process multiple diverse workloads while enforcing business policies that govern workload performance. Workload control mechanisms such as admission control, query scheduling, and workload execution control serve to ensure that such policies are enforced and that individual workload goals are met. Query throttling can be used as a workload execution control method whereby problematic queries are slowed down, thus freeing resources to allow the more important work to complete more rapidly. In a self-managed system, a controller would be used to determine the appropriate level of throttling necessary to allow the important workload to meet is goals. The throttling would be increased or decreased depending upon the current system performance. In this paper, we explore two techniques to maintain an appropriate level of query throttling. The first technique uses a simple controller based on a diminishing step function to determine the amount of throttling. The second technique adopts a control theory approach that uses a black-box modelling technique to model the system and to determine the appropriate throttle value given current performance. We present a set of experiments that illustrate the effectiveness of each controller, then propose and evaluate a hybrid controller that combines the two techniques.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115833054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Maximizing visibility of objects 最大化对象的可见性
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452730
M. Miah
In recent years, there has been significant interest in the development of ranking functions and efficient top-k retrieval algorithms to help users in ad-hoc search and retrieval in databases (e.g., buyers searching for products in a catalog). We introduce a complementary problem: how to guide a seller in selecting the best attributes of a new tuple (e.g., a new product) to highlight so that it stands out in the crowd of existing competitive products and is widely visible to the pool of potential buyers. We refer this problem as “attributes selection” problem. Package design based on user input is a problem that has also attracted recent interest. Given a set of elements, and a set of user preferences (where each preference is a conjunction of positive or negative preferences for individual elements), we investigate the problem of designing the most “popular package”, i.e., a subset of the elements that maximizes the number of satisfied users. Numerous instances of this problem occur in practice. We refer this later problem as “package design” problem. We develop several formulations of both the problems. Even for the NP-complete problems, we give several exact (optimal) and approximation algorithms that work well in practice. Our experimental evaluation on real and synthetic datasets shows that the optimal and approximate algorithms are efficient for moderate and large datasets respectively, and also that the approximate algorithms have small approximation error.
近年来,人们对开发排序函数和有效的top-k检索算法非常感兴趣,以帮助用户在数据库中进行特别搜索和检索(例如,买家在目录中搜索产品)。我们引入了一个互补问题:如何引导卖家选择一个新元组(例如,一个新产品)的最佳属性来突出,以便它在现有的竞争产品中脱颖而出,并对潜在的买家池广泛可见。我们把这个问题称为“属性选择”问题。基于用户输入的包装设计是最近引起人们兴趣的一个问题。给定一组元素和一组用户偏好(其中每个偏好是对单个元素的积极或消极偏好的结合),我们研究设计最“受欢迎的包”的问题,即最大化满意用户数量的元素子集。在实践中出现了许多这样的问题。我们把后面的问题称为“包装设计”问题。我们提出了这两个问题的几种表述。即使对于np完全问题,我们也给出了几个在实践中运行良好的精确(最优)和近似算法。我们在真实数据集和合成数据集上的实验评估表明,最优算法和近似算法分别对中等和大型数据集有效,并且近似算法具有较小的近似误差。
{"title":"Maximizing visibility of objects","authors":"M. Miah","doi":"10.1109/ICDEW.2010.5452730","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452730","url":null,"abstract":"In recent years, there has been significant interest in the development of ranking functions and efficient top-k retrieval algorithms to help users in ad-hoc search and retrieval in databases (e.g., buyers searching for products in a catalog). We introduce a complementary problem: how to guide a seller in selecting the best attributes of a new tuple (e.g., a new product) to highlight so that it stands out in the crowd of existing competitive products and is widely visible to the pool of potential buyers. We refer this problem as “attributes selection” problem. Package design based on user input is a problem that has also attracted recent interest. Given a set of elements, and a set of user preferences (where each preference is a conjunction of positive or negative preferences for individual elements), we investigate the problem of designing the most “popular package”, i.e., a subset of the elements that maximizes the number of satisfied users. Numerous instances of this problem occur in practice. We refer this later problem as “package design” problem. We develop several formulations of both the problems. Even for the NP-complete problems, we give several exact (optimal) and approximation algorithms that work well in practice. Our experimental evaluation on real and synthetic datasets shows that the optimal and approximate algorithms are efficient for moderate and large datasets respectively, and also that the approximate algorithms have small approximation error.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121125941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1