首页 > 最新文献

22nd International Conference on Data Engineering (ICDE'06)最新文献

英文 中文
How to Determine a Good Multi-Programming Level for External Scheduling 如何确定外部调度的良好多规划水平
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.78
Bianca Schroeder, Mor Harchol-Balter, A. Iyengar, E. Nahum, A. Wierman
Scheduling/prioritization of DBMS transactions is important for many applications that rely on database backends. A convenient way to achieve scheduling is to limit the number of transactions within the database, maintaining most of the transactions in an external queue, which can be ordered as desired by the application. While external scheduling has many advantages in that it doesn’t require changes to internal resources, it is also difficult to get right in that its performance depends critically on the particular multiprogramming limit used (the MPL), i.e. the number of transactions allowed into the database. If the MPL is too low, throughput will suffer, since not all DBMS resources will be utilized. On the other hand, if the MPL is too high, there is insufficient control on scheduling. The question of how to adjust theMPL to achieve both goals simultaneously is an open problem, not just for databases but in system design in general. Herein we study this problem in the context of transactional workloads, both via extensive experimentation and queueing theoretic analysis. We find that the two most critical factors in adjusting the MPL are the number of resources that the workload utilizes and the variability of the transactions’ service demands. We develop a feedback based controller, augmented by queueing theoretic models for automatically adjusting the MPL. Finally, we apply our methods to the specific problem of external prioritization of transactions. We find that external prioritization can be nearly as effective as internal prioritization, without any negative consequences, when the MPL is set appropriately.
对于许多依赖数据库后端的应用程序来说,DBMS事务的调度/优先级排序非常重要。实现调度的一种方便方法是限制数据库中的事务数量,将大多数事务维护在一个外部队列中,应用程序可以根据需要对其进行排序。虽然外部调度有许多优点,因为它不需要更改内部资源,但它也很难得到正确的处理,因为它的性能严重依赖于所使用的特定多道编程限制(MPL),即允许进入数据库的事务数量。如果MPL过低,吞吐量将受到影响,因为并非所有DBMS资源都将被利用。另一方面,如果MPL过高,则对调度的控制不足。如何调整mpl以同时实现这两个目标是一个开放的问题,不仅对于数据库,而且对于一般的系统设计也是如此。在这里,我们通过大量的实验和排队理论分析,在事务性工作负载的背景下研究这个问题。我们发现,调整MPL的两个最关键因素是工作负载利用的资源数量和事务服务需求的可变性。我们开发了一种基于反馈的控制器,并通过排队理论模型进行扩充,用于自动调整MPL。最后,我们将我们的方法应用于事务的外部优先级的具体问题。我们发现,当MPL设置得当时,外部优先级几乎可以与内部优先级一样有效,而不会产生任何负面后果。
{"title":"How to Determine a Good Multi-Programming Level for External Scheduling","authors":"Bianca Schroeder, Mor Harchol-Balter, A. Iyengar, E. Nahum, A. Wierman","doi":"10.1109/ICDE.2006.78","DOIUrl":"https://doi.org/10.1109/ICDE.2006.78","url":null,"abstract":"Scheduling/prioritization of DBMS transactions is important for many applications that rely on database backends. A convenient way to achieve scheduling is to limit the number of transactions within the database, maintaining most of the transactions in an external queue, which can be ordered as desired by the application. While external scheduling has many advantages in that it doesn’t require changes to internal resources, it is also difficult to get right in that its performance depends critically on the particular multiprogramming limit used (the MPL), i.e. the number of transactions allowed into the database. If the MPL is too low, throughput will suffer, since not all DBMS resources will be utilized. On the other hand, if the MPL is too high, there is insufficient control on scheduling. The question of how to adjust theMPL to achieve both goals simultaneously is an open problem, not just for databases but in system design in general. Herein we study this problem in the context of transactional workloads, both via extensive experimentation and queueing theoretic analysis. We find that the two most critical factors in adjusting the MPL are the number of resources that the workload utilizes and the variability of the transactions’ service demands. We develop a feedback based controller, augmented by queueing theoretic models for automatically adjusting the MPL. Finally, we apply our methods to the specific problem of external prioritization of transactions. We find that external prioritization can be nearly as effective as internal prioritization, without any negative consequences, when the MPL is set appropriately.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73523864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 127
Query Selection Techniques for Efficient Crawling of Structured Web Sources 结构化Web资源高效抓取的查询选择技术
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.124
Ping Wu, Ji-Rong Wen, Huan Liu, Wei-Ying Ma
The high quality, structured data from Web structured sources is invaluable for many applications. Hidden Web databases are not directly crawlable by Web search engines and are only accessible through Web query forms or via Web service interfaces. Recent research efforts have been focusing on understanding these Web query forms. A critical but still largely unresolved question is: how to efficiently acquire the structured information inside Web databases through iteratively issuing meaningful queries? In this paper we focus on the central issue of enabling efficient Web database crawling through query selection, i.e. how to select good queries to rapidly harvest data records from Web databases. We model each structured Web database as a distinct attribute-value graph. Under this theoretical framework, the database crawling problem is transformed into a graph traversal one that follows "relational" links. We show that finding an optimal query selection plan is equivalent to finding a Minimum Weighted Dominating Set of the corresponding database graph, a well-known NP-Complete problem. We propose a suite of query selection techniques aiming at optimizing the query harvest rate. Extensive experimental evaluations over real Web sources and simulations over controlled database servers validate the effectiveness of our techniques and provide insights for future efforts in this
来自Web结构化源的高质量结构化数据对许多应用程序都是无价的。隐藏的Web数据库不能被Web搜索引擎直接抓取,只能通过Web查询表单或Web服务接口访问。最近的研究工作集中在理解这些Web查询表单上。一个关键但仍未解决的问题是:如何通过迭代地发出有意义的查询来有效地获取Web数据库中的结构化信息?在本文中,我们关注通过查询选择实现高效Web数据库爬行的核心问题,即如何选择好的查询来快速从Web数据库中获取数据记录。我们将每个结构化Web数据库建模为一个不同的属性-值图。在这个理论框架下,数据库爬行问题被转化为遵循“关系”链接的图遍历问题。我们证明了找到最优查询选择计划等同于找到相应数据库图的最小加权支配集,这是一个众所周知的np完全问题。我们提出了一套旨在优化查询收获率的查询选择技术。对真实Web源的广泛实验评估和对受控数据库服务器的模拟验证了我们技术的有效性,并为这方面的未来工作提供了见解
{"title":"Query Selection Techniques for Efficient Crawling of Structured Web Sources","authors":"Ping Wu, Ji-Rong Wen, Huan Liu, Wei-Ying Ma","doi":"10.1109/ICDE.2006.124","DOIUrl":"https://doi.org/10.1109/ICDE.2006.124","url":null,"abstract":"The high quality, structured data from Web structured sources is invaluable for many applications. Hidden Web databases are not directly crawlable by Web search engines and are only accessible through Web query forms or via Web service interfaces. Recent research efforts have been focusing on understanding these Web query forms. A critical but still largely unresolved question is: how to efficiently acquire the structured information inside Web databases through iteratively issuing meaningful queries? In this paper we focus on the central issue of enabling efficient Web database crawling through query selection, i.e. how to select good queries to rapidly harvest data records from Web databases. We model each structured Web database as a distinct attribute-value graph. Under this theoretical framework, the database crawling problem is transformed into a graph traversal one that follows \"relational\" links. We show that finding an optimal query selection plan is equivalent to finding a Minimum Weighted Dominating Set of the corresponding database graph, a well-known NP-Complete problem. We propose a suite of query selection techniques aiming at optimizing the query harvest rate. Extensive experimental evaluations over real Web sources and simulations over controlled database servers validate the effectiveness of our techniques and provide insights for future efforts in this","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74084237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 152
MONDRIAN: Annotating and Querying Databases through Colors and Blocks 蒙德里安:通过颜色和块注释和查询数据库
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.102
Floris Geerts, Anastasios Kementsietsidis, D. Milano
Annotations play a central role in the curation of scientific databases. Despite their importance, data formats and schemas are not designed to manage the increasing variety of annotations. Moreover, DBMS’s often lack support for storing and querying annotations. Furthermore, annotations and data are only loosely coupled. This paper introduces an annotation-oriented data model for the manipulation and querying of both data and annotations. In particular, the model allows for the specification of annotations on sets of values and for effectively querying the information on their association. We use the concept of block to represent an annotated set of values. Different colors applied to the blocks represent different annotations. We introduce a color query language for our model and prove it to be both complete (it can express all possible queries over the class of annotated databases), and minimal (all the algebra operators are primitive). We present MONDRIAN, a prototype implementation of our annotation mechanism, and we conduct experiments that investigate the set of parameters which influence the evaluation cost for color queries.
注释在科学数据库的管理中起着核心作用。尽管数据格式和模式很重要,但它们并不是为管理越来越多的注释而设计的。此外,DBMS通常缺乏对存储和查询注释的支持。此外,注释和数据只是松散耦合的。本文介绍了一种面向注释的数据模型,用于数据和注释的操作和查询。特别是,该模型允许在值集上指定注释,并允许有效地查询有关它们的关联的信息。我们使用块的概念来表示一组带注释的值。应用到块上的不同颜色表示不同的注释。我们为我们的模型引入了一种颜色查询语言,并证明它是完备的(它可以表达对带注释的数据库类的所有可能查询)和最小的(所有代数运算符都是原语)。我们提出了MONDRIAN,这是我们注释机制的一个原型实现,我们进行了实验,研究了影响颜色查询评估成本的一组参数。
{"title":"MONDRIAN: Annotating and Querying Databases through Colors and Blocks","authors":"Floris Geerts, Anastasios Kementsietsidis, D. Milano","doi":"10.1109/ICDE.2006.102","DOIUrl":"https://doi.org/10.1109/ICDE.2006.102","url":null,"abstract":"Annotations play a central role in the curation of scientific databases. Despite their importance, data formats and schemas are not designed to manage the increasing variety of annotations. Moreover, DBMS’s often lack support for storing and querying annotations. Furthermore, annotations and data are only loosely coupled. This paper introduces an annotation-oriented data model for the manipulation and querying of both data and annotations. In particular, the model allows for the specification of annotations on sets of values and for effectively querying the information on their association. We use the concept of block to represent an annotated set of values. Different colors applied to the blocks represent different annotations. We introduce a color query language for our model and prove it to be both complete (it can express all possible queries over the class of annotated databases), and minimal (all the algebra operators are primitive). We present MONDRIAN, a prototype implementation of our annotation mechanism, and we conduct experiments that investigate the set of parameters which influence the evaluation cost for color queries.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75295857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
cgmOLAP: Efficient Parallel Generation and Querying of Terabyte Size ROLAP Data Cubes Terabyte大小的ROLAP数据立方体的高效并行生成和查询
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.32
Ying Chen, A. Rau-Chaplin, F. Dehne, Todd Eavis, D. Green, E. Sithirasenan
We present the cgmOLAP server, the first fully functional parallel OLAP system able to build data cubes at a rate of more than 1 Terabyte per hour. cgmOLAP incorporates a variety of novel approaches for the parallel computation of full cubes, partial cubes, and iceberg cubes as well as new parallel cube indexing schemes. The cgmOLAP system consists of an application interface, a parallel query engine, a parallel cube materialization engine, meta data and cost model repositories, and shared server components that provide uniform management of I/O, memory, communications, and disk resources.
我们介绍了cgmOLAP服务器,这是第一个功能齐全的并行OLAP系统,能够以每小时超过1 tb的速度构建数据集。cgmOLAP结合了各种新的方法来并行计算全立方体、部分立方体和冰山立方体,以及新的并行立方体索引方案。cgmOLAP系统包括一个应用程序接口、一个并行查询引擎、一个并行多维数据集实体化引擎、元数据和成本模型存储库,以及提供I/O、内存、通信和磁盘资源统一管理的共享服务器组件。
{"title":"cgmOLAP: Efficient Parallel Generation and Querying of Terabyte Size ROLAP Data Cubes","authors":"Ying Chen, A. Rau-Chaplin, F. Dehne, Todd Eavis, D. Green, E. Sithirasenan","doi":"10.1109/ICDE.2006.32","DOIUrl":"https://doi.org/10.1109/ICDE.2006.32","url":null,"abstract":"We present the cgmOLAP server, the first fully functional parallel OLAP system able to build data cubes at a rate of more than 1 Terabyte per hour. cgmOLAP incorporates a variety of novel approaches for the parallel computation of full cubes, partial cubes, and iceberg cubes as well as new parallel cube indexing schemes. The cgmOLAP system consists of an application interface, a parallel query engine, a parallel cube materialization engine, meta data and cost model repositories, and shared server components that provide uniform management of I/O, memory, communications, and disk resources.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79162375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Approximately Processing Multi-granularity Aggregate Queries over Data Streams 近似处理数据流上的多粒度聚合查询
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.22
Shouke Qin, Weining Qian, Aoying Zhou
Aggregate monitoring over data streams is attracting more and more attention in research community due to its broad potential applications. Existing methods suffer two problems, 1) The aggregate functions which could be monitored are restricted to be first-order statistic or monotonic with respect to the window size. 2) Only a limited number of granularity and time scales could be monitored over a stream, thus some interesting patterns might be neglected, and users might be misled by the incomplete changing profile about current data streams. These two impede the development of online mining techniques over data streams, and some kind of breakthrough is urged. In this paper, we employed the powerful tool of fractal analysis to enable the monitoring of both monotonic and non-monotonic aggregates on time-changing data streams. The monotony property of aggregate monitoring is revealed and monotonic search space is built to decrease the time overhead for accessing the synopsis from O(m) to O(logm), where m is the number of windows to be monitored. With the help of a novel inverted histogram, the statistical summary is compressed to be fit in limited main memory, so that high aggregates on windows of any length can be detected accurately and efficiently on-line. Theoretical analysis show the space and time complexity bound of this method are relatively low, while experimental results prove the applicability and efficiency of the proposed algorithm in different application settings.
数据流的聚合监控由于其广泛的应用前景而越来越受到学术界的关注。现有的方法存在两个问题:1)可监测的聚合函数对于窗口大小来说是一阶统计量或单调的。2)只有有限的粒度和时间尺度可以监控流,因此一些有趣的模式可能会被忽略,用户可能会被当前数据流的不完整的变化概况误导。这两者阻碍了数据流在线挖掘技术的发展,迫切需要某种突破。本文利用分形分析这一强大的工具,实现了对随时间变化的数据流的单调和非单调聚合体的监测。揭示了聚合监控的单调性,建立了单调搜索空间,以减少从O(m)到O(logm)访问概要的时间开销,其中m为要监控的窗口数。利用一种新颖的倒直方图对统计摘要进行压缩,使其适合有限的主存储器,从而可以准确有效地在线检测任意长度窗口上的高聚合。理论分析表明,该方法具有较低的空间和时间复杂度界限,实验结果证明了该算法在不同应用环境下的适用性和有效性。
{"title":"Approximately Processing Multi-granularity Aggregate Queries over Data Streams","authors":"Shouke Qin, Weining Qian, Aoying Zhou","doi":"10.1109/ICDE.2006.22","DOIUrl":"https://doi.org/10.1109/ICDE.2006.22","url":null,"abstract":"Aggregate monitoring over data streams is attracting more and more attention in research community due to its broad potential applications. Existing methods suffer two problems, 1) The aggregate functions which could be monitored are restricted to be first-order statistic or monotonic with respect to the window size. 2) Only a limited number of granularity and time scales could be monitored over a stream, thus some interesting patterns might be neglected, and users might be misled by the incomplete changing profile about current data streams. These two impede the development of online mining techniques over data streams, and some kind of breakthrough is urged. In this paper, we employed the powerful tool of fractal analysis to enable the monitoring of both monotonic and non-monotonic aggregates on time-changing data streams. The monotony property of aggregate monitoring is revealed and monotonic search space is built to decrease the time overhead for accessing the synopsis from O(m) to O(logm), where m is the number of windows to be monitored. With the help of a novel inverted histogram, the statistical summary is compressed to be fit in limited main memory, so that high aggregates on windows of any length can be detected accurately and efficiently on-line. Theoretical analysis show the space and time complexity bound of this method are relatively low, while experimental results prove the applicability and efficiency of the proposed algorithm in different application settings.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80969091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Answering Imprecise Queries over Autonomous Web Databases 回答自治Web数据库上的不精确查询
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.20
Ullas Nambiar, S. Kambhampati
Current approaches for answering queries with imprecise constraints require user-specific distance metrics and importance measures for attributes of interest - metrics that are hard to elicit from lay users. We present AIMQ, a domain and user independent approach for answering imprecise queries over autonomous Web databases. We developed methods for query relaxation that use approximate functional dependencies. We also present an approach to automatically estimate the similarity between values of categorical attributes. Experimental results demonstrating the robustness, efficiency and effectiveness of AIMQ are presented. Results of a preliminary user study demonstrating the high precision of the AIMQ system is also provided.
当前回答带有不精确约束的查询的方法需要用户特定的距离度量和感兴趣属性的重要性度量——这些度量很难从外行用户那里得到。我们提出AIMQ,一种独立于域和用户的方法,用于回答自治Web数据库上的不精确查询。我们开发了使用近似函数依赖的查询松弛方法。我们还提出了一种自动估计分类属性值之间相似度的方法。实验结果证明了该算法的鲁棒性、高效性和有效性。初步的用户研究结果证明了AIMQ系统的高精度。
{"title":"Answering Imprecise Queries over Autonomous Web Databases","authors":"Ullas Nambiar, S. Kambhampati","doi":"10.1109/ICDE.2006.20","DOIUrl":"https://doi.org/10.1109/ICDE.2006.20","url":null,"abstract":"Current approaches for answering queries with imprecise constraints require user-specific distance metrics and importance measures for attributes of interest - metrics that are hard to elicit from lay users. We present AIMQ, a domain and user independent approach for answering imprecise queries over autonomous Web databases. We developed methods for query relaxation that use approximate functional dependencies. We also present an approach to automatically estimate the similarity between values of categorical attributes. Experimental results demonstrating the robustness, efficiency and effectiveness of AIMQ are presented. Results of a preliminary user study demonstrating the high precision of the AIMQ system is also provided.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82123271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Robust Cardinality and Cost Estimation for Skyline Operator Skyline算子的鲁棒基数与代价估计
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.131
S. Chaudhuri, Nilesh N. Dalvi, R. Kaushik
Incorporating the skyline operator inside the relational engine requires solving the cardinality estimation and the cost estimation problem, hitherto unaddressed. We propose robust techniques to estimate the cardinality and the computational cost of Skyline, and through an empirical comparison, show that our technique is substantially more effective than traditional approaches. Finally, we show through an implementation in Microsoft SQL Server that skyline queries can substantially benefit from our techniques.
在关系引擎中合并skyline操作符需要解决基数估计和成本估计问题,这些问题迄今尚未得到解决。我们提出了健壮的技术来估计Skyline的基数和计算成本,并通过经验比较,表明我们的技术比传统方法有效得多。最后,我们通过Microsoft SQL Server中的一个实现展示了skyline查询可以从我们的技术中获益。
{"title":"Robust Cardinality and Cost Estimation for Skyline Operator","authors":"S. Chaudhuri, Nilesh N. Dalvi, R. Kaushik","doi":"10.1109/ICDE.2006.131","DOIUrl":"https://doi.org/10.1109/ICDE.2006.131","url":null,"abstract":"Incorporating the skyline operator inside the relational engine requires solving the cardinality estimation and the cost estimation problem, hitherto unaddressed. We propose robust techniques to estimate the cardinality and the computational cost of Skyline, and through an empirical comparison, show that our technique is substantially more effective than traditional approaches. Finally, we show through an implementation in Microsoft SQL Server that skyline queries can substantially benefit from our techniques.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76344992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 157
LB-Index: A Multi-Resolution Index Structure for Images LB-Index:图像的多分辨率索引结构
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.85
Vebjorn Ljosa, Arnab Bhattacharya, Ambuj K. Singh
In many domains, the similarity between two images depends on the spatial locations of their features. The earth mover’s distance (EMD), first proposed by Werman et al. [8], measures such similarity. It yields higher-quality image retrieval results than the Lp-norm, quadratic-form distance, and Jeffrey divergence [6], and has also been used for similarity search on contours [3], melodies [7], and graphs [2].
在许多领域,两幅图像之间的相似性取决于其特征的空间位置。首先由Werman等人[8]提出的土动器距离(EMD)测量了这种相似性。它比lp范数、二次形式距离和Jeffrey散度[6]产生了更高质量的图像检索结果,也被用于轮廓[3]、旋律[7]和图形[2]的相似性搜索。
{"title":"LB-Index: A Multi-Resolution Index Structure for Images","authors":"Vebjorn Ljosa, Arnab Bhattacharya, Ambuj K. Singh","doi":"10.1109/ICDE.2006.85","DOIUrl":"https://doi.org/10.1109/ICDE.2006.85","url":null,"abstract":"In many domains, the similarity between two images depends on the spatial locations of their features. The earth mover’s distance (EMD), first proposed by Werman et al. [8], measures such similarity. It yields higher-quality image retrieval results than the Lp-norm, quadratic-form distance, and Jeffrey divergence [6], and has also been used for similarity search on contours [3], melodies [7], and graphs [2].","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89094553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Cluster Hull: A Technique for Summarizing Spatial Data Streams 簇壳:一种汇总空间数据流的技术
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.38
J. Hershberger, Nisheeth Shrivastava, S. Suri
Recently there has been a growing interest in detecting patterns and analyzing trends in data that are generated continuously, often delivered in some fixed order and at a rapid rate, in the form of a data stream [5, 6]. When the stream consists of spatial data, its geometric "shape" can convey important qualitative aspects of the data set more effectively than many numerical statistics. In a stream setting, where the data must be constantly discarded and compressed, special care must be taken to ensure that the compressed summary faithfully captures the overall shape of the point distribution. We propose a novel scheme, ClusterHulls, to represent the shape of a stream of two-dimensional points. Our scheme is particularly useful when the input contains clusters with widely varying shapes and sizes, and the boundary shape, orientation, or volume of those clusters may be important in the analysis.
最近,人们对检测模式和分析连续生成的数据的趋势越来越感兴趣,这些数据通常以某种固定的顺序和快速的速度以数据流的形式交付[5,6]。当数据流由空间数据组成时,其几何“形状”可以比许多数值统计更有效地传达数据集的重要定性方面。在流设置中,必须不断丢弃和压缩数据,必须特别注意确保压缩的摘要忠实地捕获点分布的总体形状。我们提出了一种新颖的方案,clusterhull,来表示二维点流的形状。当输入包含形状和大小变化很大的簇时,我们的方案特别有用,这些簇的边界形状、方向或体积在分析中可能很重要。
{"title":"Cluster Hull: A Technique for Summarizing Spatial Data Streams","authors":"J. Hershberger, Nisheeth Shrivastava, S. Suri","doi":"10.1109/ICDE.2006.38","DOIUrl":"https://doi.org/10.1109/ICDE.2006.38","url":null,"abstract":"Recently there has been a growing interest in detecting patterns and analyzing trends in data that are generated continuously, often delivered in some fixed order and at a rapid rate, in the form of a data stream [5, 6]. When the stream consists of spatial data, its geometric \"shape\" can convey important qualitative aspects of the data set more effectively than many numerical statistics. In a stream setting, where the data must be constantly discarded and compressed, special care must be taken to ensure that the compressed summary faithfully captures the overall shape of the point distribution. We propose a novel scheme, ClusterHulls, to represent the shape of a stream of two-dimensional points. Our scheme is particularly useful when the input contains clusters with widely varying shapes and sizes, and the boundary shape, orientation, or volume of those clusters may be important in the analysis.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80546693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Supporting Keyword Columns with Ontology-based Referential Constraints in DBMS 在DBMS中支持基于本体的引用约束的关键字列
Pub Date : 2006-04-03 DOI: 10.1109/ICDE.2006.151
E. Chong, Souripriya Das, G. Eadon, Jagannathan Srinivasan
Keywords are typically used to qualify rows in a table. However, the fact that a keyword denotes a concept, which belongs to a specific knowledge domain, is not semantically enforced in current database systems. This paper proposes defining ontology based referential constraint for such keyword columns. A query on ontology, specified as part of the referential constraint, is used to identify the domain for the keyword column. Furthermore, since ontology may evolve causing change to the domain of the keyword column, the paper proposes use of ontology based transformation functions to either automatically evolve or to recommend refinements for the values in the keyword column. Also, queries on a keyword column can perform semantic match, that is, match a keyword to related terms based on the associated ontology. Thus, the proposed approach of semantically connecting keyword columns to ontologies 1) enhances semantic data integrity, 2) facilitates evolution of keyword columns with the referenced ontology, and 3) enables semantic match queries on keyword columns.
关键字通常用于限定表中的行。然而,在当前的数据库系统中,关键字表示属于特定知识领域的概念这一事实在语义上并不强制执行。本文提出了针对这类关键字列定义基于本体的引用约束。对本体的查询(指定为引用约束的一部分)用于标识关键字列的域。此外,由于本体可能演变导致关键字列的域发生变化,因此本文提出使用基于本体的转换函数来自动演变或推荐关键字列中的值的改进。此外,对关键字列的查询可以执行语义匹配,即根据关联的本体将关键字与相关的术语进行匹配。因此,本文提出的将关键字列与本体进行语义连接的方法1)增强了语义数据的完整性,2)促进了关键字列与引用本体的演化,3)实现了对关键字列的语义匹配查询。
{"title":"Supporting Keyword Columns with Ontology-based Referential Constraints in DBMS","authors":"E. Chong, Souripriya Das, G. Eadon, Jagannathan Srinivasan","doi":"10.1109/ICDE.2006.151","DOIUrl":"https://doi.org/10.1109/ICDE.2006.151","url":null,"abstract":"Keywords are typically used to qualify rows in a table. However, the fact that a keyword denotes a concept, which belongs to a specific knowledge domain, is not semantically enforced in current database systems. This paper proposes defining ontology based referential constraint for such keyword columns. A query on ontology, specified as part of the referential constraint, is used to identify the domain for the keyword column. Furthermore, since ontology may evolve causing change to the domain of the keyword column, the paper proposes use of ontology based transformation functions to either automatically evolve or to recommend refinements for the values in the keyword column. Also, queries on a keyword column can perform semantic match, that is, match a keyword to related terms based on the associated ontology. Thus, the proposed approach of semantically connecting keyword columns to ontologies 1) enhances semantic data integrity, 2) facilitates evolution of keyword columns with the referenced ontology, and 3) enables semantic match queries on keyword columns.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83192418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
22nd International Conference on Data Engineering (ICDE'06)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1