首页 > 最新文献

Proceedings 17th International Conference on Data Engineering最新文献

英文 中文
An index structure for efficient reverse nearest neighbor queries 用于高效反向最近邻查询的索引结构
Pub Date : 2001-04-02 DOI: 10.1109/ICDE.2001.914862
Congjun Yang, King-Ip Lin
The Reverse Nearest Neighbor (RNN) problem is to find all points in a given data set whose nearest neighbor is a given query point. Just like the Nearest Neighbor (NN) queries, the RNN queries appear in many practical situations such as marketing and resource management. Thus, efficient methods for the RNN queries in databases are required. The paper introduces a new index structure, the Rdnn-tree, that answers both RNN and NN queries efficiently. A single index structure is employed for a dynamic database, in contrast to the use of multiple indexes in previous work. This leads to significant savings in dynamically maintaining the index structure. The Rdnn-tree outperforms existing methods in various aspects. Experiments on both synthetic and real world data show that our index structure outperforms previous methods by a significant margin (more than 90% in terms of number of leaf nodes accessed) in RNN queries. It also shows improvement in NN queries over standard techniques. Furthermore, performance in insertion and deletion is significantly enhanced by the ability to combine multiple queries (NN and RNN) in one traversal of the tree. These facts make our index structure extremely preferable in both static and dynamic cases.
反向最近邻(RNN)问题是在给定数据集中找到其最近邻是给定查询点的所有点。与最近邻(NN)查询一样,RNN查询也出现在许多实际情况中,如市场营销和资源管理。因此,需要有效的RNN数据库查询方法。本文介绍了一种新的索引结构Rdnn-tree,它可以有效地回答RNN和NN的查询。动态数据库采用单一索引结构,而不是在以前的工作中使用多个索引。这大大节省了动态维护索引结构的工作量。Rdnn-tree在很多方面都优于现有的方法。在合成数据和真实世界数据上的实验表明,我们的索引结构在RNN查询中显著优于以前的方法(就访问的叶节点数量而言超过90%)。它还显示了与标准技术相比,神经网络查询的改进。此外,在树的一次遍历中组合多个查询(NN和RNN)的能力大大提高了插入和删除的性能。这些事实使得我们的索引结构在静态和动态情况下都非常可取。
{"title":"An index structure for efficient reverse nearest neighbor queries","authors":"Congjun Yang, King-Ip Lin","doi":"10.1109/ICDE.2001.914862","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914862","url":null,"abstract":"The Reverse Nearest Neighbor (RNN) problem is to find all points in a given data set whose nearest neighbor is a given query point. Just like the Nearest Neighbor (NN) queries, the RNN queries appear in many practical situations such as marketing and resource management. Thus, efficient methods for the RNN queries in databases are required. The paper introduces a new index structure, the Rdnn-tree, that answers both RNN and NN queries efficiently. A single index structure is employed for a dynamic database, in contrast to the use of multiple indexes in previous work. This leads to significant savings in dynamically maintaining the index structure. The Rdnn-tree outperforms existing methods in various aspects. Experiments on both synthetic and real world data show that our index structure outperforms previous methods by a significant margin (more than 90% in terms of number of leaf nodes accessed) in RNN queries. It also shows improvement in NN queries over standard techniques. Furthermore, performance in insertion and deletion is significantly enhanced by the ability to combine multiple queries (NN and RNN) in one traversal of the tree. These facts make our index structure extremely preferable in both static and dynamic cases.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115789428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 186
An efficient approximation scheme for data mining tasks 数据挖掘任务的有效逼近方案
Pub Date : 2001-04-02 DOI: 10.1109/ICDE.2001.914858
G. Kollios, D. Gunopulos, Nick Koudas, Stefan Berchtold
We investigate the use of biased sampling according to the density of the dataset, to speed up the operation of general data mining tasks, such as clustering and outlier detection in large multidimensional datasets. In density biased sampling, the probability that a given point will be included in the sample depends on the local density of the dataset. We propose a general technique for density-biased sampling that can factor in user requirements to sample for properties of interest, and can be tuned for specific data mining tasks. This allows great flexibility and improved accuracy of the results over simple random sampling. We describe our approach in detail, we analytically evaluate it, and show how it can be optimized for approximate clustering and outlier detection. Finally we present a thorough experimental evaluation of the proposed method, applying density-biased sampling on real and synthetic data sets, and employing clustering and outlier detection algorithms, thus highlighting the utility of our approach.
我们根据数据集的密度研究了偏差抽样的使用,以加快一般数据挖掘任务的操作,如大型多维数据集的聚类和离群点检测。在密度偏倚抽样中,一个给定点被包含在样本中的概率取决于数据集的局部密度。我们提出了一种用于密度偏差采样的通用技术,该技术可以考虑用户对感兴趣的属性进行采样的需求,并且可以针对特定的数据挖掘任务进行调优。与简单的随机抽样相比,这允许极大的灵活性和提高结果的准确性。我们详细描述了我们的方法,对其进行了分析评估,并展示了如何对近似聚类和离群值检测进行优化。最后,我们对所提出的方法进行了彻底的实验评估,在真实和合成数据集上应用密度偏差抽样,并采用聚类和离群值检测算法,从而突出了我们方法的实用性。
{"title":"An efficient approximation scheme for data mining tasks","authors":"G. Kollios, D. Gunopulos, Nick Koudas, Stefan Berchtold","doi":"10.1109/ICDE.2001.914858","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914858","url":null,"abstract":"We investigate the use of biased sampling according to the density of the dataset, to speed up the operation of general data mining tasks, such as clustering and outlier detection in large multidimensional datasets. In density biased sampling, the probability that a given point will be included in the sample depends on the local density of the dataset. We propose a general technique for density-biased sampling that can factor in user requirements to sample for properties of interest, and can be tuned for specific data mining tasks. This allows great flexibility and improved accuracy of the results over simple random sampling. We describe our approach in detail, we analytically evaluate it, and show how it can be optimized for approximate clustering and outlier detection. Finally we present a thorough experimental evaluation of the proposed method, applying density-biased sampling on real and synthetic data sets, and employing clustering and outlier detection algorithms, thus highlighting the utility of our approach.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116407752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Tuning an SQL-based PDM system in a worldwide client/server environment 在全球客户机/服务器环境中调优基于sql的PDM系统
Pub Date : 2001-04-02 DOI: 10.1109/ICDE.2001.914818
Erich Müller, P. Dadam, Jost Enderle, M. Feltes
The management of product-related data in a uniform and consistent way is a big challenge for many manufacturing enterprises, especially the large ones, like DaimlerChrysler. So-called product data management (PDM) systems are a promising way to achieve this goal. For various reasons, PDM systems often sit on top of a relational DBMS, using it (more or less) as a simple record manager. User interactions with the PDM systems are translated into a series of SQL queries. This does not cause too much harm when the DBMS and PDM system are located in the same local area network, with high bandwidth and short latency times. The picture may change dramatically, however, if the users are working in geographically distributed environments. Response times may rise by orders of magnitude, e.g. from 1-2 minutes in the local context to 30 minutes and even more in the "inter-continental" context. This paper shows how a more sophisticated utilization of the (advanced) SQL features coming along with SQL:1999 can help to cut down response times significantly.
以统一一致的方式管理与产品相关的数据对许多制造企业来说是一个巨大的挑战,尤其是像戴姆勒克莱斯勒这样的大型企业。所谓的产品数据管理(PDM)系统是实现这一目标的一种有希望的方法。由于各种原因,PDM系统通常位于关系DBMS之上,将它(或多或少)用作简单的记录管理器。用户与PDM系统的交互被转换为一系列SQL查询。当DBMS和PDM系统位于同一局域网中,具有高带宽和短延迟时间时,这不会造成太大的危害。然而,如果用户在地理上分布的环境中工作,情况可能会发生巨大变化。响应时间可能会以数量级增加,例如,从本地情况下的1-2分钟增加到“洲际”情况下的30分钟,甚至更多。本文展示了如何更复杂地利用SQL:1999附带的(高级)SQL特性来帮助显著缩短响应时间。
{"title":"Tuning an SQL-based PDM system in a worldwide client/server environment","authors":"Erich Müller, P. Dadam, Jost Enderle, M. Feltes","doi":"10.1109/ICDE.2001.914818","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914818","url":null,"abstract":"The management of product-related data in a uniform and consistent way is a big challenge for many manufacturing enterprises, especially the large ones, like DaimlerChrysler. So-called product data management (PDM) systems are a promising way to achieve this goal. For various reasons, PDM systems often sit on top of a relational DBMS, using it (more or less) as a simple record manager. User interactions with the PDM systems are translated into a series of SQL queries. This does not cause too much harm when the DBMS and PDM system are located in the same local area network, with high bandwidth and short latency times. The picture may change dramatically, however, if the users are working in geographically distributed environments. Response times may rise by orders of magnitude, e.g. from 1-2 minutes in the local context to 30 minutes and even more in the \"inter-continental\" context. This paper shows how a more sophisticated utilization of the (advanced) SQL features coming along with SQL:1999 can help to cut down response times significantly.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122236158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Measuring and optimizing a system for persistent database sessions 测量和优化持久化数据库会话的系统
Pub Date : 2001-04-02 DOI: 10.1109/ICDE.2001.914810
R. Barga, D. Lomet
High availability for both data and applications is rapidly becoming a business requirement. While database systems support recovery, providing high database availability, applications may still lose work because of server outages. When a server crashes, any volatile state associated with the application's database session is lost and the application may require an operator-assisted restart. This exposes server failures to end-users and always degrades application availability. Our Phoenix/ODBC system supports persistent database sessions that can survive a database crash without the application being aware of the outage, except for possible timing considerations. This improves application availability and eliminates the application programming needed to cope with database crashes. Phoenix/ODBC requires no changes to the database system, data access routines or applications. Hence, it can be deployed in any application that uses ODBC to access a database. Further, our generic approach can be exploited for a variety of data access protocols. In this paper, we describe the design of Phoenix/ODBC and introduce an extension to optimize the response time and to reduce overhead for OLTP workloads. We present a performance evaluation using the TPC-C and TPC-H benchmarks that demonstrate Phoenix/ODBC's extra overhead is modest.
数据和应用程序的高可用性正迅速成为业务需求。虽然数据库系统支持恢复,提供高数据库可用性,但应用程序仍然可能由于服务器中断而失去工作。当服务器崩溃时,与应用程序的数据库会话关联的任何不稳定状态都会丢失,并且应用程序可能需要操作员协助重新启动。这会将服务器故障暴露给最终用户,并且总是会降低应用程序的可用性。我们的Phoenix/ODBC系统支持持久的数据库会话,除了可能的时间考虑外,这些会话可以在数据库崩溃时存活下来,而应用程序不会意识到中断。这提高了应用程序的可用性,并消除了处理数据库崩溃所需的应用程序编程。Phoenix/ODBC不需要更改数据库系统、数据访问例程或应用程序。因此,它可以部署在任何使用ODBC访问数据库的应用程序中。此外,我们的通用方法可用于各种数据访问协议。在本文中,我们描述了Phoenix/ODBC的设计,并引入了一个扩展来优化响应时间并减少OLTP工作负载的开销。我们使用TPC-C和TPC-H基准进行性能评估,证明Phoenix/ODBC的额外开销是适度的。
{"title":"Measuring and optimizing a system for persistent database sessions","authors":"R. Barga, D. Lomet","doi":"10.1109/ICDE.2001.914810","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914810","url":null,"abstract":"High availability for both data and applications is rapidly becoming a business requirement. While database systems support recovery, providing high database availability, applications may still lose work because of server outages. When a server crashes, any volatile state associated with the application's database session is lost and the application may require an operator-assisted restart. This exposes server failures to end-users and always degrades application availability. Our Phoenix/ODBC system supports persistent database sessions that can survive a database crash without the application being aware of the outage, except for possible timing considerations. This improves application availability and eliminates the application programming needed to cope with database crashes. Phoenix/ODBC requires no changes to the database system, data access routines or applications. Hence, it can be deployed in any application that uses ODBC to access a database. Further, our generic approach can be exploited for a variety of data access protocols. In this paper, we describe the design of Phoenix/ODBC and introduce an extension to optimize the response time and to reduce overhead for OLTP workloads. We present a performance evaluation using the TPC-C and TPC-H benchmarks that demonstrate Phoenix/ODBC's extra overhead is modest.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131508968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Mining frequent itemsets with convertible constraints 挖掘具有可转换约束的频繁项集
Pub Date : 2001-04-02 DOI: 10.1109/ICDE.2001.914856
J. Pei, Jiawei Han, L. Lakshmanan
Recent work has highlighted the importance of the constraint based mining paradigm in the context of frequent itemsets, associations, correlations, sequential patterns, and many other interesting patterns in large databases. The authors study constraints which cannot be handled with existing theory and techniques. For example, avg(S) /spl theta/ /spl nu/, median(S) /spl theta/ /spl nu/, sum(S) /spl theta/ /spl nu/ (S can contain items of arbitrary values) (/spl theta//spl isin/{/spl ges/, /spl les/}), are customarily regarded as "tough" constraints in that they cannot be pushed inside an algorithm such as a priori. We develop a notion of convertible constraints and systematically analyze, classify, and characterize this class. We also develop techniques which enable them to be readily pushed deep inside the recently developed FP-growth algorithm for frequent itemset mining. Results from our detailed experiments show the effectiveness of the techniques developed.
最近的工作强调了基于约束的挖掘范式在频繁项集、关联、关联、顺序模式和大型数据库中许多其他有趣模式的上下文中的重要性。作者研究了现有理论和技术无法处理的约束。例如,avg(S) /spl theta//spl nu/, median(S) /spl theta//spl nu/, sum(S) /spl theta//spl nu/ (S可以包含任意值的项)(/spl theta//spl isin/{/spl ges/, /spl les/})通常被认为是“严格”的约束,因为它们不能被推入像先验这样的算法中。我们提出了可转换约束的概念,并系统地分析、分类和描述这一类。我们还开发了一些技术,使它们能够很容易地深入到最近开发的用于频繁项集挖掘的fp增长算法中。我们详细的实验结果表明所开发的技术是有效的。
{"title":"Mining frequent itemsets with convertible constraints","authors":"J. Pei, Jiawei Han, L. Lakshmanan","doi":"10.1109/ICDE.2001.914856","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914856","url":null,"abstract":"Recent work has highlighted the importance of the constraint based mining paradigm in the context of frequent itemsets, associations, correlations, sequential patterns, and many other interesting patterns in large databases. The authors study constraints which cannot be handled with existing theory and techniques. For example, avg(S) /spl theta/ /spl nu/, median(S) /spl theta/ /spl nu/, sum(S) /spl theta/ /spl nu/ (S can contain items of arbitrary values) (/spl theta//spl isin/{/spl ges/, /spl les/}), are customarily regarded as \"tough\" constraints in that they cannot be pushed inside an algorithm such as a priori. We develop a notion of convertible constraints and systematically analyze, classify, and characterize this class. We also develop techniques which enable them to be readily pushed deep inside the recently developed FP-growth algorithm for frequent itemset mining. Results from our detailed experiments show the effectiveness of the techniques developed.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128992169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 385
On dual mining: from patterns to circumstances, and back 关于双重挖掘:从模式到环境,再回来
Pub Date : 2001-04-02 DOI: 10.1109/ICDE.2001.914828
G. Grahne, L. Lakshmanan, Xiaohong Wang, M. Xie
Previous work on frequent item set mining has focused on finding all itemsets that are frequent in a specified part of a database. We motivate the dual question of finding under what circumstances a given item set satisfies a pattern of interest (e.g., frequency) in a database. Circumstances form a lattice that generalizes the instance lattice associated with datacube. Exploiting this, we adapt known cube algorithms and propose our own, minCirc, for mining the strongest (e.g., minimal) circumstances under which an itemset satisfies a pattern. Our experiments show that minCirc is competitive with the adapted algorithms. We motivate mining queries involving migration between item set and circumstance lattices and propose the notion of Armstrong Basis as a structure that provides efficient support for such migration queries, as well as a simple algorithm for computing it.
以前关于频繁项集挖掘的工作主要集中在查找数据库指定部分中频繁出现的所有项集。我们激发了一个双重问题,即在什么情况下给定的项目集满足数据库中感兴趣的模式(例如,频率)。环境形成了一个格,它概括了与数据立方体相关的实例格。利用这一点,我们改编了已知的立方体算法,并提出了我们自己的算法minCirc,用于挖掘项目集满足模式的最强(例如,最小)情况。我们的实验表明,minCirc与自适应算法具有竞争力。我们激发了涉及项目集和环境格之间迁移的挖掘查询,并提出了Armstrong Basis的概念,作为为此类迁移查询提供有效支持的结构,以及计算它的简单算法。
{"title":"On dual mining: from patterns to circumstances, and back","authors":"G. Grahne, L. Lakshmanan, Xiaohong Wang, M. Xie","doi":"10.1109/ICDE.2001.914828","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914828","url":null,"abstract":"Previous work on frequent item set mining has focused on finding all itemsets that are frequent in a specified part of a database. We motivate the dual question of finding under what circumstances a given item set satisfies a pattern of interest (e.g., frequency) in a database. Circumstances form a lattice that generalizes the instance lattice associated with datacube. Exploiting this, we adapt known cube algorithms and propose our own, minCirc, for mining the strongest (e.g., minimal) circumstances under which an itemset satisfies a pattern. Our experiments show that minCirc is competitive with the adapted algorithms. We motivate mining queries involving migration between item set and circumstance lattices and propose the notion of Armstrong Basis as a structure that provides efficient support for such migration queries, as well as a simple algorithm for computing it.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"5 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121016035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Exactly-once semantics in a replicated messaging system 复制消息传递系统中的一次语义
Pub Date : 2001-04-02 DOI: 10.1109/ICDE.2001.914808
Yongqiang Huang, H. Garcia-Molina
A wide-area distributed message delivery system can use replication to improve performance and availability. However, without safeguards, replicated messages may be delivered to a mobile device more than once, making the device's user repeat actions (e.g. making unnecessary phone calls, firing weapons repeatedly). Alternatively, they may not be delivered at all, making the user miss important messages. In this paper, we address the problem of exactly-once delivery to mobile clients when messages are replicated globally. We define exactly-once semantics and propose algorithms to guarantee it. We also propose and define a relaxed version of exactly-once semantics which is appropriate for limited-capability mobile devices. We study the relative performance of our algorithms compared to the weaker at-least-once semantics, and find that the performance overhead of exactly-once can be minimized in most cases by careful design of the system.
广域分布式消息传递系统可以使用复制来提高性能和可用性。然而,如果没有保护措施,复制的消息可能会不止一次地传递到移动设备,使设备的用户重复操作(例如,拨打不必要的电话,反复发射武器)。或者,它们可能根本没有被传递,使用户错过重要的信息。在本文中,我们解决了当消息被全局复制时,向移动客户端只传递一次的问题。我们定义了一次语义,并提出了保证它的算法。我们还提出并定义了一种简化版本的“一次语义”,适用于功能有限的移动设备。我们研究了我们的算法与较弱的“至少一次”语义的相对性能,并发现在大多数情况下,通过仔细设计系统,“恰好一次”的性能开销可以最小化。
{"title":"Exactly-once semantics in a replicated messaging system","authors":"Yongqiang Huang, H. Garcia-Molina","doi":"10.1109/ICDE.2001.914808","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914808","url":null,"abstract":"A wide-area distributed message delivery system can use replication to improve performance and availability. However, without safeguards, replicated messages may be delivered to a mobile device more than once, making the device's user repeat actions (e.g. making unnecessary phone calls, firing weapons repeatedly). Alternatively, they may not be delivered at all, making the user miss important messages. In this paper, we address the problem of exactly-once delivery to mobile clients when messages are replicated globally. We define exactly-once semantics and propose algorithms to guarantee it. We also propose and define a relaxed version of exactly-once semantics which is appropriate for limited-capability mobile devices. We study the relative performance of our algorithms compared to the weaker at-least-once semantics, and find that the performance overhead of exactly-once can be minimized in most cases by careful design of the system.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123736243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
An index-based approach for similarity search supporting time warping in large sequence databases 一种支持大型序列数据库时间规整的基于索引的相似性搜索方法
Pub Date : 2001-04-02 DOI: 10.1109/ICDE.2001.914875
Sang-Wook Kim, Sanghyun Park, W. Chu
This paper proposes a new novel method for similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. Our primary goal is to innovate on search performance without permitting any false dismissal. To attain this goal, we devise a new distance function D/sub tw-lb/ that consistently underestimates the time warping distance and also satisfies the triangular inequality D/sub tw-lb/ uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes and D/sub tw-lb/ as a distance function. The extensive experimental results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data and up to 720 times with very large synthetic data.
提出了一种支持大型序列数据库时间规整的相似性搜索新方法。时间扭曲可以发现具有相似模式的序列,即使它们的长度不同。由于时间扭曲距离不满足三角不等式,以往支持时间扭曲的相似度搜索处理方法不能采用多维索引而不产生误解雇。我们的主要目标是在不允许任何虚假解雇的情况下对搜索性能进行创新。为了实现这一目标,我们设计了一个新的距离函数D/sub two -lb/,它始终低估了时间翘曲距离,并且还满足三角形不等式D/sub two -lb/,该函数使用从每个序列中提取的4元组特征向量,并且对时间翘曲不变。为了高效地处理相似性搜索,我们采用了一种多维索引,该索引使用4元组特征向量作为索引属性,D/sub 2 -lb/作为距离函数。广泛的实验结果表明,我们的方法在实际标准普尔500指数股票数据上实现了显著的加速,最高可达43倍,在非常大的合成数据上可达720倍。
{"title":"An index-based approach for similarity search supporting time warping in large sequence databases","authors":"Sang-Wook Kim, Sanghyun Park, W. Chu","doi":"10.1109/ICDE.2001.914875","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914875","url":null,"abstract":"This paper proposes a new novel method for similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. Our primary goal is to innovate on search performance without permitting any false dismissal. To attain this goal, we devise a new distance function D/sub tw-lb/ that consistently underestimates the time warping distance and also satisfies the triangular inequality D/sub tw-lb/ uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes and D/sub tw-lb/ as a distance function. The extensive experimental results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data and up to 720 times with very large synthetic data.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129251051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 324
Rewriting OLAP queries using materialized views and dimension hierarchies in data warehouses 在数据仓库中使用物化视图和维度层次重写OLAP查询
Pub Date : 2001-04-02 DOI: 10.1109/ICDE.2001.914865
Chang-Sup Park, Myoung-Ho Kim, Yoon-Joon Lee
OLAP queries involve a lot of aggregations on a large amount of data in data warehouses. To process expensive OLAP queries efficiently, we propose a new method for rewriting a given OLAP query using the various kinds of materialized aggregate views which already exist in data warehouses. We first define the normal forms of OLAP queries and materialized views based on the lattice of dimension hierarchies and the semantic information in data warehouses. Conditions for the usability of a materialized view in rewriting a given query are specified by relationships between the components of their normal forms. We present a rewriting algorithm for OLAP queries that effectively utilizes existing materialized views. The proposed algorithm can make use of materialized views having different selection granularities, selection regions and aggregation granularities together, to generate an efficient rewritten query.
OLAP查询涉及对数据仓库中的大量数据进行大量聚合。为了有效地处理昂贵的OLAP查询,我们提出了一种使用数据仓库中已经存在的各种物化聚合视图重写给定OLAP查询的新方法。我们首先根据维度层次结构的晶格和数据仓库中的语义信息定义OLAP查询和物化视图的标准形式。物化视图在重写给定查询时的可用性条件由其正常形式的组件之间的关系指定。我们提出了一种针对OLAP查询的重写算法,该算法有效地利用了现有的物化视图。该算法可以将具有不同选择粒度、选择区域和聚合粒度的物化视图结合起来,生成高效的重写查询。
{"title":"Rewriting OLAP queries using materialized views and dimension hierarchies in data warehouses","authors":"Chang-Sup Park, Myoung-Ho Kim, Yoon-Joon Lee","doi":"10.1109/ICDE.2001.914865","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914865","url":null,"abstract":"OLAP queries involve a lot of aggregations on a large amount of data in data warehouses. To process expensive OLAP queries efficiently, we propose a new method for rewriting a given OLAP query using the various kinds of materialized aggregate views which already exist in data warehouses. We first define the normal forms of OLAP queries and materialized views based on the lattice of dimension hierarchies and the semantic information in data warehouses. Conditions for the usability of a materialized view in rewriting a given query are specified by relationships between the components of their normal forms. We present a rewriting algorithm for OLAP queries that effectively utilizes existing materialized views. The proposed algorithm can make use of materialized views having different selection granularities, selection regions and aggregation granularities together, to generate an efficient rewritten query.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125981912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Bringing the Internet to your database: using SQL server 2000 and XML to build loosely-coupled systems 将Internet引入数据库:使用SQL server 2000和XML构建松耦合系统
Pub Date : 2001-03-07 DOI: 10.1109/ICDE.2001.914859
M. Rys
Loosely-coupled, distributed system architectures need to be flexible enough to allow individual components to join or leave the heterogeneous conglomerate of services and components and to change their internal design and data models without jeopardizing the whole architecture. A well-established approach is to use XML as the lingua franca for the integration layer that hides the heterogeneity among the components and provides the glue that allows the individual components to take part in the loosely integrated system. The article focuses on how to provide the basic technology to enable a relational database to become a component in such loosely-coupled systems and it provides an overview of the features that are needed to provide access via HTTP and XML.
松耦合的分布式系统体系结构需要足够灵活,以允许单个组件加入或离开异构的服务和组件组合,并在不损害整个体系结构的情况下更改其内部设计和数据模型。一种公认的方法是使用XML作为集成层的通用语言,它隐藏了组件之间的异构性,并提供了一种粘合剂,允许各个组件参与松散集成的系统。本文重点介绍了如何提供基本技术,使关系数据库成为这种松耦合系统中的一个组件,并概述了通过HTTP和XML提供访问所需的特性。
{"title":"Bringing the Internet to your database: using SQL server 2000 and XML to build loosely-coupled systems","authors":"M. Rys","doi":"10.1109/ICDE.2001.914859","DOIUrl":"https://doi.org/10.1109/ICDE.2001.914859","url":null,"abstract":"Loosely-coupled, distributed system architectures need to be flexible enough to allow individual components to join or leave the heterogeneous conglomerate of services and components and to change their internal design and data models without jeopardizing the whole architecture. A well-established approach is to use XML as the lingua franca for the integration layer that hides the heterogeneity among the components and provides the glue that allows the individual components to take part in the loosely integrated system. The article focuses on how to provide the basic technology to enable a relational database to become a component in such loosely-coupled systems and it provides an overview of the features that are needed to provide access via HTTP and XML.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133294532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
期刊
Proceedings 17th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1