首页 > 最新文献

Proceedings 18th International Conference on Data Engineering最新文献

英文 中文
A framework towards efficient and effective sequence clustering 一种实现高效序列聚类的框架
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994736
Wei Wang, Jiong Yang
Analyzing sequence data (particularly in categorical domains) has become increasingly important, partially due to the significant advances in biology and other fields. Examples of sequence data include DNA sequences, unfolded protein sequences, text documents, Web usage data, system traces, etc. Previous work on mining sequence data has mainly focused on frequent pattern discovery. In this project, we focus on the problem of clustering sequence data.
分析序列数据(特别是在分类领域)变得越来越重要,部分原因是生物学和其他领域的重大进展。序列数据的示例包括DNA序列、未折叠的蛋白质序列、文本文档、Web使用数据、系统跟踪等。以往的序列数据挖掘工作主要集中在频繁模式发现上。本课题主要研究序列数据的聚类问题。
{"title":"A framework towards efficient and effective sequence clustering","authors":"Wei Wang, Jiong Yang","doi":"10.1109/ICDE.2002.994736","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994736","url":null,"abstract":"Analyzing sequence data (particularly in categorical domains) has become increasingly important, partially due to the significant advances in biology and other fields. Examples of sequence data include DNA sequences, unfolded protein sequences, text documents, Web usage data, system traces, etc. Previous work on mining sequence data has mainly focused on frequent pattern discovery. In this project, we focus on the problem of clustering sequence data.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123738981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Content-based video indexing for the support of digital library search 支持数字图书馆检索的基于内容的视频索引
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994766
M. Petkovic, R. V. Zwol, H. Blok, W. Jonker, P. Apers, Menzo Windhouwer, M. Kersten
Presents a digital library search engine that combines efforts of the AMIS and DMW research projects, each covering significant parts of the problem of finding the required information in an enormous mass of data. The most important contributions of our work are the following: (1) We demonstrate a flexible solution for the extraction and querying of meta-data from multimedia documents in general. (2) Scalability and efficiency support are illustrated for full-text indexing and retrieval. (3) We show how, for a more limited domain, like an intranet, conceptual modelling can offer additional and more powerful query facilities. (4) In the limited domain case, we demonstrate how domain knowledge can be used to interpret low-level features into semantic content. In this short description, we focus on the first and fourth items.
提出了一个数字图书馆搜索引擎,它结合了AMIS和DMW研究项目的努力,每个项目都涵盖了在海量数据中寻找所需信息的重要部分。我们的工作最重要的贡献如下:(1)我们展示了一个灵活的解决方案,用于从多媒体文档中提取和查询元数据。(2)说明了全文索引和检索的可扩展性和效率支持。(3)我们展示了如何在更有限的领域,如内部网,概念建模可以提供额外的和更强大的查询工具。(4)在有限领域的情况下,我们展示了如何使用领域知识将低级特征解释为语义内容。在这个简短的描述中,我们主要关注第一项和第四项。
{"title":"Content-based video indexing for the support of digital library search","authors":"M. Petkovic, R. V. Zwol, H. Blok, W. Jonker, P. Apers, Menzo Windhouwer, M. Kersten","doi":"10.1109/ICDE.2002.994766","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994766","url":null,"abstract":"Presents a digital library search engine that combines efforts of the AMIS and DMW research projects, each covering significant parts of the problem of finding the required information in an enormous mass of data. The most important contributions of our work are the following: (1) We demonstrate a flexible solution for the extraction and querying of meta-data from multimedia documents in general. (2) Scalability and efficiency support are illustrated for full-text indexing and retrieval. (3) We show how, for a more limited domain, like an intranet, conceptual modelling can offer additional and more powerful query facilities. (4) In the limited domain case, we demonstrate how domain knowledge can be used to interpret low-level features into semantic content. In this short description, we focus on the first and fourth items.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"39 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123375013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Exploiting local similarity for indexing paths in graph-structured data 利用图结构数据中索引路径的局部相似性
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994703
R. Kaushik, P. Shenoy, P. Bohannon, E. Gudes
XML and other semi-structured data may have partially specified or missing schema information, motivating the use of a structural summary which can be automatically computed from the data. These summaries also serve as indices for evaluating the complex path expressions common to XML and semi-structured query languages. However, to answer all path queries accurately, summaries must encode information about long, seldom-queried paths, leading to increased size and complexity with little added value. We introduce the A(k)-indices, a family of approximate structural summaries. They are based on the concept of k-bisimilarity, in which nodes are grouped based on local structure, i.e., the incoming paths of length up to k. The parameter k thus smoothly varies the level of detail (and accuracy) of the A(k)-index. For small values of k, the size of the index is substantially reduced. While smaller, the A(k) index is approximate, and we describe techniques for efficiently extracting exact answers to regular path queries. Our experiments show that, for moderate values of k, path evaluation using the A(k)-index ranges from being very efficient for simple queries to competitive for most complex queries, while using significantly less space than comparable structures.
XML和其他半结构化数据可能有部分指定的或缺失的模式信息,这促使使用可以从数据自动计算的结构化摘要。这些摘要还用作索引,用于评估XML和半结构化查询语言中常见的复杂路径表达式。然而,为了准确地回答所有的路径查询,摘要必须编码关于长且很少查询的路径的信息,这会导致大小和复杂性的增加,而附加值却很少。我们引入了A(k)指标,一类近似的结构摘要。它们基于k-双相似性的概念,其中节点根据局部结构分组,即长度不超过k的传入路径。因此,参数k平滑地改变A(k)索引的详细程度(和精度)。对于较小的k值,索引的大小会大大减小。虽然较小,但A(k)索引是近似值,并且我们描述了有效提取常规路径查询的精确答案的技术。我们的实验表明,对于k的中等值,使用A(k)-索引的路径评估范围从对简单查询非常有效到对大多数复杂查询具有竞争力,同时比可比结构使用更少的空间。
{"title":"Exploiting local similarity for indexing paths in graph-structured data","authors":"R. Kaushik, P. Shenoy, P. Bohannon, E. Gudes","doi":"10.1109/ICDE.2002.994703","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994703","url":null,"abstract":"XML and other semi-structured data may have partially specified or missing schema information, motivating the use of a structural summary which can be automatically computed from the data. These summaries also serve as indices for evaluating the complex path expressions common to XML and semi-structured query languages. However, to answer all path queries accurately, summaries must encode information about long, seldom-queried paths, leading to increased size and complexity with little added value. We introduce the A(k)-indices, a family of approximate structural summaries. They are based on the concept of k-bisimilarity, in which nodes are grouped based on local structure, i.e., the incoming paths of length up to k. The parameter k thus smoothly varies the level of detail (and accuracy) of the A(k)-index. For small values of k, the size of the index is substantially reduced. While smaller, the A(k) index is approximate, and we describe techniques for efficiently extracting exact answers to regular path queries. Our experiments show that, for moderate values of k, path evaluation using the A(k)-index ranges from being very efficient for simple queries to competitive for most complex queries, while using significantly less space than comparable structures.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125912970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 295
Out from under the trees [linear file template] 从树下出来[线性文件模板]
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994719
C. Jermaine, E. Omiecinski, Wai Gen Yee
We introduce the linear file template, which is a generic data organization suitable for use with many different types of data. The linear file is specifically designed to handle intense database update loads concurrently with processing of analytic queries.
我们将介绍线性文件模板,它是一种通用的数据组织,适用于许多不同类型的数据。线性文件是专门设计来处理密集的数据库更新负载,同时处理分析查询。
{"title":"Out from under the trees [linear file template]","authors":"C. Jermaine, E. Omiecinski, Wai Gen Yee","doi":"10.1109/ICDE.2002.994719","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994719","url":null,"abstract":"We introduce the linear file template, which is a generic data organization suitable for use with many different types of data. The linear file is specifically designed to handle intense database update loads concurrently with processing of analytic queries.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114167599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predator-Miner: ad hoc mining of associations rules within a database management system 捕食者-挖掘者:在数据库管理系统中对关联规则进行特别挖掘
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994741
W. Tok, Twee-Hee Ong, Wai Lup Low, I. Atmosukarto, S. Bressan
We present a prototype system, Predator-Miner, which extends Predator with an relational-like association rule mining operator to support data mining operations. Predator-Miner allows a user to combine association rule mining queries with SQL queries. This approach towards tight integration differs from existing techniques of using user-defined functions (UDFs), stored procedures, or re-expressing a mining query as several SQL queries in two aspects. First, by encapsulating the task of association rule mining in a relational operator, we allow association rule mining to be considered as part of the query plan, on which query optimization can be performed on the mining query holistically. Second, by integrating it as a relational operator, we can leverage on the mature field of relational database technology. We extend Predator to support a variant of DMQL, and allow SQL and DMQL to be intermixed in a query. We also demonstrate a cost-based mining query optimization framework.
我们提出了一个原型系统,捕食者-矿工,它扩展了捕食者与一个类似关系的关联规则挖掘算子,以支持数据挖掘操作。掠夺者-挖掘者允许用户将关联规则挖掘查询与SQL查询结合起来。这种实现紧密集成的方法与使用用户定义函数(udf)、存储过程或将挖掘查询重新表示为几个SQL查询的现有技术在两个方面有所不同。首先,通过将关联规则挖掘任务封装在关系操作符中,我们允许将关联规则挖掘视为查询计划的一部分,从而可以在此基础上对挖掘查询整体执行查询优化。其次,通过将其集成为关系运算符,我们可以利用成熟的关系数据库技术领域。我们扩展了Predator以支持DMQL的变体,并允许在查询中混合使用SQL和DMQL。我们还演示了一个基于成本的挖掘查询优化框架。
{"title":"Predator-Miner: ad hoc mining of associations rules within a database management system","authors":"W. Tok, Twee-Hee Ong, Wai Lup Low, I. Atmosukarto, S. Bressan","doi":"10.1109/ICDE.2002.994741","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994741","url":null,"abstract":"We present a prototype system, Predator-Miner, which extends Predator with an relational-like association rule mining operator to support data mining operations. Predator-Miner allows a user to combine association rule mining queries with SQL queries. This approach towards tight integration differs from existing techniques of using user-defined functions (UDFs), stored procedures, or re-expressing a mining query as several SQL queries in two aspects. First, by encapsulating the task of association rule mining in a relational operator, we allow association rule mining to be considered as part of the query plan, on which query optimization can be performed on the mining query holistically. Second, by integrating it as a relational operator, we can leverage on the mature field of relational database technology. We extend Predator to support a variant of DMQL, and allow SQL and DMQL to be intermixed in a query. We also demonstrate a cost-based mining query optimization framework.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116729368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How good are association-rule mining algorithms? 关联规则挖掘算法有多好?
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994730
Vikram Pudi, J. Haritsa
Addresses the question of how much space remains for performance improvement over current association rule mining algorithms. Our approach is to compare their performance against an "Oracle algorithm" that knows in advance the identities of all frequent item sets in the database and only needs to gather the actual supports of these item sets, in one scan over the database, to complete the mining process. Clearly, any practical algorithm has to do at least this much work in order to generate mining rules. While the notion of the Oracle is conceptually simple, its construction is not equally straightforward. In particular, it is critically dependent on the choice of data structures and database organizations used during the counting process. We present a carefully engineered implementation of Oracle that makes the best choices for these design parameters at each stage of the counting process. We also present anew mining algorithm, called ARMOR (Association Rule Mining based on ORacle), whose structure is derived by making minimal changes to Oracle, and is guaranteed to complete in two passes over the database. This is in marked contrast to the earlier approaches which designed new algorithms by trying to address the limitations of previous online algorithms. Although ARMOR is derived from Oracle, it shares the positive features of a variety of previous algorithms such as PARTITION, CARMA, AS-CPA, VIPER and DELTA. Our empirical study shows that ARMOR consistently performs within a factor of two of Oracle, over both real and synthetic databases.
解决了与当前关联规则挖掘算法相比,还有多少空间用于性能改进的问题。我们的方法是将它们的性能与“Oracle算法”进行比较,该算法提前知道数据库中所有频繁项集的身份,并且只需要在一次数据库扫描中收集这些项集的实际支持,就可以完成挖掘过程。显然,为了生成挖掘规则,任何实用的算法都至少要做这么多的工作。虽然Oracle的概念在概念上很简单,但它的构造并不同样简单。特别是,它严重依赖于计数过程中使用的数据结构和数据库组织的选择。我们提出了一个精心设计的Oracle实现,在计数过程的每个阶段为这些设计参数做出最佳选择。我们还提出了一种新的挖掘算法,称为ARMOR(基于ORacle的关联规则挖掘),其结构是通过对ORacle进行最小的更改而获得的,并保证在两次数据库传递中完成。这与早期的方法形成鲜明对比,这些方法通过尝试解决以前在线算法的局限性来设计新算法。虽然ARMOR源自Oracle,但它分享了先前各种算法的积极特征,如PARTITION, CARMA, as - cpa, VIPER和DELTA。我们的实证研究表明,在真实数据库和合成数据库上,ARMOR的性能始终保持在Oracle的两倍之内。
{"title":"How good are association-rule mining algorithms?","authors":"Vikram Pudi, J. Haritsa","doi":"10.1109/ICDE.2002.994730","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994730","url":null,"abstract":"Addresses the question of how much space remains for performance improvement over current association rule mining algorithms. Our approach is to compare their performance against an \"Oracle algorithm\" that knows in advance the identities of all frequent item sets in the database and only needs to gather the actual supports of these item sets, in one scan over the database, to complete the mining process. Clearly, any practical algorithm has to do at least this much work in order to generate mining rules. While the notion of the Oracle is conceptually simple, its construction is not equally straightforward. In particular, it is critically dependent on the choice of data structures and database organizations used during the counting process. We present a carefully engineered implementation of Oracle that makes the best choices for these design parameters at each stage of the counting process. We also present anew mining algorithm, called ARMOR (Association Rule Mining based on ORacle), whose structure is derived by making minimal changes to Oracle, and is guaranteed to complete in two passes over the database. This is in marked contrast to the earlier approaches which designed new algorithms by trying to address the limitations of previous online algorithms. Although ARMOR is derived from Oracle, it shares the positive features of a variety of previous algorithms such as PARTITION, CARMA, AS-CPA, VIPER and DELTA. Our empirical study shows that ARMOR consistently performs within a factor of two of Oracle, over both real and synthetic databases.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127799743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
BestPeer: a self-configurable peer-to-peer system BestPeer:一个自配置的点对点系统
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994726
W. Ng, B. Ooi, K. Tan
We present BestPeer, a prototype P2P system that we have implemented at the National University of Singapore. BestPeer is a generic P2P system designed to serve as a platform on which P2P applications can be developed easily and efficiently. The network consists of two types of entities: a large number of computers (nodes), and a relatively fewer number of location independent global name lookup (LIGLO) servers. Each participating node runs the BestPeer (Java-based) software and will be able to communicate or share resources with any other nodes (i.e., peers) in the BestPeer network. Each node comprises two types of data: private data and sharable data. Nodes can only access peers' data that are sharable.
我们将介绍BestPeer,一个我们在新加坡国立大学实施的P2P系统原型。BestPeer是一个通用的P2P系统,旨在作为一个平台,在这个平台上,P2P应用程序可以轻松有效地开发。网络由两种类型的实体组成:大量的计算机(节点)和相对较少的位置独立全局名称查找(LIGLO)服务器。每个参与节点运行BestPeer(基于java的)软件,并将能够与BestPeer网络中的任何其他节点(即对等节点)通信或共享资源。每个节点包含两种类型的数据:私有数据和共享数据。节点只能访问可共享的节点数据。
{"title":"BestPeer: a self-configurable peer-to-peer system","authors":"W. Ng, B. Ooi, K. Tan","doi":"10.1109/ICDE.2002.994726","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994726","url":null,"abstract":"We present BestPeer, a prototype P2P system that we have implemented at the National University of Singapore. BestPeer is a generic P2P system designed to serve as a platform on which P2P applications can be developed easily and efficiently. The network consists of two types of entities: a large number of computers (nodes), and a relatively fewer number of location independent global name lookup (LIGLO) servers. Each participating node runs the BestPeer (Java-based) software and will be able to communicate or share resources with any other nodes (i.e., peers) in the BestPeer network. Each node comprises two types of data: private data and sharable data. Nodes can only access peers' data that are sharable.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133931933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 83
Fast mining of massive tabular data via approximate distance computations 通过近似距离计算快速挖掘大量表格数据
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994778
Graham Cormode, P. Indyk, Nick Koudas, S. Muthukrishnan
Tabular data abound in many data stores: traditional relational databases store tables, and new applications also generate massive tabular datasets. We present methods for determining similar regions in massive tabular data. Our methods are for computing the "distance" between any two subregions of tabular data: they are approximate, but highly accurate as we prove mathematically, and they are fast, running in time nearly linear in the table size. Our methods are general since these distance computations can be applied to any mining or similarity algorithms that use L/sub p/ norms. A novelty of our distance computation procedures is that they work for any L/sub p/ norms, not only the traditional p = 2 or p = 1, but for all p /spl les/ 2; the choice of p, say fractional p, provides an interesting alternative similarity behavior! We use our algorithms in a detailed experimental study of the clustering patterns in real tabular data obtained from one of AT&T's data stores and show that our methods are substantially faster than straightforward methods while remaining highly accurate, and able to detect interesting patterns by varying the value of p.
表格数据在许多数据存储中大量存在:传统的关系数据库存储表格,新的应用程序也生成大量的表格数据集。我们提出了在大量表格数据中确定相似区域的方法。我们的方法用于计算表格数据的任意两个子区域之间的“距离”:它们是近似的,但正如我们在数学上证明的那样非常精确,而且它们速度很快,在表大小上运行的时间几乎是线性的。我们的方法是通用的,因为这些距离计算可以应用于使用L/ p/范数的任何挖掘或相似算法。我们的距离计算程序的一个新颖之处在于,它们适用于任何L/ p/规范,不仅适用于传统的p = 2或p = 1,而且适用于所有p/ spl les/ 2;选择p,比如分数p,提供了一种有趣的替代相似性行为!我们在AT&T的一个数据存储中获得的真实表格数据的聚类模式的详细实验研究中使用了我们的算法,并表明我们的方法比直接方法要快得多,同时保持了高度的准确性,并且能够通过改变p的值来检测有趣的模式。
{"title":"Fast mining of massive tabular data via approximate distance computations","authors":"Graham Cormode, P. Indyk, Nick Koudas, S. Muthukrishnan","doi":"10.1109/ICDE.2002.994778","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994778","url":null,"abstract":"Tabular data abound in many data stores: traditional relational databases store tables, and new applications also generate massive tabular datasets. We present methods for determining similar regions in massive tabular data. Our methods are for computing the \"distance\" between any two subregions of tabular data: they are approximate, but highly accurate as we prove mathematically, and they are fast, running in time nearly linear in the table size. Our methods are general since these distance computations can be applied to any mining or similarity algorithms that use L/sub p/ norms. A novelty of our distance computation procedures is that they work for any L/sub p/ norms, not only the traditional p = 2 or p = 1, but for all p /spl les/ 2; the choice of p, say fractional p, provides an interesting alternative similarity behavior! We use our algorithms in a detailed experimental study of the clustering patterns in real tabular data obtained from one of AT&T's data stores and show that our methods are substantially faster than straightforward methods while remaining highly accurate, and able to detect interesting patterns by varying the value of p.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131854694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
A non-blocking parallel spatial join algorithm 一种非阻塞并行空间连接算法
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994786
Gang Luo, J. Naughton, Curt J. Ellmann
Interest in incremental and adaptive query processing has led to the investigation of equijoin evaluation algorithms that are non-blocking. This investigation has yielded a number of algorithms, including the symmetric hash join, the XJoin, the Ripple Join, and their variants. However, to our knowledge no one has proposed a nonblocking spatial join algorithm. In this paper, we propose a parallel non-blocking spatial join algorithm that uses duplicate avoidance rather than duplicate elimination. Results from a prototype implementation in a commercial parallel object-relational DBMS show that it generates answer tuples steadily even in the presence of memory overflow, and that its rate of producing answer tuples scales with the number of processors. Also, when allowed to run to completion, its performance is comparable with the state-of-the-art blocking parallel spatial join algorithm.
对增量和自适应查询处理的兴趣导致了对非阻塞的等联接计算算法的研究。这项研究产生了许多算法,包括对称散列连接、XJoin、Ripple join及其变体。然而,据我们所知,还没有人提出非阻塞空间连接算法。在本文中,我们提出了一种并行非阻塞空间连接算法,该算法使用重复避免而不是重复消除。在商业并行对象关系DBMS中的原型实现结果表明,即使在内存溢出的情况下,它也能稳定地生成答案元组,并且其生成答案元组的速度随处理器数量的增加而增加。此外,当允许运行到完成时,其性能可与最先进的阻塞并行空间连接算法相媲美。
{"title":"A non-blocking parallel spatial join algorithm","authors":"Gang Luo, J. Naughton, Curt J. Ellmann","doi":"10.1109/ICDE.2002.994786","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994786","url":null,"abstract":"Interest in incremental and adaptive query processing has led to the investigation of equijoin evaluation algorithms that are non-blocking. This investigation has yielded a number of algorithms, including the symmetric hash join, the XJoin, the Ripple Join, and their variants. However, to our knowledge no one has proposed a nonblocking spatial join algorithm. In this paper, we propose a parallel non-blocking spatial join algorithm that uses duplicate avoidance rather than duplicate elimination. Results from a prototype implementation in a commercial parallel object-relational DBMS show that it generates answer tuples steadily even in the presence of memory overflow, and that its rate of producing answer tuples scales with the number of processors. Also, when allowed to run to completion, its performance is comparable with the state-of-the-art blocking parallel spatial join algorithm.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134116215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Database replication for the mobile era 移动时代的数据库复制
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994761
A. Wolski
{"title":"Database replication for the mobile era","authors":"A. Wolski","doi":"10.1109/ICDE.2002.994761","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994761","url":null,"abstract":"","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116995729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings 18th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1