首页 > 最新文献

Proceedings. 20th International Conference on Data Engineering最新文献

英文 中文
GenExplore: interactive exploration of gene interactions from microarray data GenExplore:从微阵列数据中对基因相互作用进行交互式探索
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320088
Yong Ye, Xintao Wu, K. Subramanian, Liying Zhang
DNA microarray provides a powerful basis for analysis of gene expression. Data mining methods such as clustering have been widely applied to microarray data to link genes that show similar expression patterns. However, this approach usually fails to unveil gene-gene interactions in the same cluster. We propose to combine graphical model based interaction analysis with other data mining techniques (e.g., association rule, hierarchical clustering) for this purpose. For interaction analysis, we propose the use of graphical Gaussian model to discover pairwise gene interactions and loglinear model to discover multigene interactions. We have constructed a prototype system that permits rapid interactive exploration of gene relationships.
DNA微阵列为基因表达分析提供了强有力的基础。聚类等数据挖掘方法已广泛应用于微阵列数据,以链接显示相似表达模式的基因。然而,这种方法通常无法揭示同一簇中基因-基因之间的相互作用。为此,我们建议将基于图形模型的交互分析与其他数据挖掘技术(例如,关联规则,分层聚类)相结合。对于相互作用分析,我们提出使用图形高斯模型来发现基因的成对相互作用,使用对数线性模型来发现多基因的相互作用。我们已经建立了一个原型系统,允许快速互动探索基因关系。
{"title":"GenExplore: interactive exploration of gene interactions from microarray data","authors":"Yong Ye, Xintao Wu, K. Subramanian, Liying Zhang","doi":"10.1109/ICDE.2004.1320088","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320088","url":null,"abstract":"DNA microarray provides a powerful basis for analysis of gene expression. Data mining methods such as clustering have been widely applied to microarray data to link genes that show similar expression patterns. However, this approach usually fails to unveil gene-gene interactions in the same cluster. We propose to combine graphical model based interaction analysis with other data mining techniques (e.g., association rule, hierarchical clustering) for this purpose. For interaction analysis, we propose the use of graphical Gaussian model to discover pairwise gene interactions and loglinear model to discover multigene interactions. We have constructed a prototype system that permits rapid interactive exploration of gene relationships.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128015724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Making the pyramid technique robust to query types and workloads 使金字塔技术对查询类型和工作负载具有鲁棒性
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320007
Rui Zhang, B. Ooi, K. Tan
The effectiveness of many existing high-dimensional indexing structures is limited to specific types of queries and workloads. For example, while the Pyramid technique and the iMinMax are efficient for window queries, the iDistance is superior for kNN queries. We present a new structure, called the P/sup +/-tree, that supports both window queries and kNN queries under different workloads efficiently. In the P/sup +/-tree, a B/sup +/-tree is employed to index the data points as follows. The data space is partitioned into subspaces based on clustering, and points in each subspace are mapped onto a single dimensional space using the Pyramid technique, and stored in the B/sup +/ -tree. The crux of the scheme lies in the transformation of the data which has two crucial properties. First, it maps each subspace into a hypercube so that the Pyramid technique can be applied. Second, it shifts the cluster center to the top of the pyramid, which is the case that the Pyramid technique works very efficiently. We present window and kNN query processing algorithms for the P/sup +/-tree. Through an extensive performance study, we show that the P/sup +/-tree has considerable speedup over the Pyramid technique and the iMinMax for window queries and outperforms the iDistance for kNN queries.
许多现有的高维索引结构的有效性仅限于特定类型的查询和工作负载。例如,虽然Pyramid技术和iMinMax对于窗口查询是有效的,但是iDistance对于kNN查询是更优的。我们提出了一种新的结构,称为P/sup +/-树,它有效地支持不同工作负载下的窗口查询和kNN查询。在P/sup +/-树中,使用B/sup +/-树对数据点进行如下索引。数据空间基于聚类划分为子空间,每个子空间中的点使用金字塔技术映射到单维空间,并存储在B/sup +/ -树中。该方案的关键在于数据的转换,数据的转换有两个关键的性质。首先,它将每个子空间映射到一个超立方体,以便可以应用金字塔技术。其次,它将集群中心转移到金字塔的顶部,这是金字塔技术非常有效的情况。提出了P/sup +/-树的窗口查询处理算法和kNN查询处理算法。通过广泛的性能研究,我们表明P/sup +/-树在窗口查询方面比金字塔技术和iMinMax有相当大的加速,并且在kNN查询方面优于iDistance。
{"title":"Making the pyramid technique robust to query types and workloads","authors":"Rui Zhang, B. Ooi, K. Tan","doi":"10.1109/ICDE.2004.1320007","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320007","url":null,"abstract":"The effectiveness of many existing high-dimensional indexing structures is limited to specific types of queries and workloads. For example, while the Pyramid technique and the iMinMax are efficient for window queries, the iDistance is superior for kNN queries. We present a new structure, called the P/sup +/-tree, that supports both window queries and kNN queries under different workloads efficiently. In the P/sup +/-tree, a B/sup +/-tree is employed to index the data points as follows. The data space is partitioned into subspaces based on clustering, and points in each subspace are mapped onto a single dimensional space using the Pyramid technique, and stored in the B/sup +/ -tree. The crux of the scheme lies in the transformation of the data which has two crucial properties. First, it maps each subspace into a hypercube so that the Pyramid technique can be applied. Second, it shifts the cluster center to the top of the pyramid, which is the case that the Pyramid technique works very efficiently. We present window and kNN query processing algorithms for the P/sup +/-tree. Through an extensive performance study, we show that the P/sup +/-tree has considerable speedup over the Pyramid technique and the iMinMax for window queries and outperforms the iDistance for kNN queries.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120917431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Multi-scale histograms for answering queries over time series data 用于回答时间序列数据查询的多尺度直方图
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320068
Lei Chen, M. Tamer Özsu
Similarity-based time series data retrieval has been used in many real world applications, such as stock data or weather data analysis. Two types of queries on time series data are generally studied: pattern existence queries and exact match queries. Here, we describe a technique to answer both pattern existence queries and exact match queries. A typical application that needs answers to both queries is an interactive analysis of time series data. We propose a histogram-based representation to approximate time series data.
基于相似性的时间序列数据检索已经在许多实际应用程序中使用,例如股票数据或天气数据分析。通常研究两种类型的时间序列查询:模式存在查询和精确匹配查询。在这里,我们描述一种同时回答模式存在查询和精确匹配查询的技术。需要回答这两个查询的典型应用程序是时间序列数据的交互式分析。我们提出了一种基于直方图的表示来近似时间序列数据。
{"title":"Multi-scale histograms for answering queries over time series data","authors":"Lei Chen, M. Tamer Özsu","doi":"10.1109/ICDE.2004.1320068","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320068","url":null,"abstract":"Similarity-based time series data retrieval has been used in many real world applications, such as stock data or weather data analysis. Two types of queries on time series data are generally studied: pattern existence queries and exact match queries. Here, we describe a technique to answer both pattern existence queries and exact match queries. A typical application that needs answers to both queries is an interactive analysis of time series data. We propose a histogram-based representation to approximate time series data.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121121704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Mining frequent labeled and partially labeled graph patterns 挖掘频繁标记和部分标记的图形模式
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319987
N. Vanetik, E. Gudes
Whereas data mining in structured data focuses on frequent data values, in semistructured and graph data the emphasis is on frequent labels and common topologies. Here, the structure of the data is just as important as its content. When data contains large amount of different labels, both fully labeled and partially labeled data may be useful. More informative patterns can be found in the database if some of the pattern nodes can be regarded as 'unlabeled'. We study the problem of discovering typical fully and partially labeled patterns of graph data. Discovered patterns are useful in many applications, including: compact representation of source information and a road-map for browsing and querying information sources.
结构化数据中的数据挖掘侧重于频繁的数据值,而在半结构化和图形数据中,重点是频繁的标签和公共拓扑。在这里,数据的结构和内容一样重要。当数据包含大量不同的标签时,完全标记和部分标记的数据都可能有用。如果可以将一些模式节点视为“未标记”,则可以在数据库中找到更多信息丰富的模式。研究了图数据中典型的全标记和部分标记模式的发现问题。发现的模式在许多应用程序中都很有用,包括:源信息的紧凑表示和用于浏览和查询信息源的路线图。
{"title":"Mining frequent labeled and partially labeled graph patterns","authors":"N. Vanetik, E. Gudes","doi":"10.1109/ICDE.2004.1319987","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319987","url":null,"abstract":"Whereas data mining in structured data focuses on frequent data values, in semistructured and graph data the emphasis is on frequent labels and common topologies. Here, the structure of the data is just as important as its content. When data contains large amount of different labels, both fully labeled and partially labeled data may be useful. More informative patterns can be found in the database if some of the pattern nodes can be regarded as 'unlabeled'. We study the problem of discovering typical fully and partially labeled patterns of graph data. Discovered patterns are useful in many applications, including: compact representation of source information and a road-map for browsing and querying information sources.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123286524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A succinct physical storage scheme for efficient evaluation of path queries in XML 一个简洁的物理存储方案,用于有效地计算XML中的路径查询
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319984
Ning Zhang, V. Kacholia, M. Tamer Özsu
Path expressions are ubiquitous in XML processing languages. Existing approaches evaluate a path expression by selecting nodes that satisfies the tag-name and value constraints and then joining them according to the structural constraints. We propose a novel approach, next-of-kin (NoK) pattern matching, to speed up the node-selection step, and to reduce the join size significantly in the second step. To efficiently perform NoK pattern matching, we also propose a succinct XML physical storage scheme that is adaptive to updates and streaming XML as well. Our performance results demonstrate that the proposed storage scheme and path evaluation algorithm is highly efficient and outperforms the other tested systems in most cases.
路径表达式在XML处理语言中无处不在。现有的方法通过选择满足标记名称和值约束的节点来评估路径表达式,然后根据结构约束将它们连接起来。我们提出了一种新的方法,近亲(NoK)模式匹配,以加快节点选择步骤,并在第二步显著减少连接大小。为了有效地执行NoK模式匹配,我们还提出了一种简洁的XML物理存储方案,该方案也适用于更新和流XML。我们的性能结果表明,所提出的存储方案和路径评估算法在大多数情况下都是高效的,并且优于其他测试系统。
{"title":"A succinct physical storage scheme for efficient evaluation of path queries in XML","authors":"Ning Zhang, V. Kacholia, M. Tamer Özsu","doi":"10.1109/ICDE.2004.1319984","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319984","url":null,"abstract":"Path expressions are ubiquitous in XML processing languages. Existing approaches evaluate a path expression by selecting nodes that satisfies the tag-name and value constraints and then joining them according to the structural constraints. We propose a novel approach, next-of-kin (NoK) pattern matching, to speed up the node-selection step, and to reduce the join size significantly in the second step. To efficiently perform NoK pattern matching, we also propose a succinct XML physical storage scheme that is adaptive to updates and streaming XML as well. Our performance results demonstrate that the proposed storage scheme and path evaluation algorithm is highly efficient and outperforms the other tested systems in most cases.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116584510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 106
Multiresolution indexing of XML for frequent queries 用于频繁查询的XML多分辨率索引
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320037
Hao He, Jun Yang
XML and other types of semistructured data are typically represented by a labeled directed graph. To speed up path expression queries over the graph, a variety of structural indexes have been proposed. They usually work by partitioning nodes in the data graph into equivalence classes and storing equivalence classes as index nodes. A(k)-index introduces the concept of local bisimilarity for partitioning, allowing the trade-off between index size and query answering power. However, all index nodes in A(k)-index have the same local similarity k, which cannot take advantage of the fact that a workload may contain path expressions of different lengths, or that different parts of the data graph may have different local similarity requirements. To overcome these limitations, we propose M(k)- and M*(k)-indexes. The basic M(k)-index is workload-aware: Like the previously proposed D(k)-index, it allows different index nodes to have different local similarity requirements, providing finer partitioning only for parts of the data graph targeted by longer path expressions. Unlike D(k)-index, M(k)-index is never over-refined for irrelevant index or data nodes. However, the workload-aware feature still incurs overrefinement due to over-qualified parent index nodes. Moreover, fine partitions penalize the performance of short path expressions. To solve these problems, we further propose the M*(k)-index. An M*(k)-index consists of a collection of indexes whose nodes are organized in a partition hierarchy, allowing successively coarser partitioning information to co-exist with the finest partitioning information required. Experiments show that our indexes are superior to previously proposed indexes in terms of index size and query performance.
XML和其他类型的半结构化数据通常由标记的有向图表示。为了加快对图的路径表达式查询,提出了各种结构索引。它们通常通过将数据图中的节点划分为等价类并将等价类存储为索引节点来工作。A(k)-index为分区引入了局部双相似度的概念,允许在索引大小和查询应答能力之间进行权衡。但是,A(k)-index中的所有索引节点具有相同的局部相似度k,这无法利用工作负载可能包含不同长度的路径表达式,或者数据图的不同部分可能具有不同的局部相似度需求的事实。为了克服这些限制,我们提出了M(k)-和M*(k)-指标。基本的M(k)-索引是工作负载敏感的:与之前提出的D(k)-索引一样,它允许不同的索引节点具有不同的局部相似性需求,仅为更长的路径表达式所针对的数据图的部分提供更精细的分区。与D(k)-index不同,M(k)-index不会对不相关的索引或数据节点进行过度细化。但是,工作负载感知特性仍然会由于父索引节点的过度限定而导致过度细化。此外,细分区会影响短路径表达式的性能。为了解决这些问题,我们进一步提出M*(k)-指标。一个M*(k)-索引由一组索引组成,这些索引的节点被组织在一个分区层次结构中,允许连续较粗的分区信息与所需的最优分区信息共存。实验表明,我们的索引在索引大小和查询性能方面优于先前提出的索引。
{"title":"Multiresolution indexing of XML for frequent queries","authors":"Hao He, Jun Yang","doi":"10.1109/ICDE.2004.1320037","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320037","url":null,"abstract":"XML and other types of semistructured data are typically represented by a labeled directed graph. To speed up path expression queries over the graph, a variety of structural indexes have been proposed. They usually work by partitioning nodes in the data graph into equivalence classes and storing equivalence classes as index nodes. A(k)-index introduces the concept of local bisimilarity for partitioning, allowing the trade-off between index size and query answering power. However, all index nodes in A(k)-index have the same local similarity k, which cannot take advantage of the fact that a workload may contain path expressions of different lengths, or that different parts of the data graph may have different local similarity requirements. To overcome these limitations, we propose M(k)- and M*(k)-indexes. The basic M(k)-index is workload-aware: Like the previously proposed D(k)-index, it allows different index nodes to have different local similarity requirements, providing finer partitioning only for parts of the data graph targeted by longer path expressions. Unlike D(k)-index, M(k)-index is never over-refined for irrelevant index or data nodes. However, the workload-aware feature still incurs overrefinement due to over-qualified parent index nodes. Moreover, fine partitions penalize the performance of short path expressions. To solve these problems, we further propose the M*(k)-index. An M*(k)-index consists of a collection of indexes whose nodes are organized in a partition hierarchy, allowing successively coarser partitioning information to co-exist with the finest partitioning information required. Experiments show that our indexes are superior to previously proposed indexes in terms of index size and query performance.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115510562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
ItCompress: an iterative semantic compression algorithm ItCompress:迭代语义压缩算法
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320034
H. Jagadish, R. Ng, B. Ooi, A. Tung
Real datasets are often large enough to necessitate data compression. Traditional 'syntactic' data compression methods treat the table as a large byte string and operate at the byte level. The tradeoff in such cases is usually between the ease of retrieval (the ease with which one can retrieve a single tuple or attribute value without decompressing a much larger unit) and the effectiveness of the compression. In this regard, the use of semantic compression has generated considerable interest and motivated certain recent works. We propose a semantic compression algorithm called ItCompress ITerative Compression, which achieves good compression while permitting access even at attribute level without requiring the decompression of a larger unit. ItCompress iteratively improves the compression ratio of the compressed output during each scan of the table. The amount of compression can be tuned based on the number of iterations. Moreover, the initial iterations provide significant compression, thereby making it a cost-effective compression technique. Extensive experiments were conducted and the results indicate the superiority of ItCompress with respect to previously known techniques, such as 'SPARTAN' and 'fascicles'.
真实的数据集通常足够大,需要进行数据压缩。传统的“语法”数据压缩方法将表视为一个大字节字符串,并在字节级别进行操作。这种情况下的权衡通常是在检索的便利性(检索单个元组或属性值而无需解压缩更大的单元的便利性)和压缩的有效性之间进行的。在这方面,语义压缩的使用引起了相当大的兴趣,并激发了最近的一些研究。我们提出了一种称为ItCompress迭代压缩的语义压缩算法,它实现了良好的压缩,同时允许在属性级别访问,而不需要对更大的单元进行解压缩。ItCompress在每次扫描表期间迭代地提高压缩输出的压缩比。压缩量可以根据迭代次数进行调整。此外,初始迭代提供了显著的压缩,从而使其成为一种经济有效的压缩技术。进行了大量的实验,结果表明ItCompress相对于先前已知的技术(如“SPARTAN”和“fascicles”)具有优势。
{"title":"ItCompress: an iterative semantic compression algorithm","authors":"H. Jagadish, R. Ng, B. Ooi, A. Tung","doi":"10.1109/ICDE.2004.1320034","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320034","url":null,"abstract":"Real datasets are often large enough to necessitate data compression. Traditional 'syntactic' data compression methods treat the table as a large byte string and operate at the byte level. The tradeoff in such cases is usually between the ease of retrieval (the ease with which one can retrieve a single tuple or attribute value without decompressing a much larger unit) and the effectiveness of the compression. In this regard, the use of semantic compression has generated considerable interest and motivated certain recent works. We propose a semantic compression algorithm called ItCompress ITerative Compression, which achieves good compression while permitting access even at attribute level without requiring the decompression of a larger unit. ItCompress iteratively improves the compression ratio of the compressed output during each scan of the table. The amount of compression can be tuned based on the number of iterations. Moreover, the initial iterations provide significant compression, thereby making it a cost-effective compression technique. Extensive experiments were conducted and the results indicate the superiority of ItCompress with respect to previously known techniques, such as 'SPARTAN' and 'fascicles'.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115641156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Incorporating updates in domain indexes: experiences with Oracle spatial R-trees 在域索引中整合更新:使用Oracle空间r树的经验
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320042
K. Kanth, S. Ravada, N. An
Much research has been devoted to scalable storage and retrieval techniques for domain databases such as spatial, text, XML and gene sequence data. Many efficient indexing techniques have been developed in this context. Given the improvement in the underlying technology, database applications are increasingly using domain data in transactional semantics. We examine the issue of when during the lifetime of a transaction is it better to incorporate updates in domain indexes. We present our experiences with R-tree indexes in Oracle. We examine two approaches for incorporating updates in spatial R-tree indexes: the first at update time, and the second at commit time. The first approach immediately incorporates changes in the index right away using system transactions and at commit time makes them visible to other transactions. The second approach, referred to as the deferred-incorporate approach, defers the updates in a secondary table and incorporates the changes in the index only at commit time. In experiments on real data sets, we compare the performance of the two approaches. For most transactions with reasonable number of update operations, we observe that the deferred approach outperforms the immediate-incorporate approach significantly for update operations and with appropriate optimizations achieves comparable query performance.
空间数据库、文本数据库、XML数据库和基因序列数据库等领域数据库的可扩展存储和检索技术已经得到了大量的研究。在此背景下开发了许多高效的索引技术。由于底层技术的改进,数据库应用程序越来越多地在事务语义中使用域数据。我们将研究在事务的生命周期中,什么时候最好将更新合并到域索引中。我们介绍了我们在Oracle中使用r树索引的经验。我们研究了在空间r树索引中合并更新的两种方法:第一种在更新时,第二种在提交时。第一种方法使用系统事务立即将更改合并到索引中,并在提交时使其对其他事务可见。第二种方法称为延迟合并方法,它延迟二级表中的更新,并仅在提交时合并索引中的更改。在真实数据集的实验中,我们比较了两种方法的性能。对于具有合理数量的更新操作的大多数事务,我们观察到延迟方法在更新操作方面明显优于立即合并方法,并且通过适当的优化可以实现相当的查询性能。
{"title":"Incorporating updates in domain indexes: experiences with Oracle spatial R-trees","authors":"K. Kanth, S. Ravada, N. An","doi":"10.1109/ICDE.2004.1320042","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320042","url":null,"abstract":"Much research has been devoted to scalable storage and retrieval techniques for domain databases such as spatial, text, XML and gene sequence data. Many efficient indexing techniques have been developed in this context. Given the improvement in the underlying technology, database applications are increasingly using domain data in transactional semantics. We examine the issue of when during the lifetime of a transaction is it better to incorporate updates in domain indexes. We present our experiences with R-tree indexes in Oracle. We examine two approaches for incorporating updates in spatial R-tree indexes: the first at update time, and the second at commit time. The first approach immediately incorporates changes in the index right away using system transactions and at commit time makes them visible to other transactions. The second approach, referred to as the deferred-incorporate approach, defers the updates in a secondary table and incorporates the changes in the index only at commit time. In experiments on real data sets, we compare the performance of the two approaches. For most transactions with reasonable number of update operations, we observe that the deferred approach outperforms the immediate-incorporate approach significantly for update operations and with appropriate optimizations achieves comparable query performance.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130061907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
XBench benchmark and performance testing of XML DBMSs XML dbms的XBench基准测试和性能测试
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320032
B. Yao, M. Tamer Özsu, Nitin Khandelwal
XML support is being added to existing database management systems (DBMSs) and native XML systems are being developed both in industry and in academia. The individual performance characteristics of these approaches as well as the relative performance of various systems is an ongoing concern. In this paper we discuss the XBench XML benchmark and report on the relative performance of various DBMSs. XBench is a family of XML benchmarks which recognizes that the XML data that DBMSs manage are quite varied and no one database schema and workload can properly capture this variety. Thus, the members of this benchmark family have been defined for capturing diverse application domains.
XML支持正在被添加到现有的数据库管理系统(dbms)中,工业界和学术界都在开发原生XML系统。这些方法的个别性能特征以及各种系统的相对性能是一个持续关注的问题。在本文中,我们讨论了XBench XML基准测试,并报告了各种dbms的相对性能。XBench是一系列XML基准测试,它认识到dbms管理的XML数据千差万别,没有一种数据库模式和工作负载能够正确捕获这种多样性。因此,这个基准系列的成员被定义为捕获不同的应用程序域。
{"title":"XBench benchmark and performance testing of XML DBMSs","authors":"B. Yao, M. Tamer Özsu, Nitin Khandelwal","doi":"10.1109/ICDE.2004.1320032","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320032","url":null,"abstract":"XML support is being added to existing database management systems (DBMSs) and native XML systems are being developed both in industry and in academia. The individual performance characteristics of these approaches as well as the relative performance of various systems is an ongoing concern. In this paper we discuss the XBench XML benchmark and report on the relative performance of various DBMSs. XBench is a family of XML benchmarks which recognizes that the XML data that DBMSs manage are quite varied and no one database schema and workload can properly capture this variety. Thus, the members of this benchmark family have been defined for capturing diverse application domains.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126778340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 148
Web service composition through declarative queries: the case of conjunctive queries with union and negation 通过声明性查询组合Web服务:具有联合和否定的联合查询的情况
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320070
Bertram Ludäscher, Alan Nash
A Web service operation can be seen as a function op: X/sub 1/,..., X/sub n/ /spl rarr/ Y/sub 1/,..., Y/sub m/ having an input message (request) with n arguments (parts), and an output message (response) with m parts. We study the problem of deciding whether a query Q is feasible, i.e., whether there exists a logically equivalent query Q' that can be executed observing the limited access patterns given by the Web service (source) relations. Executability depends on the specific syntactic form of a query, while feasibility is a more "robust" semantic notion, involving all equivalent queries (i.e., reorderings, minimized queries, etc). We show that deciding query feasibility (called "stability") is NP-complete for conjunctive queries (CQ) and for conjunctive queries with union (UCQ).
Web服务操作可以看作是一个函数op: X/sub 1/,…, X/下标n/ /spl rrr / Y/下标1/,…, Y/sub m/具有具有n个参数(部分)的输入消息(请求)和具有m个部分的输出消息(响应)。我们研究确定查询Q是否可行的问题,即,是否存在一个逻辑上等价的查询Q',该查询Q'可以通过观察Web服务(源)关系给出的有限访问模式来执行。可执行性取决于查询的特定语法形式,而可行性是一个更“健壮”的语义概念,涉及所有等价查询(即,重新排序、最小化查询等)。我们证明了决定查询可行性(称为“稳定性”)对于合取查询(CQ)和带并取的合取查询(UCQ)是np完全的。
{"title":"Web service composition through declarative queries: the case of conjunctive queries with union and negation","authors":"Bertram Ludäscher, Alan Nash","doi":"10.1109/ICDE.2004.1320070","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320070","url":null,"abstract":"A Web service operation can be seen as a function op: X/sub 1/,..., X/sub n/ /spl rarr/ Y/sub 1/,..., Y/sub m/ having an input message (request) with n arguments (parts), and an output message (response) with m parts. We study the problem of deciding whether a query Q is feasible, i.e., whether there exists a logically equivalent query Q' that can be executed observing the limited access patterns given by the Web service (source) relations. Executability depends on the specific syntactic form of a query, while feasibility is a more \"robust\" semantic notion, involving all equivalent queries (i.e., reorderings, minimized queries, etc). We show that deciding query feasibility (called \"stability\") is NP-complete for conjunctive queries (CQ) and for conjunctive queries with union (UCQ).","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126147273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
Proceedings. 20th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1