Proceedings. 20th International Conference on Data Engineering最新文献

英文中文

Querying the past, the present, and the future 查询过去、现在和将来

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320094

D. Gawlick

Database technology has done an excellent job of managing data. SQL92/99 and XML are generally considered to be powerful building blocks; these building blocks are complemented by support for Text, Images, Audio, Video, Spatial, Expressions, and other complex data structures. Database technology can also transparently manage access to data in other (remote) databases, in file systems, and in applications. Furthermore, database technology has achieved impressive operational characteristics with respect to, e.g., performance, scalability, reliability, component and site tolerance, and security.

数据库技术在管理数据方面做得非常出色。SQL92/99和XML通常被认为是功能强大的构建块;这些构建块还支持文本、图像、音频、视频、空间、表达式和其他复杂的数据结构。数据库技术还可以透明地管理对其他(远程)数据库、文件系统和应用程序中的数据的访问。此外，数据库技术在性能、可伸缩性、可靠性、组件和站点容忍度以及安全性等方面取得了令人印象深刻的操作特性。

引用次数: 6

Database research for the current millennium 当前千年的数据库研究

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320093

D. Florescu

The database world today (or better—the information world) is totally different from the peaceful days when the database research field was created. Moreover, it is in constant movement. Lets list some of the changing factors. First the Internet forever changed our lives. Then came XML as an innocent character-bycharacter UNICODE syntax, and that changed all the rules. Then Web Services arrived, invented by marketing departments in the middle of the boom, and only later taken seriously by vendor capitals and technologists. Now mobile computing and messaging are pervasive. And, finally, we see a shift in perspective due to the dramatic reduction of hardware costs.

今天的数据库世界(或更好的信息世界)与数据库研究领域刚建立时的和平时代完全不同。此外，它还在不断地运动。让我们列出一些变化的因素。首先，互联网永远地改变了我们的生活。然后出现了XML，它是一种简单的逐字符UNICODE语法，它改变了所有的规则。然后，Web Services出现了，它是由市场营销部门在繁荣中期发明的，后来才被供应商资本和技术专家认真对待。现在移动计算和信息已经普及。最后，由于硬件成本的大幅降低，我们看到了观点的转变。

引用次数: 3

LDC: enabling search by partial distance in a hyper-dimensional space LDC:允许在超维空间中按部分距离搜索

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319980

Nick Koudas, B. Ooi, Heng Tao Shen, A. Tung

Recent advances in research fields like multimedia and bioinformatics have brought about a new generation of hyper-dimensional databases which can contain hundreds or even thousands of dimensions. Such hyper-dimensional databases pose significant problems to existing high-dimensional indexing techniques which have been developed for indexing databases with (commonly) less than a hundred dimensions. To support efficient querying and retrieval on hyper-dimensional databases, we propose a methodology called local digital coding (LDC) which can support k-nearest neighbors (KNN) queries on hyper-dimensional databases and yet co-exist with ubiquitous indices, such as B+-trees. LDC extracts a simple bitmap representation called digital code(DC) for each point in the database. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the DC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between hyper-dimensional data can be avoided. Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic hyper-dimensional datasets.

多媒体和生物信息学等研究领域的最新进展带来了新一代的超维度数据库，这些数据库可以包含数百甚至数千个维度。这种超维数据库给现有的高维索引技术带来了严重的问题，高维索引技术是为(通常)少于100维的数据库开发的。为了支持在超维数据库上的高效查询和检索，我们提出了一种局部数字编码(LDC)方法，该方法既支持k近邻(KNN)查询，又能与泛在索引(如B+-树)共存。LDC为数据库中的每个点提取一个简单的位图表示，称为数字代码(DC)。KNN搜索期间的剪枝是通过动态地从DC中选择比特的一个子集来执行的，随后的比较是基于这个子集来执行的。这样，可以避免涉及计算超维数据之间l -范数距离函数的昂贵操作。大量的实验表明，我们的方法在现实生活和合成超维数据集上都比其他现有的索引方法具有显著的性能优势。

{"title":"LDC: enabling search by partial distance in a hyper-dimensional space","authors":"Nick Koudas, B. Ooi, Heng Tao Shen, A. Tung","doi":"10.1109/ICDE.2004.1319980","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319980","url":null,"abstract":"Recent advances in research fields like multimedia and bioinformatics have brought about a new generation of hyper-dimensional databases which can contain hundreds or even thousands of dimensions. Such hyper-dimensional databases pose significant problems to existing high-dimensional indexing techniques which have been developed for indexing databases with (commonly) less than a hundred dimensions. To support efficient querying and retrieval on hyper-dimensional databases, we propose a methodology called local digital coding (LDC) which can support k-nearest neighbors (KNN) queries on hyper-dimensional databases and yet co-exist with ubiquitous indices, such as B+-trees. LDC extracts a simple bitmap representation called digital code(DC) for each point in the database. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the DC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between hyper-dimensional data can be avoided. Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic hyper-dimensional datasets.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121466100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50

Simple, robust and highly concurrent b-trees with node deletion 具有节点删除功能的简单、健壮和高度并发的b树

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319981

D. Lomet

Why might B-tree concurrency control still be interesting? For two reasons: (i) currently exploited "real world" approaches are complicated; (ii) simpler proposals are not used because they are not sufficiently robust. In the "real world", systems need to deal robustly with node deletion, and this is an important reason why the currently exploited techniques are complicated. In our effort to simplify the world of robust and highly concurrent B-tree methods, we focus on exactly where B-tree concurrency control needs information about node deletes, and describe mechanisms that provide that information. We exploit the B/sup link/ -tree property of being "well-formed" even when index term posting for a node split has not been completed to greatly simplify our algorithms. Our goal is to describe a very simple but nonetheless robust method.

为什么b树并发控制仍然很有趣?有两个原因:(i)目前使用的“现实世界”方法很复杂;(ii)不采用较简单的建议，因为它们不够健壮。在“现实世界”中，系统需要健壮地处理节点删除，这是当前开发的技术复杂的重要原因。为了简化健壮且高度并发的b树方法，我们将重点关注b树并发控制在哪些地方需要有关节点删除的信息，并描述提供该信息的机制。我们利用B/sup link/ -tree的“格式良好”属性，即使节点拆分的索引项发布尚未完成，也可以大大简化我们的算法。我们的目标是描述一个非常简单但仍然健壮的方法。

引用次数: 29

A prime number labeling scheme for dynamic ordered XML trees 动态有序XML树的素数标记方案

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319985

Xiaodong Wu, M. Lee, W. Hsu

Efficient evaluation of XML queries requires the determination of whether a relationship exists between two elements. A number of labeling schemes have been designed to label the element nodes such that the relationships between nodes can be easily determined by comparing their labels. With the increased popularity of XML on the Web, finding a labeling scheme that is able to support order-sensitive queries in the presence of dynamic updates becomes urgent. We propose a new labeling scheme that take advantage of the unique property of prime numbers to meet this need. The global order of the nodes can be captured by generating simultaneous congruence values from the prime number node labels. Theoretical analysis of the label size requirements for the various labeling schemes is given. Experiment results indicate that the prime number labeling scheme is compact compared to existing dynamic labeling schemes, and provides efficient support to order-sensitive queries and updates.

对XML查询的有效评估需要确定两个元素之间是否存在关系。已经设计了许多标记方案来标记元素节点，这样可以通过比较节点之间的标签轻松确定节点之间的关系。随着XML在Web上的日益普及，找到一种能够在动态更新中支持顺序敏感查询的标记方案变得非常紧迫。我们提出了一种新的标记方案，利用质数的唯一性来满足这一需求。通过从素数节点标签生成同时的同余值，可以捕获节点的全局顺序。对各种贴标方案的标签尺寸要求进行了理论分析。实验结果表明，与现有的动态标记方案相比，该方案结构紧凑，能够有效地支持顺序敏感的查询和更新。

引用次数: 238

Unordered tree mining with applications to phylogeny 无序树挖掘及其在系统发育中的应用

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320039

D. Shasha, J. Wang, Sen Zhang

Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. We present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|/sup 2/) time where |T| is the number of nodes in T. Experimental results on synthetic data and phylogenies show the scalability and effectiveness of the proposed technique. To demonstrate the usefulness of our approach, we discuss its applications to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. We also describe extensions of our algorithms for undirected acyclic graphs (or free trees).

频繁结构挖掘(FSM)旨在发现和提取结构数据中频繁出现的模式，如树和图。FSM在生物信息学、XML处理、Web日志分析等方面有很多应用。提出了一种新的FSM技术，用于在有根的无序标记树中寻找模式。感兴趣的模式是这些树中的表兄弟对。表亲对是一对节点共享相同的父节点、相同的祖父母节点或相同的曾祖父母节点等。给定树T，我们的算法在O(|T|/sup 2/)时间内找到T的所有有趣的表兄弟对，其中|T|是T中的节点数。在合成数据和系统发育上的实验结果表明了该技术的可扩展性和有效性。为了证明我们的方法的有效性，我们讨论了它在多个进化树中定位共同发生模式的应用，评估同等简约树的一致性，以及寻找系统发生群的核树。我们还描述了对无向无环图(或自由树)算法的扩展。

引用次数: 78

Substructure clustering on sequential 3d object datasets 序列三维对象数据集的子结构聚类

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320033

Zhenqiang Tan, A. Tung

We look at substructure clustering of sequential 3d objects. A sequential 3d object is a set of points located in a three dimensional space that are linked up to form a sequence. Given a set of sequential 3d objects, our aim is to find significantly large substructures which are present in many of the sequential 3d objects. Unlike traditional subspace clustering methods in which objects are compared based on values in the same dimension, the matching dimensions between two 3d sequential objects are affected by both the translation and rotation of the objects and are thus not well defined. Instead, similarity between the objects are judge by computing a structural distance measurement call rmsd (Root Mean Square Distance) which require proper alignment (including translation and rotation) of the objects. As the computation of rmsd is expensive, we proposed a new measure call ald (Angle Length Distance) which is shown experimentally to approximate rmsd. Based on ald, we define a new clustering model called sCluster and devise an algorithm for discovering all maximum sCluster in a 3d sequential dataset. Experiments are conducted to illustrate the efficiency and effectiveness of our algorithm.

我们观察连续三维物体的子结构聚类。一个连续的3d对象是位于三维空间中的一组点，这些点连接起来形成一个序列。给定一组连续的3d对象，我们的目标是找到存在于许多连续3d对象中的显著大的子结构。传统的子空间聚类方法基于同一维度的值来比较对象，而两个三维序列对象之间的匹配维度受到对象的平移和旋转的影响，因此没有很好的定义。相反，通过计算称为rmsd(均方根距离)的结构距离测量来判断物体之间的相似性，这需要物体的适当对齐(包括平移和旋转)。由于rmsd的计算量大，我们提出了一种新的测量方法ald (Angle Length Distance，角长距离)，该方法被实验证明可以近似rmsd。在此基础上，我们定义了一种新的聚类模型sCluster，并设计了一种算法来发现三维序列数据集中所有最大的sCluster。实验验证了该算法的有效性和有效性。

{"title":"Substructure clustering on sequential 3d object datasets","authors":"Zhenqiang Tan, A. Tung","doi":"10.1109/ICDE.2004.1320033","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320033","url":null,"abstract":"We look at substructure clustering of sequential 3d objects. A sequential 3d object is a set of points located in a three dimensional space that are linked up to form a sequence. Given a set of sequential 3d objects, our aim is to find significantly large substructures which are present in many of the sequential 3d objects. Unlike traditional subspace clustering methods in which objects are compared based on values in the same dimension, the matching dimensions between two 3d sequential objects are affected by both the translation and rotation of the objects and are thus not well defined. Instead, similarity between the objects are judge by computing a structural distance measurement call rmsd (Root Mean Square Distance) which require proper alignment (including translation and rotation) of the objects. As the computation of rmsd is expensive, we proposed a new measure call ald (Angle Length Distance) which is shown experimentally to approximate rmsd. Based on ald, we define a new clustering model called sCluster and devise an algorithm for discovering all maximum sCluster in a 3d sequential dataset. Experiments are conducted to illustrate the efficiency and effectiveness of our algorithm.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114725814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Querying about the past, the present, and the future in spatio-temporal databases 在时空数据库中查询过去、现在和未来

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319997

Jimeng Sun, D. Papadias, Yufei Tao, B. Liu

Moving objects (e.g., vehicles in road networks) continuously generate large amounts of spatio-temporal information in the form of data streams. Efficient management of such streams is a challenging goal due to the highly dynamic nature of the data and the need for fast, online computations. We present a novel approach for approximate query processing about the present, past, or the future in spatio-temporal databases. In particular, we first propose an incrementally updateable, multidimensional histogram for present-time queries. Second, we develop a general architecture for maintaining and querying historical data. Third, we implement a stochastic approach for predicting the results of queries that refer to the future. Finally, we experimentally prove the effectiveness and efficiency of our techniques using a realistic simulation.

运动物体(如道路网络中的车辆)以数据流的形式不断产生大量的时空信息。由于数据的高度动态性和对快速在线计算的需求，对此类流的有效管理是一个具有挑战性的目标。我们提出了一种在时空数据库中对现在、过去或未来进行近似查询处理的新方法。特别是，我们首先为当前查询提出了一个可增量更新的多维直方图。其次，我们开发了一个维护和查询历史数据的通用架构。第三，我们实现了一种随机方法来预测涉及未来的查询的结果。最后，通过仿真实验验证了该方法的有效性和高效性。

引用次数: 110

Improving logging and recovery performance in Phoenix/App 改进了Phoenix/App的日志记录和恢复性能

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320021

R. Barga, Shimin Chen, D. Lomet

Phoenix/App supports software components whose states are made persistent across a system crash via redo recovery, replaying logged interactions. Our initial prototype force logged all request/reply events resulting from intercomponent method calls and returns. We describe an enhanced prototype that implements: (i) log optimizations to improve normal execution performance; and (ii) checkpointing to improve recovery performance. Logging is reduced in two ways: (1) we only log information required to remove nondeterminism, and we only force the log when an event "commits" the state of the component to other parts of the system; (2) we introduce new component types that provide our enhanced system with more information, enabling further reduction in logging. To improve recovery performance, we save the values of the fields of a component to the log in an application "checkpoint". We describe the system elements that we exploit for these optimizations, and characterize the performance gains that result.

Phoenix/App支持通过重做恢复、重放记录的交互，在系统崩溃时保持状态的软件组件。我们最初的原型force记录了所有由组件间方法调用和返回引起的请求/回复事件。我们描述了一个增强的原型，它实现了:(i)日志优化以提高正常的执行性能;(ii)检查点以提高恢复性能。日志记录从两个方面减少:(1)我们只记录消除不确定性所需的信息，并且我们只在事件将组件的状态“提交”给系统的其他部分时强制记录日志;(2)我们引入新的组件类型，为我们增强的系统提供更多信息，从而进一步减少日志记录。为了提高恢复性能，我们将组件字段的值保存到应用程序“检查点”的日志中。我们描述了用于这些优化的系统元素，并描述了由此产生的性能增益。

引用次数: 24

From sipping on a straw to drinking from a fire hose: data integration in a public genome database 从啜吸管到从消防水管里喝水:公共基因组数据库中的数据整合

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320050

J. Richardson, J. Kadin, J. Blake, C. Bult, J. Eppig, M. Ringwald

Biology is a vast domain. The Mouse Genome Informatics (MGI) system, which focuses on the biology of the laboratory mouse, covers only a small, carefully chosen slice. Nevertheless, we deal with data of immense variety, deep complexity, and exponentially growing volume. Our role as an integration nexus is to add value by combining data sets of diverse types and origins, eliminating redundancy and resolving conflicts. We briefly describe some of the issues we face and approaches we have adopted to the integration problem.

生物学是一个广阔的领域。小鼠基因组信息学(MGI)系统专注于实验室小鼠的生物学，只涵盖了很小的、精心挑选的部分。尽管如此，我们处理的数据种类繁多，非常复杂，而且数量呈指数级增长。我们作为集成纽带的角色是通过组合不同类型和来源的数据集来增加价值，消除冗余并解决冲突。我们简要描述了我们面临的一些问题以及我们采用的解决集成问题的方法。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. 20th International Conference on Data Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀