首页 > 最新文献

Proceedings. 20th International Conference on Data Engineering最新文献

英文 中文
Querying the past, the present, and the future 查询过去、现在和将来
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320094
D. Gawlick
Database technology has done an excellent job of managing data. SQL92/99 and XML are generally considered to be powerful building blocks; these building blocks are complemented by support for Text, Images, Audio, Video, Spatial, Expressions, and other complex data structures. Database technology can also transparently manage access to data in other (remote) databases, in file systems, and in applications. Furthermore, database technology has achieved impressive operational characteristics with respect to, e.g., performance, scalability, reliability, component and site tolerance, and security.
数据库技术在管理数据方面做得非常出色。SQL92/99和XML通常被认为是功能强大的构建块;这些构建块还支持文本、图像、音频、视频、空间、表达式和其他复杂的数据结构。数据库技术还可以透明地管理对其他(远程)数据库、文件系统和应用程序中的数据的访问。此外,数据库技术在性能、可伸缩性、可靠性、组件和站点容忍度以及安全性等方面取得了令人印象深刻的操作特性。
{"title":"Querying the past, the present, and the future","authors":"D. Gawlick","doi":"10.1109/ICDE.2004.1320094","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320094","url":null,"abstract":"Database technology has done an excellent job of managing data. SQL92/99 and XML are generally considered to be powerful building blocks; these building blocks are complemented by support for Text, Images, Audio, Video, Spatial, Expressions, and other complex data structures. Database technology can also transparently manage access to data in other (remote) databases, in file systems, and in applications. Furthermore, database technology has achieved impressive operational characteristics with respect to, e.g., performance, scalability, reliability, component and site tolerance, and security.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132642241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Database research for the current millennium 当前千年的数据库研究
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320093
D. Florescu
The database world today (or better—the information world) is totally different from the peaceful days when the database research field was created. Moreover, it is in constant movement. Lets list some of the changing factors. First the Internet forever changed our lives. Then came XML as an innocent character-bycharacter UNICODE syntax, and that changed all the rules. Then Web Services arrived, invented by marketing departments in the middle of the boom, and only later taken seriously by vendor capitals and technologists. Now mobile computing and messaging are pervasive. And, finally, we see a shift in perspective due to the dramatic reduction of hardware costs.
今天的数据库世界(或更好的信息世界)与数据库研究领域刚建立时的和平时代完全不同。此外,它还在不断地运动。让我们列出一些变化的因素。首先,互联网永远地改变了我们的生活。然后出现了XML,它是一种简单的逐字符UNICODE语法,它改变了所有的规则。然后,Web Services出现了,它是由市场营销部门在繁荣中期发明的,后来才被供应商资本和技术专家认真对待。现在移动计算和信息已经普及。最后,由于硬件成本的大幅降低,我们看到了观点的转变。
{"title":"Database research for the current millennium","authors":"D. Florescu","doi":"10.1109/ICDE.2004.1320093","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320093","url":null,"abstract":"The database world today (or better—the information world) is totally different from the peaceful days when the database research field was created. Moreover, it is in constant movement. Lets list some of the changing factors. First the Internet forever changed our lives. Then came XML as an innocent character-bycharacter UNICODE syntax, and that changed all the rules. Then Web Services arrived, invented by marketing departments in the middle of the boom, and only later taken seriously by vendor capitals and technologists. Now mobile computing and messaging are pervasive. And, finally, we see a shift in perspective due to the dramatic reduction of hardware costs.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130963078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
LDC: enabling search by partial distance in a hyper-dimensional space LDC:允许在超维空间中按部分距离搜索
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319980
Nick Koudas, B. Ooi, Heng Tao Shen, A. Tung
Recent advances in research fields like multimedia and bioinformatics have brought about a new generation of hyper-dimensional databases which can contain hundreds or even thousands of dimensions. Such hyper-dimensional databases pose significant problems to existing high-dimensional indexing techniques which have been developed for indexing databases with (commonly) less than a hundred dimensions. To support efficient querying and retrieval on hyper-dimensional databases, we propose a methodology called local digital coding (LDC) which can support k-nearest neighbors (KNN) queries on hyper-dimensional databases and yet co-exist with ubiquitous indices, such as B+-trees. LDC extracts a simple bitmap representation called digital code(DC) for each point in the database. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the DC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between hyper-dimensional data can be avoided. Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic hyper-dimensional datasets.
多媒体和生物信息学等研究领域的最新进展带来了新一代的超维度数据库,这些数据库可以包含数百甚至数千个维度。这种超维数据库给现有的高维索引技术带来了严重的问题,高维索引技术是为(通常)少于100维的数据库开发的。为了支持在超维数据库上的高效查询和检索,我们提出了一种局部数字编码(LDC)方法,该方法既支持k近邻(KNN)查询,又能与泛在索引(如B+-树)共存。LDC为数据库中的每个点提取一个简单的位图表示,称为数字代码(DC)。KNN搜索期间的剪枝是通过动态地从DC中选择比特的一个子集来执行的,随后的比较是基于这个子集来执行的。这样,可以避免涉及计算超维数据之间l -范数距离函数的昂贵操作。大量的实验表明,我们的方法在现实生活和合成超维数据集上都比其他现有的索引方法具有显著的性能优势。
{"title":"LDC: enabling search by partial distance in a hyper-dimensional space","authors":"Nick Koudas, B. Ooi, Heng Tao Shen, A. Tung","doi":"10.1109/ICDE.2004.1319980","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319980","url":null,"abstract":"Recent advances in research fields like multimedia and bioinformatics have brought about a new generation of hyper-dimensional databases which can contain hundreds or even thousands of dimensions. Such hyper-dimensional databases pose significant problems to existing high-dimensional indexing techniques which have been developed for indexing databases with (commonly) less than a hundred dimensions. To support efficient querying and retrieval on hyper-dimensional databases, we propose a methodology called local digital coding (LDC) which can support k-nearest neighbors (KNN) queries on hyper-dimensional databases and yet co-exist with ubiquitous indices, such as B+-trees. LDC extracts a simple bitmap representation called digital code(DC) for each point in the database. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the DC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between hyper-dimensional data can be avoided. Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic hyper-dimensional datasets.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121466100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Unordered tree mining with applications to phylogeny 无序树挖掘及其在系统发育中的应用
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320039
D. Shasha, J. Wang, Sen Zhang
Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. We present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|/sup 2/) time where |T| is the number of nodes in T. Experimental results on synthetic data and phylogenies show the scalability and effectiveness of the proposed technique. To demonstrate the usefulness of our approach, we discuss its applications to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. We also describe extensions of our algorithms for undirected acyclic graphs (or free trees).
频繁结构挖掘(FSM)旨在发现和提取结构数据中频繁出现的模式,如树和图。FSM在生物信息学、XML处理、Web日志分析等方面有很多应用。提出了一种新的FSM技术,用于在有根的无序标记树中寻找模式。感兴趣的模式是这些树中的表兄弟对。表亲对是一对节点共享相同的父节点、相同的祖父母节点或相同的曾祖父母节点等。给定树T,我们的算法在O(|T|/sup 2/)时间内找到T的所有有趣的表兄弟对,其中|T|是T中的节点数。在合成数据和系统发育上的实验结果表明了该技术的可扩展性和有效性。为了证明我们的方法的有效性,我们讨论了它在多个进化树中定位共同发生模式的应用,评估同等简约树的一致性,以及寻找系统发生群的核树。我们还描述了对无向无环图(或自由树)算法的扩展。
{"title":"Unordered tree mining with applications to phylogeny","authors":"D. Shasha, J. Wang, Sen Zhang","doi":"10.1109/ICDE.2004.1320039","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320039","url":null,"abstract":"Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. We present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|/sup 2/) time where |T| is the number of nodes in T. Experimental results on synthetic data and phylogenies show the scalability and effectiveness of the proposed technique. To demonstrate the usefulness of our approach, we discuss its applications to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. We also describe extensions of our algorithms for undirected acyclic graphs (or free trees).","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116250422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
A prime number labeling scheme for dynamic ordered XML trees 动态有序XML树的素数标记方案
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319985
Xiaodong Wu, M. Lee, W. Hsu
Efficient evaluation of XML queries requires the determination of whether a relationship exists between two elements. A number of labeling schemes have been designed to label the element nodes such that the relationships between nodes can be easily determined by comparing their labels. With the increased popularity of XML on the Web, finding a labeling scheme that is able to support order-sensitive queries in the presence of dynamic updates becomes urgent. We propose a new labeling scheme that take advantage of the unique property of prime numbers to meet this need. The global order of the nodes can be captured by generating simultaneous congruence values from the prime number node labels. Theoretical analysis of the label size requirements for the various labeling schemes is given. Experiment results indicate that the prime number labeling scheme is compact compared to existing dynamic labeling schemes, and provides efficient support to order-sensitive queries and updates.
对XML查询的有效评估需要确定两个元素之间是否存在关系。已经设计了许多标记方案来标记元素节点,这样可以通过比较节点之间的标签轻松确定节点之间的关系。随着XML在Web上的日益普及,找到一种能够在动态更新中支持顺序敏感查询的标记方案变得非常紧迫。我们提出了一种新的标记方案,利用质数的唯一性来满足这一需求。通过从素数节点标签生成同时的同余值,可以捕获节点的全局顺序。对各种贴标方案的标签尺寸要求进行了理论分析。实验结果表明,与现有的动态标记方案相比,该方案结构紧凑,能够有效地支持顺序敏感的查询和更新。
{"title":"A prime number labeling scheme for dynamic ordered XML trees","authors":"Xiaodong Wu, M. Lee, W. Hsu","doi":"10.1109/ICDE.2004.1319985","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319985","url":null,"abstract":"Efficient evaluation of XML queries requires the determination of whether a relationship exists between two elements. A number of labeling schemes have been designed to label the element nodes such that the relationships between nodes can be easily determined by comparing their labels. With the increased popularity of XML on the Web, finding a labeling scheme that is able to support order-sensitive queries in the presence of dynamic updates becomes urgent. We propose a new labeling scheme that take advantage of the unique property of prime numbers to meet this need. The global order of the nodes can be captured by generating simultaneous congruence values from the prime number node labels. Theoretical analysis of the label size requirements for the various labeling schemes is given. Experiment results indicate that the prime number labeling scheme is compact compared to existing dynamic labeling schemes, and provides efficient support to order-sensitive queries and updates.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122483026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 238
Querying about the past, the present, and the future in spatio-temporal databases 在时空数据库中查询过去、现在和未来
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319997
Jimeng Sun, D. Papadias, Yufei Tao, B. Liu
Moving objects (e.g., vehicles in road networks) continuously generate large amounts of spatio-temporal information in the form of data streams. Efficient management of such streams is a challenging goal due to the highly dynamic nature of the data and the need for fast, online computations. We present a novel approach for approximate query processing about the present, past, or the future in spatio-temporal databases. In particular, we first propose an incrementally updateable, multidimensional histogram for present-time queries. Second, we develop a general architecture for maintaining and querying historical data. Third, we implement a stochastic approach for predicting the results of queries that refer to the future. Finally, we experimentally prove the effectiveness and efficiency of our techniques using a realistic simulation.
运动物体(如道路网络中的车辆)以数据流的形式不断产生大量的时空信息。由于数据的高度动态性和对快速在线计算的需求,对此类流的有效管理是一个具有挑战性的目标。我们提出了一种在时空数据库中对现在、过去或未来进行近似查询处理的新方法。特别是,我们首先为当前查询提出了一个可增量更新的多维直方图。其次,我们开发了一个维护和查询历史数据的通用架构。第三,我们实现了一种随机方法来预测涉及未来的查询的结果。最后,通过仿真实验验证了该方法的有效性和高效性。
{"title":"Querying about the past, the present, and the future in spatio-temporal databases","authors":"Jimeng Sun, D. Papadias, Yufei Tao, B. Liu","doi":"10.1109/ICDE.2004.1319997","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319997","url":null,"abstract":"Moving objects (e.g., vehicles in road networks) continuously generate large amounts of spatio-temporal information in the form of data streams. Efficient management of such streams is a challenging goal due to the highly dynamic nature of the data and the need for fast, online computations. We present a novel approach for approximate query processing about the present, past, or the future in spatio-temporal databases. In particular, we first propose an incrementally updateable, multidimensional histogram for present-time queries. Second, we develop a general architecture for maintaining and querying historical data. Third, we implement a stochastic approach for predicting the results of queries that refer to the future. Finally, we experimentally prove the effectiveness and efficiency of our techniques using a realistic simulation.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121593350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 110
Simple, robust and highly concurrent b-trees with node deletion 具有节点删除功能的简单、健壮和高度并发的b树
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319981
D. Lomet
Why might B-tree concurrency control still be interesting? For two reasons: (i) currently exploited "real world" approaches are complicated; (ii) simpler proposals are not used because they are not sufficiently robust. In the "real world", systems need to deal robustly with node deletion, and this is an important reason why the currently exploited techniques are complicated. In our effort to simplify the world of robust and highly concurrent B-tree methods, we focus on exactly where B-tree concurrency control needs information about node deletes, and describe mechanisms that provide that information. We exploit the B/sup link/ -tree property of being "well-formed" even when index term posting for a node split has not been completed to greatly simplify our algorithms. Our goal is to describe a very simple but nonetheless robust method.
为什么b树并发控制仍然很有趣?有两个原因:(i)目前使用的“现实世界”方法很复杂;(ii)不采用较简单的建议,因为它们不够健壮。在“现实世界”中,系统需要健壮地处理节点删除,这是当前开发的技术复杂的重要原因。为了简化健壮且高度并发的b树方法,我们将重点关注b树并发控制在哪些地方需要有关节点删除的信息,并描述提供该信息的机制。我们利用B/sup link/ -tree的“格式良好”属性,即使节点拆分的索引项发布尚未完成,也可以大大简化我们的算法。我们的目标是描述一个非常简单但仍然健壮的方法。
{"title":"Simple, robust and highly concurrent b-trees with node deletion","authors":"D. Lomet","doi":"10.1109/ICDE.2004.1319981","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319981","url":null,"abstract":"Why might B-tree concurrency control still be interesting? For two reasons: (i) currently exploited \"real world\" approaches are complicated; (ii) simpler proposals are not used because they are not sufficiently robust. In the \"real world\", systems need to deal robustly with node deletion, and this is an important reason why the currently exploited techniques are complicated. In our effort to simplify the world of robust and highly concurrent B-tree methods, we focus on exactly where B-tree concurrency control needs information about node deletes, and describe mechanisms that provide that information. We exploit the B/sup link/ -tree property of being \"well-formed\" even when index term posting for a node split has not been completed to greatly simplify our algorithms. Our goal is to describe a very simple but nonetheless robust method.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"254 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114545797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
From sipping on a straw to drinking from a fire hose: data integration in a public genome database 从啜吸管到从消防水管里喝水:公共基因组数据库中的数据整合
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320050
J. Richardson, J. Kadin, J. Blake, C. Bult, J. Eppig, M. Ringwald
Biology is a vast domain. The Mouse Genome Informatics (MGI) system, which focuses on the biology of the laboratory mouse, covers only a small, carefully chosen slice. Nevertheless, we deal with data of immense variety, deep complexity, and exponentially growing volume. Our role as an integration nexus is to add value by combining data sets of diverse types and origins, eliminating redundancy and resolving conflicts. We briefly describe some of the issues we face and approaches we have adopted to the integration problem.
生物学是一个广阔的领域。小鼠基因组信息学(MGI)系统专注于实验室小鼠的生物学,只涵盖了很小的、精心挑选的部分。尽管如此,我们处理的数据种类繁多,非常复杂,而且数量呈指数级增长。我们作为集成纽带的角色是通过组合不同类型和来源的数据集来增加价值,消除冗余并解决冲突。我们简要描述了我们面临的一些问题以及我们采用的解决集成问题的方法。
{"title":"From sipping on a straw to drinking from a fire hose: data integration in a public genome database","authors":"J. Richardson, J. Kadin, J. Blake, C. Bult, J. Eppig, M. Ringwald","doi":"10.1109/ICDE.2004.1320050","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320050","url":null,"abstract":"Biology is a vast domain. The Mouse Genome Informatics (MGI) system, which focuses on the biology of the laboratory mouse, covers only a small, carefully chosen slice. Nevertheless, we deal with data of immense variety, deep complexity, and exponentially growing volume. Our role as an integration nexus is to add value by combining data sets of diverse types and origins, eliminating redundancy and resolving conflicts. We briefly describe some of the issues we face and approaches we have adopted to the integration problem.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127834229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Substructure clustering on sequential 3d object datasets 序列三维对象数据集的子结构聚类
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320033
Zhenqiang Tan, A. Tung
We look at substructure clustering of sequential 3d objects. A sequential 3d object is a set of points located in a three dimensional space that are linked up to form a sequence. Given a set of sequential 3d objects, our aim is to find significantly large substructures which are present in many of the sequential 3d objects. Unlike traditional subspace clustering methods in which objects are compared based on values in the same dimension, the matching dimensions between two 3d sequential objects are affected by both the translation and rotation of the objects and are thus not well defined. Instead, similarity between the objects are judge by computing a structural distance measurement call rmsd (Root Mean Square Distance) which require proper alignment (including translation and rotation) of the objects. As the computation of rmsd is expensive, we proposed a new measure call ald (Angle Length Distance) which is shown experimentally to approximate rmsd. Based on ald, we define a new clustering model called sCluster and devise an algorithm for discovering all maximum sCluster in a 3d sequential dataset. Experiments are conducted to illustrate the efficiency and effectiveness of our algorithm.
我们观察连续三维物体的子结构聚类。一个连续的3d对象是位于三维空间中的一组点,这些点连接起来形成一个序列。给定一组连续的3d对象,我们的目标是找到存在于许多连续3d对象中的显著大的子结构。传统的子空间聚类方法基于同一维度的值来比较对象,而两个三维序列对象之间的匹配维度受到对象的平移和旋转的影响,因此没有很好的定义。相反,通过计算称为rmsd(均方根距离)的结构距离测量来判断物体之间的相似性,这需要物体的适当对齐(包括平移和旋转)。由于rmsd的计算量大,我们提出了一种新的测量方法ald (Angle Length Distance,角长距离),该方法被实验证明可以近似rmsd。在此基础上,我们定义了一种新的聚类模型sCluster,并设计了一种算法来发现三维序列数据集中所有最大的sCluster。实验验证了该算法的有效性和有效性。
{"title":"Substructure clustering on sequential 3d object datasets","authors":"Zhenqiang Tan, A. Tung","doi":"10.1109/ICDE.2004.1320033","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320033","url":null,"abstract":"We look at substructure clustering of sequential 3d objects. A sequential 3d object is a set of points located in a three dimensional space that are linked up to form a sequence. Given a set of sequential 3d objects, our aim is to find significantly large substructures which are present in many of the sequential 3d objects. Unlike traditional subspace clustering methods in which objects are compared based on values in the same dimension, the matching dimensions between two 3d sequential objects are affected by both the translation and rotation of the objects and are thus not well defined. Instead, similarity between the objects are judge by computing a structural distance measurement call rmsd (Root Mean Square Distance) which require proper alignment (including translation and rotation) of the objects. As the computation of rmsd is expensive, we proposed a new measure call ald (Angle Length Distance) which is shown experimentally to approximate rmsd. Based on ald, we define a new clustering model called sCluster and devise an algorithm for discovering all maximum sCluster in a 3d sequential dataset. Experiments are conducted to illustrate the efficiency and effectiveness of our algorithm.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114725814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A machine learning approach to rapid development of XML mapping queries 一种快速开发XML映射查询的机器学习方法
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320004
Atsuyuki Morishima, H. Kitagawa, Akira Matsumoto
We present XLearner, a novel tool that helps the rapid development of XML mapping queries written in XQuery. XLearner is novel in that it learns XQuery queries consistent with given examples (fragments) of intended query results. XLearner combines known learning techniques, incorporates mechanisms to cope with issues specific to the XQuery learning context, and provides a systematic way for the semiautomatic development of queries. We describe the XLearner system. It presents algorithms for learning various classes of XQuery, shows that a minor extension gives the system a practical expressive power, and reports experimental results to demonstrate how XLearner outputs reasonably complicated queries with only a small number of interactions with the user.
我们介绍XLearner,这是一种帮助快速开发用XQuery编写的XML映射查询的新工具。XLearner的新颖之处在于,它学习与预期查询结果的给定示例(片段)一致的XQuery查询。XLearner结合了已知的学习技术,结合了一些机制来处理特定于XQuery学习上下文的问题,并为查询的半自动开发提供了一种系统的方法。我们来描述一下XLearner系统。本文介绍了用于学习各种XQuery类的算法,展示了一个小扩展为系统提供了实用的表达能力,并报告了实验结果,以演示XLearner如何仅与用户进行少量交互就输出相当复杂的查询。
{"title":"A machine learning approach to rapid development of XML mapping queries","authors":"Atsuyuki Morishima, H. Kitagawa, Akira Matsumoto","doi":"10.1109/ICDE.2004.1320004","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320004","url":null,"abstract":"We present XLearner, a novel tool that helps the rapid development of XML mapping queries written in XQuery. XLearner is novel in that it learns XQuery queries consistent with given examples (fragments) of intended query results. XLearner combines known learning techniques, incorporates mechanisms to cope with issues specific to the XQuery learning context, and provides a systematic way for the semiautomatic development of queries. We describe the XLearner system. It presents algorithms for learning various classes of XQuery, shows that a minor extension gives the system a practical expressive power, and reports experimental results to demonstrate how XLearner outputs reasonably complicated queries with only a small number of interactions with the user.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131383335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
Proceedings. 20th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1