首页 > 最新文献

21st International Conference on Data Engineering (ICDE'05)最新文献

英文 中文
Knowledge discovery from transportation network data 从交通网络数据中发现知识
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.82
Wei Jiang, Jaideep Vaidya, Zahir Balaporia, Chris Clifton, Brett Banich
Transportation and logistics are a major sector of the economy, however data analysis in this domain has remained largely in the province of optimization. The potential of data mining and knowledge discovery techniques is largely untapped. Transportation networks are naturally represented as graphs. This paper explores the problems in mining of transportation network graphs: we hope to find how current techniques both succeed and fail on this problem, and from the failures, we hope to present new challenges for data mining. Experimental results from applying both existing graph mining and conventional data mining techniques to real transportation network data are provided, including new approaches to making these techniques applicable to the problems. Reasons why these techniques are not appropriate are discussed. We also suggest several challenging problems to precipitate research and galvanize future work in this area.
运输和物流是经济的一个主要部门,但该领域的数据分析在很大程度上仍停留在优化领域。数据挖掘和知识发现技术的潜力在很大程度上尚未开发。交通网络自然地被表示为图形。本文探讨了交通网络图挖掘中的问题:我们希望找到当前技术在这个问题上的成功和失败,并从失败中,我们希望为数据挖掘提出新的挑战。将现有的图挖掘技术和传统的数据挖掘技术应用于实际交通网络数据的实验结果,包括使这些技术适用于问题的新方法。讨论了这些技术不合适的原因。我们还提出了几个具有挑战性的问题,以促进该领域的研究和激励未来的工作。
{"title":"Knowledge discovery from transportation network data","authors":"Wei Jiang, Jaideep Vaidya, Zahir Balaporia, Chris Clifton, Brett Banich","doi":"10.1109/ICDE.2005.82","DOIUrl":"https://doi.org/10.1109/ICDE.2005.82","url":null,"abstract":"Transportation and logistics are a major sector of the economy, however data analysis in this domain has remained largely in the province of optimization. The potential of data mining and knowledge discovery techniques is largely untapped. Transportation networks are naturally represented as graphs. This paper explores the problems in mining of transportation network graphs: we hope to find how current techniques both succeed and fail on this problem, and from the failures, we hope to present new challenges for data mining. Experimental results from applying both existing graph mining and conventional data mining techniques to real transportation network data are provided, including new approaches to making these techniques applicable to the problems. Reasons why these techniques are not appropriate are discussed. We also suggest several challenging problems to precipitate research and galvanize future work in this area.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127182223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Dynamic load management for distributed continuous query systems 分布式连续查询系统的动态负载管理
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.54
Yongluan Zhou, B. Ooi, K. Tan
A distributed stream processing system must adapt to changes in environment parameters and servers' load. We believe a dynamic load management scheme is indispensable for the system to be scalable. In particular, we expect aggressive methods such as query operator migration during runtime to bring long term benefit (especially for long running continuous queries) even though they may incur some short term overhead. However, to date few complete and practical solutions have been proposed for this problem. In this paper, we offer our solution to the problem. More specifically we make the following contributions: We formally define a new metric, performance ratio (PR), to measure the relative performance of each query and the objective for the whole system. By building a new cost model, we identify the heuristics that can be used to approach the objective. We propose a complete and practical distributed load management scheme, which includes a static initial placement scheme for newly, initiated queries as well as a runtime dynamic scheme. We conducted an extensive experimental study that shows the effectiveness of our technique.
分布式流处理系统必须适应环境参数和服务器负载的变化。我们认为动态负载管理方案对于系统的可扩展性是必不可少的。特别是,我们期望诸如运行时期间的查询操作符迁移之类的激进方法能够带来长期的好处(特别是对于长时间运行的连续查询),尽管它们可能会产生一些短期开销。然而,迄今为止,针对这一问题提出的完整和实用的解决方案很少。本文提出了解决这一问题的方法。更具体地说,我们做出了以下贡献:我们正式定义了一个新的度量,性能比率(PR),用于度量每个查询的相对性能和整个系统的目标。通过建立一个新的成本模型,我们确定了可用于实现目标的启发式方法。我们提出了一个完整而实用的分布式负载管理方案,其中包括一个用于新发起查询的静态初始放置方案以及一个运行时动态方案。我们进行了广泛的实验研究,证明了我们技术的有效性。
{"title":"Dynamic load management for distributed continuous query systems","authors":"Yongluan Zhou, B. Ooi, K. Tan","doi":"10.1109/ICDE.2005.54","DOIUrl":"https://doi.org/10.1109/ICDE.2005.54","url":null,"abstract":"A distributed stream processing system must adapt to changes in environment parameters and servers' load. We believe a dynamic load management scheme is indispensable for the system to be scalable. In particular, we expect aggressive methods such as query operator migration during runtime to bring long term benefit (especially for long running continuous queries) even though they may incur some short term overhead. However, to date few complete and practical solutions have been proposed for this problem. In this paper, we offer our solution to the problem. More specifically we make the following contributions: We formally define a new metric, performance ratio (PR), to measure the relative performance of each query and the objective for the whole system. By building a new cost model, we identify the heuristics that can be used to approach the objective. We propose a complete and practical distributed load management scheme, which includes a static initial placement scheme for newly, initiated queries as well as a runtime dynamic scheme. We conducted an extensive experimental study that shows the effectiveness of our technique.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"415 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124161594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
A unified framework for monitoring data streams in real time 实时监控数据流的统一框架
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.13
A. Bulut, Ambuj K. Singh
Online monitoring of data streams poses a challenge in many data-centric applications, such as telecommunications networks, traffic management, trend-related analysis, Web-click streams, intrusion detection, and sensor networks. Mining techniques employed in these applications have to be efficient in terms of space usage and per-item processing time while providing a high quality of answers to (1) aggregate monitoring queries, such as finding surprising levels of a data stream, detecting bursts, and to (2) similarity queries, such as detecting correlations and finding interesting patterns. The most important aspect of these tasks is their need for flexible query lengths, i.e., it is difficult to set the appropriate lengths a priori. For example, bursts of events can occur at variable temporal modalities from hours to days to weeks. Correlated trends can occur at various temporal scales. The system has to discover "interesting" behavior online and monitor over flexible window sizes. In this paper, we propose a multi-resolution indexing scheme, which handles variable length queries efficiently. We demonstrate the effectiveness of our framework over existing techniques through an extensive set of experiments.
数据流的在线监控在许多以数据为中心的应用程序中提出了挑战,例如电信网络、流量管理、趋势相关分析、web点击流、入侵检测和传感器网络。在这些应用程序中使用的挖掘技术必须在空间使用和每项处理时间方面是高效的,同时为(1)聚合监视查询提供高质量的答案,例如查找数据流的惊人级别、检测突发,以及(2)相似性查询,例如检测相关性和查找有趣的模式。这些任务最重要的方面是它们需要灵活的查询长度,也就是说,很难预先设置适当的长度。例如,突发事件可能以不同的时间模式发生,从数小时到数天到数周不等。相关趋势可以出现在不同的时间尺度上。该系统必须在网上发现“有趣”的行为,并监控灵活的窗口大小。在本文中,我们提出了一种多分辨率索引方案,可以有效地处理变长度查询。我们通过一系列广泛的实验证明了我们的框架优于现有技术的有效性。
{"title":"A unified framework for monitoring data streams in real time","authors":"A. Bulut, Ambuj K. Singh","doi":"10.1109/ICDE.2005.13","DOIUrl":"https://doi.org/10.1109/ICDE.2005.13","url":null,"abstract":"Online monitoring of data streams poses a challenge in many data-centric applications, such as telecommunications networks, traffic management, trend-related analysis, Web-click streams, intrusion detection, and sensor networks. Mining techniques employed in these applications have to be efficient in terms of space usage and per-item processing time while providing a high quality of answers to (1) aggregate monitoring queries, such as finding surprising levels of a data stream, detecting bursts, and to (2) similarity queries, such as detecting correlations and finding interesting patterns. The most important aspect of these tasks is their need for flexible query lengths, i.e., it is difficult to set the appropriate lengths a priori. For example, bursts of events can occur at variable temporal modalities from hours to days to weeks. Correlated trends can occur at various temporal scales. The system has to discover \"interesting\" behavior online and monitor over flexible window sizes. In this paper, we propose a multi-resolution indexing scheme, which handles variable length queries efficiently. We demonstrate the effectiveness of our framework over existing techniques through an extensive set of experiments.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123706951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Top-down specialization for information and privacy preservation 自上而下的信息和隐私保护专业化
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.143
B. Fung, Ke Wang, Philip S. Yu
Releasing person-specific data in its most specific state poses a threat to individual privacy. This paper presents a practical and efficient algorithm for determining a generalized version of data that masks sensitive information and remains useful for modelling classification. The generalization of data is implemented by specializing or detailing the level of information in a top-down manner until a minimum privacy requirement is violated. This top-down specialization is natural and efficient for handling both categorical and continuous attributes. Our approach exploits the fact that data usually contains redundant structures for classification. While generalization may eliminate some structures, other structures emerge to help. Our results show that quality of classification can be preserved even for highly restrictive privacy requirements. This work has great applicability to both public and private sectors that share information for mutual benefits and productivity.
以最特定的状态发布个人数据对个人隐私构成了威胁。本文提出了一种实用而有效的算法,用于确定数据的广义版本,该版本掩盖了敏感信息,并且仍然对建模分类有用。数据的泛化是通过以自顶向下的方式专门化或详细化信息级别来实现的,直到违反了最低隐私要求。这种自顶向下的专门化对于处理分类属性和连续属性都是自然而有效的。我们的方法利用了这样一个事实:数据通常包含用于分类的冗余结构。虽然泛化可能会消除一些结构,但其他结构会有所帮助。我们的结果表明,即使在高度严格的隐私要求下,分类质量也可以保持不变。这项工作对共享信息以实现互利和提高生产力的公共和私营部门都有很大的适用性。
{"title":"Top-down specialization for information and privacy preservation","authors":"B. Fung, Ke Wang, Philip S. Yu","doi":"10.1109/ICDE.2005.143","DOIUrl":"https://doi.org/10.1109/ICDE.2005.143","url":null,"abstract":"Releasing person-specific data in its most specific state poses a threat to individual privacy. This paper presents a practical and efficient algorithm for determining a generalized version of data that masks sensitive information and remains useful for modelling classification. The generalization of data is implemented by specializing or detailing the level of information in a top-down manner until a minimum privacy requirement is violated. This top-down specialization is natural and efficient for handling both categorical and continuous attributes. Our approach exploits the fact that data usually contains redundant structures for classification. While generalization may eliminate some structures, other structures emerge to help. Our results show that quality of classification can be preserved even for highly restrictive privacy requirements. This work has great applicability to both public and private sectors that share information for mutual benefits and productivity.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121639213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 681
Evaluation of spatio-temporal predicates on moving objects 运动物体时空谓词的评价
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.62
Markus Schneider
Moving objects databases managing spatial objects with continuously changing position and extent over time have recently found large interest in the database community. Queries about moving objects become particularly interesting when they ask for temporal changes in the topological relationships between evolving spatial objects. A concept of spatio-temporal predicates has been proposed to describe these relationships. The goal of this paper is to design efficient algorithms for them so that they can be used in spatio-temporal joins and selections. This paper proposes not to design an algorithm for each new predicate individually but to employ a generic algorithmic scheme, which is able to cover present and future predicate definitions.
移动对象数据库管理随着时间不断变化的位置和范围的空间对象最近引起了数据库社区的极大兴趣。关于移动对象的查询,当它们要求在不断发展的空间对象之间的拓扑关系的时间变化时,变得特别有趣。提出了时空谓词的概念来描述这些关系。本文的目标是为它们设计有效的算法,使它们能够用于时空连接和选择。本文建议不为每个新谓词单独设计算法,而是采用通用算法方案,该方案能够涵盖当前和未来的谓词定义。
{"title":"Evaluation of spatio-temporal predicates on moving objects","authors":"Markus Schneider","doi":"10.1109/ICDE.2005.62","DOIUrl":"https://doi.org/10.1109/ICDE.2005.62","url":null,"abstract":"Moving objects databases managing spatial objects with continuously changing position and extent over time have recently found large interest in the database community. Queries about moving objects become particularly interesting when they ask for temporal changes in the topological relationships between evolving spatial objects. A concept of spatio-temporal predicates has been proposed to describe these relationships. The goal of this paper is to design efficient algorithms for them so that they can be used in spatio-temporal joins and selections. This paper proposes not to design an algorithm for each new predicate individually but to employ a generic algorithmic scheme, which is able to cover present and future predicate definitions.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131535184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Mining closed relational graphs with connectivity constraints 挖掘具有连接约束的封闭关系图
Pub Date : 2005-04-05 DOI: 10.1145/1081870.1081908
Xifeng Yan, X. Zhou, Jiawei Han
Relational graphs are widely used in modeling large scale networks such as biological networks and social networks. In a relational graph, each node represents a distinct entity while each edge represents a relationship between entities. Various algorithms were developed to discover interesting patterns from a single relational graph (Z. Wu et al., 1993). However, little attention has been paid to the patterns that are hidden in multiple relational graphs. One interesting pattern in relational graphs is frequent highly connected subgraph which can identify recurrent groups and clusters. In social networks, this kind of pattern corresponds to communities where people are strongly associated. For example, if several researchers co-author some papers, attend the same conferences, and refer their works from each other, it strongly indicates that they are studying the same research theme.
关系图广泛应用于生物网络和社会网络等大型网络的建模。在关系图中,每个节点表示一个不同的实体,而每个边表示实体之间的关系。开发了各种算法来从单个关系图中发现有趣的模式(Z. Wu et al., 1993)。然而,很少有人关注隐藏在多个关系图中的模式。关系图中一个有趣的模式是频繁高连通子图,它可以识别出循环的群和聚类。在社交网络中,这种模式对应于人们联系紧密的社区。例如,如果几个研究人员共同撰写了一些论文,参加了相同的会议,并相互引用了他们的作品,这强烈表明他们正在研究相同的研究主题。
{"title":"Mining closed relational graphs with connectivity constraints","authors":"Xifeng Yan, X. Zhou, Jiawei Han","doi":"10.1145/1081870.1081908","DOIUrl":"https://doi.org/10.1145/1081870.1081908","url":null,"abstract":"Relational graphs are widely used in modeling large scale networks such as biological networks and social networks. In a relational graph, each node represents a distinct entity while each edge represents a relationship between entities. Various algorithms were developed to discover interesting patterns from a single relational graph (Z. Wu et al., 1993). However, little attention has been paid to the patterns that are hidden in multiple relational graphs. One interesting pattern in relational graphs is frequent highly connected subgraph which can identify recurrent groups and clusters. In social networks, this kind of pattern corresponds to communities where people are strongly associated. For example, if several researchers co-author some papers, attend the same conferences, and refer their works from each other, it strongly indicates that they are studying the same research theme.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131333764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 163
Representing and querying data transformations 表示和查询数据转换
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.123
Yannis Velegrakis, Renée J. Miller, J. Mylopoulos
Modern information systems often store data that has been transformed and integrated from a variety of sources. This integration may obscure the original source semantics of data items. For many tasks, it is important to be able to determine not only where data items originated, but also why they appear in the integration as they do and through what transformation they were derived. This problem is known as data provenance. In this work, we consider data provenance at the schema and mapping level. In particular, we consider how to answer questions such as "what schema elements in the source(s) contributed to this value", or "through what transformations or mappings was this value derived?" Towards this end, we elevate schemas and mappings to first-class citizens that are stored in a repository and are associated with the actual data values. An extended query language, called MXQL, is also developed that allows meta-data to be queried as regular data and we describe its implementation scenario.
现代信息系统通常存储从各种来源转换和集成的数据。这种集成可能会模糊数据项的原始源语义。对于许多任务,重要的是不仅要能够确定数据项的来源,还要能够确定它们出现在集成中的原因,以及它们是通过什么转换派生出来的。这个问题被称为数据来源。在这项工作中,我们在模式和映射级别考虑数据来源。特别是,我们考虑如何回答诸如“源中的哪些模式元素促成了这个值”,或者“通过哪些转换或映射派生了这个值?”为此,我们将模式和映射提升为存储在存储库中并与实际数据值相关联的一等公民。还开发了一种称为MXQL的扩展查询语言,它允许将元数据作为常规数据进行查询,我们描述了它的实现场景。
{"title":"Representing and querying data transformations","authors":"Yannis Velegrakis, Renée J. Miller, J. Mylopoulos","doi":"10.1109/ICDE.2005.123","DOIUrl":"https://doi.org/10.1109/ICDE.2005.123","url":null,"abstract":"Modern information systems often store data that has been transformed and integrated from a variety of sources. This integration may obscure the original source semantics of data items. For many tasks, it is important to be able to determine not only where data items originated, but also why they appear in the integration as they do and through what transformation they were derived. This problem is known as data provenance. In this work, we consider data provenance at the schema and mapping level. In particular, we consider how to answer questions such as \"what schema elements in the source(s) contributed to this value\", or \"through what transformations or mappings was this value derived?\" Towards this end, we elevate schemas and mappings to first-class citizens that are stored in a repository and are associated with the actual data values. An extended query language, called MXQL, is also developed that allows meta-data to be queried as regular data and we describe its implementation scenario.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124633688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Distributed/heterogeneous query processing in Microsoft SQL server Microsoft SQL server中的分布式/异构查询处理
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.51
J. Blakeley, Conor Cunningham, Nigel Ellis, Balaji Rathakrishnan, Ming Wu
This paper presents an architecture overview of the distributed, heterogeneous query processor (DHQP) in the Microsoft SQL server database system to enable queries over a large collection of diverse data sources. The paper highlights three salient aspects of the architecture. First, the system introduces well-defined abstractions such as connections, commands, and rowsets that enable sources to plug into the system. These abstractions are formalized by the OLE DB data access interfaces. The generality of OLE DB and its broad industry adoption enables our system to reach a very large collection of diverse data sources ranging from personal productivity tools, to database management systems, to file system data. Second, the DHQP is built-in to the relational optimizer and execution engine of the system. This enables DH queries and updates to benefit from the cost-based algebraic transformations and execution strategies available in the system. Finally, the architecture is inherently extensible to support new data sources as they emerge as well as serves as a key extensibility point for the relational engine to add new features such as full-text search and distributed partitioned views.
本文介绍了Microsoft SQL server数据库系统中分布式异构查询处理器(DHQP)的体系结构概述,以支持对大量不同数据源的查询。本文强调了该体系结构的三个突出方面。首先,系统引入定义良好的抽象,如连接、命令和行集,使源能够插入系统。这些抽象通过OLE DB数据访问接口形式化。OLE DB的通用性及其广泛的行业采用使我们的系统能够访问各种数据源的非常大的集合,从个人生产力工具到数据库管理系统,再到文件系统数据。其次,DHQP内置于系统的关系优化器和执行引擎中。这使得DH查询和更新可以从系统中可用的基于成本的代数转换和执行策略中获益。最后,该体系结构具有固有的可扩展性,可以在新数据源出现时支持它们,还可以作为关系引擎添加新特性(如全文搜索和分布式分区视图)的关键扩展点。
{"title":"Distributed/heterogeneous query processing in Microsoft SQL server","authors":"J. Blakeley, Conor Cunningham, Nigel Ellis, Balaji Rathakrishnan, Ming Wu","doi":"10.1109/ICDE.2005.51","DOIUrl":"https://doi.org/10.1109/ICDE.2005.51","url":null,"abstract":"This paper presents an architecture overview of the distributed, heterogeneous query processor (DHQP) in the Microsoft SQL server database system to enable queries over a large collection of diverse data sources. The paper highlights three salient aspects of the architecture. First, the system introduces well-defined abstractions such as connections, commands, and rowsets that enable sources to plug into the system. These abstractions are formalized by the OLE DB data access interfaces. The generality of OLE DB and its broad industry adoption enables our system to reach a very large collection of diverse data sources ranging from personal productivity tools, to database management systems, to file system data. Second, the DHQP is built-in to the relational optimizer and execution engine of the system. This enables DH queries and updates to benefit from the cost-based algebraic transformations and execution strategies available in the system. Finally, the architecture is inherently extensible to support new data sources as they emerge as well as serves as a key extensibility point for the relational engine to add new features such as full-text search and distributed partitioned views.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130343719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Compressing bitmap indices by data reorganization 通过数据重组压缩位图索引
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.35
Ali Pinar, Tao Tao, H. Ferhatosmanoğlu
Many scientific applications generate massive volumes of data through observations or computer simulations, bringing up the need for effective indexing methods for efficient storage and retrieval of scientific data. Unlike conventional databases, scientific data is mostly read-only and its volume can reach to the order of petabytes, making a compact index structure vital. Bitmap indexing has been successfully applied to scientific databases by exploiting the fact that scientific data are enumerated or numerical. Bitmap indices can be compressed with valiants of run length encoding for a compact index structure. However even this may not be enough for the enormous data generated in some applications such as high energy physics. In this paper, we study how to reorganize bitmap tables for improved compression rates. Our algorithms are used just as a preprocessing step, thus there is no need to reuse the current indexing techniques and the query processing algorithms. We introduce the tuple reordering problem, which aims to reorganize database tuples for optimal compression rates. We propose Gray code ordering algorithm for this NP-Complete problem, which is an in-place algorithm, and runs in linear time in the order of the size of the database. We also discuss how the tuple reordering problem can be reduced to the traveling salesperson problem. Our experimental results on real data sets show that the compression ratio can be improved by a factor of 2 to 10.
许多科学应用程序通过观测或计算机模拟产生大量数据,因此需要有效的索引方法来有效地存储和检索科学数据。与传统数据库不同,科学数据大多是只读的,其容量可以达到pb级,因此紧凑的索引结构至关重要。位图标引利用科学数据的枚举性或数值性,成功地应用于科学数据库中。对于紧凑的索引结构,位图索引可以使用运行长度编码来压缩。然而,对于某些应用(如高能物理)中产生的大量数据来说,这可能还不够。在本文中,我们研究了如何重组位图表以提高压缩率。我们的算法仅用作预处理步骤,因此不需要重用当前的索引技术和查询处理算法。我们引入元组重新排序问题,其目的是重新组织数据库元组以获得最佳压缩率。针对这一np完全问题,我们提出了Gray编码排序算法,该算法是一种就地算法,并按照数据库大小的顺序在线性时间内运行。我们还讨论了如何将元组重新排序问题简化为旅行销售人员问题。我们在真实数据集上的实验结果表明,压缩比可以提高2到10倍。
{"title":"Compressing bitmap indices by data reorganization","authors":"Ali Pinar, Tao Tao, H. Ferhatosmanoğlu","doi":"10.1109/ICDE.2005.35","DOIUrl":"https://doi.org/10.1109/ICDE.2005.35","url":null,"abstract":"Many scientific applications generate massive volumes of data through observations or computer simulations, bringing up the need for effective indexing methods for efficient storage and retrieval of scientific data. Unlike conventional databases, scientific data is mostly read-only and its volume can reach to the order of petabytes, making a compact index structure vital. Bitmap indexing has been successfully applied to scientific databases by exploiting the fact that scientific data are enumerated or numerical. Bitmap indices can be compressed with valiants of run length encoding for a compact index structure. However even this may not be enough for the enormous data generated in some applications such as high energy physics. In this paper, we study how to reorganize bitmap tables for improved compression rates. Our algorithms are used just as a preprocessing step, thus there is no need to reuse the current indexing techniques and the query processing algorithms. We introduce the tuple reordering problem, which aims to reorganize database tuples for optimal compression rates. We propose Gray code ordering algorithm for this NP-Complete problem, which is an in-place algorithm, and runs in linear time in the order of the size of the database. We also discuss how the tuple reordering problem can be reduced to the traveling salesperson problem. Our experimental results on real data sets show that the compression ratio can be improved by a factor of 2 to 10.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129498659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Rank-aware query processsing and optimization 基于排名的查询处理和优化
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.119
I. Ilyas, Walid G. Aref
Efficient execution of ranking query is increasingly becoming a major challenge for database technology. DBMSs provide efficient update, indexing, concurrency and recovery. On the other hand, IR on text and multimedia requires techniques involving uncertainty and ranking for effective retrieval. The main goal of this paper is to give an in-depth look on supporting ranking queries as an increasingly interesting area of research. We cover the state-of-the-art techniques in research prototypes and industry-strength database engines for efficient handling of ranking and queries. We focus primarily on how to integrate ranking as a new query processing and optimization dimension, with the aim of supporting ranking queries as a basic and core functionality. The paper identifies several challenges that need to be addressed towards a true support for ranking and effective retrieval in database management systems.
排序查询的高效执行日益成为数据库技术面临的主要挑战。dbms提供高效的更新、索引、并发性和恢复。另一方面,文本和多媒体的信息检索需要涉及不确定性和排序的技术来进行有效检索。本文的主要目标是深入研究支持排名查询作为一个越来越有趣的研究领域。我们涵盖了研究原型和行业实力数据库引擎中最先进的技术,用于有效处理排名和查询。我们主要关注如何将排名集成为一个新的查询处理和优化维度,目的是支持排名查询作为基本和核心功能。本文确定了在数据库管理系统中需要解决的几个挑战,以真正支持排名和有效检索。
{"title":"Rank-aware query processsing and optimization","authors":"I. Ilyas, Walid G. Aref","doi":"10.1109/ICDE.2005.119","DOIUrl":"https://doi.org/10.1109/ICDE.2005.119","url":null,"abstract":"Efficient execution of ranking query is increasingly becoming a major challenge for database technology. DBMSs provide efficient update, indexing, concurrency and recovery. On the other hand, IR on text and multimedia requires techniques involving uncertainty and ranking for effective retrieval. The main goal of this paper is to give an in-depth look on supporting ranking queries as an increasingly interesting area of research. We cover the state-of-the-art techniques in research prototypes and industry-strength database engines for efficient handling of ranking and queries. We focus primarily on how to integrate ranking as a new query processing and optimization dimension, with the aim of supporting ranking queries as a basic and core functionality. The paper identifies several challenges that need to be addressed towards a true support for ranking and effective retrieval in database management systems.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130912348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
21st International Conference on Data Engineering (ICDE'05)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1