首页 > 最新文献

Proceedings. 20th International Conference on Data Engineering最新文献

英文 中文
Algebraic signatures for scalable distributed data structures 可扩展分布式数据结构的代数签名
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320015
W. Litwin, T. Schwarz
Signatures detect changes to data objects. Numerous schemes are in use, especially the cryptographically secure standards SHA-1. We propose a novel signature scheme which we call algebraic signatures. The scheme uses the Galois field calculations. Its major property is the sure detection of any changes up to a parameterized size. More precisely, we detect for sure any changes that do not exceed n-symbols for an n-symbol algebraic signature. This property is new for any known signature scheme. For larger changes, the collision probability is typically negligible, as for the other known schemes. We apply the algebraic signatures to the scalable distributed data structures (SDDS). We filter at the SDDS client node the updates that do not actually change the records. We also manage the concurrent updates to data stored in the SDDS RAM buckets at the server nodes. We further use the scheme for the fast disk backup of these buckets. We sign our objects with 4-byte signatures, instead of 20-byte standard SHA-1 signatures. Our algebraic calculus is then also about twice as fast.
签名检测数据对象的更改。有许多方案正在使用,特别是加密安全标准SHA-1。我们提出了一种新的签名方案,我们称之为代数签名。该方案采用伽罗瓦场计算。它的主要特性是确定检测到参数化大小的任何变化。更准确地说,对于n符号代数签名,我们可以确定地检测到不超过n个符号的任何变化。此属性对于任何已知的签名方案都是新的。对于较大的变化,与其他已知方案一样,碰撞概率通常可以忽略不计。我们将代数签名应用于可扩展分布式数据结构(SDDS)。我们在SDDS客户机节点上过滤没有实际更改记录的更新。我们还管理存储在服务器节点的SDDS RAM桶中的数据的并发更新。我们进一步使用该方案对这些桶进行快速磁盘备份。我们用4字节的签名来签署对象,而不是20字节的标准SHA-1签名。我们的代数演算也快了两倍。
{"title":"Algebraic signatures for scalable distributed data structures","authors":"W. Litwin, T. Schwarz","doi":"10.1109/ICDE.2004.1320015","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320015","url":null,"abstract":"Signatures detect changes to data objects. Numerous schemes are in use, especially the cryptographically secure standards SHA-1. We propose a novel signature scheme which we call algebraic signatures. The scheme uses the Galois field calculations. Its major property is the sure detection of any changes up to a parameterized size. More precisely, we detect for sure any changes that do not exceed n-symbols for an n-symbol algebraic signature. This property is new for any known signature scheme. For larger changes, the collision probability is typically negligible, as for the other known schemes. We apply the algebraic signatures to the scalable distributed data structures (SDDS). We filter at the SDDS client node the updates that do not actually change the records. We also manage the concurrent updates to data stored in the SDDS RAM buckets at the server nodes. We further use the scheme for the fast disk backup of these buckets. We sign our objects with 4-byte signatures, instead of 20-byte standard SHA-1 signatures. Our algebraic calculus is then also about twice as fast.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121149513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Continuously maintaining quantile summaries of the most recent N elements over a data stream 持续维护数据流中最近N个元素的分位数摘要
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320011
Xuemin Lin, Hongjun Lu, Jian Xu, J. Yu
Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, Web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. We study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of /spl epsiv/N. We developed a space efficient algorithm for predefined N that requires only one scan of the input data stream and O(log(/spl epsiv//sup 2/N)//spl epsiv/+1//spl epsiv//sup 2/) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n /spl les/ N) can be answered with a guaranteed precision of /spl epsiv/n. The worst case space requirement for this algorithm is only O(log/sup 2/(/spl epsiv/N)//spl epsiv//sup 2/). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.
在涉及数据流的应用程序中,经常需要对最近观察到的数据元素进行统计,例如网络监控中的入侵检测、金融市场中的股票价格预测、用于访问预测的Web日志挖掘以及用于个性化的用户单击流挖掘。在各种统计中,分位数汇总的计算可能是最具挑战性的,因为它的复杂性。我们研究了连续维护流上最近观察到的N个元素的分位数摘要的问题,以便分位数查询可以以保证精度的/spl epsiv/N来回答。我们为预定义的N开发了一种空间高效算法,在最坏的情况下,只需要扫描一次输入数据流和O(log(/spl epsiv//sup 2/N)//spl epsiv/+1//spl epsiv//sup 2/)空间。我们还开发了一种算法,用于维护最近N个元素的分位数摘要,以便对任何最近N个元素(N /spl les/ N)的分位数查询都可以得到保证精度为/spl epsiv/ N的回答。该算法的最坏情况空间需求仅为O(log/sup 2/(/spl epsiv/N)//spl epsiv//sup 2/)。我们的性能研究表明,不仅实际的分位数估计误差远远低于保证精度,而且空间要求也远远小于给定的理论界限。
{"title":"Continuously maintaining quantile summaries of the most recent N elements over a data stream","authors":"Xuemin Lin, Hongjun Lu, Jian Xu, J. Yu","doi":"10.1109/ICDE.2004.1320011","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320011","url":null,"abstract":"Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, Web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. We study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of /spl epsiv/N. We developed a space efficient algorithm for predefined N that requires only one scan of the input data stream and O(log(/spl epsiv//sup 2/N)//spl epsiv/+1//spl epsiv//sup 2/) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n /spl les/ N) can be answered with a guaranteed precision of /spl epsiv/n. The worst case space requirement for this algorithm is only O(log/sup 2/(/spl epsiv/N)//spl epsiv//sup 2/). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"358 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122746675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 91
Database kernel research: what, if anything, is left to do? 数据库内核研究:如果有的话,还剩下什么要做?
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320095
D. Lomet
Data Engineering deals with the use of engineering techniques and methodologies in the design, development and assessment of information systems for different computing platforms and application environments. The 20th International Conference on Data Engineering will be held in Boston, Massachusetts, USA-an academic and technological center with a variety of historical and cultural attractions of international prominence within walking distance.
数据工程涉及在不同计算平台和应用环境的信息系统的设计、开发和评估中使用工程技术和方法。第20届国际数据工程会议将在美国马萨诸塞州波士顿举行,波士顿是一个学术和技术中心,步行即可到达各种国际知名的历史和文化景点。
{"title":"Database kernel research: what, if anything, is left to do?","authors":"D. Lomet","doi":"10.1109/ICDE.2004.1320095","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320095","url":null,"abstract":"Data Engineering deals with the use of engineering techniques and methodologies in the design, development and assessment of information systems for different computing platforms and application environments. The 20th International Conference on Data Engineering will be held in Boston, Massachusetts, USA-an academic and technological center with a variety of historical and cultural attractions of international prominence within walking distance.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133814423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BEA liquid data for WebLogic: XML-based enterprise information integration 用于WebLogic的BEA流动数据:基于xml的企业信息集成
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320051
M. Carey
This presentation provides a technical overview of BEA Liquid Data for WebLogic, a relatively new product from BEA Systems that provides enterprise information integration capabilities to enterprise applications that are built and deployed using the BEA WebLogic Platform. Liquid Data takes an XML-centric approach to tackling the long-standing problem of integrating data from disparate data sources and making that information easily accessible to applications. In particular, Liquid Data uses the forthcoming XQuery language standard as the basis for defining integrated views of enterprise data and querying over those views. We provide a brief overview of the Liquid Data product architecture and then discuss some of the query processing technology that lies at the heart of the product.
本报告提供了BEA Liquid Data for WebLogic的技术概述,BEA是BEA Systems的一款相对较新的产品,它为使用BEA WebLogic平台构建和部署的企业应用程序提供了企业信息集成功能。Liquid Data采用以xml为中心的方法来解决长期存在的问题,即集成来自不同数据源的数据,并使应用程序能够轻松访问这些信息。特别地,Liquid Data使用即将发布的XQuery语言标准作为定义企业数据集成视图和对这些视图进行查询的基础。我们简要概述了Liquid Data产品架构,然后讨论了该产品核心的一些查询处理技术。
{"title":"BEA liquid data for WebLogic: XML-based enterprise information integration","authors":"M. Carey","doi":"10.1109/ICDE.2004.1320051","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320051","url":null,"abstract":"This presentation provides a technical overview of BEA Liquid Data for WebLogic, a relatively new product from BEA Systems that provides enterprise information integration capabilities to enterprise applications that are built and deployed using the BEA WebLogic Platform. Liquid Data takes an XML-centric approach to tackling the long-standing problem of integrating data from disparate data sources and making that information easily accessible to applications. In particular, Liquid Data uses the forthcoming XQuery language standard as the basis for defining integrated views of enterprise data and querying over those views. We provide a brief overview of the Liquid Data product architecture and then discuss some of the query processing technology that lies at the heart of the product.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115196792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Using stream semantics for continuous queries in media stream processors 在媒体流处理器中使用流语义进行连续查询
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320083
Amarnath Gupta, B. Liu, Pilho Kim, R. Jain
In the case of media and feature streams, explicit inter-stream constraints exist and can be exploited in the evaluation of continuous queries in the spirit of semantic query optimization. We express these properties using a media stream declaration language MSDL. In the demonstration, we present IMMERSI-MEET, an application built around an immersive environment. The IMMERSI-MEET system distinguishes between continuous streams, where values of different types come at a specified data rate, and discrete streams where sources push values intermittently. In MSDL, any dependence declaration must have at least one dependency specifying predicate in the body. As stream declarations are registered, the stream constraints are interpreted to construct a set of evaluation directives.
在媒体流和特征流的情况下,存在显式的流间约束,可以本着语义查询优化的精神在连续查询的评估中加以利用。我们使用媒体流声明语言MSDL来表达这些属性。在演示中,我们展示了IMMERSI-MEET,一个围绕沉浸式环境构建的应用程序。IMMERSI-MEET系统区分连续流和离散流,连续流中不同类型的值以指定的数据速率出现,而离散流中源间歇性地推送值。在MSDL中,任何依赖声明必须在主体中至少有一个指定谓词的依赖。在注册流声明时,流约束被解释为构造一组求值指令。
{"title":"Using stream semantics for continuous queries in media stream processors","authors":"Amarnath Gupta, B. Liu, Pilho Kim, R. Jain","doi":"10.1109/ICDE.2004.1320083","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320083","url":null,"abstract":"In the case of media and feature streams, explicit inter-stream constraints exist and can be exploited in the evaluation of continuous queries in the spirit of semantic query optimization. We express these properties using a media stream declaration language MSDL. In the demonstration, we present IMMERSI-MEET, an application built around an immersive environment. The IMMERSI-MEET system distinguishes between continuous streams, where values of different types come at a specified data rate, and discrete streams where sources push values intermittently. In MSDL, any dependence declaration must have at least one dependency specifying predicate in the body. As stream declarations are registered, the stream constraints are interpreted to construct a set of evaluation directives.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123361791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Meta data management 元数据管理
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320101
Philip A. Bernstein, Sergey Melnik
By meta data management, we mean techniques for manipulating schemas and schema-like objects (such as interface definitions and web site maps) and mappings between them. Work on meta data problems goes back to at least the early 1970s, when data translation was the hot database research topic, even before relational databases caught on. Many popular research problems in the past five years are primarily meta data problems, such as data warehouse tools (e.g., ETL – to extract, transform and load), data integration, the semantic web, generation of XML or object-oriented wrappers for SQL databases, and generation of wrappers for web sites. Other classical meta data problems are information resource management, design tool support and integration, and schema evolution and data migration. Despite its longevity and continued importance, there is no widely-accepted conceptual framework for the meta data field, as there is for many other database topics, such as access methods, query processing, and transaction management. In this seminar, we propose such a conceptual framework. It consists of three layers: applications, design patterns, and basic operators. Applications are the end-user problems to be solved, like those listed in the previous paragraph. Design patterns are generic problems that need to be solved in support of many different applications, such as meta modeling (for all meta data problems), answering queries using views (for data integration and the semantic web), and change propagation (for data translation, schema evolution, and round-trip engineering). Basic operators are procedures that are needed to support multiple design patterns and applications, such as matching schemas to produce a mapping, merging schemas based on a mapping, and composing mappings. We will describe several meta data management problems, and for each, we will explain the design patterns and operators that are needed to solve it. We will summarize the main approaches to each design pattern and operator – the main choices of language, data structures, and algorithms – and will highlight the relevant papers that address it. This seminar is targeted at both practicing engineers and researchers. The former will learn about the latest solutions to important meta data problems and the many difficult unsolved problems that are best to avoid. Database researchers, especially professors, will benefit from considering the conceptual framework that we propose, since no database textbooks treat meta data management as a separate topic as far as we know.
通过元数据管理,我们指的是操作模式和类似模式的对象(如接口定义和网站映射)以及它们之间的映射的技术。元数据问题的研究至少可以追溯到20世纪70年代早期,当时数据翻译是热门的数据库研究主题,甚至在关系数据库流行之前。在过去的五年中,许多流行的研究问题主要是元数据问题,如数据仓库工具(例如,ETL -提取,转换和加载),数据集成,语义网,XML或SQL数据库的面向对象包装器的生成,以及网站包装器的生成。其他经典元数据问题包括信息资源管理、设计工具支持和集成、模式演化和数据迁移。尽管元数据领域存在了很长时间并且一直很重要,但它还没有被广泛接受的概念框架,就像许多其他数据库主题(如访问方法、查询处理和事务管理)一样。在本次研讨会上,我们提出了这样一个概念框架。它由三层组成:应用程序、设计模式和基本操作符。应用程序是要解决的最终用户问题,就像前面列出的那些问题一样。设计模式是需要解决的通用问题,以支持许多不同的应用程序,例如元建模(用于所有元数据问题)、使用视图回答查询(用于数据集成和语义web)和更改传播(用于数据转换、模式演化和往返工程)。基本操作符是支持多种设计模式和应用程序所需的过程,例如匹配模式以生成映射、基于映射合并模式以及组合映射。我们将描述几个元数据管理问题,并针对每个问题解释解决这些问题所需的设计模式和操作符。我们将总结每种设计模式和运算符的主要方法——语言、数据结构和算法的主要选择——并将重点介绍解决这些问题的相关论文。本次研讨会的对象是执业工程师和研究人员。前者将学习重要元数据问题的最新解决方案,以及最好避免的许多难以解决的问题。数据库研究人员,特别是教授,将从考虑我们提出的概念框架中受益,因为据我们所知,没有数据库教科书将元数据管理作为一个单独的主题。
{"title":"Meta data management","authors":"Philip A. Bernstein, Sergey Melnik","doi":"10.1109/ICDE.2004.1320101","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320101","url":null,"abstract":"By meta data management, we mean techniques for manipulating schemas and schema-like objects (such as interface definitions and web site maps) and mappings between them. Work on meta data problems goes back to at least the early 1970s, when data translation was the hot database research topic, even before relational databases caught on. Many popular research problems in the past five years are primarily meta data problems, such as data warehouse tools (e.g., ETL – to extract, transform and load), data integration, the semantic web, generation of XML or object-oriented wrappers for SQL databases, and generation of wrappers for web sites. Other classical meta data problems are information resource management, design tool support and integration, and schema evolution and data migration. Despite its longevity and continued importance, there is no widely-accepted conceptual framework for the meta data field, as there is for many other database topics, such as access methods, query processing, and transaction management. In this seminar, we propose such a conceptual framework. It consists of three layers: applications, design patterns, and basic operators. Applications are the end-user problems to be solved, like those listed in the previous paragraph. Design patterns are generic problems that need to be solved in support of many different applications, such as meta modeling (for all meta data problems), answering queries using views (for data integration and the semantic web), and change propagation (for data translation, schema evolution, and round-trip engineering). Basic operators are procedures that are needed to support multiple design patterns and applications, such as matching schemas to produce a mapping, merging schemas based on a mapping, and composing mappings. We will describe several meta data management problems, and for each, we will explain the design patterns and operators that are needed to solve it. We will summarize the main approaches to each design pattern and operator – the main choices of language, data structures, and algorithms – and will highlight the relevant papers that address it. This seminar is targeted at both practicing engineers and researchers. The former will learn about the latest solutions to important meta data problems and the many difficult unsolved problems that are best to avoid. Database researchers, especially professors, will benefit from considering the conceptual framework that we propose, since no database textbooks treat meta data management as a separate topic as far as we know.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122740915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 91
Scalable multimedia disk scheduling 可扩展的多媒体磁盘调度
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320022
M. Mokbel, Walid G. Aref, Khaled M. Elbassioni, I. Kamel
A new multimedia disk-scheduling algorithm, termed Cascaded-SFC, is presented. The Cascaded-SFC multimedia disk scheduler is applicable in environments where multimedia data requests arrive with different quality of service (QoS) requirements such as real-time deadline and user priority. Previous work on disk scheduling has focused on optimizing the seek times and/or meeting the real-time deadlines. The Cascaded-SFC disk scheduler provides a unified framework for multimedia disk scheduling that scales with the number of scheduling parameters. The general idea is based on modeling the multimedia disk requests as points in multiple multidimensional subspaces, where each of the dimensions represents one of the parameters (e.g., one dimension represents the request deadline, another represents the disk cylinder number, and a third dimension represents the priority of the request, etc.). Each multidimensional subspace represents a subset of the QoS parameters that share some common scheduling characteristics. Then the multimedia disk scheduling problem reduces to the problem of finding a linear order to traverse the multidimensional points in each subspace. Multiple space-filling curves are selected to fit the scheduling needs of the QoS parameters in each subspace. The orders in each subspace are integrated in a cascaded way to provide a total order for the whole space. Comprehensive experiments demonstrate the efficiency and scalability of the Cascaded-SFC disk scheduling algorithm over other disk schedulers.
提出了一种新的多媒体磁盘调度算法——级联式sfc。cascade - sfc多媒体磁盘调度器适用于对多媒体数据请求有实时截止时间、用户优先级等不同QoS要求的环境。以前关于磁盘调度的工作主要集中在优化寻道时间和/或满足实时截止日期。cascade - sfc磁盘调度程序为多媒体磁盘调度提供了一个统一的框架,该框架可随调度参数的数量而扩展。一般思想是基于将多媒体磁盘请求建模为多个多维子空间中的点,其中每个维度表示一个参数(例如,一个维度表示请求截止日期,另一个维度表示磁盘圆柱号,第三个维度表示请求的优先级,等等)。每个多维子空间表示QoS参数的一个子集,这些参数共享一些共同的调度特征。然后将多媒体磁盘调度问题简化为寻找遍历每个子空间中多维点的线性顺序问题。选择多个空间填充曲线来适应每个子空间中QoS参数的调度需求。每个子空间中的阶数以级联的方式进行积分,以提供整个空间的总阶数。综合实验表明,与其他磁盘调度程序相比,cascade - sfc磁盘调度算法具有较高的效率和可扩展性。
{"title":"Scalable multimedia disk scheduling","authors":"M. Mokbel, Walid G. Aref, Khaled M. Elbassioni, I. Kamel","doi":"10.1109/ICDE.2004.1320022","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320022","url":null,"abstract":"A new multimedia disk-scheduling algorithm, termed Cascaded-SFC, is presented. The Cascaded-SFC multimedia disk scheduler is applicable in environments where multimedia data requests arrive with different quality of service (QoS) requirements such as real-time deadline and user priority. Previous work on disk scheduling has focused on optimizing the seek times and/or meeting the real-time deadlines. The Cascaded-SFC disk scheduler provides a unified framework for multimedia disk scheduling that scales with the number of scheduling parameters. The general idea is based on modeling the multimedia disk requests as points in multiple multidimensional subspaces, where each of the dimensions represents one of the parameters (e.g., one dimension represents the request deadline, another represents the disk cylinder number, and a third dimension represents the priority of the request, etc.). Each multidimensional subspace represents a subset of the QoS parameters that share some common scheduling characteristics. Then the multimedia disk scheduling problem reduces to the problem of finding a linear order to traverse the multidimensional points in each subspace. Multiple space-filling curves are selected to fit the scheduling needs of the QoS parameters in each subspace. The orders in each subspace are integrated in a cascaded way to provide a total order for the whole space. Comprehensive experiments demonstrate the efficiency and scalability of the Cascaded-SFC disk scheduling algorithm over other disk schedulers.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129703912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
NEXSORT: sorting XML in external memory NEXSORT:在外部内存中对XML排序
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320038
Adam Silberstein, Jun Yang
XML plays an important role in delivering data over the Internet, and the need to store and manipulate XML in its native format has become increasingly relevant. This growing need necessitates work on developing native XML operators, especially for one as fundamental as sort. We present NEXSORT, an algorithm that leverages the hierarchical nature of XML to efficiently sort an XML document in external memory. In a fully sorted XML document, children of every nonleaf element are ordered according to a given sorting criterion. Among NEXSORT's uses is in combination with structural merge as the XML version of sort-merge join, which allows us to merge large XML documents using only a single pass once they are sorted. The hierarchical structure of an XML document limits the number of possible legal orderings among its elements, which means that sorting XML is fundamentally "easier" than sorting a flat file. We prove that the I/O lower bound for sorting XML in external memory is /spl Theta/(max{n,nlog/sub m/(k/B)}), where n is the number of blocks in the input XML document, m is the number of main memory blocks available for sorting, B is the number of elements that can fit in one block, and k is the maximum fan-out of the input document tree. We show that NEXSORT performs within a constant factor of this theoretical lower bound. In practice we demonstrate, even with a naive implementation, NEXSORT significantly outperforms a regular external merge sort of all elements by their key paths, unless the XML document is nearly flat, in which case NEXSORT degenerates essentially to external merge sort.
XML在通过Internet传递数据方面发挥着重要作用,以其原生格式存储和操作XML的需求变得越来越重要。这种不断增长的需求需要开发原生XML操作符,尤其是像sort这样的基本操作符。我们介绍了NEXSORT,这是一种利用XML的层次特性对外部内存中的XML文档进行有效排序的算法。在完全排序的XML文档中,根据给定的排序标准对每个非叶元素的子元素进行排序。NEXSORT的用途之一是与结构合并结合使用,作为排序-合并连接的XML版本,它允许我们在对大型XML文档进行排序后仅使用一次传递就合并它们。XML文档的层次结构限制了其元素之间可能的合法排序的数量,这意味着对XML进行排序从根本上比对平面文件进行排序“更容易”。我们证明了在外部内存中对XML进行排序的I/O下界为/spl Theta/(max{n,nlog/sub m/(k/B)}),其中n为输入XML文档中的块数,m为可用于排序的主内存块数,B为可容纳在一个块中的元素数,k为输入文档树的最大扇出。我们证明NEXSORT在这个理论下界的常数因子内执行。在实践中,我们证明,即使使用简单的实现,NEXSORT也明显优于按键路径对所有元素进行常规的外部合并排序,除非XML文档几乎是平面的,在这种情况下,NEXSORT本质上退化为外部合并排序。
{"title":"NEXSORT: sorting XML in external memory","authors":"Adam Silberstein, Jun Yang","doi":"10.1109/ICDE.2004.1320038","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320038","url":null,"abstract":"XML plays an important role in delivering data over the Internet, and the need to store and manipulate XML in its native format has become increasingly relevant. This growing need necessitates work on developing native XML operators, especially for one as fundamental as sort. We present NEXSORT, an algorithm that leverages the hierarchical nature of XML to efficiently sort an XML document in external memory. In a fully sorted XML document, children of every nonleaf element are ordered according to a given sorting criterion. Among NEXSORT's uses is in combination with structural merge as the XML version of sort-merge join, which allows us to merge large XML documents using only a single pass once they are sorted. The hierarchical structure of an XML document limits the number of possible legal orderings among its elements, which means that sorting XML is fundamentally \"easier\" than sorting a flat file. We prove that the I/O lower bound for sorting XML in external memory is /spl Theta/(max{n,nlog/sub m/(k/B)}), where n is the number of blocks in the input XML document, m is the number of main memory blocks available for sorting, B is the number of elements that can fit in one block, and k is the maximum fan-out of the input document tree. We show that NEXSORT performs within a constant factor of this theoretical lower bound. In practice we demonstrate, even with a naive implementation, NEXSORT significantly outperforms a regular external merge sort of all elements by their key paths, unless the XML document is nearly flat, in which case NEXSORT degenerates essentially to external merge sort.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116938160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Recursive XML schemas, recursive XML queries, and relational storage: XML-to-SQL query translation 递归XML模式、递归XML查询和关系存储:XML到sql查询的转换
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319983
R. Krishnamurthy, Venkatesan T. Chakaravarthy, R. Kaushik, J. Naughton
We consider the problem of translating XML queries into SQL when XML documents have been stored in an RDBMS using a schema-based relational decomposition. Surprisingly, there is no published XML-to-SQL query translation algorithm for this scenario that handles recursive XML schemas. We present a generic algorithm to translate path expression queries into SQL in the presence of recursion in the schema and queries. This algorithm handles a general class of XML-to-relational mappings, which includes all techniques proposed in literature. Some of the salient features of this algorithm are: (i) It translates a path expression query into a single SQL query, irrespective of how complex the XML schema is, (ii) It uses the "with" clause in SQL99 to handle recursive queries even over nonrecursive schemas, (iii) It reconstructs recursive XML subtrees with a single SQL query and (iv) It shows that the support for linear recursion in SQL99 is sufficient for handling path expression queries over arbitrarily complex recursive XML schema.
当使用基于模式的关系分解将XML文档存储在RDBMS中时,我们考虑将XML查询转换为SQL的问题。令人惊讶的是,没有针对此场景发布的XML-to- sql查询转换算法来处理递归XML模式。在模式和查询中存在递归的情况下,我们提出了一种将路径表达式查询转换为SQL的通用算法。该算法处理xml到关系映射的一般类,其中包括文献中提出的所有技术。这个算法的特征有:(i)将路径表达式查询转换为一个SQL查询,不管多么复杂的XML模式,(2)它使用”和“条款SQL99处理递归查询甚至在nonrecursive模式,(3)它可以递归的XML子树和一个SQL查询(iv)这表明支持线性递归SQL99足够处理路径表达式查询任意复杂递归的XML模式。
{"title":"Recursive XML schemas, recursive XML queries, and relational storage: XML-to-SQL query translation","authors":"R. Krishnamurthy, Venkatesan T. Chakaravarthy, R. Kaushik, J. Naughton","doi":"10.1109/ICDE.2004.1319983","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1319983","url":null,"abstract":"We consider the problem of translating XML queries into SQL when XML documents have been stored in an RDBMS using a schema-based relational decomposition. Surprisingly, there is no published XML-to-SQL query translation algorithm for this scenario that handles recursive XML schemas. We present a generic algorithm to translate path expression queries into SQL in the presence of recursion in the schema and queries. This algorithm handles a general class of XML-to-relational mappings, which includes all techniques proposed in literature. Some of the salient features of this algorithm are: (i) It translates a path expression query into a single SQL query, irrespective of how complex the XML schema is, (ii) It uses the \"with\" clause in SQL99 to handle recursive queries even over nonrecursive schemas, (iii) It reconstructs recursive XML subtrees with a single SQL query and (iv) It shows that the support for linear recursion in SQL99 is sufficient for handling path expression queries over arbitrarily complex recursive XML schema.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134573817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
An efficient algorithm for mining frequent sequences by a new strategy without support counting 一种不支持计数的新策略挖掘频繁序列的高效算法
Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320012
D. Chiu, Yi-Hung Wu, Arbee L. P. Chen
Mining sequential patterns in large databases is an important research topic. The main challenge of mining sequential patterns is the high processing cost due to the large amount of data. We propose a new strategy called direct sequence comparison (abbreviated as DISC), which can find frequent sequences without having to compute the support counts of nonfrequent sequences. The main difference between the DISC strategy and the previous works is the way to prune nonfrequent sequences. The previous works are based on the antimonotone property, which prune the nonfrequent sequences according to the frequent sequences with shorter lengths. On the contrary, the DISC strategy prunes the nonfrequent sequences according to the other sequences with the same length. Moreover, we summarize three strategies used in the previous works and design an efficient algorithm called DISC-all to take advantages of all the four strategies. The experimental results show that the DISC-all algorithm outperforms the PrefixSpan algorithm on mining frequent sequences in large databases. In addition, we analyze these strategies to design the dynamic version of our algorithm, which achieves a much better performance.
在大型数据库中挖掘顺序模式是一个重要的研究课题。挖掘顺序模式的主要挑战是由于数据量大而导致的高处理成本。我们提出了一种新的策略,称为直接序列比较(DISC),它可以在不计算非频繁序列的支持计数的情况下找到频繁序列。DISC策略与先前工作的主要区别在于对非频繁序列的修剪方式。以往的工作都是基于反单调性,根据长度较短的频繁序列对非频繁序列进行剪枝。相反,DISC策略根据相同长度的其他序列对非频繁序列进行剪枝。此外,我们总结了之前工作中使用的三种策略,并设计了一种称为DISC-all的高效算法来利用这四种策略。实验结果表明,在大型数据库中挖掘频繁序列时,DISC-all算法优于PrefixSpan算法。此外,我们分析了这些策略,设计了算法的动态版本,实现了更好的性能。
{"title":"An efficient algorithm for mining frequent sequences by a new strategy without support counting","authors":"D. Chiu, Yi-Hung Wu, Arbee L. P. Chen","doi":"10.1109/ICDE.2004.1320012","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320012","url":null,"abstract":"Mining sequential patterns in large databases is an important research topic. The main challenge of mining sequential patterns is the high processing cost due to the large amount of data. We propose a new strategy called direct sequence comparison (abbreviated as DISC), which can find frequent sequences without having to compute the support counts of nonfrequent sequences. The main difference between the DISC strategy and the previous works is the way to prune nonfrequent sequences. The previous works are based on the antimonotone property, which prune the nonfrequent sequences according to the frequent sequences with shorter lengths. On the contrary, the DISC strategy prunes the nonfrequent sequences according to the other sequences with the same length. Moreover, we summarize three strategies used in the previous works and design an efficient algorithm called DISC-all to take advantages of all the four strategies. The experimental results show that the DISC-all algorithm outperforms the PrefixSpan algorithm on mining frequent sequences in large databases. In addition, we analyze these strategies to design the dynamic version of our algorithm, which achieves a much better performance.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133138604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 90
期刊
Proceedings. 20th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1