Proceedings. 20th International Conference on Data Engineering最新文献

英文中文

A Web-services architecture for efficient XML data exchange 用于高效XML数据交换的web服务体系结构

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320024

S. Amer-Yahia, Y. Kotidis

Business applications often exchange large amounts of enterprise data stored in legacy systems. The advent of XML as a standard specification format has improved applications interoperability. However, optimizing the performance of XML data exchange, in particular, when data volumes are large, is still in its infancy. Quite often, the target system has to undo some of the work the source did to assemble documents in order to map XML elements into its own data structures. This publish&map process is both resource and time consuming. In this paper, we develop a middle-tier Web services architecture to optimize the exchange of large XML data volumes. The key idea is to allow systems to negotiate the data exchange process using an extension to WSDL. The source (target) can specify document fragments that it is willing to produce (consume). Given these fragmentations, the middleware instruments the data exchange process between the two systems to minimize the number of necessary operations and optimize the distributed processing between the source and the target systems. We show that our new exchange paradigm outperforms publish&map and enables more flexible scenarios without necessitating substantial modifications to the underlying systems.

业务应用程序经常交换存储在遗留系统中的大量企业数据。XML作为标准规范格式的出现改进了应用程序的互操作性。但是，优化XML数据交换的性能，特别是在数据量很大的情况下，仍然处于起步阶段。为了将XML元素映射到自己的数据结构中，目标系统经常必须撤销源系统为组装文档所做的一些工作。这个发布和映射过程既耗费资源又耗费时间。在本文中，我们开发了一个中间层Web服务体系结构来优化大型XML数据量的交换。关键思想是允许系统使用WSDL扩展来协商数据交换过程。源(目标)可以指定它愿意生成(使用)的文档片段。给定这些片段，中间件将在两个系统之间配置数据交换过程，以最小化必要操作的数量，并优化源系统和目标系统之间的分布式处理。我们展示了我们的新交换范例优于publish&map，并且在不需要对底层系统进行实质性修改的情况下支持更灵活的场景。

{"title":"A Web-services architecture for efficient XML data exchange","authors":"S. Amer-Yahia, Y. Kotidis","doi":"10.1109/ICDE.2004.1320024","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320024","url":null,"abstract":"Business applications often exchange large amounts of enterprise data stored in legacy systems. The advent of XML as a standard specification format has improved applications interoperability. However, optimizing the performance of XML data exchange, in particular, when data volumes are large, is still in its infancy. Quite often, the target system has to undo some of the work the source did to assemble documents in order to map XML elements into its own data structures. This publish&map process is both resource and time consuming. In this paper, we develop a middle-tier Web services architecture to optimize the exchange of large XML data volumes. The key idea is to allow systems to negotiate the data exchange process using an extension to WSDL. The source (target) can specify document fragments that it is willing to produce (consume). Given these fragmentations, the middleware instruments the data exchange process between the two systems to minimize the number of necessary operations and optimize the distributed processing between the source and the target systems. We show that our new exchange paradigm outperforms publish&map and enables more flexible scenarios without necessitating substantial modifications to the underlying systems.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131584361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

Bitmap-tree indexing for set operations on free text 位图树索引在自由文本上的集合操作

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320067

Ilias Nitsos, Georgios Evangelidis, D. Dervos

Here we report on our implementation of a hybrid-indexing scheme (bitmap-tree) that combines the advantages of bitmap indexing and file inversion. The results we obtained are compared to those of the compressed inverted file index. Both storage overhead and query processing efficiency are taken into consideration. The proposed new method is shown to excel in handling queries involving set operations. For general-purpose user queries, the bitmap-tree is shown to perform as good as the compressed inverted file index.

在这里，我们报告了一种混合索引方案(位图树)的实现，它结合了位图索引和文件反转的优点。将所得结果与压缩后的倒排文件索引结果进行了比较。同时考虑了存储开销和查询处理效率。所提出的新方法在处理涉及集合操作的查询方面表现优异。对于一般用途的用户查询，位图树的表现与压缩的倒排文件索引一样好。

引用次数: 3

Probe, cluster, and discover: focused extraction of QA-Pagelets from the deep Web 探测、聚类和发现:从深度网络集中提取QA-Pagelets

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1319988

James Caverlee, Ling Liu, David J. Buttler

We introduce the concept of a QA-Pagelet to refer to the content region in a dynamic page that contains query matches. We present THOR, a scalable and efficient mining system for discovering and extracting QA-Pagelets from the deep Web. A unique feature of THOR is its two-phase extraction framework. In the first phase, pages from a deep Web site are grouped into distinct clusters of structurally-similar pages. In the second phase, pages from each page cluster are examined through a subtree filtering algorithm that exploits the structural and content similarity at subtree level to identify the QA-Pagelets.

我们引入QA-Pagelet的概念来引用包含查询匹配的动态页面中的内容区域。我们提出了THOR，一个可扩展且高效的挖掘系统，用于从深度网络中发现和提取QA-Pagelets。THOR的一个独特之处在于它的两相提取框架。在第一阶段，来自深度Web站点的页面被分组到结构相似的不同页面集群中。在第二阶段，通过子树过滤算法检查每个页面簇中的页面，该算法利用子树级别的结构和内容相似性来识别QA-Pagelets。

引用次数: 48

Lazy database replication with ordering guarantees 具有排序保证的延迟数据库复制

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320016

Khuzaima S. Daudjee, K. Salem

Lazy replication is a popular technique for improving the performance and availability of database systems. Although there are concurrency control techniques, which guarantee serializability in lazy replication systems, these techniques result in undesirable transaction orderings. Since transactions may see stale data, they may be serialized in an order different from the one in which they were submitted. Strong serializability avoids such problems, but it is very costly to implement. We propose a generalized form of strong serializability that is suitable for use with lazy replication. In addition to having many of the advantages of strong serializability, it can be implemented more efficiently. We show how generalized strong serializability can be implemented in a lazy replication system, and we present the results of a simulation study that quantifies the strengths and limitations of the approach.

惰性复制是提高数据库系统性能和可用性的一种流行技术。尽管存在并发控制技术，可以保证延迟复制系统中的可序列化性，但这些技术会导致不希望出现的事务顺序。由于事务可能会看到陈旧的数据，因此它们可能会以与提交时不同的顺序进行序列化。强序列化性避免了这类问题，但实现起来代价很高。我们提出了一种适用于延迟复制的强序列化性的广义形式。除了具有强序列化性的许多优点外，它还可以更有效地实现。我们展示了如何在惰性复制系统中实现广义强序列化性，并给出了量化该方法的优点和局限性的模拟研究结果。

引用次数: 81

Range cube: efficient cube computation by exploiting data correlation 范围立方体:利用数据相关性进行高效的立方体计算

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320035

Ying Feng, D. Agrawal, A. E. Abbadi, Ahmed A. Metwally

Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. We introduce range cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.

数据多维数据集计算和表示在时间和空间上都非常昂贵。以前的工作主要集中在减少计算时间或压缩数据立方体的表示。我们引入了范围立方作为一种有效的方法来计算和压缩数据立方而不损失任何精度。采用一种新的数据结构range trie来压缩和识别属性值之间的相关性，并对输入数据集进行压缩，有效地降低了计算成本。范围立方算法生成一个压缩的立方体，称为范围立方，它将所有单元划分为不相交的范围。每个区域表示具有相同聚合值的单元格子集，作为与输入数据元组具有相同维数的元组。范围多维数据集保留数据多维数据集的上卷/下钻语义。与h -立方相比，在真实数据集上的实验表明，当两种算法以其首选维度顺序运行时，运行时间小于三十分之一，仍然生成小于完整立方体空间的九分之一的范围立方体。在合成数据上，范围立方显示出更好的可伸缩性，以及对数据稀疏性和倾斜的更高适应性。

{"title":"Range cube: efficient cube computation by exploiting data correlation","authors":"Ying Feng, D. Agrawal, A. E. Abbadi, Ahmed A. Metwally","doi":"10.1109/ICDE.2004.1320035","DOIUrl":"https://doi.org/10.1109/ICDE.2004.1320035","url":null,"abstract":"Data cube computation and representation are prohibitively expensive in terms of time and space. Prior work has focused on either reducing the computation time or condensing the representation of a data cube. We introduce range cubing as an efficient way to compute and compress the data cube without any loss of precision. A new data structure, range trie, is used to compress and identify correlation in attribute values, and compress the input dataset to effectively reduce the computational cost. The range cubing algorithm generates a compressed cube, called range cube, which partitions all cells into disjoint ranges. Each range represents a subset of cells with the same aggregation value, as a tuple which has the same number of dimensions as the input data tuples. The range cube preserves the roll-up/drill-down semantics of a data cube. Compared to H-cubing, experiments on real dataset show a running time of less than one thirtieth, still generating a range cube of less than one ninth of the space of the full cube, when both algorithms run in their preferred dimension orders. On synthetic data, range cubing demonstrates much better scalability, as well as higher adaptiveness to both data sparsity and skew.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130125345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

FLYINGDOC: an architecture for distributed, user-friendly, and personalized information systems FLYINGDOC:分布式、用户友好和个性化信息系统的架构

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320078

I. Bruder, A. Zeitz, Holger Meyer, B. Hänsel, A. Heuer

The need for personal information management using distributed, user-friendly, and personalized document management systems is obvious. State of the art document management systems such as digital libraries provide support for the whole document lifecycle. To enhance such document management systems to get a personalized, distributed and user-friendly information system we present techniques for a simple import of collections, documents, and data, for generic and concrete data modeling, replication, and, personalization. These techniques were employed for the implementation of a personal conference assistant, which was used for the first time at the VLDB conference 2003 in Berlin, Germany. Our client-server architecture provides an information server with different services and different kinds of clients. These services comprise a distribution and replication service, a collection integration service, a data management unit, and, a query processing service.

使用分布式的、用户友好的和个性化的文档管理系统进行个人信息管理的需求是显而易见的。最先进的文档管理系统，如数字图书馆，为整个文档生命周期提供支持。为了增强这样的文档管理系统，使其成为个性化、分布式和用户友好的信息系统，我们提出了用于简单导入集合、文档和数据的技术，用于通用和具体的数据建模、复制和个性化。这些技术被用于实现个人会议助理，并在2003年德国柏林举行的VLDB会议上首次使用。我们的客户机-服务器体系结构提供了具有不同服务和不同类型客户机的信息服务器。这些服务包括分发和复制服务、集合集成服务、数据管理单元和查询处理服务。

引用次数: 3

SQLCM: a continuous monitoring framework for relational database engines SQLCM:关系数据库引擎的连续监视框架

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320020

S. Chaudhuri, A. König, Vivek R. Narasayya

The ability to monitor a database server is crucial for effective database administration. Today's commercial database systems support two basic mechanisms for monitoring: (a) obtaining a snapshot of counters to capture current state, and (b) logging events in the server to a table/file to capture history. We show that for a large class of important database administration tasks the above mechanisms are inadequate in functionality or performance. We present an infrastructure called SQLCM that enables continuous monitoring inside the database server and that has the ability to automatically take actions based on monitoring. We describe the implementation of SQLCM in Microsoft SQL Server and show how several common and important monitoring tasks can be easily specified in SQLCM. Our experimental evaluation indicates that SQLCM imposes low overhead on normal server execution end enables monitoring tasks on a production server that would be too expensive using today's monitoring mechanisms.

监视数据库服务器的能力对于有效的数据库管理至关重要。今天的商业数据库系统支持两种基本的监视机制:(a)获取计数器的快照以捕获当前状态;(b)将服务器中的事件记录到表/文件中以捕获历史。我们表明，对于大量重要的数据库管理任务，上述机制在功能或性能上都是不够的。我们提出了一种名为SQLCM的基础设施，它支持在数据库服务器内部进行连续监视，并且能够根据监视自动采取操作。我们将描述SQLCM在Microsoft SQL Server中的实现，并展示如何在SQLCM中轻松指定几个常见且重要的监视任务。我们的实验评估表明，SQLCM对普通服务器执行端的开销很低，因此可以在生产服务器上监视任务，而使用目前的监视机制会造成太大的开销。

引用次数: 32

Efficient incremental validation of XML documents 对XML文档进行有效的增量验证

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320036

Denilson Barbosa, A. Mendelzon, L. Libkin, L. Mignet, M. Arenas

We discuss incremental validation of XML documents with respect to DTDs and XML schema definitions. We consider insertions and deletions of subtrees, as opposed to leaf nodes only, and we also consider the validation of ID and IDREF attributes. For arbitrary schemas, we give a worst-case n log n time and linear space algorithm, and show that it often is far superior to revalidation from scratch. We present two classes of schemas, which capture most real-life DTDs, and show that they admit a logarithmic time incremental validation algorithm that, in many cases, requires only constant auxiliary space. We then discuss an implementation of these algorithms that is independent of, and can be customized for different storage mechanisms for XML. Finally, we present extensive experimental results showing that our approach is highly efficient and scalable.

我们将根据dtd和XML模式定义讨论XML文档的增量验证。我们考虑子树的插入和删除，而不是只考虑叶节点，我们还考虑ID和IDREF属性的验证。对于任意模式，我们给出了最坏情况下的n log n时间和线性空间算法，并表明它通常比从头开始重新验证要好得多。我们提出了两类模式，它们捕获了大多数现实生活中的dtd，并表明它们支持对数时间增量验证算法，在许多情况下，该算法只需要恒定的辅助空间。然后我们讨论这些算法的实现，这些算法独立于XML的不同存储机制，并且可以针对不同的存储机制进行定制。最后，我们给出了大量的实验结果，表明我们的方法是高效和可扩展的。

引用次数: 78

Proving ownership over categorical data 证明对分类数据的所有权

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320029

R. Sion

This paper introduces a novel method of rights protection for categorical data through watermarking. We discover new watermark embedding channels for relational data with categorical types. We design novel watermark encoding algorithms and analyze important theoretical bounds including mark vulnerability. While fully preserving data quality requirements, our solution survives important attacks, such as subset selection and random alterations. Mark detection is fully "blind" in that it doesn't require the original data, an important characteristic especially in the case of massive data. We propose various improvements and alternative encoding methods. We perform validation experiments by watermarking the outsourced Wal-Mart sales data available at our institute. We prove (experimentally and by analysis) our solution to be extremely resilient to both alteration and data loss attacks, for example tolerating up to 80% data loss with a watermark alteration of only 25%.

介绍了一种利用水印对分类数据进行版权保护的新方法。我们为具有分类类型的关系数据发现了新的水印嵌入通道。设计了新的水印编码算法，分析了水印脆弱性等重要理论边界。在完全保留数据质量要求的同时，我们的解决方案能够承受重要的攻击，例如子集选择和随机更改。标记检测是完全“盲”的，因为它不需要原始数据，这是一个重要的特征，特别是在海量数据的情况下。我们提出了各种改进和替代编码方法。我们通过对我们研究所提供的外包沃尔玛销售数据进行水印来进行验证实验。我们(通过实验和分析)证明我们的解决方案对更改和数据丢失攻击具有极高的弹性，例如，容忍高达80%的数据丢失，水印更改仅为25%。

引用次数: 97

GODIVA: lightweight data management for scientific visualization applications GODIVA:用于科学可视化应用程序的轻量级数据管理

Proceedings. 20th International Conference on Data Engineering

Pub Date : 2004-03-30 DOI: 10.1109/ICDE.2004.1320041

Xiaosong Ma, M. Winslett, Johnny Norris, X. Jiao, R. Fiedler

Scientific visualization applications are very data-intensive, with high demands for I/O and data management. Developers of many visualization tools hesitate to use traditional DBMSs, due to the lack of support for these DBMSs on parallel platforms and the risk of reducing the portability of their tools and the user data. We propose the GODIVA framework, which provides simple database-like interfaces to help visualization tool developers manage their in-memory data, and I/O optimizations such as prefetching and caching to improve input performance at run time. We implemented the GODIVA interfaces in a stand-alone, portable user library, which can be used by all types of visualization codes: interactive and batch-mode, sequential and parallel. Performance results from running a visualization tool using the GODIVA library on multiple platforms show that the GODIVA framework is easy to use, alleviates developers' data management burden, and can bring substantial I/O performance improvement.

科学可视化应用程序是非常数据密集型的，对I/O和数据管理有很高的要求。许多可视化工具的开发人员对使用传统的dbms犹豫不决，因为在并行平台上缺乏对这些dbms的支持，并且存在降低工具和用户数据可移植性的风险。我们提出了GODIVA框架，它提供了简单的类似数据库的接口来帮助可视化工具开发人员管理他们的内存数据，以及I/O优化，如预取和缓存，以提高运行时的输入性能。我们在一个独立的、可移植的用户库中实现了GODIVA接口，它可以被所有类型的可视化代码使用:交互式和批处理模式，顺序和并行。使用GODIVA库在多个平台上运行可视化工具的性能结果表明，GODIVA框架易于使用，减轻了开发人员的数据管理负担，并且可以带来实质性的I/O性能提升。

引用次数: 17

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. 20th International Conference on Data Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀