Proceedings 14th International Conference on Data Engineering最新文献

英文中文

Industry applications of data mining: challenges and opportunities 数据挖掘的行业应用:挑战与机遇

Proceedings 14th International Conference on Data Engineering

Pub Date : 1998-02-23 DOI: 10.1109/ICDE.1998.655765

Evangelos Simoudis

Summary form only given, as follows. Data mining applications deployed in industry are aimed at satisfying two problems organizations face: customer intimacy and better utilization of data assets. These applications can be divided into those that use micro-mining, i.e. single-mining-component desktop systems, and those who use macro-mining, i.e. multi-component server-based systems. The macro-mining applications are usually coupled with data warehouses. The interesting result of this coupling for the data mining community is that the data warehouses cannot be supported by the current data mining offerings delaying the deployment of applications in production environments. The data volumes are too large, the data types too diverse and the data characteristics too incompatible for the existing data mining algorithms. Furthermore, the pure mining operation is a very small part of the entire application life-cycle. The author presents the issues related to the coupling of macro-mining with data warehouses, and proposes issues that must be resolved for large-scale data mining applications to continue being deployed successfully.

仅给出摘要形式，如下。在工业中部署的数据挖掘应用程序旨在满足组织面临的两个问题:客户亲密性和更好地利用数据资产。这些应用程序可以分为使用微挖掘的应用程序，即单挖掘组件的桌面系统，以及使用宏挖掘的应用程序，即基于多组件的服务器系统。宏观挖掘应用程序通常与数据仓库相结合。对于数据挖掘社区来说，这种耦合的有趣结果是，当前的数据挖掘产品无法支持数据仓库，从而延迟了应用程序在生产环境中的部署。对于现有的数据挖掘算法来说，数据量太大，数据类型太多样，数据特征不兼容。此外，纯挖掘操作在整个应用程序生命周期中只占很小的一部分。作者提出了宏观挖掘与数据仓库耦合的相关问题，并提出了大规模数据挖掘应用程序要继续成功部署必须解决的问题。

引用次数: 1

DB-MAN: a distributed database system based on database migration in ATM networks DB-MAN: ATM网络中基于数据库迁移的分布式数据库系统

Proceedings 14th International Conference on Data Engineering

Pub Date : 1998-02-23 DOI: 10.1109/ICDE.1998.655815

T. Hara, K. Harumoto, M. Tsukamoto, S. Nishio

Because of the recent development of network technologies such as ATM (Asynchronous Transfer Mode), broader channel bandwidth is becoming available everywhere in the world wide networks. As one of the new technologies to make good use of such broadband channel, dynamic relocation of databases through networks, which we call database migration, will soon become a powerful and basic database operation of practical use. We discuss our proposal of a distributed database system, DB-MAN (distributed database system based on DataBase Migration in ATM Networks), which takes advantage of database migration in virtual LANs (local area networks) of ATM networks. DB-MAN has two notable mechanisms: a mechanism for selecting the transaction processing method and a mechanism for concurrency control with database migration. The former, is a mechanism which chooses the more efficient method between two transaction processing methods: the conventional method based on the two phase commit protocol and our method employing database migration. The latter is a mechanism to prevent the transaction processing throughput from deteriorating in environments where data contention is a significant factor. Then we show simulation results regarding performance comparison between our proposed system and the conventional distributed database system based on the two phase commit protocol. The obtained results demonstrate that effective use of database migration gives higher performance than that of the conventional system.

由于近年来诸如ATM(异步传输模式)等网络技术的发展，更宽的信道带宽在全球网络中无处不在。作为利用这种宽带信道的新技术之一，通过网络对数据库进行动态迁移，即数据库迁移，将很快成为一种实用的强大的基础数据库操作方法。我们提出了一种基于ATM网络数据库迁移的分布式数据库系统DB-MAN (distributed database system based on database Migration in ATM network)，它利用了ATM网络虚拟局域网(local area network)中的数据库迁移。DB-MAN有两个值得注意的机制:一个是选择事务处理方法的机制，另一个是通过数据库迁移进行并发控制的机制。前者是在传统的基于两阶段提交协议的事务处理方法和采用数据库迁移的事务处理方法之间选择效率更高的一种机制。后者是一种在数据争用是一个重要因素的环境中防止事务处理吞吐量恶化的机制。在此基础上，给出了系统与基于两阶段提交协议的传统分布式数据库系统的性能对比仿真结果。结果表明，有效地利用数据库迁移可以获得比传统系统更高的性能。

{"title":"DB-MAN: a distributed database system based on database migration in ATM networks","authors":"T. Hara, K. Harumoto, M. Tsukamoto, S. Nishio","doi":"10.1109/ICDE.1998.655815","DOIUrl":"https://doi.org/10.1109/ICDE.1998.655815","url":null,"abstract":"Because of the recent development of network technologies such as ATM (Asynchronous Transfer Mode), broader channel bandwidth is becoming available everywhere in the world wide networks. As one of the new technologies to make good use of such broadband channel, dynamic relocation of databases through networks, which we call database migration, will soon become a powerful and basic database operation of practical use. We discuss our proposal of a distributed database system, DB-MAN (distributed database system based on DataBase Migration in ATM Networks), which takes advantage of database migration in virtual LANs (local area networks) of ATM networks. DB-MAN has two notable mechanisms: a mechanism for selecting the transaction processing method and a mechanism for concurrency control with database migration. The former, is a mechanism which chooses the more efficient method between two transaction processing methods: the conventional method based on the two phase commit protocol and our method employing database migration. The latter is a mechanism to prevent the transaction processing throughput from deteriorating in environments where data contention is a significant factor. Then we show simulation results regarding performance comparison between our proposed system and the conventional distributed database system based on the two phase commit protocol. The obtained results demonstrate that effective use of database migration gives higher performance than that of the conventional system.","PeriodicalId":264926,"journal":{"name":"Proceedings 14th International Conference on Data Engineering","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129467762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Persistent applications using generalized redo recovery 使用广义重做恢复的持久应用程序

Proceedings 14th International Conference on Data Engineering

Pub Date : 1998-02-23 DOI: 10.1109/ICDE.1998.655771

D. Lomet

We describe how to recover applications after system crashes using database recovery. Earlier efforts, based on frequent application checkpoints and/or logging values read, are very expensive. We treat application state as a cached object and log application execution as operations in the recovery framework of D. Lomet and M. Tuttle (1995). Logging application execution does not require logging the application state. Further logged application reads are mostly logical operations in which only the data source identity is logged. We describe a cache manager that handles the flush order dependencies introduced by these log operations and a recovery process that restores application state by replaying the application.

我们描述了如何在系统崩溃后使用数据库恢复来恢复应用程序。早期的工作，基于频繁的应用程序检查点和/或日志值读取，是非常昂贵的。在D. Lomet和M. Tuttle(1995)的恢复框架中，我们将应用程序状态视为缓存对象，并将应用程序执行作为操作记录。记录应用程序的执行不需要记录应用程序状态。进一步记录的应用程序读取大多是逻辑操作，其中只记录数据源标识。我们描述了一个缓存管理器，它处理由这些日志操作引入的刷新顺序依赖项，以及一个恢复进程，它通过重放应用程序来恢复应用程序状态。

引用次数: 20

On querying spreadsheets 关于查询电子表格

Proceedings 14th International Conference on Data Engineering

Pub Date : 1998-02-23 DOI: 10.1109/ICDE.1998.655769

L. Lakshmanan, Iyer N. Subramanian, N. Goyal, R. Krishnamurthy

Considers the problem of querying the data in applications such as spreadsheets and word processors. This problem has several motivations from the perspective of data integration, interoperability and OLAP. We provide an architecture for realizing interoperability among such diverse applications and address the challenges that arise specifically in the context of querying data stored in spreadsheet applications. A fundamental challenge is the lack of a well-defined schema. We propose a framework in which the user can specify the layout of data in a spreadsheet, based on his perception of the important concepts underlying that data. Layout specifications can be viewed as the "physical schema" of a spreadsheet. We motivate the concept of an abstract database machine (ADM) that uses the layout specifications to provide a relational view of the data in spreadsheet applications and, similar to a DBMS, supports efficient querying of the spreadsheet data. We develop a methodology for building ADMs for spreadsheets and describe our implementation of an ADM for Microsoft Excel applications, based on the above methodology. Our implementation platform is IBM PCs running Windows NT, Microsoft Office and OLE 2.0. We demonstrate the generality and practicality of our approach by developing a formal characterization of the class of spreadsheets that can be handled in our framework. Our results show that the approach is capable of handling a broad class of naturally occurring spreadsheet applications. This work is part of an office tool integration project.

考虑在电子表格和文字处理器等应用程序中查询数据的问题。从数据集成、互操作性和OLAP的角度来看，这个问题有几个原因。我们提供了一种体系结构，用于实现这些不同应用程序之间的互操作性，并解决了在查询存储在电子表格应用程序中的数据时特别出现的挑战。一个根本的挑战是缺乏定义良好的模式。我们提出了一个框架，在这个框架中，用户可以根据他对数据背后重要概念的理解来指定电子表格中数据的布局。布局规范可以看作是电子表格的“物理模式”。我们提出了抽象数据库机器(ADM)的概念，它使用布局规范提供电子表格应用程序中数据的关系视图，并且与DBMS类似，支持对电子表格数据的有效查询。我们开发了一种为电子表格构建ADM的方法，并描述了基于上述方法为Microsoft Excel应用程序实现ADM的方法。我们的实现平台是运行Windows NT、Microsoft Office和OLE 2.0的IBM个人电脑。我们通过开发可在我们的框架中处理的电子表格类的正式特征来演示我们方法的通用性和实用性。我们的结果表明，该方法能够处理大量自然产生的电子表格应用程序。这项工作是办公工具集成项目的一部分。

{"title":"On querying spreadsheets","authors":"L. Lakshmanan, Iyer N. Subramanian, N. Goyal, R. Krishnamurthy","doi":"10.1109/ICDE.1998.655769","DOIUrl":"https://doi.org/10.1109/ICDE.1998.655769","url":null,"abstract":"Considers the problem of querying the data in applications such as spreadsheets and word processors. This problem has several motivations from the perspective of data integration, interoperability and OLAP. We provide an architecture for realizing interoperability among such diverse applications and address the challenges that arise specifically in the context of querying data stored in spreadsheet applications. A fundamental challenge is the lack of a well-defined schema. We propose a framework in which the user can specify the layout of data in a spreadsheet, based on his perception of the important concepts underlying that data. Layout specifications can be viewed as the \"physical schema\" of a spreadsheet. We motivate the concept of an abstract database machine (ADM) that uses the layout specifications to provide a relational view of the data in spreadsheet applications and, similar to a DBMS, supports efficient querying of the spreadsheet data. We develop a methodology for building ADMs for spreadsheets and describe our implementation of an ADM for Microsoft Excel applications, based on the above methodology. Our implementation platform is IBM PCs running Windows NT, Microsoft Office and OLE 2.0. We demonstrate the generality and practicality of our approach by developing a formal characterization of the class of spreadsheets that can be handled in our framework. Our results show that the approach is capable of handling a broad class of naturally occurring spreadsheet applications. This work is part of an office tool integration project.","PeriodicalId":264926,"journal":{"name":"Proceedings 14th International Conference on Data Engineering","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131003815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

The Alps at your fingertips: virtual reality and geoinformation systems 触手可及的阿尔卑斯山:虚拟现实和地理信息系统

Proceedings 14th International Conference on Data Engineering

Pub Date : 1998-02-23 DOI: 10.1109/ICDE.1998.655818

R. Pajarola, T. Ohler, P. Stucki, Kornel Szabo, P. Widmayer

Advocates a desktop virtual reality (VR) interface to a geographic information system (GIS). The navigational capability to explore large topographic scenes is a powerful metaphor and a natural way of interacting with a GIS. VR systems succeed in providing visual realism and real-time navigation and interaction, but fail to cope with very large amounts of data and to provide the general functionality of information systems. We suggest a way to overcome these problems. We describe a prototype system, called ViRGIS (Virtual Reality GIS), that integrates two system platforms: a client that runs the VR component interacts via a (local or wide area) network with a server that runs an object-oriented database containing geographic data. For the purpose of accessing data efficiently, we describe how to integrate a geometric index into the database, and how to perform the operations that are requested in a real-time trip through the virtual world.

提倡为地理信息系统(GIS)提供桌面虚拟现实(VR)接口。探索大型地形场景的导航能力是一个强大的隐喻，也是与GIS交互的自然方式。VR系统在提供视觉真实感、实时导航和交互方面取得了成功，但无法处理大量数据，也无法提供信息系统的一般功能。我们提出了一种克服这些问题的方法。我们描述了一个原型系统，称为ViRGIS(虚拟现实地理信息系统)，它集成了两个系统平台:运行VR组件的客户端通过(本地或广域网)网络与运行包含地理数据的面向对象数据库的服务器进行交互。为了有效地访问数据，我们描述了如何将几何索引集成到数据库中，以及如何在虚拟世界的实时旅行中执行所要求的操作。

引用次数: 23

Point-versus interval-based temporal data models 基于点与间隔的时间数据模型

Proceedings 14th International Conference on Data Engineering

Pub Date : 1998-02-23 DOI: 10.1109/ICDE.1998.655777

Michael H. Böhlen, Renato Busatto, Christian S. Jensen

The association of timestamps with various data items such as tuples or attribute values is fundamental to the management of time varying information. Using intervals in timestamps, as do most data models, leaves a data model with a variety of choices for giving a meaning to timestamps. Specifically, some such data models claim to be point based while other data models claim to be interval based. The meaning chosen for timestamps is important it has a pervasive effect on most aspects of a data model, including database design, a variety of query language properties, and query processing techniques, e.g., the availability of query optimization opportunities. The paper precisely defines the notions of point based and interval based temporal data models, thus providing a new formal basis for characterizing temporal data models and obtaining new insights into the properties of their query languages. Queries in point based models treat snapshot equivalent argument relations identically. This renders point based models insensitive to coalescing. In contrast, queries in interval based models give significance to the actual intervals used in the timestamps, thus generally treating non identical, but possibly snapshot equivalent relations differently. The paper identifies the notion of time fragment preservation as the essential defining property of an interval based data model.

时间戳与各种数据项(如元组或属性值)的关联是管理时变信息的基础。与大多数数据模型一样，在时间戳中使用间隔，使得数据模型有多种选择来赋予时间戳意义。具体来说，一些这样的数据模型声称是基于点的，而另一些数据模型声称是基于区间的。为时间戳选择的含义很重要，它对数据模型的大多数方面都有广泛的影响，包括数据库设计、各种查询语言属性和查询处理技术，例如查询优化机会的可用性。本文精确地定义了基于点的时态数据模型和基于区间的时态数据模型的概念，从而为表征时态数据模型提供了新的形式化基础，并对其查询语言的性质有了新的认识。基于点的模型中的查询以相同的方式处理快照等效参数关系。这使得基于点的模型对合并不敏感。相反，基于间隔的模型中的查询对时间戳中使用的实际间隔具有重要意义，因此通常以不同的方式处理不相同但可能是快照等效的关系。本文将时间片段保存的概念确定为基于区间的数据模型的基本定义属性。

{"title":"Point-versus interval-based temporal data models","authors":"Michael H. Böhlen, Renato Busatto, Christian S. Jensen","doi":"10.1109/ICDE.1998.655777","DOIUrl":"https://doi.org/10.1109/ICDE.1998.655777","url":null,"abstract":"The association of timestamps with various data items such as tuples or attribute values is fundamental to the management of time varying information. Using intervals in timestamps, as do most data models, leaves a data model with a variety of choices for giving a meaning to timestamps. Specifically, some such data models claim to be point based while other data models claim to be interval based. The meaning chosen for timestamps is important it has a pervasive effect on most aspects of a data model, including database design, a variety of query language properties, and query processing techniques, e.g., the availability of query optimization opportunities. The paper precisely defines the notions of point based and interval based temporal data models, thus providing a new formal basis for characterizing temporal data models and obtaining new insights into the properties of their query languages. Queries in point based models treat snapshot equivalent argument relations identically. This renders point based models insensitive to coalescing. In contrast, queries in interval based models give significance to the actual intervals used in the timestamps, thus generally treating non identical, but possibly snapshot equivalent relations differently. The paper identifies the notion of time fragment preservation as the essential defining property of an interval based data model.","PeriodicalId":264926,"journal":{"name":"Proceedings 14th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129580338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 112

Performance analysis of parallel hash join algorithms on a distributed shared memory machine implementation and evaluation on HP exemplar SPP 1600 并行哈希连接算法在分布式共享内存机上的性能分析，在HP范例spp1600上的实现与评估

Proceedings 14th International Conference on Data Engineering

Pub Date : 1998-02-23 DOI: 10.1109/ICDE.1998.655761

M. Nakano, H. Imai, M. Kitsuregawa

The distributed shared memory (DSM) architecture is considered to be one of the most likely parallel computing environment candidate for the near future because of its ease of system scalability and facilitation for parallel programming. However, a naive program based on shared memory execution on a DSM machine often deteriorates performance, because of the overhead involved for maintaining cache coherency particularly with frequent remote memory accesses. We show that careful buffer management of parallel join processing on DSM can produce considerable performance improvements in comparison with a naive implementation. We propose four buffer management strategies for parallel hash join processing on the DSM architecture and actually implement them on the HP Exemplar SPP 1600. The basic strategy is to begin with the hash join algorithm for the shared everything architecture and then to consider the memory locality of DSM by distributing the hash table and data pool buffers among the nodes. The results of four buffering strategies are analyzed in detail. Consequently, we can conclude that, in order to achieve high performance on a DSM machine, our buffer management strategy in which the memory access pattern is extracted and buffers are allocated in the local memory of nodes to minimize memory access cost is very efficient.

分布式共享内存(DSM)体系结构被认为是在不久的将来最有可能的并行计算环境候选之一，因为它易于系统可伸缩性和促进并行编程。然而，在DSM机器上基于共享内存执行的原始程序通常会降低性能，因为维护缓存一致性涉及开销，特别是在频繁的远程内存访问时。我们表明，与简单的实现相比，对DSM上的并行连接处理进行仔细的缓冲区管理可以产生相当大的性能改进。我们提出了四种缓冲管理策略，用于DSM架构上的并行哈希连接处理，并在HP Exemplar SPP 1600上实际实现。基本策略是从共享一切架构的哈希连接算法开始，然后通过在节点之间分布哈希表和数据池缓冲区来考虑DSM的内存局部性。详细分析了四种缓冲策略的效果。因此，我们可以得出结论，为了在DSM机器上实现高性能，我们的缓冲区管理策略非常有效，其中提取内存访问模式并在节点的本地内存中分配缓冲区以最小化内存访问成本。

{"title":"Performance analysis of parallel hash join algorithms on a distributed shared memory machine implementation and evaluation on HP exemplar SPP 1600","authors":"M. Nakano, H. Imai, M. Kitsuregawa","doi":"10.1109/ICDE.1998.655761","DOIUrl":"https://doi.org/10.1109/ICDE.1998.655761","url":null,"abstract":"The distributed shared memory (DSM) architecture is considered to be one of the most likely parallel computing environment candidate for the near future because of its ease of system scalability and facilitation for parallel programming. However, a naive program based on shared memory execution on a DSM machine often deteriorates performance, because of the overhead involved for maintaining cache coherency particularly with frequent remote memory accesses. We show that careful buffer management of parallel join processing on DSM can produce considerable performance improvements in comparison with a naive implementation. We propose four buffer management strategies for parallel hash join processing on the DSM architecture and actually implement them on the HP Exemplar SPP 1600. The basic strategy is to begin with the hash join algorithm for the shared everything architecture and then to consider the memory locality of DSM by distributing the hash table and data pool buffers among the nodes. The results of four buffering strategies are analyzed in detail. Consequently, we can conclude that, in order to achieve high performance on a DSM machine, our buffer management strategy in which the memory access pattern is extracted and buffers are allocated in the local memory of nodes to minimize memory access cost is very efficient.","PeriodicalId":264926,"journal":{"name":"Proceedings 14th International Conference on Data Engineering","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128561130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Methodical restructuring of complex workflow activities 复杂工作流活动的系统重构

Proceedings 14th International Conference on Data Engineering

Pub Date : 1998-02-23 DOI: 10.1109/ICDE.1998.655797

Ling Liu, C. Pu

We describe a family of activity-split and activity-join operations with a notion of validity. The key idea of introducing the set of activity-split and activity-join operations is to allow users to restructure ongoing activities in anticipation of uncertainty so that any significant performance loss due to unexpected unavailablity or delay of shared resources can be avoided or reduced through release of early committed resources or transferring ownership of uncommitted resources. To guarantee the correctness of new activities generated by activity-split or activity-join operations, we define the notion of validity of activity restructuring operations and identify the cases where the correctness is ensured and the cases where activity-split or activity-join are illegal due to the inconsistency incurred.

我们描述了一系列具有有效性概念的活动分割和活动连接操作。引入一组活动分割和活动连接操作的关键思想是允许用户在预期不确定性的情况下重构正在进行的活动，以便通过释放早期提交的资源或转移未提交资源的所有权来避免或减少由于共享资源的意外不可用或延迟而导致的任何重大性能损失。为了保证活动分割或活动连接操作生成的新活动的正确性，我们定义了活动重组操作有效性的概念，并确定了确保正确性的情况，以及由于不一致而导致活动分割或活动连接不合法的情况。

引用次数: 39

Representing and querying changes in semistructured data 表示和查询半结构化数据中的更改

Proceedings 14th International Conference on Data Engineering

Pub Date : 1998-02-23 DOI: 10.1109/ICDE.1998.655752

S. Chawathe, S. Abiteboul, J. Widom

Semistructured data may be irregular and incomplete and does not necessarily conform to a fixed schema. As with structured data, it is often desirable to maintain a history of changes to data, and to query over both the data and the changes. Representing and querying changes in semistructured data is more difficult than in structured data due to the irregularity and lack of schema. We present a model for representing changes in semistructured data and a language for querying over these changes. An important feature of our approach is that we represent and query changes directly as annotations on the affected data, instead of indirectly as the difference between database states. We describe the implementation of our model and query language. We also describe the design and implementation of a query subscription service that permits users to subscribe to changes in semistructured information sources.

半结构化数据可能是不规则和不完整的，并且不一定符合固定的模式。与结构化数据一样，通常需要维护数据更改的历史记录，并对数据和更改进行查询。由于不规则性和缺乏模式，半结构化数据中的变化表示和查询比结构化数据中的变化更加困难。我们提出了一个表示半结构化数据变化的模型和一种对这些变化进行查询的语言。我们的方法的一个重要特性是，我们直接将更改表示为受影响数据上的注释，而不是间接地表示为数据库状态之间的差异。我们描述了模型和查询语言的实现。我们还描述了查询订阅服务的设计和实现，该服务允许用户订阅半结构化信息源中的更改。

引用次数: 147

Employing intelligent agents for knowledge discovery 利用智能代理进行知识发现

Proceedings 14th International Conference on Data Engineering

Pub Date : 1998-02-23 DOI: 10.1109/ICDE.1998.655764

Earl Stahl

Summary form only given, substantially as follows. Compares and contrasts alternative approaches currently in use as methodologies for 'data mining' and knowledge discovery. Specifically, the author focuses on the use of intelligent agents for mining important relationships and patterns, as well as the value of knowledge induction as a basis for quick and accurate discovery of vital information. Beginning with a quick tutorial on how intelligent agent architectures fit within the current n-tier network infrastructure, the author discusses in detail how to build a network or topology of agents representing the important relationships that support the research and discovery efforts.

仅给出摘要形式，内容大致如下。比较和对比目前用作“数据挖掘”和知识发现方法的替代方法。具体来说，作者着重于使用智能代理来挖掘重要的关系和模式，以及知识归纳作为快速准确发现重要信息的基础的价值。从一个关于智能代理架构如何适应当前n层网络基础设施的快速教程开始，作者详细讨论了如何构建代表支持研究和发现工作的重要关系的代理网络或拓扑。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings 14th International Conference on Data Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀