有效维护来自深度网站的连续查询结果档案中的常用键

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI:10.1109/ICDE.2011.5767891

Fajar Ardian, S. Bhowmick

{"title":"有效维护来自深度网站的连续查询结果档案中的常用键","authors":"Fajar Ardian, S. Bhowmick","doi":"10.1109/ICDE.2011.5767891","DOIUrl":null,"url":null,"abstract":"In many real-world applications, it is important to create a local archive containing versions of structured results of continuous queries (queries that are evaluated periodically) submitted to autonomous database-driven Web sites (e.g., deep Web). Such history of digital information is a potential gold mine for all kinds of scientific, media and business analysts. An important task in this context is to maintain the set of common keys of the underlying archived results as they play pivotal role in data modeling and analysis, query processing, and entity tracking. A set of attributes in a structured data is a common key iff it is a key for all versions of the data in the archive. Due to the data-driven nature of key discovery from the archive, unlike traditional keys, the common keys are not temporally invariant. That is, keys identified in one version may be different from those in another version. Hence, in this paper, we propose a novel technique to maintain common keys in an archive containing a sequence of versions of evolutionary continuous query results. Given the current common key set of existing versions and a new snapshot, we propose an algorithm called COKE (COmmon KEy maintenancE) which incrementally maintains the common key set without undertaking expensive minimal keys computation from the new snapshot. Furthermore, it exploits certain interesting evolutionary features of real-world data to further reduce the computation cost. Our exhaustive empirical study demonstrates that COKE has excellent performance and is orders of magnitude faster than a baseline approach for maintenance of common keys.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Efficient maintenance of common keys in archives of continuous query results from deep websites\",\"authors\":\"Fajar Ardian, S. Bhowmick\",\"doi\":\"10.1109/ICDE.2011.5767891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many real-world applications, it is important to create a local archive containing versions of structured results of continuous queries (queries that are evaluated periodically) submitted to autonomous database-driven Web sites (e.g., deep Web). Such history of digital information is a potential gold mine for all kinds of scientific, media and business analysts. An important task in this context is to maintain the set of common keys of the underlying archived results as they play pivotal role in data modeling and analysis, query processing, and entity tracking. A set of attributes in a structured data is a common key iff it is a key for all versions of the data in the archive. Due to the data-driven nature of key discovery from the archive, unlike traditional keys, the common keys are not temporally invariant. That is, keys identified in one version may be different from those in another version. Hence, in this paper, we propose a novel technique to maintain common keys in an archive containing a sequence of versions of evolutionary continuous query results. Given the current common key set of existing versions and a new snapshot, we propose an algorithm called COKE (COmmon KEy maintenancE) which incrementally maintains the common key set without undertaking expensive minimal keys computation from the new snapshot. Furthermore, it exploits certain interesting evolutionary features of real-world data to further reduce the computation cost. Our exhaustive empirical study demonstrates that COKE has excellent performance and is orders of magnitude faster than a baseline approach for maintenance of common keys.\",\"PeriodicalId\":332374,\"journal\":{\"name\":\"2011 IEEE 27th International Conference on Data Engineering\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 27th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2011.5767891\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 27th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2011.5767891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

在许多实际应用程序中，创建包含提交给自主数据库驱动的Web站点(例如深度Web)的连续查询(定期评估的查询)的结构化结果版本的本地存档非常重要。对于各种科学、媒体和商业分析人士来说，这样的数字信息历史是一座潜在的金矿。在此上下文中，一个重要的任务是维护底层归档结果的公共键集，因为它们在数据建模和分析、查询处理和实体跟踪中起着关键作用。如果结构化数据中的一组属性是归档中所有版本数据的一个键，那么它就是一个公共键。由于从存档中发现密钥的数据驱动性质，与传统密钥不同，公共密钥不是暂时不变的。也就是说，一个版本中标识的键可能与另一个版本中的键不同。因此，在本文中，我们提出了一种新的技术来维护包含进化连续查询结果的一系列版本的存档中的公共键。给定现有版本的当前公共密钥集和一个新的快照，我们提出了一种称为COKE (common key maintenancE)的算法，该算法增量地维护公共密钥集，而无需从新快照进行昂贵的最小密钥计算。此外，它还利用了现实世界数据的一些有趣的演化特征来进一步降低计算成本。我们详尽的实证研究表明，COKE具有出色的性能，并且在维护公共键方面比基线方法快几个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Efficient maintenance of common keys in archives of continuous query results from deep websites

In many real-world applications, it is important to create a local archive containing versions of structured results of continuous queries (queries that are evaluated periodically) submitted to autonomous database-driven Web sites (e.g., deep Web). Such history of digital information is a potential gold mine for all kinds of scientific, media and business analysts. An important task in this context is to maintain the set of common keys of the underlying archived results as they play pivotal role in data modeling and analysis, query processing, and entity tracking. A set of attributes in a structured data is a common key iff it is a key for all versions of the data in the archive. Due to the data-driven nature of key discovery from the archive, unlike traditional keys, the common keys are not temporally invariant. That is, keys identified in one version may be different from those in another version. Hence, in this paper, we propose a novel technique to maintain common keys in an archive containing a sequence of versions of evolutionary continuous query results. Given the current common key set of existing versions and a new snapshot, we propose an algorithm called COKE (COmmon KEy maintenancE) which incrementally maintains the common key set without undertaking expensive minimal keys computation from the new snapshot. Furthermore, it exploits certain interesting evolutionary features of real-world data to further reduce the computation cost. Our exhaustive empirical study demonstrates that COKE has excellent performance and is orders of magnitude faster than a baseline approach for maintenance of common keys.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE 27th International Conference on Data Engineering

自引率

0.00%

发文量

期刊最新文献

Advanced search, visualization and tagging of sensor metadata Bidirectional mining of non-redundant recurrent rules from a sequence database Web-scale information extraction with vertex Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins Dynamic prioritization of database queries