首页 > 最新文献

21st International Conference on Data Engineering (ICDE'05)最新文献

英文 中文
Adaptive overlapped declustering: a highly available data-placement method balancing access load and space utilization 自适应重叠聚类:一种平衡访问负载和空间利用率的高可用数据放置方法
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.16
Akitsugu Watanabe, H. Yokota
This paper proposes a new data-placement method named adaptive overlapped declustering, which can be applied to a parallel storage system using a value range partitioning-based distributed directory and primary-backup data replication, to improve the space utilization by balancing their access loads. The proposed method reduces data skews generated by data migration for balancing access load. While some data-placement methods capable of balancing access load or reducing data skew have been proposed, both requirements satisfied simultaneously. The proposed method also improves the reliability and availability of the system because it reduces recovery time for damaged backups after a disk failure. The method achieves this acceleration by reducing a large amount of network communications and disk I/O. Mathematical analysis shows the efficiency of space utilization under skewed access workloads. Queuing simulations demonstrated that the proposed method halves backup restoration time, compared with the traditional chained declustering method.
本文提出了一种新的数据放置方法——自适应重叠解簇,该方法可以应用于基于值范围分区的分布式目录和主备份数据复制的并行存储系统,通过平衡它们的访问负载来提高空间利用率。该方法减少了由于数据迁移而产生的数据倾斜,实现了访问负载均衡。虽然已经提出了一些能够平衡访问负载或减少数据倾斜的数据放置方法,但同时满足这两种要求。该方法还提高了系统的可靠性和可用性,因为它减少了磁盘故障后损坏备份的恢复时间。该方法通过减少大量的网络通信和磁盘I/O来实现这种加速。数学分析显示了在倾斜访问工作负载下的空间利用效率。排队仿真结果表明,与传统的链式聚类方法相比,该方法的备份恢复时间缩短了一半。
{"title":"Adaptive overlapped declustering: a highly available data-placement method balancing access load and space utilization","authors":"Akitsugu Watanabe, H. Yokota","doi":"10.1109/ICDE.2005.16","DOIUrl":"https://doi.org/10.1109/ICDE.2005.16","url":null,"abstract":"This paper proposes a new data-placement method named adaptive overlapped declustering, which can be applied to a parallel storage system using a value range partitioning-based distributed directory and primary-backup data replication, to improve the space utilization by balancing their access loads. The proposed method reduces data skews generated by data migration for balancing access load. While some data-placement methods capable of balancing access load or reducing data skew have been proposed, both requirements satisfied simultaneously. The proposed method also improves the reliability and availability of the system because it reduces recovery time for damaged backups after a disk failure. The method achieves this acceleration by reducing a large amount of network communications and disk I/O. Mathematical analysis shows the efficiency of space utilization under skewed access workloads. Queuing simulations demonstrated that the proposed method halves backup restoration time, compared with the traditional chained declustering method.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115012446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Corpus-based schema matching 基于语料库的模式匹配
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.39
J. Madhavan, P. Bernstein, A. Doan, A. Halevy
Schema matching is the problem of identifying corresponding elements in different schemas. Discovering these correspondences or matches is inherently difficult to automate. Past solutions have proposed a principled combination of multiple algorithms. However, these solutions sometimes perform rather poorly due to the lack of sufficient evidence in the schemas being matched. In this paper we show how a corpus of schemas and mappings can be used to augment the evidence about the schemas being matched, so they can be matched better. Such a corpus typically contains multiple schemas that model similar concepts and hence enables us to learn variations in the elements and their properties. We exploit such a corpus in two ways. First, we increase the evidence about each element being matched by including evidence from similar elements in the corpus. Second, we learn statistics about elements and their relationships and use them to infer constraints that we use to prune candidate mappings. We also describe how to use known mappings to learn the importance of domain and generic constraints. We present experimental results that demonstrate corpus-based matching outperforms direct matching (without the benefit of a corpus) in multiple domains.
模式匹配是在不同模式中识别相应元素的问题。发现这些对应或匹配本身就很难实现自动化。过去的解决方案提出了多种算法的原则组合。然而,由于在匹配的模式中缺乏足够的证据,这些解决方案有时执行得相当差。在本文中,我们展示了如何使用模式和映射的语料库来增加关于正在匹配的模式的证据,以便更好地匹配它们。这样的语料库通常包含多个模式,这些模式对相似的概念进行建模,从而使我们能够了解元素及其属性的变化。我们以两种方式利用这样的语料库。首先,我们通过包含语料库中相似元素的证据来增加每个元素匹配的证据。其次,我们学习关于元素及其关系的统计信息,并使用它们来推断约束,我们使用这些约束来修剪候选映射。我们还描述了如何使用已知映射来了解域约束和泛型约束的重要性。我们提出的实验结果表明,基于语料库的匹配在多个领域优于直接匹配(没有语料库的好处)。
{"title":"Corpus-based schema matching","authors":"J. Madhavan, P. Bernstein, A. Doan, A. Halevy","doi":"10.1109/ICDE.2005.39","DOIUrl":"https://doi.org/10.1109/ICDE.2005.39","url":null,"abstract":"Schema matching is the problem of identifying corresponding elements in different schemas. Discovering these correspondences or matches is inherently difficult to automate. Past solutions have proposed a principled combination of multiple algorithms. However, these solutions sometimes perform rather poorly due to the lack of sufficient evidence in the schemas being matched. In this paper we show how a corpus of schemas and mappings can be used to augment the evidence about the schemas being matched, so they can be matched better. Such a corpus typically contains multiple schemas that model similar concepts and hence enables us to learn variations in the elements and their properties. We exploit such a corpus in two ways. First, we increase the evidence about each element being matched by including evidence from similar elements in the corpus. Second, we learn statistics about elements and their relationships and use them to infer constraints that we use to prune candidate mappings. We also describe how to use known mappings to learn the importance of domain and generic constraints. We present experimental results that demonstrate corpus-based matching outperforms direct matching (without the benefit of a corpus) in multiple domains.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123330488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 435
Personalized queries under a generalized preference model 广义偏好模型下的个性化查询
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.106
G. Koutrika, Y. Ioannidis
Query personalization is the process of dynamically enhancing a query with related user preferences stored in a user profile with the aim of providing personalized answers. The underlying idea is that different users may find different things relevant to a search due to different preferences. Essential ingredients of query personalization are: (a) a model for representing and storing preferences in user profiles, and (b) algorithms for the generation of personalized answers using stored preferences. Modeling the plethora of preference types is a challenge. In this paper, we present a preference model that combines expressivity and concision. In addition, we provide efficient algorithms for the selection of preferences related to a query, and an algorithm for the progressive generation of personalized results, which are ranked based on user interest. Several classes of ranking functions are provided for this purpose. We present results of experiments both synthetic and with real users (a) demonstrating the efficiency of our algorithms, (b) showing the benefits of query personalization, and (c) providing insight as to the appropriateness of the proposed ranking functions.
查询个性化是使用存储在用户配置文件中的相关用户首选项动态增强查询的过程,目的是提供个性化的答案。其基本思想是,不同的用户可能会由于不同的偏好而找到与搜索相关的不同内容。查询个性化的基本成分是:(a)在用户配置文件中表示和存储首选项的模型,以及(b)使用存储的首选项生成个性化答案的算法。对过多的偏好类型进行建模是一项挑战。在本文中,我们提出了一个结合表达性和简洁性的偏好模型。此外,我们还提供了用于选择与查询相关的偏好的高效算法,以及用于逐步生成个性化结果的算法,这些结果基于用户兴趣进行排名。为此目的提供了几类排序函数。我们展示了合成和真实用户的实验结果(a)展示了我们算法的效率,(b)展示了查询个性化的好处,以及(c)提供了对所提议的排名函数的适当性的见解。
{"title":"Personalized queries under a generalized preference model","authors":"G. Koutrika, Y. Ioannidis","doi":"10.1109/ICDE.2005.106","DOIUrl":"https://doi.org/10.1109/ICDE.2005.106","url":null,"abstract":"Query personalization is the process of dynamically enhancing a query with related user preferences stored in a user profile with the aim of providing personalized answers. The underlying idea is that different users may find different things relevant to a search due to different preferences. Essential ingredients of query personalization are: (a) a model for representing and storing preferences in user profiles, and (b) algorithms for the generation of personalized answers using stored preferences. Modeling the plethora of preference types is a challenge. In this paper, we present a preference model that combines expressivity and concision. In addition, we provide efficient algorithms for the selection of preferences related to a query, and an algorithm for the progressive generation of personalized results, which are ranked based on user interest. Several classes of ranking functions are provided for this purpose. We present results of experiments both synthetic and with real users (a) demonstrating the efficiency of our algorithms, (b) showing the benefits of query personalization, and (c) providing insight as to the appropriateness of the proposed ranking functions.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124599338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 133
IMAX: incremental maintenance of schema-based XML statistics IMAX:基于模式的XML统计信息的增量维护
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.75
Maya Ramanath, L. Zhang, J. Freire, J. Haritsa
Current approaches for estimating the cardinality of XML queries are applicable to a static scenario wherein the underlying XML data does not change subsequent to the collection of statistics on the repository. However, in practice, many XML-based applications are dynamic and involve frequent updates to the data. In this paper, we investigate efficient strategies for incrementally maintaining statistical summaries as and when updates are applied to the data. Specifically, we propose algorithms that handle both the addition of new documents as well as random insertions in the existing document trees. We also show, through a detailed performance evaluation, that our incremental techniques are significantly faster than the naive recomputation approach; and that estimation accuracy can be maintained even with a fixed memory budget.
目前估计XML查询基数的方法适用于静态场景,其中底层XML数据在存储库的统计信息收集之后不会发生变化。然而,在实践中,许多基于xml的应用程序是动态的,并且涉及对数据的频繁更新。在本文中,我们研究了当更新应用于数据时增量式维护统计摘要的有效策略。具体来说,我们提出的算法既可以处理新文档的添加,也可以处理现有文档树中的随机插入。我们还通过详细的性能评估表明,我们的增量技术比单纯的重新计算方法要快得多;即使在内存预算固定的情况下,也可以保持估计的准确性。
{"title":"IMAX: incremental maintenance of schema-based XML statistics","authors":"Maya Ramanath, L. Zhang, J. Freire, J. Haritsa","doi":"10.1109/ICDE.2005.75","DOIUrl":"https://doi.org/10.1109/ICDE.2005.75","url":null,"abstract":"Current approaches for estimating the cardinality of XML queries are applicable to a static scenario wherein the underlying XML data does not change subsequent to the collection of statistics on the repository. However, in practice, many XML-based applications are dynamic and involve frequent updates to the data. In this paper, we investigate efficient strategies for incrementally maintaining statistical summaries as and when updates are applied to the data. Specifically, we propose algorithms that handle both the addition of new documents as well as random insertions in the existing document trees. We also show, through a detailed performance evaluation, that our incremental techniques are significantly faster than the naive recomputation approach; and that estimation accuracy can be maintained even with a fixed memory budget.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128888345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Bloom filter-based XML packets filtering for millions of path queries 基于Bloom过滤器的XML包过滤数百万个路径查询
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.26
Xueqing Gong, Ying Yan, Weining Qian, Aoying Zhou
The filtering of XML data is the basis of many complex applications. Lots of algorithms have been proposed to solve this problem. One important challenge is that the number of path queries is huge. It is necessary to take an efficient data structure representing path queries. Another challenge is that these path queries usually vary with time. The maintenance of path queries determines the flexibility and capacity of a filtering system. In this paper, we introduce a novel approximate method for XML data filtering, which uses Bloom filters representing path queries. In this method, millions of path queries can be stored efficiently At the same time, it is easy to deal with the change of these path queries. To improve the filtering performance, we introduce a new data structure, Prefix Filters, to decrease the number of candidate paths. Experiments show that our Bloom filter-based method takes less time to build routing table than automaton-based method. And our method has a good performance with acceptable false positive when filtering XML packets of relatively small depth with millions of path queries.
XML数据的过滤是许多复杂应用程序的基础。为了解决这个问题,已经提出了许多算法。一个重要的挑战是路径查询的数量非常大。有必要采用一种表示路径查询的有效数据结构。另一个挑战是,这些路径查询通常随时间而变化。路径查询的维护决定了过滤系统的灵活性和容量。本文介绍了一种新的XML数据过滤近似方法,该方法使用Bloom过滤器表示路径查询。该方法可以高效地存储数以百万计的路径查询,同时易于处理这些路径查询的变化。为了提高过滤性能,我们引入了一种新的数据结构,前缀过滤器,以减少候选路径的数量。实验表明,基于Bloom过滤器的路由表构建方法比基于自动机的路由表构建方法耗时更短。在使用数百万个路径查询过滤相对深度较小的XML数据包时,我们的方法具有良好的性能,并且具有可接受的误报。
{"title":"Bloom filter-based XML packets filtering for millions of path queries","authors":"Xueqing Gong, Ying Yan, Weining Qian, Aoying Zhou","doi":"10.1109/ICDE.2005.26","DOIUrl":"https://doi.org/10.1109/ICDE.2005.26","url":null,"abstract":"The filtering of XML data is the basis of many complex applications. Lots of algorithms have been proposed to solve this problem. One important challenge is that the number of path queries is huge. It is necessary to take an efficient data structure representing path queries. Another challenge is that these path queries usually vary with time. The maintenance of path queries determines the flexibility and capacity of a filtering system. In this paper, we introduce a novel approximate method for XML data filtering, which uses Bloom filters representing path queries. In this method, millions of path queries can be stored efficiently At the same time, it is easy to deal with the change of these path queries. To improve the filtering performance, we introduce a new data structure, Prefix Filters, to decrease the number of candidate paths. Experiments show that our Bloom filter-based method takes less time to build routing table than automaton-based method. And our method has a good performance with acceptable false positive when filtering XML packets of relatively small depth with millions of path queries.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128522271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Exploiting correlated attributes in acquisitional query processing 在获取查询处理中利用相关属性
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.63
A. Deshpande, Carlos Guestrin, W. Hong, S. Madden
Sensor networks and other distributed information systems (such as the Web) must frequently access data that has a high per-attribute acquisition cost, in terms of energy, latency, or computational resources. When executing queries that contain several predicates over such expensive attributes, we observe that it can be beneficial to use correlations to automatically introduce low-cost attributes whose observation will allow the query processor to better estimate die selectivity of these expensive predicates. In particular, we show how to build conditional plans that branch into one or more sub-plans, each with a different ordering for the expensive query predicates, based on the runtime observation of low-cost attributes. We frame the problem of constructing the optimal conditional plan for a given user query and set of candidate low-cost attributes as an optimization problem. We describe an exponential time algorithm for finding such optimal plans, and describe a polynomial-time heuristic for identifying conditional plans that perform well in practice. We also show how to compactly model conditional probability distributions needed to identify correlations and build these plans. We evaluate our algorithms against several real-world sensor-network data sets, showing several-times performance increases for a variety of queries versus traditional optimization techniques.
传感器网络和其他分布式信息系统(如Web)必须频繁访问在能量、延迟或计算资源方面具有高每个属性获取成本的数据。当在这些昂贵的属性上执行包含多个谓词的查询时,我们观察到使用相关性来自动引入低成本属性可能是有益的,这些属性的观察结果将允许查询处理器更好地估计这些昂贵谓词的die选择性。特别是,我们将展示如何构建条件计划,将其分支为一个或多个子计划,每个子计划基于对低成本属性的运行时观察,对昂贵的查询谓词使用不同的顺序。我们将为给定用户查询和候选低成本属性集构造最优条件计划的问题定义为优化问题。我们描述了一个指数时间算法来寻找这样的最优计划,并描述了一个多项式时间启发式算法来识别在实践中表现良好的条件计划。我们还展示了如何对识别相关性和构建这些计划所需的条件概率分布进行紧凑建模。我们针对几个真实世界的传感器网络数据集评估了我们的算法,显示了与传统优化技术相比,各种查询的性能提高了几倍。
{"title":"Exploiting correlated attributes in acquisitional query processing","authors":"A. Deshpande, Carlos Guestrin, W. Hong, S. Madden","doi":"10.1109/ICDE.2005.63","DOIUrl":"https://doi.org/10.1109/ICDE.2005.63","url":null,"abstract":"Sensor networks and other distributed information systems (such as the Web) must frequently access data that has a high per-attribute acquisition cost, in terms of energy, latency, or computational resources. When executing queries that contain several predicates over such expensive attributes, we observe that it can be beneficial to use correlations to automatically introduce low-cost attributes whose observation will allow the query processor to better estimate die selectivity of these expensive predicates. In particular, we show how to build conditional plans that branch into one or more sub-plans, each with a different ordering for the expensive query predicates, based on the runtime observation of low-cost attributes. We frame the problem of constructing the optimal conditional plan for a given user query and set of candidate low-cost attributes as an optimization problem. We describe an exponential time algorithm for finding such optimal plans, and describe a polynomial-time heuristic for identifying conditional plans that perform well in practice. We also show how to compactly model conditional probability distributions needed to identify correlations and build these plans. We evaluate our algorithms against several real-world sensor-network data sets, showing several-times performance increases for a variety of queries versus traditional optimization techniques.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129358476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 138
Schema matching using duplicates 使用重复项进行模式匹配
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.126
Alexander Bilke, Felix Naumann
Most data integration applications require a matching between the schemas of the respective data sets. We show how the existence of duplicates within these data sets can be exploited to automatically identify matching attributes. We describe an algorithm that first discovers duplicates among data sets with unaligned schemas and then uses these duplicates to perform schema matching between schemas with opaque column names. Discovering duplicates among data sets with unaligned schemas is more difficult than in the usual setting, because it is not clear which fields in one object should be compared with which fields in the other. We have developed a new algorithm that efficiently finds the most likely duplicates in such a setting. Now, our schema matching algorithm is able to identify corresponding attributes by comparing data values within those duplicate records. An experimental study on real-world data shows the effectiveness of this approach.
大多数数据集成应用程序都需要在各自数据集的模式之间进行匹配。我们将展示如何利用这些数据集中存在的重复项来自动识别匹配的属性。我们描述了一种算法,该算法首先发现具有未对齐模式的数据集之间的重复项,然后使用这些重复项在具有不透明列名的模式之间执行模式匹配。在具有未对齐模式的数据集中发现重复项比在通常设置中更难,因为不清楚应该将一个对象中的哪些字段与另一个对象中的哪些字段进行比较。我们开发了一种新的算法,可以在这种情况下有效地找到最可能的副本。现在,我们的模式匹配算法能够通过比较这些重复记录中的数据值来识别相应的属性。对实际数据的实验研究表明了该方法的有效性。
{"title":"Schema matching using duplicates","authors":"Alexander Bilke, Felix Naumann","doi":"10.1109/ICDE.2005.126","DOIUrl":"https://doi.org/10.1109/ICDE.2005.126","url":null,"abstract":"Most data integration applications require a matching between the schemas of the respective data sets. We show how the existence of duplicates within these data sets can be exploited to automatically identify matching attributes. We describe an algorithm that first discovers duplicates among data sets with unaligned schemas and then uses these duplicates to perform schema matching between schemas with opaque column names. Discovering duplicates among data sets with unaligned schemas is more difficult than in the usual setting, because it is not clear which fields in one object should be compared with which fields in the other. We have developed a new algorithm that efficiently finds the most likely duplicates in such a setting. Now, our schema matching algorithm is able to identify corresponding attributes by comparing data values within those duplicate records. An experimental study on real-world data shows the effectiveness of this approach.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126561766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 241
An enhanced query model for soccer video retrieval using temporal relationships 基于时间关系的足球视频检索增强查询模型
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.20
Shu‐Ching Chen, M. Shyu, Na Zhao
The focal goal of our research is to develop a general framework which can automatically analyze the sports video, detect the sports events, and finally offer an efficient and user-friendly system for sports video retrieval. In our earlier work, a novel multimedia data mining technique was proposed for automatic soccer event extraction by adopting multimodal feature analysis. Until now, this framework has been performed on the detection of goal and corner kick events and the results are quite impressive. Correspondingly, in this work, the detected video events are modeled and effectively stored in the database. A temporal query model is designed to satisfy the comprehensive temporal query requirements, and the corresponding graphical query language is developed. The advanced characteristics make our model particularly well suited for searching events in a large scale video database.
本文研究的重点是开发一个通用的框架,实现对体育视频的自动分析,对体育赛事的自动检测,最终提供一个高效、用户友好的体育视频检索系统。在之前的工作中,我们提出了一种基于多模态特征分析的多媒体数据挖掘技术,用于足球赛事的自动提取。到目前为止,这个框架已经被用于检测进球和角球事件,结果相当令人印象深刻。相应地,在本工作中,对检测到的视频事件进行建模并有效地存储在数据库中。设计了满足综合时态查询需求的时态查询模型,并开发了相应的图形化查询语言。这种先进的特性使我们的模型特别适合于在大型视频数据库中搜索事件。
{"title":"An enhanced query model for soccer video retrieval using temporal relationships","authors":"Shu‐Ching Chen, M. Shyu, Na Zhao","doi":"10.1109/ICDE.2005.20","DOIUrl":"https://doi.org/10.1109/ICDE.2005.20","url":null,"abstract":"The focal goal of our research is to develop a general framework which can automatically analyze the sports video, detect the sports events, and finally offer an efficient and user-friendly system for sports video retrieval. In our earlier work, a novel multimedia data mining technique was proposed for automatic soccer event extraction by adopting multimodal feature analysis. Until now, this framework has been performed on the detection of goal and corner kick events and the results are quite impressive. Correspondingly, in this work, the detected video events are modeled and effectively stored in the database. A temporal query model is designed to satisfy the comprehensive temporal query requirements, and the corresponding graphical query language is developed. The advanced characteristics make our model particularly well suited for searching events in a large scale video database.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125959158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
GPIVOT: efficient incremental maintenance of complex ROLAP views GPIVOT:对复杂的ROLAP视图进行有效的增量维护
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.71
Songting Chen, Elke A. Rundensteiner
Data warehousing and on-line analytical processing (OLAP) are essential for decision support applications. Common OLAP operations include for example drill down, roll up, pivot and unpivot. Typically, such queries are fairly complex and are often executed over huge volumes of data. The solution in practice is to use materialized views to reduce the query cost. Utilizing materialized views that incorporate not just traditional simple SELECT-PROJECT-JOIN operators but also complex OLAP operators such as pivot and unpivot is crucial to improve the OLAP query performance but as of now unexplored topic. In this work, we demonstrate that the efficient maintenance of views with pivot and unpivot operators requires the definition of more generalized operators, which we call GPIVOT and GUNPIVOT. We propose rewriting rules, combination rules and propagation rules for such operators. We also design a novel view maintenance framework for applying these rules to obtain an efficient maintenance plan. Our query transformation rules are thus dual purpose serving both view maintenance and query optimization. This paves the way for the inclusion of the GPIVOT and GUNPIVOT into any DBMS engine.
数据仓库和联机分析处理(OLAP)对于决策支持应用程序是必不可少的。常见的OLAP操作包括向下钻取、向上卷取、pivot和unpivot。通常,此类查询相当复杂,并且经常在大量数据上执行。在实践中,解决方案是使用物化视图来降低查询成本。利用物化视图不仅包含传统的简单SELECT-PROJECT-JOIN操作符,而且还包含复杂的OLAP操作符(如pivot和unpivot),这对于提高OLAP查询性能至关重要,但这是目前尚未探索的主题。在这项工作中,我们证明了有效地维护具有枢轴和非枢轴算子的视图需要定义更广义的算子,我们称之为GPIVOT和GUNPIVOT。我们提出了改写规则、组合规则和传播规则。我们还设计了一个新的视图维护框架来应用这些规则,以获得有效的维护计划。因此,我们的查询转换规则具有双重目的,既服务于视图维护,又服务于查询优化。这为将GPIVOT和GUNPIVOT包含到任何DBMS引擎中铺平了道路。
{"title":"GPIVOT: efficient incremental maintenance of complex ROLAP views","authors":"Songting Chen, Elke A. Rundensteiner","doi":"10.1109/ICDE.2005.71","DOIUrl":"https://doi.org/10.1109/ICDE.2005.71","url":null,"abstract":"Data warehousing and on-line analytical processing (OLAP) are essential for decision support applications. Common OLAP operations include for example drill down, roll up, pivot and unpivot. Typically, such queries are fairly complex and are often executed over huge volumes of data. The solution in practice is to use materialized views to reduce the query cost. Utilizing materialized views that incorporate not just traditional simple SELECT-PROJECT-JOIN operators but also complex OLAP operators such as pivot and unpivot is crucial to improve the OLAP query performance but as of now unexplored topic. In this work, we demonstrate that the efficient maintenance of views with pivot and unpivot operators requires the definition of more generalized operators, which we call GPIVOT and GUNPIVOT. We propose rewriting rules, combination rules and propagation rules for such operators. We also design a novel view maintenance framework for applying these rules to obtain an efficient maintenance plan. Our query transformation rules are thus dual purpose serving both view maintenance and query optimization. This paves the way for the inclusion of the GPIVOT and GUNPIVOT into any DBMS engine.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"246 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127492291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Towards exploring interactive relationship between clusters and outliers in multi-dimensional data analysis 探讨多维数据分析中聚类与离群值之间的交互关系
Pub Date : 2005-04-05 DOI: 10.1109/ICDE.2005.146
Yong Shi, A. Zhang
Nowadays many data mining algorithms focus on clustering methods. There are also a lot of approaches designed for outlier detection. We observe that, in many situations, clusters and outliers are concepts whose meanings are inseparable to each other, especially for those data sets with noise. Thus, it is necessary to treat clusters and outliers as concepts of the same importance in data analysis. In this paper, we present a cluster-outlier iterative detection algorithm, tending to detect the clusters and outliers in another perspective for noisy data sets. In this algorithm, clusters are detected and adjusted according to the intra-relationship within clusters and the inter-relationship between clusters and outliers, and vice versa. The adjustment and modification of the clusters and outliers are performed iteratively until a certain termination condition is reached. This data processing algorithm can be applied in many fields such as pattern recognition, data clustering and signal processing. Experimental results demonstrate the advantages of our approach.
目前许多数据挖掘算法都集中在聚类方法上。还有很多方法是为异常值检测而设计的。我们观察到,在许多情况下,聚类和离群值是彼此意义不可分割的概念,特别是对于那些带有噪声的数据集。因此,有必要将聚类和离群值作为数据分析中同等重要的概念来对待。在本文中,我们提出了一种聚类-离群点迭代检测算法,倾向于从另一个角度检测噪声数据集的聚类和离群点。该算法根据聚类内部的相互关系和聚类与离群点之间的相互关系对聚类进行检测和调整,反之亦然。迭代地对聚类和离群点进行调整和修改,直到达到一定的终止条件。该数据处理算法可应用于模式识别、数据聚类和信号处理等多个领域。实验结果证明了该方法的优越性。
{"title":"Towards exploring interactive relationship between clusters and outliers in multi-dimensional data analysis","authors":"Yong Shi, A. Zhang","doi":"10.1109/ICDE.2005.146","DOIUrl":"https://doi.org/10.1109/ICDE.2005.146","url":null,"abstract":"Nowadays many data mining algorithms focus on clustering methods. There are also a lot of approaches designed for outlier detection. We observe that, in many situations, clusters and outliers are concepts whose meanings are inseparable to each other, especially for those data sets with noise. Thus, it is necessary to treat clusters and outliers as concepts of the same importance in data analysis. In this paper, we present a cluster-outlier iterative detection algorithm, tending to detect the clusters and outliers in another perspective for noisy data sets. In this algorithm, clusters are detected and adjusted according to the intra-relationship within clusters and the inter-relationship between clusters and outliers, and vice versa. The adjustment and modification of the clusters and outliers are performed iteratively until a certain termination condition is reached. This data processing algorithm can be applied in many fields such as pattern recognition, data clustering and signal processing. Experimental results demonstrate the advantages of our approach.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128443485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
21st International Conference on Data Engineering (ICDE'05)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1