首页 > 最新文献

Proceedings. International Database Engineering and Applications Symposium最新文献

英文 中文
A constrained frequent pattern mining system for handling aggregate constraints 一种处理聚合约束的约束频繁模式挖掘系统
Pub Date : 2012-08-08 DOI: 10.1145/2351476.2351479
C. Leung, Fan Jiang, Lijing Sun, Yan Wang
Frequent pattern mining searches data for sets of items that are frequently co-occurring together. Most of algorithms find all the frequent patterns. However, there are many real-life situations in which users is interested in only some small portions of the entire collection of frequent patterns. To mine patterns that satisfy the user aggregate constraints in the form of agg(X.attr)θconst, properties of constraints are exploited. When agg is sum, the mining can be complicated. Existing mining systems or algorithms usually make assumptions about the value or range of X.attr and/or const. In this paper, we propose a frequent pattern mining system that avoids making these assumptions and that effectively handles the sum constraints as well as other aggregate constraints.
频繁模式挖掘在数据中搜索经常同时出现的项目集。大多数算法都能找到所有的频繁模式。然而,在许多实际情况下,用户只对整个频繁模式集合的一小部分感兴趣。为了挖掘以agg(X.attr)θconst形式满足用户聚合约束的模式,需要利用约束的属性。当agg是和时,挖掘可能会很复杂。现有的挖掘系统或算法通常对X.attr和/或const的值或范围进行假设。在本文中,我们提出了一个频繁模式挖掘系统,它避免了这些假设,并有效地处理和约束以及其他聚合约束。
{"title":"A constrained frequent pattern mining system for handling aggregate constraints","authors":"C. Leung, Fan Jiang, Lijing Sun, Yan Wang","doi":"10.1145/2351476.2351479","DOIUrl":"https://doi.org/10.1145/2351476.2351479","url":null,"abstract":"Frequent pattern mining searches data for sets of items that are frequently co-occurring together. Most of algorithms find all the frequent patterns. However, there are many real-life situations in which users is interested in only some small portions of the entire collection of frequent patterns. To mine patterns that satisfy the user aggregate constraints in the form of agg(X.attr)θconst, properties of constraints are exploited. When agg is sum, the mining can be complicated. Existing mining systems or algorithms usually make assumptions about the value or range of X.attr and/or const. In this paper, we propose a frequent pattern mining system that avoids making these assumptions and that effectively handles the sum constraints as well as other aggregate constraints.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"73 1","pages":"14-23"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86106922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Efficient graph management based on bitmap indices 基于位图索引的高效图形管理
Pub Date : 2012-08-08 DOI: 10.1145/2351476.2351489
Norbert Martínez-Bazan, Miquel Angel Aguila-Lorente, V. Muntés-Mulero, David Dominguez-Sal, S. Gómez-Villamor, J. Larriba-Pey
The increasing amount of graph like data from social networks, science and the web has grown an interest in analyzing the relationships between different entities. New specialized solutions in the form of graph databases, which are generic and able to adapt to any schema as an alternative to RDBMS, have appeared to manage attributed multigraphs efficiently. In this paper, we describe the internals of DEX graph database, which is based on a representation of the graph and its attributes as maps and bitmap structures that can be loaded and unloaded efficiently from memory. We also present the internal operations used in DEX to manipulate these structures. We show that by using these structures, DEX scales to graphs with billions of vertices and edges with very limited memory requirements. Finally, we compare our graph-oriented approach to other approaches showing that our system is better suited for out-of-core typical graph-like operations.
来自社交网络、科学和网络的越来越多的图表数据让人们对分析不同实体之间的关系产生了兴趣。以图数据库形式出现的新的专门解决方案,具有通用性,能够适应任何模式,作为RDBMS的替代品,可以有效地管理属性多图。在本文中,我们描述了DEX图形数据库的内部结构,该数据库基于图形及其属性的映射和位图结构的表示,可以有效地从内存中加载和卸载。我们还介绍了DEX中用于操作这些结构的内部操作。我们表明,通过使用这些结构,DEX扩展到具有数十亿个顶点和边的图形,并且内存需求非常有限。最后,我们将面向图形的方法与其他方法进行比较,表明我们的系统更适合于核外典型的类图操作。
{"title":"Efficient graph management based on bitmap indices","authors":"Norbert Martínez-Bazan, Miquel Angel Aguila-Lorente, V. Muntés-Mulero, David Dominguez-Sal, S. Gómez-Villamor, J. Larriba-Pey","doi":"10.1145/2351476.2351489","DOIUrl":"https://doi.org/10.1145/2351476.2351489","url":null,"abstract":"The increasing amount of graph like data from social networks, science and the web has grown an interest in analyzing the relationships between different entities. New specialized solutions in the form of graph databases, which are generic and able to adapt to any schema as an alternative to RDBMS, have appeared to manage attributed multigraphs efficiently. In this paper, we describe the internals of DEX graph database, which is based on a representation of the graph and its attributes as maps and bitmap structures that can be loaded and unloaded efficiently from memory. We also present the internal operations used in DEX to manipulate these structures. We show that by using these structures, DEX scales to graphs with billions of vertices and edges with very limited memory requirements. Finally, we compare our graph-oriented approach to other approaches showing that our system is better suited for out-of-core typical graph-like operations.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"101 1","pages":"110-119"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80154413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Sample-based forecasting exploiting hierarchical time series 利用分层时间序列的基于样本的预测
Pub Date : 2012-08-08 DOI: 10.1145/2351476.2351490
Ulrike Fischer, Frank Rosenthal, Wolfgang Lehner
Time series forecasting is challenging as sophisticated forecast models are computationally expensive to build. Recent research has addressed the integration of forecasting inside a DBMS. One main benefit is that models can be created once and then repeatedly used to answer forecast queries. Often forecast queries are submitted on higher aggregation levels, e. g., forecasts of sales over all locations. To answer such a forecast query, we have two possibilities. First, we can aggregate all base time series (sales in Austria, sales in Belgium...) and create only one model for the aggregate time series. Second, we can create models for all base time series and aggregate the base forecast values. The second possibility might lead to a higher accuracy but it is usually too expensive due to a high number of base time series. However, we actually do not need all base models to achieve a high accuracy, a sample of base models is enough. With this approach, we still achieve a better accuracy than an aggregate model, very similar to using all models, but we need less models to create and maintain in the database. We further improve this approach if new actual values of the base time series arrive at different points in time. With each new actual value we can refine the aggregate forecast and eventually converge towards the real actual value. Our experimental evaluation using several real-world data sets, shows a high accuracy of our approaches and a fast convergence towards the optimal value with increasing sample sizes and increasing number of actual values respectively.
时间序列预测是具有挑战性的,因为复杂的预测模型在计算上是昂贵的。最近的研究解决了在DBMS中集成预测的问题。一个主要的好处是,模型可以创建一次,然后重复使用,以回答预测查询。通常,预测查询是在更高的聚合级别上提交的,例如,所有地点的销售预测。要回答这样一个预测问题,我们有两种可能。首先,我们可以聚合所有基本时间序列(奥地利的销售、比利时的销售……),并为聚合时间序列只创建一个模型。其次,我们可以为所有基本时间序列创建模型并汇总基本预测值。第二种可能性可能会导致更高的精度,但由于大量的基本时间序列,它通常过于昂贵。然而,我们实际上并不需要所有的基础模型来达到很高的精度,一个基础模型的样本就足够了。使用这种方法,我们仍然可以获得比聚合模型更好的准确性,这与使用所有模型非常相似,但是我们需要在数据库中创建和维护的模型更少。如果新的基本时间序列的实际值到达不同的时间点,我们将进一步改进这种方法。对于每一个新的实际值,我们都可以对总体预测进行细化,并最终收敛于真实的实际值。我们使用几个真实数据集进行的实验评估表明,我们的方法具有很高的准确性,并且随着样本量的增加和实际值数量的增加,我们的方法能够快速收敛到最优值。
{"title":"Sample-based forecasting exploiting hierarchical time series","authors":"Ulrike Fischer, Frank Rosenthal, Wolfgang Lehner","doi":"10.1145/2351476.2351490","DOIUrl":"https://doi.org/10.1145/2351476.2351490","url":null,"abstract":"Time series forecasting is challenging as sophisticated forecast models are computationally expensive to build. Recent research has addressed the integration of forecasting inside a DBMS. One main benefit is that models can be created once and then repeatedly used to answer forecast queries. Often forecast queries are submitted on higher aggregation levels, e. g., forecasts of sales over all locations. To answer such a forecast query, we have two possibilities. First, we can aggregate all base time series (sales in Austria, sales in Belgium...) and create only one model for the aggregate time series. Second, we can create models for all base time series and aggregate the base forecast values. The second possibility might lead to a higher accuracy but it is usually too expensive due to a high number of base time series. However, we actually do not need all base models to achieve a high accuracy, a sample of base models is enough. With this approach, we still achieve a better accuracy than an aggregate model, very similar to using all models, but we need less models to create and maintain in the database. We further improve this approach if new actual values of the base time series arrive at different points in time. With each new actual value we can refine the aggregate forecast and eventually converge towards the real actual value. Our experimental evaluation using several real-world data sets, shows a high accuracy of our approaches and a fast convergence towards the optimal value with increasing sample sizes and increasing number of actual values respectively.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"34 1","pages":"120-129"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82453641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A mediator-based system for distributed semantic provenance management systems 分布式语义来源管理系统的基于中介的系统
Pub Date : 2012-08-08 DOI: 10.1145/2351476.2351499
Mohamed Amin Sakka, Bruno Defude
Today, most of the applications exchanging and processing documents on the web or in clouds become provenance aware and provides heterogeneous, decentralized and not interoperable provenance data. Provenance is becoming a key metadata for assessing electronic documents trustworthiness and should be considered as first class data. Most of provenance management systems are either dedicated to a specific application (workflow, database) or a specific data type. Moreover, in modern infrastructures such as cloud, applications can be deployed and are executed on several provider infrastructure. So, there is a need to track the provenance of applications between different provenance providers. For these reasons, modeling, collecting and querying provenance across heterogeneous distributed sources is considered as a challenging task. In this paper, we introduce a framework based on semantic web models supporting syntactic and semantic heterogeneity of provenance sources and correlations between multiple sources. This framework is implemented as a provenance management system (or PMS). We focus on the design of a mediator based system allowing to federate distributed PMSs and we present optimization issues related to distributed query processing.
今天,大多数在网络或云上交换和处理文档的应用程序都具有来源意识,并提供异构的、分散的、不可互操作的来源数据。来源正在成为评估电子文档可信度的关键元数据,应被视为一级数据。大多数来源管理系统要么专用于特定的应用程序(工作流、数据库),要么专用于特定的数据类型。此外,在现代基础设施(如云)中,应用程序可以在多个提供者基础设施上部署和执行。因此,需要在不同的来源提供者之间跟踪应用程序的来源。由于这些原因,跨异构分布式源建模、收集和查询来源被认为是一项具有挑战性的任务。在本文中,我们引入了一个基于语义web模型的框架,支持来源的句法和语义异构以及多个来源之间的相关性。这个框架被实现为一个来源管理系统(或PMS)。我们将重点关注基于中介的系统的设计,该系统允许联合分布式pms,并提出与分布式查询处理相关的优化问题。
{"title":"A mediator-based system for distributed semantic provenance management systems","authors":"Mohamed Amin Sakka, Bruno Defude","doi":"10.1145/2351476.2351499","DOIUrl":"https://doi.org/10.1145/2351476.2351499","url":null,"abstract":"Today, most of the applications exchanging and processing documents on the web or in clouds become provenance aware and provides heterogeneous, decentralized and not interoperable provenance data. Provenance is becoming a key metadata for assessing electronic documents trustworthiness and should be considered as first class data.\u0000 Most of provenance management systems are either dedicated to a specific application (workflow, database) or a specific data type. Moreover, in modern infrastructures such as cloud, applications can be deployed and are executed on several provider infrastructure. So, there is a need to track the provenance of applications between different provenance providers. For these reasons, modeling, collecting and querying provenance across heterogeneous distributed sources is considered as a challenging task. In this paper, we introduce a framework based on semantic web models supporting syntactic and semantic heterogeneity of provenance sources and correlations between multiple sources. This framework is implemented as a provenance management system (or PMS). We focus on the design of a mediator based system allowing to federate distributed PMSs and we present optimization issues related to distributed query processing.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"56 1","pages":"193-198"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85513613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
UBIQUEST, for rapid prototyping of networking applications UBIQUEST,用于网络应用的快速原型设计
Pub Date : 2012-08-08 DOI: 10.1145/2351476.2351498
A. Ahmad-Kassem, Christophe Bobineau, C. Collet, Etienne Dublé, S. Grumbach, Fuda Ma, L. Martínez, S. Ubéda
An UBIQUEST system provides a high level programming abstraction for rapid prototyping of heterogeneous and distributed applications in a dynamic environment. Such a system is perceived as a distributed database and the applications interact through declarative queries including declarative networking programs (e.g. routing) and/or specific data-oriented distributed algorithms (e.g. distributed join). Case-Based Reasoning is used for optimization of distributed queries when as there is no prior knowledge on data (sources) in networking applications, and certainly no related metadata such as data statistics.
UBIQUEST系统为动态环境中异构和分布式应用程序的快速原型设计提供了一个高层次的编程抽象。这样的系统被认为是一个分布式数据库,应用程序通过声明性查询进行交互,包括声明性网络程序(例如路由)和/或特定的面向数据的分布式算法(例如分布式连接)。基于案例的推理用于优化分布式查询,因为在网络应用程序中没有关于数据(源)的先验知识,当然也没有相关的元数据(如数据统计)。
{"title":"UBIQUEST, for rapid prototyping of networking applications","authors":"A. Ahmad-Kassem, Christophe Bobineau, C. Collet, Etienne Dublé, S. Grumbach, Fuda Ma, L. Martínez, S. Ubéda","doi":"10.1145/2351476.2351498","DOIUrl":"https://doi.org/10.1145/2351476.2351498","url":null,"abstract":"An UBIQUEST system provides a high level programming abstraction for rapid prototyping of heterogeneous and distributed applications in a dynamic environment. Such a system is perceived as a distributed database and the applications interact through declarative queries including declarative networking programs (e.g. routing) and/or specific data-oriented distributed algorithms (e.g. distributed join). Case-Based Reasoning is used for optimization of distributed queries when as there is no prior knowledge on data (sources) in networking applications, and certainly no related metadata such as data statistics.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"4 1","pages":"187-192"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77963028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mining GPS traces to recommend common meeting points 挖掘GPS轨迹,推荐常见的会面地点
Pub Date : 2012-08-08 DOI: 10.1145/2351476.2351497
Sonia Khetarpaul, S. K. Gupta, L. V. Subramaniam, Ullas Nambiar
Scheduling a meeting is a difficult task for people who have overbooked calendars and many constraints. The complexity increases when the meeting is to be scheduled between parties who are situated in geographically distant locations of a city and have varying travel patterns. In this paper, we present a solution that identifies a common meeting point for a group of users who have temporal and spatial locality constraints that vary over time. The problem entails answering an Optimal Meeting Point (OMP) query in spatial databases. Under Euclidean space OMP query solution identification gets reduced to the problem of determining the geometric median of a set of points, a problem for which no exact solution exists. The OMP problem does not consider any constraints as far as availability of users is concerned whereas that is a key constraint in our setting. We therefore focus on finding a solution that uses daily movements information obtained from GPS traces for each user to compute stay points during various times of the day. We then determine interesting locations by analyzing the stay points across multiple users. The novelty of our solution is that the computations are done within the database by using various relational algebra operations in combination with statistical operations on the GPS trajectory data. This makes our solution scalable to larger groups of users and for multiple such requests. Once this list of stay points and interesting locations are obtained, we show that this data can be utilized to construct spatio-temporal graphs for the users that allow us efficiently decide a meeting place. We perform experiments on a real-world dataset and show that our method is effective in finding an optimal meeting point between two users.
安排会议对于日程排满且有很多限制的人来说是一项艰巨的任务。当会议安排在位于城市中地理位置较远且旅行模式不同的各方之间时,复杂性会增加。在本文中,我们提出了一个解决方案,该解决方案为具有随时间变化的时间和空间局部性约束的一组用户确定一个公共会议点。该问题需要回答空间数据库中的最优会合点(OMP)查询。在欧几里得空间下,OMP查询解的识别被简化为确定一组点的几何中位数的问题,这个问题不存在精确解。OMP问题不考虑用户可用性方面的任何约束,而这是我们设置中的一个关键约束。因此,我们专注于寻找一种解决方案,使用从每个用户的GPS轨迹中获得的日常运动信息来计算一天中不同时间的停留点。然后,我们通过分析多个用户的停留点来确定有趣的地点。我们的解决方案的新颖之处在于,计算是在数据库中通过使用各种关系代数操作结合GPS轨迹数据的统计操作完成的。这使得我们的解决方案可扩展到更大的用户组和多个这样的请求。一旦获得了停留点和有趣地点的列表,我们就可以利用这些数据为用户构建时空图,使我们能够有效地确定会议地点。我们在一个真实的数据集上进行了实验,并表明我们的方法在寻找两个用户之间的最佳交汇点方面是有效的。
{"title":"Mining GPS traces to recommend common meeting points","authors":"Sonia Khetarpaul, S. K. Gupta, L. V. Subramaniam, Ullas Nambiar","doi":"10.1145/2351476.2351497","DOIUrl":"https://doi.org/10.1145/2351476.2351497","url":null,"abstract":"Scheduling a meeting is a difficult task for people who have overbooked calendars and many constraints. The complexity increases when the meeting is to be scheduled between parties who are situated in geographically distant locations of a city and have varying travel patterns. In this paper, we present a solution that identifies a common meeting point for a group of users who have temporal and spatial locality constraints that vary over time. The problem entails answering an Optimal Meeting Point (OMP) query in spatial databases. Under Euclidean space OMP query solution identification gets reduced to the problem of determining the geometric median of a set of points, a problem for which no exact solution exists. The OMP problem does not consider any constraints as far as availability of users is concerned whereas that is a key constraint in our setting. We therefore focus on finding a solution that uses daily movements information obtained from GPS traces for each user to compute stay points during various times of the day. We then determine interesting locations by analyzing the stay points across multiple users. The novelty of our solution is that the computations are done within the database by using various relational algebra operations in combination with statistical operations on the GPS trajectory data. This makes our solution scalable to larger groups of users and for multiple such requests. Once this list of stay points and interesting locations are obtained, we show that this data can be utilized to construct spatio-temporal graphs for the users that allow us efficiently decide a meeting place. We perform experiments on a real-world dataset and show that our method is effective in finding an optimal meeting point between two users.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"30 1","pages":"181-186"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80318878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Mining probabilistic datasets vertically 垂直挖掘概率数据集
Pub Date : 2012-08-08 DOI: 10.1145/2351476.2351500
C. Leung, S. Tanbeer, Bhavek P. Budhia, Lauren C. Zacharias
As frequent pattern mining plays an important role in various real-life applications, it has been the subject of numerous studies. Most of the studies mine transactional datasets of precise data. However, there are situations in which data are uncertain. Over the few years, Apriori-based, tree-based, and hyperlinked array structure based mining algorithms have been proposed to mine frequent patterns from these probabilistic datasets of uncertain data. These algorithms view the datasets "horizontally" as collections of transactions, and each records a set of items contained in that transaction. In this paper, we consider an alternative representation such that probabilistic datasets of uncertain data can be viewed "vertically" as collections of vectors. The vector for each item indicates which transactions contain that item. We also propose an algorithm called U-VIPER to mine these probabilistic datasets "vertically for frequent patterns.
由于频繁模式挖掘在各种实际应用中起着重要的作用,它已经成为许多研究的主题。大多数研究挖掘的是精确数据的事务性数据集。然而,在某些情况下,数据是不确定的。近年来,人们提出了基于apriori、基于树和基于超链接数组结构的挖掘算法来从这些不确定数据的概率数据集中挖掘频繁模式。这些算法“水平地”将数据集视为事务集合,每个事务记录该事务中包含的一组项。在本文中,我们考虑了一种替代表示,使得不确定数据的概率数据集可以“垂直地”视为向量的集合。每个项目的向量表示哪些事务包含该项目。我们还提出了一种称为U-VIPER的算法,用于垂直挖掘这些概率数据集的频繁模式。
{"title":"Mining probabilistic datasets vertically","authors":"C. Leung, S. Tanbeer, Bhavek P. Budhia, Lauren C. Zacharias","doi":"10.1145/2351476.2351500","DOIUrl":"https://doi.org/10.1145/2351476.2351500","url":null,"abstract":"As frequent pattern mining plays an important role in various real-life applications, it has been the subject of numerous studies. Most of the studies mine transactional datasets of precise data. However, there are situations in which data are uncertain. Over the few years, Apriori-based, tree-based, and hyperlinked array structure based mining algorithms have been proposed to mine frequent patterns from these probabilistic datasets of uncertain data. These algorithms view the datasets \"horizontally\" as collections of transactions, and each records a set of items contained in that transaction. In this paper, we consider an alternative representation such that probabilistic datasets of uncertain data can be viewed \"vertically\" as collections of vectors. The vector for each item indicates which transactions contain that item. We also propose an algorithm called U-VIPER to mine these probabilistic datasets \"vertically for frequent patterns.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"65 1","pages":"199-204"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81453627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Differential evolution versus genetic algorithms: towards symbolic aggregate approximation of non-normalized time series 差分进化与遗传算法:非归一化时间序列的符号聚合近似
Pub Date : 2012-08-08 DOI: 10.1145/2351476.2351501
Muhammad Marwan Muhammad Fuad
The differential evolution (DE) is a very powerful search method for solving many optimization problems. In this paper we present a new scheme (DESAX) based on the differential evolution to localize the breakpoints utilized with the symbolic aggregate approximation method; one of the most important symbolic representation techniques for times series data. We compare the new scheme with a previous one (GASAX), which is based on the genetic algorithms, and we show how the new scheme outperforms the original one. We also show how (DESAX) can be used for the symbolic aggregate approximation of non-normalized time series.
差分进化(DE)是一种非常强大的搜索方法,可用于解决许多优化问题。本文提出了一种基于差分进化的断点局部化算法(DESAX),该算法利用符号聚集逼近法对断点进行局部化;时间序列数据最重要的符号表示技术之一。我们将新方案与先前基于遗传算法的方案(GASAX)进行了比较,并展示了新方案如何优于原方案。我们还展示了如何将(DESAX)用于非规范化时间序列的符号聚合近似。
{"title":"Differential evolution versus genetic algorithms: towards symbolic aggregate approximation of non-normalized time series","authors":"Muhammad Marwan Muhammad Fuad","doi":"10.1145/2351476.2351501","DOIUrl":"https://doi.org/10.1145/2351476.2351501","url":null,"abstract":"The differential evolution (DE) is a very powerful search method for solving many optimization problems. In this paper we present a new scheme (DESAX) based on the differential evolution to localize the breakpoints utilized with the symbolic aggregate approximation method; one of the most important symbolic representation techniques for times series data. We compare the new scheme with a previous one (GASAX), which is based on the genetic algorithms, and we show how the new scheme outperforms the original one. We also show how (DESAX) can be used for the symbolic aggregate approximation of non-normalized time series.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"18 1","pages":"205-210"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88427190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Partitioning XML documents for iterative queries 为迭代查询划分XML文档
Pub Date : 2012-08-08 DOI: 10.1145/2351476.2351483
N. Bidoit, Dario Colazzo, Noor Malla, C. Sartiani
This paper presents an XML partitioning technique that allows main-memory query engines to process a class of XQuery queries, that we dub iterative queries, on arbitrarily large input documents. We provide a static analysis technique to recognize these queries. The static analysis is based on paths extracted from queries and does not need additional schema information. We then provide an algorithm using path information for partitioning the input documents of iterative queries. This algorithm admits a streaming implementation, whose effectiveness is experimentally validated.
本文提出了一种XML分区技术,该技术允许主存查询引擎在任意大的输入文档上处理一类XQuery查询,我们称之为迭代查询。我们提供了一种静态分析技术来识别这些查询。静态分析基于从查询中提取的路径,不需要额外的模式信息。然后,我们提供了一个使用路径信息对迭代查询的输入文档进行分区的算法。该算法支持流实现,并通过实验验证了其有效性。
{"title":"Partitioning XML documents for iterative queries","authors":"N. Bidoit, Dario Colazzo, Noor Malla, C. Sartiani","doi":"10.1145/2351476.2351483","DOIUrl":"https://doi.org/10.1145/2351476.2351483","url":null,"abstract":"This paper presents an XML partitioning technique that allows main-memory query engines to process a class of XQuery queries, that we dub iterative queries, on arbitrarily large input documents. We provide a static analysis technique to recognize these queries. The static analysis is based on paths extracted from queries and does not need additional schema information. We then provide an algorithm using path information for partitioning the input documents of iterative queries. This algorithm admits a streaming implementation, whose effectiveness is experimentally validated.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"29 1","pages":"51-60"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81078960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Efficient MD5 hash reversing using D.E.A. framework for sharing computational resources 高效MD5哈希反转使用d.e.a框架共享计算资源
Pub Date : 2012-08-08 DOI: 10.1145/2351476.2351502
Nunzio Cassavia, E. Masciari
The recent advances in computing technology lead to the availability of a huge number of computational resources that can be easily connected through network infrastructures. Indeed, a really small fraction of the available computing power is fully exploited for performing effective computation of user tasks. On the contrary, there are several research projects that require a lot of computing power to reach their goals, but they usually lack adequate resources thus making the project activities quite hard to be completed. In this paper we describe D.E.A. (Distributed Execution Agent), a framework for sharing computational resources. We will exploit D.E.A. framework to tame the high computational demanding problem of hash MD5 reversing. We performed several experiments that confirmed the validity of our approach.
计算技术的最新进展导致大量计算资源的可用性,这些资源可以很容易地通过网络基础设施连接起来。实际上,可用计算能力的一小部分被充分利用来执行用户任务的有效计算。相反,有几个研究项目需要大量的计算能力才能达到他们的目标,但他们通常缺乏足够的资源,从而使项目活动很难完成。本文描述了分布式执行代理(d.e.a),一个用于共享计算资源的框架。我们将利用d.e.a框架来驯服哈希MD5反转的高计算需求问题。我们做了几个实验,证实了我们方法的有效性。
{"title":"Efficient MD5 hash reversing using D.E.A. framework for sharing computational resources","authors":"Nunzio Cassavia, E. Masciari","doi":"10.1145/2351476.2351502","DOIUrl":"https://doi.org/10.1145/2351476.2351502","url":null,"abstract":"The recent advances in computing technology lead to the availability of a huge number of computational resources that can be easily connected through network infrastructures. Indeed, a really small fraction of the available computing power is fully exploited for performing effective computation of user tasks. On the contrary, there are several research projects that require a lot of computing power to reach their goals, but they usually lack adequate resources thus making the project activities quite hard to be completed. In this paper we describe D.E.A. (Distributed Execution Agent), a framework for sharing computational resources. We will exploit D.E.A. framework to tame the high computational demanding problem of hash MD5 reversing. We performed several experiments that confirmed the validity of our approach.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"19 1","pages":"211-215"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81156052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings. International Database Engineering and Applications Symposium
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1