首页 > 最新文献

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)最新文献

英文 中文
Profiling linked open data with ProLOD 分析与ProLOD链接的开放数据
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452762
Christoph Böhm, Felix Naumann, Ziawasch Abedjan, D. Fenz, Toni Grütze, Daniel Hefenbrock, M. Pohl, David Sonnabend
Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset.
链接开放数据(LOD)由数量迅速增长的来源提供,构成了大量易于获取的信息。然而,这些数据并不容易理解。它通常以一组(RDF)三元组的形式提供,通常以覆盖许多领域的巨大文件的形式提供。此外,当数据来自最终用户生成的源(如Wikipedia)时,数据通常具有松散的结构。最后,实际数据的质量也令人担忧,因为它可能不完整、格式不佳、不一致等。要理解和分析这种链接的开放数据,传统的数据分析方法是不够的。在ProLOD中,我们提出了一套方法,从领域级别(聚类、标记),通过模式级别(匹配、消歧义),到数据级别(数据类型检测、模式检测、值分布)。它们被打包成一个交互式的、基于web的工具,允许迭代地探索和发现新的LOD源。因此,用户可以快速测量手头问题(例如,某些集成任务)的源的相关性,关注并探索相关子集。
{"title":"Profiling linked open data with ProLOD","authors":"Christoph Böhm, Felix Naumann, Ziawasch Abedjan, D. Fenz, Toni Grütze, Daniel Hefenbrock, M. Pohl, David Sonnabend","doi":"10.1109/ICDEW.2010.5452762","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452762","url":null,"abstract":"Linked open data (LOD), as provided by a quickly growing number of sources constitutes a wealth of easily accessible information. However, this data is not easy to understand. It is usually provided as a set of (RDF) triples, often enough in the form of enormous files covering many domains. What is more, the data usually has a loose structure when it is derived from end-user generated sources, such as Wikipedia. Finally, the quality of the actual data is also worrisome, because it may be incomplete, poorly formatted, inconsistent, etc. To understand and profile such linked open data, traditional data profiling methods do not suffice. With ProLOD, we propose a suite of methods ranging from the domain level (clustering, labeling), via the schema level (matching, disambiguation), to the data level (data type detection, pattern detection, value distribution). Packaged into an interactive, web-based tool, they allow iterative exploration and discovery of new LOD sources. Thus, users can quickly gauge the relevance of the source for the problem at hand (e.g., some integration task), focus on and explore the relevant subset.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116100373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
On the use of query-driven XML auto-indexing 关于使用查询驱动的XML自动索引
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452741
Karsten Schmidt, T. Härder
Autonomous index management in native XML DBMSs has to address XML's flexibility and storage mapping features, which provide a rich set of indexing options. Change of workload characteristics, indexes selected by the query optimizer's "magic", subtle differences in the expressiveness of indexes, and tailor-made index properties ask-in addition to (long-range) manual index selection-for rapid autonomic reactions and self-tuning options by the DBMS. Hence, when managing an existing set of indexes (i.e., a configuration), its cost trade-off has to be steadily controlled by observing query runtimes, index creation and maintenance, and space constraints.
原生XML dbms中的自治索引管理必须解决XML的灵活性和存储映射特性,这些特性提供了一组丰富的索引选项。工作负载特征的变化、查询优化器的“魔力”所选择的索引、索引表达性的细微差异以及定制的索引属性——除了(远程)手动索引选择之外——都需要DBMS的快速自主反应和自调优选项。因此,在管理现有索引集(即配置)时,必须通过观察查询运行时、索引创建和维护以及空间约束来稳定地控制其成本权衡。
{"title":"On the use of query-driven XML auto-indexing","authors":"Karsten Schmidt, T. Härder","doi":"10.1109/ICDEW.2010.5452741","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452741","url":null,"abstract":"Autonomous index management in native XML DBMSs has to address XML's flexibility and storage mapping features, which provide a rich set of indexing options. Change of workload characteristics, indexes selected by the query optimizer's \"magic\", subtle differences in the expressiveness of indexes, and tailor-made index properties ask-in addition to (long-range) manual index selection-for rapid autonomic reactions and self-tuning options by the DBMS. Hence, when managing an existing set of indexes (i.e., a configuration), its cost trade-off has to be steadily controlled by observing query runtimes, index creation and maintenance, and space constraints.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"446 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122485378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Adaptive indexing for relational keys 关系键的自适应索引
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452743
G. Graefe, Harumi A. Kuno
Adaptive indexing schemes such as database cracking and adaptive merging have been investigated to-date only in the context of range queries. These are typical for non-key columns in relational databases. For complete self-managing indexing, adaptive indexing must also apply to key columns. The present paper proposes a design and offers a first performance evaluation in the context of keys. Adaptive merging for keys also enables further improvements in B-tree indexes. First, partitions can be matched to levels in the memory hierarchy such as a CPU cache and an in-memory buffer pool. Second, adaptive merging in merged B-trees enables automatic master-detail clustering.
自适应索引方案,如数据库破解和自适应合并,迄今为止只在范围查询的上下文中进行了研究。这是关系数据库中典型的非键列。对于完全的自管理索引,自适应索引也必须应用于键列。本文提出了一种基于密钥的设计方案,并给出了第一个性能评估。键的自适应合并还可以进一步改进b树索引。首先,分区可以与内存层次结构中的级别相匹配,例如CPU缓存和内存缓冲池。其次,在合并的b树中进行自适应合并,可以实现自动主细节集群。
{"title":"Adaptive indexing for relational keys","authors":"G. Graefe, Harumi A. Kuno","doi":"10.1109/ICDEW.2010.5452743","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452743","url":null,"abstract":"Adaptive indexing schemes such as database cracking and adaptive merging have been investigated to-date only in the context of range queries. These are typical for non-key columns in relational databases. For complete self-managing indexing, adaptive indexing must also apply to key columns. The present paper proposes a design and offers a first performance evaluation in the context of keys. Adaptive merging for keys also enables further improvements in B-tree indexes. First, partitions can be matched to levels in the memory hierarchy such as a CPU cache and an in-memory buffer pool. Second, adaptive merging in merged B-trees enables automatic master-detail clustering.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129674130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Privacy-preserving data publishing 保护隐私的数据发布
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452722
Ruilin Liu, Wendy Hui Wang
Data publishing has generated much concern on individual privacy. Recent work has focused on different background knowledge and their various threats to the privacy of published data. However, there still exist a few types of adversary knowledge waiting to be investigated. In this paper, I explain my research on privacy-preserving data publishing (PPDP) by using full functional dependencies (FFDs) as part of adversary knowledge. I also briefly explain my research plan.
数据发布引起了人们对个人隐私的极大关注。最近的工作集中在不同的背景知识及其对已发布数据隐私的各种威胁上。然而,仍然存在一些类型的对手知识有待研究。在本文中,我通过使用全功能依赖关系(ffd)作为对手知识的一部分来解释我对隐私保护数据发布(PPDP)的研究。我也简要说明了我的研究计划。
{"title":"Privacy-preserving data publishing","authors":"Ruilin Liu, Wendy Hui Wang","doi":"10.1109/ICDEW.2010.5452722","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452722","url":null,"abstract":"Data publishing has generated much concern on individual privacy. Recent work has focused on different background knowledge and their various threats to the privacy of published data. However, there still exist a few types of adversary knowledge waiting to be investigated. In this paper, I explain my research on privacy-preserving data publishing (PPDP) by using full functional dependencies (FFDs) as part of adversary knowledge. I also briefly explain my research plan.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128615387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Towards a task-based search and recommender systems 朝着基于任务的搜索和推荐系统发展
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452726
Gabriele Tolomei, S. Orlando, F. Silvestri
Nowadays, people have been increasingly interested in exploiting Web Search Engines (WSEs) not only for having access to simple Web pages, but mainly for carrying out even complex activities, namely Web-mediated processes (or taskflows). Therefore, users' information needs will become more complex, and (Web) search and recommender systems should change accordingly for dealing with this shift. We claim that such taskflows and their composing tasks are implicitly present in users' minds when they interact, thus, with a WSE to access the Web. Our first research challenge is thus to evaluate this belief by analyzing a very large, longterm log of queries submitted to a WSE, and associating meaningful semantic labels with the extracted tasks (i.e., clusters of task-related queries) and taskflows. This large knowledge base constitutes a good starting point for building a model of users' behaviors. The second research challenge is to devise a novel recommender system that goes beyond the simple query suggestion of modern WSEs. Our system has to exploit the knowledge base of Webmediated processes and the learned model of users' behaviors, to generate complex insights and task-based suggestions to incoming users while they interact with a WSE.
如今,人们对利用Web搜索引擎(wse)越来越感兴趣,不仅是为了访问简单的Web页面,而且主要是为了执行复杂的活动,即Web中介流程(或任务流)。因此,用户的信息需求将变得更加复杂,(Web)搜索和推荐系统也应该相应地改变,以应对这种转变。我们声称,当用户与WSE交互时,这些任务流及其组成任务隐式地呈现在用户的脑海中,从而访问Web。因此,我们的第一个研究挑战是通过分析提交给WSE的大量长期查询日志,并将有意义的语义标签与提取的任务(即任务相关查询的集群)和任务流关联起来,来评估这种信念。这个庞大的知识库构成了构建用户行为模型的良好起点。第二个研究挑战是设计一种新颖的推荐系统,超越现代wse的简单查询建议。我们的系统必须利用web中介过程的知识库和用户行为的学习模型,在用户与WSE交互时为他们生成复杂的见解和基于任务的建议。
{"title":"Towards a task-based search and recommender systems","authors":"Gabriele Tolomei, S. Orlando, F. Silvestri","doi":"10.1109/ICDEW.2010.5452726","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452726","url":null,"abstract":"Nowadays, people have been increasingly interested in exploiting Web Search Engines (WSEs) not only for having access to simple Web pages, but mainly for carrying out even complex activities, namely Web-mediated processes (or taskflows). Therefore, users' information needs will become more complex, and (Web) search and recommender systems should change accordingly for dealing with this shift. We claim that such taskflows and their composing tasks are implicitly present in users' minds when they interact, thus, with a WSE to access the Web. Our first research challenge is thus to evaluate this belief by analyzing a very large, longterm log of queries submitted to a WSE, and associating meaningful semantic labels with the extracted tasks (i.e., clusters of task-related queries) and taskflows. This large knowledge base constitutes a good starting point for building a model of users' behaviors. The second research challenge is to devise a novel recommender system that goes beyond the simple query suggestion of modern WSEs. Our system has to exploit the knowledge base of Webmediated processes and the learned model of users' behaviors, to generate complex insights and task-based suggestions to incoming users while they interact with a WSE.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121074736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Ranking for data repairs 数据修复排名
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452767
M. Yakout, A. Elmagarmid, Jennifer Neville
Improving data quality is a time-consuming, laborintensive and often domain specific operation. A recent principled approach for repairing dirty database is to use data quality rules in the form of database constraints to identify dirty tuples and then use the rules to derive data repairs. Most of existing data repair approaches focus on providing fully automated solutions, which could be risky to depend upon especially for critical data. To guarantee the optimal quality repairs applied to the database, users should be involved to confirm each repair. This highlights the need for an interactive approach that combines the best of both; automatically generating repairs, while efficiently employing user's efforts to verify the repairs. In such approach, the user will guide an online repairing process to incrementally generate repairs. A key challenge in this approach is the response time within the user's interactive sessions, because the process of generating the repairs is time consuming due to the large search space of possible repairs. To this end, we present in this paper a mechanism to continuously generate repairs only to the current top k important violated data quality rules. Moreover, the repairs are grouped and ranked such that the most beneficial in terms of improving data quality comes first to consult the user for verification and feedback. Our experiments on real-world dataset demonstrate the effectiveness of our ranking mechanism to provide a fast response time for the user while improving the data quality as quickly as possible.
提高数据质量是一项耗时、费力且通常是特定领域的操作。最近修复脏数据库的一种有原则的方法是,以数据库约束的形式使用数据质量规则来识别脏元组,然后使用这些规则派生数据修复。大多数现有的数据修复方法都侧重于提供完全自动化的解决方案,依赖这些解决方案可能存在风险,尤其是对于关键数据。为了保证应用于数据库的最佳质量修复,用户应该参与确认每次修复。这突出表明需要一种结合两者优点的互动方法;自动生成修复,同时有效地利用用户的努力来验证修复。在这种方法中,用户将引导在线修复过程以增量方式生成修复。这种方法的一个关键挑战是用户交互会话中的响应时间,因为生成修复的过程非常耗时,因为可能修复的搜索空间很大。为此,我们在本文中提出了一种机制,仅对当前最重要的k个违反数据质量规则的数据持续生成修复。此外,对修复进行分组和排序,以便在改进数据质量方面最有利的修复首先咨询用户以进行验证和反馈。我们在真实数据集上的实验证明了我们的排名机制在为用户提供快速响应时间的同时尽可能快地提高数据质量的有效性。
{"title":"Ranking for data repairs","authors":"M. Yakout, A. Elmagarmid, Jennifer Neville","doi":"10.1109/ICDEW.2010.5452767","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452767","url":null,"abstract":"Improving data quality is a time-consuming, laborintensive and often domain specific operation. A recent principled approach for repairing dirty database is to use data quality rules in the form of database constraints to identify dirty tuples and then use the rules to derive data repairs. Most of existing data repair approaches focus on providing fully automated solutions, which could be risky to depend upon especially for critical data. To guarantee the optimal quality repairs applied to the database, users should be involved to confirm each repair. This highlights the need for an interactive approach that combines the best of both; automatically generating repairs, while efficiently employing user's efforts to verify the repairs. In such approach, the user will guide an online repairing process to incrementally generate repairs. A key challenge in this approach is the response time within the user's interactive sessions, because the process of generating the repairs is time consuming due to the large search space of possible repairs. To this end, we present in this paper a mechanism to continuously generate repairs only to the current top k important violated data quality rules. Moreover, the repairs are grouped and ranked such that the most beneficial in terms of improving data quality comes first to consult the user for verification and feedback. Our experiments on real-world dataset demonstrate the effectiveness of our ranking mechanism to provide a fast response time for the user while improving the data quality as quickly as possible.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116768519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Summarizing ontology-based schemas in PDMS 总结PDMS中基于本体的模式
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452706
Carlos Eduardo S. Pires, Paulo Orlando Queiroz-Sousa, Zoubida Kedad, A. Salgado
Quickly understanding the content of a data source is very useful in several contexts. In a Peer Data Management System (PDMS), peers can be semantically clustered, each cluster being represented by a schema obtained by merging the local schemas of the peers in this cluster. In this paper, we present a process for summarizing schemas of peers participating in a PDMS. We assume that all the schemas are represented by ontologies and we propose a summarization algorithm which produces a summary containing the maximum number of relevant concepts and the minimum number of non-relevant concepts of the initial ontology. The relevance of a concept is determined using the notions of centrality and frequency. Since several possible candidate summaries can be identified during the summarization process, classical Information Retrieval metrics are employed to determine the best summary.
快速理解数据源的内容在一些上下文中非常有用。在对等体数据管理系统(PDMS)中,对等体可以在语义上聚类,每个集群由一个模式表示,该模式是通过合并该集群中对等体的本地模式获得的。在本文中,我们提出了一个过程来总结参与PDMS的对等体的模式。我们假设所有模式都由本体表示,并提出了一种汇总算法,该算法生成的汇总包含初始本体中相关概念的最大数量和不相关概念的最小数量。概念的相关性是使用中心性和频率的概念来确定的。由于在总结过程中可以识别几个可能的候选摘要,因此使用经典的信息检索度量来确定最佳摘要。
{"title":"Summarizing ontology-based schemas in PDMS","authors":"Carlos Eduardo S. Pires, Paulo Orlando Queiroz-Sousa, Zoubida Kedad, A. Salgado","doi":"10.1109/ICDEW.2010.5452706","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452706","url":null,"abstract":"Quickly understanding the content of a data source is very useful in several contexts. In a Peer Data Management System (PDMS), peers can be semantically clustered, each cluster being represented by a schema obtained by merging the local schemas of the peers in this cluster. In this paper, we present a process for summarizing schemas of peers participating in a PDMS. We assume that all the schemas are represented by ontologies and we propose a summarization algorithm which produces a summary containing the maximum number of relevant concepts and the minimum number of non-relevant concepts of the initial ontology. The relevance of a concept is determined using the notions of centrality and frequency. Since several possible candidate summaries can be identified during the summarization process, classical Information Retrieval metrics are employed to determine the best summary.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121095924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Graph indexing for reachability queries 可达性查询的图形索引
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452724
Hilmi Yildirim, Mohammed J. Zaki
Reachability queries appear very frequently in many important applications that work with graph structured data. In some of them, testing reachability between two nodes corresponds to an important problem. For example, in proteinprotein interaction networks one can use it to answer whether two proteins are related, whereas in ontological databases such queries might correspond to the question of whether a concept subsumes another one. Given the huge databases that are often tested with reachability queries, it is important problem to come up with a scalable indexing scheme that has almost constant query time. In this paper, we bring a new dimension to the well-known interval labeling approach. Our approach labels each node with multiple intervals instead of a single interval so that each labeling represents a hyper-rectangle. Our new approach BOX can index dags in linear time and space while retaining the querying time admissible. In experiments, we show that BOX is not vulnerable to increasing edge to node ratios which is a problem for the existing approaches.
可达性查询在许多处理图结构数据的重要应用程序中非常频繁地出现。在某些情况下,测试两个节点之间的可达性是一个重要的问题。例如,在蛋白质-蛋白质相互作用网络中,可以用它来回答两个蛋白质是否相关,而在本体论数据库中,这样的查询可能对应于一个概念是否包含另一个概念的问题。对于经常使用可达性查询进行测试的大型数据库,提出具有几乎恒定查询时间的可扩展索引方案是一个重要问题。在本文中,我们为众所周知的区间标注方法带来了一个新的维度。我们的方法用多个间隔而不是一个间隔来标记每个节点,这样每个标记都代表一个超矩形。我们的新方法BOX可以在保持查询时间允许的情况下对线性时间和空间中的标记进行索引。在实验中,我们表明BOX不容易受到边节点比增加的影响,这是现有方法的一个问题。
{"title":"Graph indexing for reachability queries","authors":"Hilmi Yildirim, Mohammed J. Zaki","doi":"10.1109/ICDEW.2010.5452724","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452724","url":null,"abstract":"Reachability queries appear very frequently in many important applications that work with graph structured data. In some of them, testing reachability between two nodes corresponds to an important problem. For example, in proteinprotein interaction networks one can use it to answer whether two proteins are related, whereas in ontological databases such queries might correspond to the question of whether a concept subsumes another one. Given the huge databases that are often tested with reachability queries, it is important problem to come up with a scalable indexing scheme that has almost constant query time. In this paper, we bring a new dimension to the well-known interval labeling approach. Our approach labels each node with multiple intervals instead of a single interval so that each labeling represents a hyper-rectangle. Our new approach BOX can index dags in linear time and space while retaining the querying time admissible. In experiments, we show that BOX is not vulnerable to increasing edge to node ratios which is a problem for the existing approaches.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116410351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Mining and representing recommendations in actively evolving recommender systems 在积极发展的推荐系统中挖掘和表示推荐
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452714
I. Assent
Recommender systems provide an automatic means of filtering out interesting items, usually based on past similarity of user ratings. In previous work, we have suggested a model that allows users to actively build a recommender network. Users express trust, obtain transparency, and grow (anonymous) recommender connections. In this work, we propose mining such active systems to generate easily understandable representations of the recommender network. Users may review these representations to provide active feedback. This approach further enhances the quality of recommendations, especially as topics of interest change over time. Most notably, it extends the amount of control users have over the model that the recommender network builds of their interests.
推荐系统提供了一种自动过滤掉有趣项目的方法,通常是基于过去用户评分的相似性。在之前的工作中,我们提出了一个允许用户主动构建推荐网络的模型。用户表达信任,获得透明度,并建立(匿名)推荐关系。在这项工作中,我们建议挖掘这样的主动系统来生成易于理解的推荐网络表示。用户可以查看这些表示以提供主动反馈。这种方法进一步提高了推荐的质量,特别是当感兴趣的主题随时间变化时。最值得注意的是,它扩展了用户对推荐网络根据他们的兴趣建立的模型的控制程度。
{"title":"Mining and representing recommendations in actively evolving recommender systems","authors":"I. Assent","doi":"10.1109/ICDEW.2010.5452714","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452714","url":null,"abstract":"Recommender systems provide an automatic means of filtering out interesting items, usually based on past similarity of user ratings. In previous work, we have suggested a model that allows users to actively build a recommender network. Users express trust, obtain transparency, and grow (anonymous) recommender connections. In this work, we propose mining such active systems to generate easily understandable representations of the recommender network. Users may review these representations to provide active feedback. This approach further enhances the quality of recommendations, especially as topics of interest change over time. Most notably, it extends the amount of control users have over the model that the recommender network builds of their interests.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"82 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123234774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards workload-aware self-management: Predicting significant workload shifts 面向工作负载感知自我管理:预测重要的工作负载变化
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452738
M. Holze, A. Haschimi, N. Ritter
The workloads of enterprise DBS often show periodic patterns, e.g. because there are mainly OLTP transactions during day-time and analysis operations at night. However, current DBS self-management functions do not consider these periodic patterns in their analysis. Instead, they either adapt the DBS configuration to an overall “average” workload, or they reactively try to adapt the DBS configuration after every periodic change as if the workload had never been observed before. In this paper we propose a periodicity detection approach, which allows the prediction of workload changes for DBS self-management functions. For this purpose, we first describe how recurring DBS workloads, i.e. workloads that are similar to workloads that have been observed in the past, can be identified. We then propose two different approaches for detecting periodic patterns in the history of recurring DBS workloads. Finally we show how this knowledge on periodic patterns can be used to predict workload changes, and how it can be adapted to changes in the periodic patterns over time.
企业DBS的工作负载通常呈现周期性模式,例如,因为白天主要是OLTP事务,而晚上主要是分析操作。然而,目前的星展银行自我管理功能并没有在分析中考虑这些周期性模式。相反,它们要么使DBS配置适应总体的“平均”工作负载,要么在每次周期性更改之后积极地尝试适应DBS配置,就好像以前从未观察过工作负载一样。在本文中,我们提出了一种周期性检测方法,该方法允许预测DBS自我管理功能的工作负载变化。为此,我们首先描述如何识别重复出现的DBS工作负载,即与过去观察到的工作负载相似的工作负载。然后,我们提出了两种不同的方法来检测重复DBS工作负载历史中的周期性模式。最后,我们将展示如何使用周期性模式的知识来预测工作负载的变化,以及如何使其适应周期性模式随时间的变化。
{"title":"Towards workload-aware self-management: Predicting significant workload shifts","authors":"M. Holze, A. Haschimi, N. Ritter","doi":"10.1109/ICDEW.2010.5452738","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452738","url":null,"abstract":"The workloads of enterprise DBS often show periodic patterns, e.g. because there are mainly OLTP transactions during day-time and analysis operations at night. However, current DBS self-management functions do not consider these periodic patterns in their analysis. Instead, they either adapt the DBS configuration to an overall “average” workload, or they reactively try to adapt the DBS configuration after every periodic change as if the workload had never been observed before. In this paper we propose a periodicity detection approach, which allows the prediction of workload changes for DBS self-management functions. For this purpose, we first describe how recurring DBS workloads, i.e. workloads that are similar to workloads that have been observed in the past, can be identified. We then propose two different approaches for detecting periodic patterns in the history of recurring DBS workloads. Finally we show how this knowledge on periodic patterns can be used to predict workload changes, and how it can be adapted to changes in the periodic patterns over time.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128218865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1