2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)最新文献

英文中文

Efficient k-nearest neighbor queries with the Signature Quadratic Form Distance 具有签名二次形式距离的高效k近邻查询

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452772

C. Beecks, M. S. Uysal, T. Seidl

A frequently encountered query type in multimedia databases is the k-nearest neighbor query which finds the k-nearest neighbors of a given query. To speed up such queries and to meet the user requirements in low response time, approximation techniques play an important role. In this paper, we present an efficient approximation technique applicable to distance measures defined over flexible feature representations, i.e. feature signatures. We apply our approximation technique to the recently proposed Signature Quadratic Form Distance applicable to feature signatures. We performed our experiments on numerous image databases, gathering k-nearest neighbor query rankings in significantly low computation time with an average speed-up factor of 13.

多媒体数据库中经常遇到的查询类型是k近邻查询，它查找给定查询的k近邻。为了加快查询速度并在较短的响应时间内满足用户需求，近似技术起着重要的作用。在本文中，我们提出了一种有效的近似技术，适用于定义在灵活特征表示上的距离度量，即特征签名。我们将逼近技术应用于最近提出的适用于特征签名的签名二次形式距离。我们在许多图像数据库上进行了实验，在非常低的计算时间内收集了k个最近邻查询排名，平均加速系数为13。

引用次数: 14

Graphical models for dependencies and queries in uncertain data 不确定数据中依赖关系和查询的图形模型

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452725

Ruiwen Chen, I. Kiringa, Yongyi Mao

Dependency and query are two fundamental concepts in databases. Specifically, hypergraph representations of join dependencies and conjunctive queries have been widely investigated for conventional relational databases. However, we still lack a systematic study of such graphical representations for uncertain and probabilistic databases. In this paper we initiate a comprehensive study of the role of graphical models in representing uncertainty and evaluating queries.

依赖关系和查询是数据库中的两个基本概念。具体来说，对于传统的关系数据库，连接依赖关系和连接查询的超图表示已经得到了广泛的研究。然而，我们仍然缺乏对不确定和概率数据库的这种图形表示的系统研究。在本文中，我们对图形模型在表示不确定性和评估查询中的作用进行了全面的研究。

引用次数: 0

DIVERSUM: Towards diversified summarisation of entities in knowledge graphs DIVERSUM:迈向知识图谱中实体的多样化总结

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452707

M. Sydow, Mariusz Pikula, Ralf Schenkel

A problem of diversified entity summarisation in RDF-like knowledge graphs, with limited ¿presentation budget¿, is formulated and studied. A greedy algorithm that adapts previous ideas from IR is proposed and preliminary but promising experimental results on real dataset extracted from IMDB database are presented.

提出并研究了类rdf知识图在有限呈现预算下的多元实体摘要问题。本文提出了一种借鉴前人思想的贪心算法，并在IMDB数据库中提取的真实数据集上进行了初步的实验。

引用次数: 23

Streaming data integration: Challenges and opportunities 流数据集成:挑战与机遇

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452751

Nesime Tatbul

In this position paper, we motivate the need for streaming data integration in three main forms including across multiple streaming data sources, over multiple stream processing engine instances, and between stream processing engines and traditional database systems. We argue that this need presents a broad range of challenges and opportunities for new research. We provide an overview of the young state of the art in this area and further discuss a selected set of concrete research topics that are currently under investigation within the scope of our MaxStream federated stream processing project at ETH Zurich.

在本文中，我们提出了三种主要形式的流数据集成需求，包括跨多个流数据源、跨多个流处理引擎实例、流处理引擎和传统数据库系统之间的流数据集成。我们认为，这种需求为新研究带来了广泛的挑战和机遇。我们概述了这一领域的最新技术，并进一步讨论了目前在苏黎世联邦理工学院MaxStream联邦流处理项目范围内正在调查的一系列具体研究课题。

引用次数: 59

Subspace similarity search using the ideas of ranking and top-k retrieval 基于排序和top-k检索思想的子空间相似性搜索

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452771

T. Bernecker, Tobias Emrich, Franz Graf, H. Kriegel, Peer Kröger, M. Renz, Erich Schubert, A. Zimek

There are abundant scenarios for applications of similarity search in databases where the similarity of objects is defined for a subset of attributes, i.e., in a subspace, only. While much research has been done in efficient support of single column similarity queries or of similarity queries in the full space, scarcely any support of similarity search in subspaces has been provided so far. The three existing approaches are variations of the sequential scan. Here, we propose the first index-based solution to subspace similarity search in arbitrary subspaces which is based on the concepts of nearest neighbor ranking and top-k retrieval.

在数据库中有很多相似度搜索的应用场景，其中对象的相似度是为属性的子集定义的，即仅在子空间中定义。虽然在单列相似性查询和全空间相似性查询的有效支持方面已经做了很多研究，但是对子空间相似性查询的支持还很少。现有的三种方法是顺序扫描的变体。本文提出了基于最近邻排序和top-k检索概念的任意子空间相似性搜索的第一个基于索引的解决方案。

引用次数: 12

On novelty in publish/subscribe delivery 关于发布/订阅交付的新颖性

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452770

D. Souravlias, Marina Drosou, K. Stefanidis, E. Pitoura

In publish/subscribe systems, users express their interests in specific items of information and get notified when relevant data items are produced. Such systems allow users to stay informed without the need of going through huge amounts of data. However, as the volume of data being created increases, some form of ranking of matched events is needed to avoid overwhelming the users. In this work-in-progress paper, we explore novelty as a ranking criterion. An event is considered novel, if it matches a subscription that has rarely been matched in the past.

在发布/订阅系统中，用户表达他们对特定信息项的兴趣，并在产生相关数据项时得到通知。这样的系统允许用户在不需要浏览大量数据的情况下保持信息灵通。然而，随着创建的数据量的增加，需要对匹配事件进行某种形式的排序，以避免让用户不知所措。在这篇正在进行的论文中，我们将探索新颖性作为排名标准。如果一个事件与过去很少匹配的订阅匹配，则该事件被认为是新颖的。

引用次数: 3

Toward large scale data-aware search: Ranking, indexing, resolution and beyond 面向大规模数据感知搜索:排名、索引、解析等

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452729

Tao Cheng, K. Chang

As the Web has evolved into a data-rich repository, with the standard “page view,” current search engines are becoming increasingly inadequate. To realize data-aware search, toward searching for data entities on the Web, we have been developing the various aspects of an entity search system, including: entity ranking, entity indexing and parallelization, entity resolution, as well as generalization and customization. Preliminary results show the promise of our proposals, achieving high accuracy, efficiency and scalability. We will also summarize our contributions and point out interesting future directions along the line of enabling data-aware search on the Web.

随着网络发展成为一个数据丰富的存储库，加上标准的“页面浏览”，当前的搜索引擎正变得越来越不合适。为了实现数据感知搜索，针对Web上的数据实体搜索，我们一直在开发实体搜索系统的各个方面，包括:实体排序、实体索引和并行化、实体解析以及泛化和定制。初步结果表明，该方法具有较高的精度、效率和可扩展性。我们还将总结我们的贡献，并指出在Web上实现数据感知搜索的有趣的未来方向。

引用次数: 0

Constrained frequent itemset mining from uncertain data streams 基于不确定数据流的约束频繁项集挖掘

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452736

C. Leung, Boyu Hao, Fan Jiang

Frequent itemset mining is a common data mining task for many real-life applications. The mined frequent itemsets can be served as building blocks for various patterns including association rules and frequent sequences. Many existing algorithms mine for frequent itemsets from traditional static transaction databases, in which the contents of each transaction (namely, items) are definitely known and precise. However, there are many situations in which ones are uncertain about the contents of transactions. This calls for the mining of uncertain data. Moreover, there are also situations in which users are interested in only some portions of the mined frequent itemsets (i.e., itemsets satisfying user-specified constraints, which express the user interest). This leads to constrained mining. Furthermore, due to advances in technology, a flood of data can be produced in many situations. This calls for the mining of data streams. To deal with all these situations, we propose tree-based algorithms to efficiently mine streams of uncertain data for frequent itemsets that satisfy user-specified constraints.

频繁项集挖掘是许多实际应用程序中常见的数据挖掘任务。挖掘的频繁项集可以作为各种模式的构建块，包括关联规则和频繁序列。许多现有算法从传统的静态事务数据库中挖掘频繁的项目集，其中每个事务(即项目)的内容是明确已知和精确的。但是，在许多情况下，人们对交易的内容不确定。这就需要对不确定数据进行挖掘。此外，还存在用户只对挖掘的频繁项集的某些部分感兴趣的情况(即，满足用户指定约束的项集，它表示用户的兴趣)。这导致了采矿受限。此外，由于技术的进步，在许多情况下可以产生大量的数据。这就需要挖掘数据流。为了处理所有这些情况，我们提出了基于树的算法来有效地挖掘满足用户指定约束的频繁项集的不确定数据流。

引用次数: 16

Privometer: Privacy protection in social networks Privometer:社交网络中的隐私保护

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452715

N. Talukder, M. Ouzzani, A. Elmagarmid, Hazem Elmeleegy, M. Yakout

The increasing popularity of social networks, such as Facebook and Orkut, has raised several privacy concerns. Traditional ways of safeguarding privacy of personal information by hiding sensitive attributes are no longer adequate. Research shows that probabilistic classification techniques can effectively infer such private information. The disclosed sensitive information of friends, group affiliations and even participation in activities, such as tagging and commenting, are considered background knowledge in this process. In this paper, we present a privacy protection tool, called Privometer, that measures the amount of sensitive information leakage in a user profile and suggests self-sanitization actions to regulate the amount of leakage. In contrast to previous research, where inference techniques use publicly available profile information, we consider an augmented model where a potentially malicious application installed in the user's friend profiles can access substantially more information. In our model, merely hiding the sensitive information is not sufficient to protect the user privacy. We present an implementation of Privometer in Facebook.

Facebook和Orkut等社交网络的日益普及引发了一些隐私问题。通过隐藏敏感属性来保护个人信息隐私的传统方法已不再适用。研究表明，概率分类技术可以有效地推断出此类隐私信息。在这个过程中，被披露的朋友、团体关系甚至参与活动的敏感信息，如标签和评论，都被视为背景知识。在本文中，我们提出了一种隐私保护工具，称为Privometer，它可以测量用户配置文件中敏感信息的泄漏量，并建议自我清理操作来调节泄漏量。与先前的研究相反，在推理技术中使用公开可用的个人资料信息，我们考虑了一个增强模型，其中安装在用户朋友配置文件中的潜在恶意应用程序可以访问更多信息。在我们的模型中，仅仅隐藏敏感信息不足以保护用户隐私。我们提出了一个Privometer在Facebook上的实现。

{"title":"Privometer: Privacy protection in social networks","authors":"N. Talukder, M. Ouzzani, A. Elmagarmid, Hazem Elmeleegy, M. Yakout","doi":"10.1109/ICDEW.2010.5452715","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452715","url":null,"abstract":"The increasing popularity of social networks, such as Facebook and Orkut, has raised several privacy concerns. Traditional ways of safeguarding privacy of personal information by hiding sensitive attributes are no longer adequate. Research shows that probabilistic classification techniques can effectively infer such private information. The disclosed sensitive information of friends, group affiliations and even participation in activities, such as tagging and commenting, are considered background knowledge in this process. In this paper, we present a privacy protection tool, called Privometer, that measures the amount of sensitive information leakage in a user profile and suggests self-sanitization actions to regulate the amount of leakage. In contrast to previous research, where inference techniques use publicly available profile information, we consider an augmented model where a potentially malicious application installed in the user's friend profiles can access substantially more information. In our model, merely hiding the sensitive information is not sufficient to protect the user privacy. We present an implementation of Privometer in Facebook.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130015689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 75

Towards enterprise software as a service in the cloud 企业软件即云中的服务

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452748

J. Schaffner, D. Jacobs, B. Eckart, Jan Brunnert, A. Zeier

For traditional data warehouses, mostly large and expensive server and storage systems are used. In particular, for small- and medium size companies, it is often too expensive to run or rent such systems. These companies might need analytical services only from time to time, for example at the end of a billing period. A solution to overcome these problems is to use Cloud Computing. In this paper, we report on work-in-progress towards building an OLAP cluster of multi-tenant main memory column databases on the Amazon EC2 cloud computing environment, for which purpose we ported SAP's in-memory column database TREX to run in the Amazon cloud. We discuss early findings on cost/performance tradeoffs between reliably storing the data of a tenant on a single node using a highly-available network attached storage, such as Amazon EBS, vs. replication of tenant data to a secondary node where the data resides on less resilient storage. We also describe a mechanism to provide support for historical queries across older snapshots of tenant data which is lazy-loaded from Amazon's S3 near-line archiving storage and cached on the local VM disks.

对于传统的数据仓库，大多使用大型且昂贵的服务器和存储系统。特别是，对于中小型公司来说，运行或租用这样的系统往往过于昂贵。这些公司可能只是偶尔需要分析服务，例如在结算期结束时。克服这些问题的解决方案是使用云计算。在本文中，我们报告了在Amazon EC2云计算环境上构建多租户主内存列数据库的OLAP集群的工作进展，为此我们将SAP的内存列数据库TREX移植到Amazon云中运行。我们讨论了使用高可用性网络附加存储(如Amazon EBS)在单个节点上可靠地存储租户数据与将租户数据复制到数据驻留在弹性较差的存储上的辅助节点之间的成本/性能权衡的早期发现。我们还描述了一种机制，为租户数据的旧快照提供历史查询支持，这些快照是从Amazon的S3近行归档存储惰性加载的，并缓存在本地VM磁盘上。

{"title":"Towards enterprise software as a service in the cloud","authors":"J. Schaffner, D. Jacobs, B. Eckart, Jan Brunnert, A. Zeier","doi":"10.1109/ICDEW.2010.5452748","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452748","url":null,"abstract":"For traditional data warehouses, mostly large and expensive server and storage systems are used. In particular, for small- and medium size companies, it is often too expensive to run or rent such systems. These companies might need analytical services only from time to time, for example at the end of a billing period. A solution to overcome these problems is to use Cloud Computing. In this paper, we report on work-in-progress towards building an OLAP cluster of multi-tenant main memory column databases on the Amazon EC2 cloud computing environment, for which purpose we ported SAP's in-memory column database TREX to run in the Amazon cloud. We discuss early findings on cost/performance tradeoffs between reliably storing the data of a tenant on a single node using a highly-available network attached storage, such as Amazon EBS, vs. replication of tenant data to a secondary node where the data resides on less resilient storage. We also describe a mechanism to provide support for historical queries across older snapshots of tenant data which is lazy-loaded from Amazon's S3 near-line archiving storage and cached on the local VM disks.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114800566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀