首页 > 最新文献

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)最新文献

英文 中文
Summarizing ontology-based schemas in PDMS 总结PDMS中基于本体的模式
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452706
Carlos Eduardo S. Pires, Paulo Orlando Queiroz-Sousa, Zoubida Kedad, A. Salgado
Quickly understanding the content of a data source is very useful in several contexts. In a Peer Data Management System (PDMS), peers can be semantically clustered, each cluster being represented by a schema obtained by merging the local schemas of the peers in this cluster. In this paper, we present a process for summarizing schemas of peers participating in a PDMS. We assume that all the schemas are represented by ontologies and we propose a summarization algorithm which produces a summary containing the maximum number of relevant concepts and the minimum number of non-relevant concepts of the initial ontology. The relevance of a concept is determined using the notions of centrality and frequency. Since several possible candidate summaries can be identified during the summarization process, classical Information Retrieval metrics are employed to determine the best summary.
快速理解数据源的内容在一些上下文中非常有用。在对等体数据管理系统(PDMS)中,对等体可以在语义上聚类,每个集群由一个模式表示,该模式是通过合并该集群中对等体的本地模式获得的。在本文中,我们提出了一个过程来总结参与PDMS的对等体的模式。我们假设所有模式都由本体表示,并提出了一种汇总算法,该算法生成的汇总包含初始本体中相关概念的最大数量和不相关概念的最小数量。概念的相关性是使用中心性和频率的概念来确定的。由于在总结过程中可以识别几个可能的候选摘要,因此使用经典的信息检索度量来确定最佳摘要。
{"title":"Summarizing ontology-based schemas in PDMS","authors":"Carlos Eduardo S. Pires, Paulo Orlando Queiroz-Sousa, Zoubida Kedad, A. Salgado","doi":"10.1109/ICDEW.2010.5452706","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452706","url":null,"abstract":"Quickly understanding the content of a data source is very useful in several contexts. In a Peer Data Management System (PDMS), peers can be semantically clustered, each cluster being represented by a schema obtained by merging the local schemas of the peers in this cluster. In this paper, we present a process for summarizing schemas of peers participating in a PDMS. We assume that all the schemas are represented by ontologies and we propose a summarization algorithm which produces a summary containing the maximum number of relevant concepts and the minimum number of non-relevant concepts of the initial ontology. The relevance of a concept is determined using the notions of centrality and frequency. Since several possible candidate summaries can be identified during the summarization process, classical Information Retrieval metrics are employed to determine the best summary.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121095924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Graph indexing for reachability queries 可达性查询的图形索引
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452724
Hilmi Yildirim, Mohammed J. Zaki
Reachability queries appear very frequently in many important applications that work with graph structured data. In some of them, testing reachability between two nodes corresponds to an important problem. For example, in proteinprotein interaction networks one can use it to answer whether two proteins are related, whereas in ontological databases such queries might correspond to the question of whether a concept subsumes another one. Given the huge databases that are often tested with reachability queries, it is important problem to come up with a scalable indexing scheme that has almost constant query time. In this paper, we bring a new dimension to the well-known interval labeling approach. Our approach labels each node with multiple intervals instead of a single interval so that each labeling represents a hyper-rectangle. Our new approach BOX can index dags in linear time and space while retaining the querying time admissible. In experiments, we show that BOX is not vulnerable to increasing edge to node ratios which is a problem for the existing approaches.
可达性查询在许多处理图结构数据的重要应用程序中非常频繁地出现。在某些情况下,测试两个节点之间的可达性是一个重要的问题。例如,在蛋白质-蛋白质相互作用网络中,可以用它来回答两个蛋白质是否相关,而在本体论数据库中,这样的查询可能对应于一个概念是否包含另一个概念的问题。对于经常使用可达性查询进行测试的大型数据库,提出具有几乎恒定查询时间的可扩展索引方案是一个重要问题。在本文中,我们为众所周知的区间标注方法带来了一个新的维度。我们的方法用多个间隔而不是一个间隔来标记每个节点,这样每个标记都代表一个超矩形。我们的新方法BOX可以在保持查询时间允许的情况下对线性时间和空间中的标记进行索引。在实验中,我们表明BOX不容易受到边节点比增加的影响,这是现有方法的一个问题。
{"title":"Graph indexing for reachability queries","authors":"Hilmi Yildirim, Mohammed J. Zaki","doi":"10.1109/ICDEW.2010.5452724","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452724","url":null,"abstract":"Reachability queries appear very frequently in many important applications that work with graph structured data. In some of them, testing reachability between two nodes corresponds to an important problem. For example, in proteinprotein interaction networks one can use it to answer whether two proteins are related, whereas in ontological databases such queries might correspond to the question of whether a concept subsumes another one. Given the huge databases that are often tested with reachability queries, it is important problem to come up with a scalable indexing scheme that has almost constant query time. In this paper, we bring a new dimension to the well-known interval labeling approach. Our approach labels each node with multiple intervals instead of a single interval so that each labeling represents a hyper-rectangle. Our new approach BOX can index dags in linear time and space while retaining the querying time admissible. In experiments, we show that BOX is not vulnerable to increasing edge to node ratios which is a problem for the existing approaches.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116410351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
DIVERSUM: Towards diversified summarisation of entities in knowledge graphs DIVERSUM:迈向知识图谱中实体的多样化总结
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452707
M. Sydow, Mariusz Pikula, Ralf Schenkel
A problem of diversified entity summarisation in RDF-like knowledge graphs, with limited ¿presentation budget¿, is formulated and studied. A greedy algorithm that adapts previous ideas from IR is proposed and preliminary but promising experimental results on real dataset extracted from IMDB database are presented.
提出并研究了类rdf知识图在有限呈现预算下的多元实体摘要问题。本文提出了一种借鉴前人思想的贪心算法,并在IMDB数据库中提取的真实数据集上进行了初步的实验。
{"title":"DIVERSUM: Towards diversified summarisation of entities in knowledge graphs","authors":"M. Sydow, Mariusz Pikula, Ralf Schenkel","doi":"10.1109/ICDEW.2010.5452707","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452707","url":null,"abstract":"A problem of diversified entity summarisation in RDF-like knowledge graphs, with limited ¿presentation budget¿, is formulated and studied. A greedy algorithm that adapts previous ideas from IR is proposed and preliminary but promising experimental results on real dataset extracted from IMDB database are presented.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127497090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Streaming data integration: Challenges and opportunities 流数据集成:挑战与机遇
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452751
Nesime Tatbul
In this position paper, we motivate the need for streaming data integration in three main forms including across multiple streaming data sources, over multiple stream processing engine instances, and between stream processing engines and traditional database systems. We argue that this need presents a broad range of challenges and opportunities for new research. We provide an overview of the young state of the art in this area and further discuss a selected set of concrete research topics that are currently under investigation within the scope of our MaxStream federated stream processing project at ETH Zurich.
在本文中,我们提出了三种主要形式的流数据集成需求,包括跨多个流数据源、跨多个流处理引擎实例、流处理引擎和传统数据库系统之间的流数据集成。我们认为,这种需求为新研究带来了广泛的挑战和机遇。我们概述了这一领域的最新技术,并进一步讨论了目前在苏黎世联邦理工学院MaxStream联邦流处理项目范围内正在调查的一系列具体研究课题。
{"title":"Streaming data integration: Challenges and opportunities","authors":"Nesime Tatbul","doi":"10.1109/ICDEW.2010.5452751","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452751","url":null,"abstract":"In this position paper, we motivate the need for streaming data integration in three main forms including across multiple streaming data sources, over multiple stream processing engine instances, and between stream processing engines and traditional database systems. We argue that this need presents a broad range of challenges and opportunities for new research. We provide an overview of the young state of the art in this area and further discuss a selected set of concrete research topics that are currently under investigation within the scope of our MaxStream federated stream processing project at ETH Zurich.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125537137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Subspace similarity search using the ideas of ranking and top-k retrieval 基于排序和top-k检索思想的子空间相似性搜索
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452771
T. Bernecker, Tobias Emrich, Franz Graf, H. Kriegel, Peer Kröger, M. Renz, Erich Schubert, A. Zimek
There are abundant scenarios for applications of similarity search in databases where the similarity of objects is defined for a subset of attributes, i.e., in a subspace, only. While much research has been done in efficient support of single column similarity queries or of similarity queries in the full space, scarcely any support of similarity search in subspaces has been provided so far. The three existing approaches are variations of the sequential scan. Here, we propose the first index-based solution to subspace similarity search in arbitrary subspaces which is based on the concepts of nearest neighbor ranking and top-k retrieval.
在数据库中有很多相似度搜索的应用场景,其中对象的相似度是为属性的子集定义的,即仅在子空间中定义。虽然在单列相似性查询和全空间相似性查询的有效支持方面已经做了很多研究,但是对子空间相似性查询的支持还很少。现有的三种方法是顺序扫描的变体。本文提出了基于最近邻排序和top-k检索概念的任意子空间相似性搜索的第一个基于索引的解决方案。
{"title":"Subspace similarity search using the ideas of ranking and top-k retrieval","authors":"T. Bernecker, Tobias Emrich, Franz Graf, H. Kriegel, Peer Kröger, M. Renz, Erich Schubert, A. Zimek","doi":"10.1109/ICDEW.2010.5452771","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452771","url":null,"abstract":"There are abundant scenarios for applications of similarity search in databases where the similarity of objects is defined for a subset of attributes, i.e., in a subspace, only. While much research has been done in efficient support of single column similarity queries or of similarity queries in the full space, scarcely any support of similarity search in subspaces has been provided so far. The three existing approaches are variations of the sequential scan. Here, we propose the first index-based solution to subspace similarity search in arbitrary subspaces which is based on the concepts of nearest neighbor ranking and top-k retrieval.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122270723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
On novelty in publish/subscribe delivery 关于发布/订阅交付的新颖性
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452770
D. Souravlias, Marina Drosou, K. Stefanidis, E. Pitoura
In publish/subscribe systems, users express their interests in specific items of information and get notified when relevant data items are produced. Such systems allow users to stay informed without the need of going through huge amounts of data. However, as the volume of data being created increases, some form of ranking of matched events is needed to avoid overwhelming the users. In this work-in-progress paper, we explore novelty as a ranking criterion. An event is considered novel, if it matches a subscription that has rarely been matched in the past.
在发布/订阅系统中,用户表达他们对特定信息项的兴趣,并在产生相关数据项时得到通知。这样的系统允许用户在不需要浏览大量数据的情况下保持信息灵通。然而,随着创建的数据量的增加,需要对匹配事件进行某种形式的排序,以避免让用户不知所措。在这篇正在进行的论文中,我们将探索新颖性作为排名标准。如果一个事件与过去很少匹配的订阅匹配,则该事件被认为是新颖的。
{"title":"On novelty in publish/subscribe delivery","authors":"D. Souravlias, Marina Drosou, K. Stefanidis, E. Pitoura","doi":"10.1109/ICDEW.2010.5452770","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452770","url":null,"abstract":"In publish/subscribe systems, users express their interests in specific items of information and get notified when relevant data items are produced. Such systems allow users to stay informed without the need of going through huge amounts of data. However, as the volume of data being created increases, some form of ranking of matched events is needed to avoid overwhelming the users. In this work-in-progress paper, we explore novelty as a ranking criterion. An event is considered novel, if it matches a subscription that has rarely been matched in the past.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128363350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Toward large scale data-aware search: Ranking, indexing, resolution and beyond 面向大规模数据感知搜索:排名、索引、解析等
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452729
Tao Cheng, K. Chang
As the Web has evolved into a data-rich repository, with the standard “page view,” current search engines are becoming increasingly inadequate. To realize data-aware search, toward searching for data entities on the Web, we have been developing the various aspects of an entity search system, including: entity ranking, entity indexing and parallelization, entity resolution, as well as generalization and customization. Preliminary results show the promise of our proposals, achieving high accuracy, efficiency and scalability. We will also summarize our contributions and point out interesting future directions along the line of enabling data-aware search on the Web.
随着网络发展成为一个数据丰富的存储库,加上标准的“页面浏览”,当前的搜索引擎正变得越来越不合适。为了实现数据感知搜索,针对Web上的数据实体搜索,我们一直在开发实体搜索系统的各个方面,包括:实体排序、实体索引和并行化、实体解析以及泛化和定制。初步结果表明,该方法具有较高的精度、效率和可扩展性。我们还将总结我们的贡献,并指出在Web上实现数据感知搜索的有趣的未来方向。
{"title":"Toward large scale data-aware search: Ranking, indexing, resolution and beyond","authors":"Tao Cheng, K. Chang","doi":"10.1109/ICDEW.2010.5452729","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452729","url":null,"abstract":"As the Web has evolved into a data-rich repository, with the standard “page view,” current search engines are becoming increasingly inadequate. To realize data-aware search, toward searching for data entities on the Web, we have been developing the various aspects of an entity search system, including: entity ranking, entity indexing and parallelization, entity resolution, as well as generalization and customization. Preliminary results show the promise of our proposals, achieving high accuracy, efficiency and scalability. We will also summarize our contributions and point out interesting future directions along the line of enabling data-aware search on the Web.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129898981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constrained frequent itemset mining from uncertain data streams 基于不确定数据流的约束频繁项集挖掘
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452736
C. Leung, Boyu Hao, Fan Jiang
Frequent itemset mining is a common data mining task for many real-life applications. The mined frequent itemsets can be served as building blocks for various patterns including association rules and frequent sequences. Many existing algorithms mine for frequent itemsets from traditional static transaction databases, in which the contents of each transaction (namely, items) are definitely known and precise. However, there are many situations in which ones are uncertain about the contents of transactions. This calls for the mining of uncertain data. Moreover, there are also situations in which users are interested in only some portions of the mined frequent itemsets (i.e., itemsets satisfying user-specified constraints, which express the user interest). This leads to constrained mining. Furthermore, due to advances in technology, a flood of data can be produced in many situations. This calls for the mining of data streams. To deal with all these situations, we propose tree-based algorithms to efficiently mine streams of uncertain data for frequent itemsets that satisfy user-specified constraints.
频繁项集挖掘是许多实际应用程序中常见的数据挖掘任务。挖掘的频繁项集可以作为各种模式的构建块,包括关联规则和频繁序列。许多现有算法从传统的静态事务数据库中挖掘频繁的项目集,其中每个事务(即项目)的内容是明确已知和精确的。但是,在许多情况下,人们对交易的内容不确定。这就需要对不确定数据进行挖掘。此外,还存在用户只对挖掘的频繁项集的某些部分感兴趣的情况(即,满足用户指定约束的项集,它表示用户的兴趣)。这导致了采矿受限。此外,由于技术的进步,在许多情况下可以产生大量的数据。这就需要挖掘数据流。为了处理所有这些情况,我们提出了基于树的算法来有效地挖掘满足用户指定约束的频繁项集的不确定数据流。
{"title":"Constrained frequent itemset mining from uncertain data streams","authors":"C. Leung, Boyu Hao, Fan Jiang","doi":"10.1109/ICDEW.2010.5452736","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452736","url":null,"abstract":"Frequent itemset mining is a common data mining task for many real-life applications. The mined frequent itemsets can be served as building blocks for various patterns including association rules and frequent sequences. Many existing algorithms mine for frequent itemsets from traditional static transaction databases, in which the contents of each transaction (namely, items) are definitely known and precise. However, there are many situations in which ones are uncertain about the contents of transactions. This calls for the mining of uncertain data. Moreover, there are also situations in which users are interested in only some portions of the mined frequent itemsets (i.e., itemsets satisfying user-specified constraints, which express the user interest). This leads to constrained mining. Furthermore, due to advances in technology, a flood of data can be produced in many situations. This calls for the mining of data streams. To deal with all these situations, we propose tree-based algorithms to efficiently mine streams of uncertain data for frequent itemsets that satisfy user-specified constraints.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132485477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Privometer: Privacy protection in social networks Privometer:社交网络中的隐私保护
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452715
N. Talukder, M. Ouzzani, A. Elmagarmid, Hazem Elmeleegy, M. Yakout
The increasing popularity of social networks, such as Facebook and Orkut, has raised several privacy concerns. Traditional ways of safeguarding privacy of personal information by hiding sensitive attributes are no longer adequate. Research shows that probabilistic classification techniques can effectively infer such private information. The disclosed sensitive information of friends, group affiliations and even participation in activities, such as tagging and commenting, are considered background knowledge in this process. In this paper, we present a privacy protection tool, called Privometer, that measures the amount of sensitive information leakage in a user profile and suggests self-sanitization actions to regulate the amount of leakage. In contrast to previous research, where inference techniques use publicly available profile information, we consider an augmented model where a potentially malicious application installed in the user's friend profiles can access substantially more information. In our model, merely hiding the sensitive information is not sufficient to protect the user privacy. We present an implementation of Privometer in Facebook.
Facebook和Orkut等社交网络的日益普及引发了一些隐私问题。通过隐藏敏感属性来保护个人信息隐私的传统方法已不再适用。研究表明,概率分类技术可以有效地推断出此类隐私信息。在这个过程中,被披露的朋友、团体关系甚至参与活动的敏感信息,如标签和评论,都被视为背景知识。在本文中,我们提出了一种隐私保护工具,称为Privometer,它可以测量用户配置文件中敏感信息的泄漏量,并建议自我清理操作来调节泄漏量。与先前的研究相反,在推理技术中使用公开可用的个人资料信息,我们考虑了一个增强模型,其中安装在用户朋友配置文件中的潜在恶意应用程序可以访问更多信息。在我们的模型中,仅仅隐藏敏感信息不足以保护用户隐私。我们提出了一个Privometer在Facebook上的实现。
{"title":"Privometer: Privacy protection in social networks","authors":"N. Talukder, M. Ouzzani, A. Elmagarmid, Hazem Elmeleegy, M. Yakout","doi":"10.1109/ICDEW.2010.5452715","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452715","url":null,"abstract":"The increasing popularity of social networks, such as Facebook and Orkut, has raised several privacy concerns. Traditional ways of safeguarding privacy of personal information by hiding sensitive attributes are no longer adequate. Research shows that probabilistic classification techniques can effectively infer such private information. The disclosed sensitive information of friends, group affiliations and even participation in activities, such as tagging and commenting, are considered background knowledge in this process. In this paper, we present a privacy protection tool, called Privometer, that measures the amount of sensitive information leakage in a user profile and suggests self-sanitization actions to regulate the amount of leakage. In contrast to previous research, where inference techniques use publicly available profile information, we consider an augmented model where a potentially malicious application installed in the user's friend profiles can access substantially more information. In our model, merely hiding the sensitive information is not sufficient to protect the user privacy. We present an implementation of Privometer in Facebook.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130015689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Towards enterprise software as a service in the cloud 企业软件即云中的服务
Pub Date : 2010-03-01 DOI: 10.1109/ICDEW.2010.5452748
J. Schaffner, D. Jacobs, B. Eckart, Jan Brunnert, A. Zeier
For traditional data warehouses, mostly large and expensive server and storage systems are used. In particular, for small- and medium size companies, it is often too expensive to run or rent such systems. These companies might need analytical services only from time to time, for example at the end of a billing period. A solution to overcome these problems is to use Cloud Computing. In this paper, we report on work-in-progress towards building an OLAP cluster of multi-tenant main memory column databases on the Amazon EC2 cloud computing environment, for which purpose we ported SAP's in-memory column database TREX to run in the Amazon cloud. We discuss early findings on cost/performance tradeoffs between reliably storing the data of a tenant on a single node using a highly-available network attached storage, such as Amazon EBS, vs. replication of tenant data to a secondary node where the data resides on less resilient storage. We also describe a mechanism to provide support for historical queries across older snapshots of tenant data which is lazy-loaded from Amazon's S3 near-line archiving storage and cached on the local VM disks.
对于传统的数据仓库,大多使用大型且昂贵的服务器和存储系统。特别是,对于中小型公司来说,运行或租用这样的系统往往过于昂贵。这些公司可能只是偶尔需要分析服务,例如在结算期结束时。克服这些问题的解决方案是使用云计算。在本文中,我们报告了在Amazon EC2云计算环境上构建多租户主内存列数据库的OLAP集群的工作进展,为此我们将SAP的内存列数据库TREX移植到Amazon云中运行。我们讨论了使用高可用性网络附加存储(如Amazon EBS)在单个节点上可靠地存储租户数据与将租户数据复制到数据驻留在弹性较差的存储上的辅助节点之间的成本/性能权衡的早期发现。我们还描述了一种机制,为租户数据的旧快照提供历史查询支持,这些快照是从Amazon的S3近行归档存储惰性加载的,并缓存在本地VM磁盘上。
{"title":"Towards enterprise software as a service in the cloud","authors":"J. Schaffner, D. Jacobs, B. Eckart, Jan Brunnert, A. Zeier","doi":"10.1109/ICDEW.2010.5452748","DOIUrl":"https://doi.org/10.1109/ICDEW.2010.5452748","url":null,"abstract":"For traditional data warehouses, mostly large and expensive server and storage systems are used. In particular, for small- and medium size companies, it is often too expensive to run or rent such systems. These companies might need analytical services only from time to time, for example at the end of a billing period. A solution to overcome these problems is to use Cloud Computing. In this paper, we report on work-in-progress towards building an OLAP cluster of multi-tenant main memory column databases on the Amazon EC2 cloud computing environment, for which purpose we ported SAP's in-memory column database TREX to run in the Amazon cloud. We discuss early findings on cost/performance tradeoffs between reliably storing the data of a tenant on a single node using a highly-available network attached storage, such as Amazon EBS, vs. replication of tenant data to a secondary node where the data resides on less resilient storage. We also describe a mechanism to provide support for historical queries across older snapshots of tenant data which is lazy-loaded from Amazon's S3 near-line archiving storage and cached on the local VM disks.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114800566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1