Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management最新文献

英文中文

Diverse retrieval via greedy optimization of expected 1-call@k in a latent subtopic relevance model 基于贪婪优化1-call@k的潜在子主题关联模型的多样化检索

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2011-10-24 DOI: 10.1145/2063576.2063869

S. Sanner, Shengbo Guo, T. Graepel, S. Kharazmi, Sarvnaz Karimi

It has been previously observed that optimization of the 1-call@k relevance objective (i.e., a set-based objective that is 1 if at least one document is relevant, otherwise 0) empirically correlates with diverse retrieval. In this paper, we proceed one step further and show theoretically that greedily optimizing expected 1-call@k w.r.t. a latent subtopic model of binary relevance leads to a diverse retrieval algorithm sharing many features of existing diversification approaches. This new result is complementary to a variety of diverse retrieval algorithms derived from alternate rank-based relevance criteria such as average precision and reciprocal rank. As such, the derivation presented here for expected 1-call@k provides a novel theoretical perspective on the emergence of diversity via a latent subtopic model of relevance --- an idea underlying both ambiguous and faceted subtopic retrieval that have been used to motivate diverse retrieval.

以前已经观察到，1-call@k相关目标的优化(即，如果至少有一个文档相关，则基于集的目标为1，否则为0)与多种检索经验相关。在本文中，我们更进一步，从理论上证明了贪婪地优化期望1-call@k w.r.t.一个二元关联的潜在子主题模型导致了一个多样化的检索算法，它共享了现有多样化方法的许多特征。这个新的结果是补充了各种不同的检索算法，这些算法来源于基于秩的相关标准，如平均精度和倒数秩。因此，本文对预期1-call@k的推导提供了一种新的理论视角，通过潜在的关联子主题模型来研究多样性的出现——这是一种隐含在模糊和多面子主题检索基础上的思想，已被用于激励多样化检索。

引用次数: 20

Coupled nominal similarity in unsupervised learning 无监督学习中的耦合名义相似度

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2011-10-24 DOI: 10.1145/2063576.2063715

Can Wang, Longbing Cao, Mingchun Wang, Jinjiu Li, Wei Wei, Yuming Ou

The similarity between nominal objects is not straightforward, especially in unsupervised learning. This paper proposes coupled similarity metrics for nominal objects, which consider not only intra-coupled similarity within an attribute (i.e., value frequency distribution) but also inter-coupled similarity between attributes (i.e. feature dependency aggregation). Four metrics are designed to calculate the inter-coupled similarity between two categorical values by considering their relationships with other attributes. The theoretical analysis reveals their equivalent accuracy and superior efficiency based on intersection against others, in particular for large-scale data. Substantial experiments on extensive UCI data sets verify the theoretical conclusions. In addition, experiments of clustering based on the derived dissimilarity metrics show a significant performance improvement.

名义对象之间的相似性并不是直截了当的，特别是在无监督学习中。本文提出了标称对象的耦合相似度度量，该度量不仅考虑属性内的耦合相似度(即值频率分布)，而且考虑属性间的耦合相似度(即特征依赖聚合)。通过考虑两个分类值与其他属性的关系，设计了四个度量来计算两个分类值之间的相互耦合相似性。理论分析表明，这两种方法在交叉的基础上具有相当的精度和优越的效率，特别是在处理大规模数据时。在大量UCI数据集上的大量实验验证了理论结论。此外，基于衍生的不相似度度量的聚类实验显示了显著的性能改进。

引用次数: 86

A peer's-eye view: network term clouds in a peer-to-peer system 点对点系统中的网络术语云

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2011-10-24 DOI: 10.1145/2063576.2063852

Raynor Vliegendhart, M. Larson, Christoph Kofler, J. Pouwelse

We investigate term clouds that represent the content available in a peer-to-peer (P2P) network. Such network term clouds are non-trivial to generate in distributed settings. Our term cloud generator was implemented and released in Tribler--a widely-used, server-free P2P system--to support users in understanding the sorts of content available. Our evaluation and analysis focuses on three aspects of the clouds: coverage, usefulness and accumulation speed. A live experiment demonstrates that individual peers accumulate substantial network-level information, indicating good coverage of the overall content of the system. The results of a user study carried out on a crowdsourcing platform confirm the usefulness of clouds, showing that they succeed in conveying to users information on the type of content available in the network. An analysis of five example peers reveals that accumulation speeds of terms at new peers can support the development of a semantically diverse term set quickly after a cold start. This work represents the first investigation of term clouds in a live, 100% server-free P2P setting.

我们研究了表示点对点(P2P)网络中可用内容的术语云。这样的网络术语云在分布式设置中是不容易生成的。我们的术语云生成器在Tribler(一个广泛使用的无服务器P2P系统)中实现和发布，以支持用户理解可用内容的种类。我们的评估和分析主要集中在三个方面:云的覆盖范围、有用性和积累速度。现场实验表明，单个对等体积累了大量的网络级信息，表明系统的整体内容覆盖良好。在一个众包平台上进行的一项用户研究的结果证实了云的有用性，表明它们成功地向用户传达了有关网络中可用内容类型的信息。对五个实例节点的分析表明，新节点上的术语积累速度可以支持冷启动后快速开发语义多样化的术语集。这项工作代表了术语云在一个实时的、100%无服务器的P2P设置中的第一次调查。

{"title":"A peer's-eye view: network term clouds in a peer-to-peer system","authors":"Raynor Vliegendhart, M. Larson, Christoph Kofler, J. Pouwelse","doi":"10.1145/2063576.2063852","DOIUrl":"https://doi.org/10.1145/2063576.2063852","url":null,"abstract":"We investigate term clouds that represent the content available in a peer-to-peer (P2P) network. Such network term clouds are non-trivial to generate in distributed settings. Our term cloud generator was implemented and released in Tribler--a widely-used, server-free P2P system--to support users in understanding the sorts of content available. Our evaluation and analysis focuses on three aspects of the clouds: coverage, usefulness and accumulation speed. A live experiment demonstrates that individual peers accumulate substantial network-level information, indicating good coverage of the overall content of the system. The results of a user study carried out on a crowdsourcing platform confirm the usefulness of clouds, showing that they succeed in conveying to users information on the type of content available in the network. An analysis of five example peers reveals that accumulation speeds of terms at new peers can support the development of a semantically diverse term set quickly after a cold start. This work represents the first investigation of term clouds in a live, 100% server-free P2P setting.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"31 1","pages":"1909-1912"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80130611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TEXplorer: keyword-based object search and exploration in multidimensional text databases 多维文本数据库中基于关键字的对象搜索和探索

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2011-10-24 DOI: 10.1145/2063576.2063822

Bo Zhao, C. Lin, Bolin Ding, Jiawei Han

We propose a novel system TEXplorer that integrates keyword-based object ranking with the aggregation and exploration power of OLAP in a text database with rich structured attributes available, e.g., a product review database. TEXplorer can be implemented within a multi-dimensional text database, where each row is associated with structural dimensions (attributes) and text data (e.g., a document). The system utilizes the text cube data model, where a cell aggregates a set of documents with matching values in a subset of dimensions. Cells in a text cube capture different levels of summarization of the documents, and can represent objects at different conceptual levels. Users query the system by submitting a set of keywords. Instead of returning a ranked list of all the cells, we propose a keyword-based interactive exploration framework that could offer flexible OLAP navigational guides and help users identify the levels and objects they are interested in. A novel significance measure of dimensions is proposed based on the distribution of IR relevance of cells. During each interaction stage, dimensions are ranked according to their significance scores to guide drilling down; and cells in the same cuboids are ranked according to their relevance to guide exploration. We propose efficient algorithms and materialization strategies for ranking top-k dimensions and cells. Finally, extensive experiments on real datasets demonstrate the efficiency and effectiveness of our approach.

我们提出了一种新的系统TEXplorer，该系统将基于关键字的对象排序与OLAP在具有丰富结构化属性的文本数据库(例如产品评论数据库)中的聚合和探索能力相结合。TEXplorer可以在多维文本数据库中实现，其中每一行都与结构维度(属性)和文本数据(例如，文档)相关联。该系统利用文本多维数据集数据模型，其中单元格聚集一组在维度子集中具有匹配值的文档。文本多维数据集中的单元格捕获文档的不同级别的摘要，并且可以表示不同概念级别的对象。用户通过提交一组关键字查询系统。我们提出了一个基于关键字的交互式探索框架，它可以提供灵活的OLAP导航指南，并帮助用户识别他们感兴趣的级别和对象，而不是返回所有单元的排名列表。提出了一种基于细胞红外相关性分布的显著性维度度量方法。在每个交互阶段，根据各维度的显著性得分对其进行排序，以指导向下钻取;同一长方体中的细胞根据它们的相关性进行排序，以指导探索。我们提出了对top-k维和单元排序的有效算法和物化策略。最后，在实际数据集上的大量实验证明了我们的方法的效率和有效性。

{"title":"TEXplorer: keyword-based object search and exploration in multidimensional text databases","authors":"Bo Zhao, C. Lin, Bolin Ding, Jiawei Han","doi":"10.1145/2063576.2063822","DOIUrl":"https://doi.org/10.1145/2063576.2063822","url":null,"abstract":"We propose a novel system TEXplorer that integrates keyword-based object ranking with the aggregation and exploration power of OLAP in a text database with rich structured attributes available, e.g., a product review database. TEXplorer can be implemented within a multi-dimensional text database, where each row is associated with structural dimensions (attributes) and text data (e.g., a document). The system utilizes the text cube data model, where a cell aggregates a set of documents with matching values in a subset of dimensions. Cells in a text cube capture different levels of summarization of the documents, and can represent objects at different conceptual levels.\u0000 Users query the system by submitting a set of keywords. Instead of returning a ranked list of all the cells, we propose a keyword-based interactive exploration framework that could offer flexible OLAP navigational guides and help users identify the levels and objects they are interested in. A novel significance measure of dimensions is proposed based on the distribution of IR relevance of cells. During each interaction stage, dimensions are ranked according to their significance scores to guide drilling down; and cells in the same cuboids are ranked according to their relevance to guide exploration. We propose efficient algorithms and materialization strategies for ranking top-k dimensions and cells. Finally, extensive experiments on real datasets demonstrate the efficiency and effectiveness of our approach.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"26 1","pages":"1709-1718"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80297647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Structural link analysis and prediction in microblogs 微博结构链接分析与预测

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2011-10-24 DOI: 10.1145/2063576.2063743

Dawei Yin, Liangjie Hong, Brian D. Davison

With hundreds of millions of participants, social media services have become commonplace. Unlike a traditional social network service, a microblogging network like Twitter is a hybrid network, combining aspects of both social networks and information networks. Understanding the structure of such hybrid networks and predicting new links are important for many tasks such as friend recommendation, community detection, and modeling network growth. We note that the link prediction problem in a hybrid network is different from previously studied networks. Unlike the information networks and traditional online social networks, the structures in a hybrid network are more complicated and informative. We compare most popular and recent methods and principles for link prediction and recommendation. Finally we propose a novel structure-based personalized link prediction model and compare its predictive performance against many fundamental and popular link prediction methods on real-world data from the Twitter microblogging network. Our experiments on both static and dynamic data sets show that our methods noticeably outperform the state-of-the-art.

拥有数亿参与者的社交媒体服务已经变得司空见惯。与传统的社交网络服务不同，像Twitter这样的微博网络是一个混合网络，结合了社交网络和信息网络的各个方面。了解这种混合网络的结构并预测新的链接对于许多任务都很重要，例如朋友推荐、社区检测和网络增长建模。我们注意到混合网络中的链路预测问题不同于以往研究的网络。与信息网络和传统的在线社交网络不同，混合网络的结构更加复杂，信息量更大。我们比较了最流行的和最新的链接预测和推荐的方法和原则。最后，我们提出了一种新的基于结构的个性化链接预测模型，并将其与许多基本和流行的链接预测方法在Twitter微博网络真实数据上的预测性能进行了比较。我们在静态和动态数据集上的实验表明，我们的方法明显优于最先进的方法。

{"title":"Structural link analysis and prediction in microblogs","authors":"Dawei Yin, Liangjie Hong, Brian D. Davison","doi":"10.1145/2063576.2063743","DOIUrl":"https://doi.org/10.1145/2063576.2063743","url":null,"abstract":"With hundreds of millions of participants, social media services have become commonplace. Unlike a traditional social network service, a microblogging network like Twitter is a hybrid network, combining aspects of both social networks and information networks. Understanding the structure of such hybrid networks and predicting new links are important for many tasks such as friend recommendation, community detection, and modeling network growth. We note that the link prediction problem in a hybrid network is different from previously studied networks. Unlike the information networks and traditional online social networks, the structures in a hybrid network are more complicated and informative. We compare most popular and recent methods and principles for link prediction and recommendation. Finally we propose a novel structure-based personalized link prediction model and compare its predictive performance against many fundamental and popular link prediction methods on real-world data from the Twitter microblogging network. Our experiments on both static and dynamic data sets show that our methods noticeably outperform the state-of-the-art.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"3 1","pages":"1163-1168"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84954873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 82

Exploring categorization property of social annotations for information retrieval 探索面向信息检索的社交注释的分类特性

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2011-10-24 DOI: 10.1145/2063576.2063659

Peng Li, Bin Wang, Wei Jin, Jian-Yun Nie, Zhiwei Shi, Ben He

User generated social annotations provide extra information for describing document contents. In this paper, we propose an effective method to model the categorization property of social annotations and explore the potential of combining it with classical language models for improving retrieval performance. Specifically, a novel TR-LDA model is presented to take annotations as an additional source for generating document contents apart from the document itself. We provide strategies for representing and weighting the categorization property and develop an efficient inference algorithm, where space saving is taken into account. Experiments are carried out on synthetic datasets, where documents and queries come from the standard evaluation conference TREC and annotations come from the website Delicious.com. Our results demonstrate the effectiveness of the proposed method on the ad-hoc retrieval task, which significantly outperforms state-of-art baselines.

用户生成的社交注释为描述文档内容提供了额外的信息。在本文中，我们提出了一种有效的方法来建模社交注释的分类属性，并探讨了将其与经典语言模型相结合以提高检索性能的潜力。具体来说，提出了一种新的TR-LDA模型，将注释作为生成文档内容的额外来源。我们提供了表示和加权分类属性的策略，并开发了一个有效的推理算法，其中考虑到节省空间。实验在合成数据集上进行，其中文档和查询来自标准评估会议TREC，注释来自Delicious.com网站。我们的结果证明了所提出的方法在临时检索任务上的有效性，显著优于最先进的基线。

引用次数: 4

Generating links to background knowledge: a case study using narrative radiology reports 生成背景知识链接:使用叙述性放射学报告的案例研究

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2011-10-24 DOI: 10.1145/2063576.2063845

Jiyin He, M. de Rijke, M. Sevenster, R. V. Ommering, Y. Qian

Automatically annotating texts with background information has recently received much attention. We conduct a case study in automatically generating links from narrative radiology reports to Wikipedia. Such links help users understand the medical terminology and thereby increase the value of the reports. Direct applications of existing automatic link generation systems trained on Wikipedia to our radiology data do not yield satisfactory results. Our analysis reveals that medical phrases are often syntactically regular but semantically complicated, e.g., containing multiple concepts or concepts with multiple modifiers. The latter property is the main reason for the failure of existing systems. Based on this observation, we propose an automatic link generation approach that takes into account these properties. We use a sequential labeling approach with syntactic features for anchor text identification in order to exploit syntactic regularities in medical terminology. We combine this with a sub-anchor based approach to target finding, which is aimed at coping with the complex semantic structure of medical phrases. Empirical results show that the proposed system effectively improves the performance over existing systems.

具有背景信息的文本自动标注是近年来备受关注的问题。我们进行了一个案例研究，自动生成从叙事放射学报告到维基百科的链接。这些链接有助于用户理解医学术语，从而提高报告的价值。在维基百科上训练的现有自动链接生成系统直接应用于我们的放射学数据不能产生令人满意的结果。我们的分析表明，医学短语往往语法规则，但语义复杂，例如包含多个概念或多个修饰语的概念。后一种性质是现有系统失效的主要原因。基于这一观察，我们提出了一种考虑到这些属性的自动链接生成方法。为了利用医学术语的句法规律，我们使用了一种具有句法特征的顺序标记方法来进行锚文本识别。我们将其与基于子锚的目标查找方法相结合，旨在处理医学短语的复杂语义结构。实证结果表明，与现有系统相比，所提出的系统有效地提高了性能。

{"title":"Generating links to background knowledge: a case study using narrative radiology reports","authors":"Jiyin He, M. de Rijke, M. Sevenster, R. V. Ommering, Y. Qian","doi":"10.1145/2063576.2063845","DOIUrl":"https://doi.org/10.1145/2063576.2063845","url":null,"abstract":"Automatically annotating texts with background information has recently received much attention. We conduct a case study in automatically generating links from narrative radiology reports to Wikipedia. Such links help users understand the medical terminology and thereby increase the value of the reports. Direct applications of existing automatic link generation systems trained on Wikipedia to our radiology data do not yield satisfactory results. Our analysis reveals that medical phrases are often syntactically regular but semantically complicated, e.g., containing multiple concepts or concepts with multiple modifiers. The latter property is the main reason for the failure of existing systems. Based on this observation, we propose an automatic link generation approach that takes into account these properties. We use a sequential labeling approach with syntactic features for anchor text identification in order to exploit syntactic regularities in medical terminology. We combine this with a sub-anchor based approach to target finding, which is aimed at coping with the complex semantic structure of medical phrases. Empirical results show that the proposed system effectively improves the performance over existing systems.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"96 1","pages":"1867-1876"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80961304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

An algorithm for axiom pinpointing in EL+ and its incremental variant EL+中公理精确定位的一种算法及其增量变体

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2011-10-24 DOI: 10.1145/2063576.2063985

Xiaojun Cheng, G. Qi

Axiom pinpointing plays an important role in the development and maintenance of ontologies. It helps the user to comprehend an unwanted entailment of an ontology by presenting all minimal subsets of the ontology which are responsible for the entailment (called MinAs). In this paper, we consider the problem of axiom pinpointing in description logic EL+, which underpins OWL 2 EL, a profile of the latest version of Web Ontology Language (OWL). We propose a novel method to compute all MinAs that utilizes the hierarchy information obtained from the classification of an EL+ ontology. The advantage of our method over an existing labeled classification based method is that we do not attach labels to entailed subsumptions, which can be memory exhaustion for large scale ontologies. We further consider axiom pinpointing in EL+ when ontologies change. An incremental algorithm is given to compute all MinAs by reusing MinAs previously computed.

公理精确定位在本体的开发和维护中起着重要的作用。它通过呈现负责蕴涵的本体论的所有最小子集(称为MinAs)来帮助用户理解不想要的本体论蕴涵。本文研究了描述逻辑EL+中的公理定位问题，该逻辑是最新版本的Web本体语言(OWL) owl2的基础。我们提出了一种利用EL+本体分类得到的层次信息来计算所有MinAs的新方法。与现有的基于标签分类的方法相比，我们的方法的优点是我们不将标签附加到必要的假设上，这对于大规模的本体来说可能是内存耗尽。我们进一步考虑当本体改变时EL+中的公理精确定位。给出了一种增量算法，通过重用先前计算的最小值来计算所有最小值。

引用次数: 7

Privacy preservation by independent component analysis and variance control 独立分量分析和方差控制的隐私保护

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2011-10-24 DOI: 10.1145/2063576.2063709

Chih-Ming Hsu, Ming-Syan Chen

The primary objective of privacy preservation is to protect an individual's confidential information in released data sets. In recent years, several simulation-based approaches for privacy preservation have been proposed. The idea is to generate a synthetic data set with the constraint that the probability distribution is as close as possible to that of the original set. In this paper, we propose two frameworks for simulation-based privacy preservation of multivariate numerical data. The first framework, called PRIMP (PRivacy preserving by Independent coMPonents), is based on independent component analysis (ICA). It is shown empirically that PRIMP outperforms other simulation-based approaches in terms of Spearman's rank correlation and Kendall's tau correlation. The second approach proposed is a hybrid method that combines PRIMP and Cholesky's decomposition technique. It is shown empirically that the hybrid method preserves the covariance matrix of the original data exactly. The method also resolves the problem of generating good seeds for the Cholesky-based approach. Although the empirical results show that the hybrid approach is not always better than the PRIMP in terms of Spearman's rank correlation and Kendall's tau correlation, in theory, the risk of information leakage under the hybrid approach is much less than that under PRIMP.

隐私保护的主要目的是保护公开数据集中的个人机密信息。近年来，人们提出了几种基于仿真的隐私保护方法。其思想是生成一个综合数据集，其约束条件是概率分布尽可能接近原始数据集。本文提出了两种基于仿真的多元数值数据隐私保护框架。第一个框架称为PRIMP (PRivacy preserving by Independent coMPonents)，基于独立组件分析(ICA)。经验表明，PRIMP在Spearman的秩相关和Kendall的tau相关方面优于其他基于模拟的方法。第二种方法是结合PRIMP和Cholesky分解技术的混合方法。经验表明，混合方法较好地保留了原始数据的协方差矩阵。该方法还解决了基于cholesky的方法产生好的种子的问题。虽然实证结果表明，混合方法在Spearman的秩相关和Kendall的tau相关方面并不总是优于PRIMP方法，但从理论上讲，混合方法下的信息泄露风险远小于PRIMP方法。

{"title":"Privacy preservation by independent component analysis and variance control","authors":"Chih-Ming Hsu, Ming-Syan Chen","doi":"10.1145/2063576.2063709","DOIUrl":"https://doi.org/10.1145/2063576.2063709","url":null,"abstract":"The primary objective of privacy preservation is to protect an individual's confidential information in released data sets. In recent years, several simulation-based approaches for privacy preservation have been proposed. The idea is to generate a synthetic data set with the constraint that the probability distribution is as close as possible to that of the original set. In this paper, we propose two frameworks for simulation-based privacy preservation of multivariate numerical data. The first framework, called PRIMP (PRivacy preserving by Independent coMPonents), is based on independent component analysis (ICA). It is shown empirically that PRIMP outperforms other simulation-based approaches in terms of Spearman's rank correlation and Kendall's tau correlation. The second approach proposed is a hybrid method that combines PRIMP and Cholesky's decomposition technique. It is shown empirically that the hybrid method preserves the covariance matrix of the original data exactly. The method also resolves the problem of generating good seeds for the Cholesky-based approach. Although the empirical results show that the hybrid approach is not always better than the PRIMP in terms of Spearman's rank correlation and Kendall's tau correlation, in theory, the risk of information leakage under the hybrid approach is much less than that under PRIMP.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"9 1","pages":"925-930"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78465717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Topic modeling for named entity queries 命名实体查询的主题建模

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

Pub Date : 2011-10-24 DOI: 10.1145/2063576.2063877

Xiaobing Xue, Xiaoxin Yin

Named entities are observed in a large portion of web search queries (named entity queries), where each entity can be associated with many different query terms that refer to various aspects of this entity. Organizing these query terms into topics helps understand major search intents about entities and the discovered topics are useful for applications such as query suggestion. Furthermore, we notice that named entities can often be organized into categories and those from the same category share many generic topics. Therefore, working on a category of named entities instead of individual ones helps avoid the problems caused by the sparsity and noise in the data. In this paper, Named Entity Topic Model (NETM) is proposed to discover generic topics for a category of named entities, where the quality of the generic topics is improved through the model design and the parameter initialization. Experiments based on query log data show that NETM discovers high-quality topics and outperforms the state-of-the-art techniques by 12.8% based on F1 measure.

在大部分web搜索查询(命名实体查询)中可以观察到命名实体，其中每个实体都可以与许多不同的查询术语相关联，这些查询术语指的是该实体的各个方面。将这些查询词组织成主题有助于理解实体的主要搜索意图，发现的主题对查询建议等应用程序很有用。此外，我们注意到命名实体通常可以组织成类别，来自同一类别的实体共享许多通用主题。因此，处理命名实体的类别而不是单个实体有助于避免由数据稀疏性和噪声引起的问题。本文提出了命名实体主题模型(NETM)来发现一类命名实体的通用主题，该模型通过模型设计和参数初始化来提高通用主题的质量。基于查询日志数据的实验表明，NETM发现了高质量的主题，并且基于F1度量比最先进的技术高出12.8%。

引用次数: 11

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀