首页 > 最新文献

Proceedings of the 22nd ACM international conference on Information & Knowledge Management最新文献

英文 中文
Fast parameterless density-based clustering via random projections 基于随机投影的快速无参数密度聚类
Johannes Schneider, M. Vlachos
Clustering offers significant insights in data analysis. Density based algorithms have emerged as flexible and efficient techniques, able to discover high-quality and potentially irregularly shaped- clusters. We present two fast density-based clustering algorithms based on random projections. Both algorithms demonstrate one to two orders of magnitude speedup compared to equivalent state-of-art density based techniques, even for modest-size datasets. We give a comprehensive analysis of both our algorithms and show runtime of O(dNlog2 N), for a d-dimensional dataset. Our first algorithm can be viewed as a fast variant of the OPTICS density-based algorithm, but using a softer definition of density combined with sampling. The second algorithm is parameter-less, and identifies areas separating clusters.
聚类为数据分析提供了重要的见解。基于密度的算法已经成为一种灵活而高效的技术,能够发现高质量和潜在不规则形状的集群。提出了两种基于随机投影的快速密度聚类算法。与同等的基于密度的技术相比,这两种算法都显示出一到两个数量级的加速,即使对于中等规模的数据集也是如此。我们对我们的算法进行了全面的分析,并显示了d维数据集的运行时间为O(dnlog2n)。我们的第一个算法可以看作是OPTICS基于密度的算法的快速变体,但是使用了更柔和的密度定义和采样相结合。第二种算法是无参数的,它识别分离聚类的区域。
{"title":"Fast parameterless density-based clustering via random projections","authors":"Johannes Schneider, M. Vlachos","doi":"10.1145/2505515.2505590","DOIUrl":"https://doi.org/10.1145/2505515.2505590","url":null,"abstract":"Clustering offers significant insights in data analysis. Density based algorithms have emerged as flexible and efficient techniques, able to discover high-quality and potentially irregularly shaped- clusters. We present two fast density-based clustering algorithms based on random projections. Both algorithms demonstrate one to two orders of magnitude speedup compared to equivalent state-of-art density based techniques, even for modest-size datasets. We give a comprehensive analysis of both our algorithms and show runtime of O(dNlog2 N), for a d-dimensional dataset. Our first algorithm can be viewed as a fast variant of the OPTICS density-based algorithm, but using a softer definition of density combined with sampling. The second algorithm is parameter-less, and identifies areas separating clusters.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75102743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
ImG-complex: graph data model for topology of unstructured meshes ImG-complex:用于非结构化网格拓扑的图形数据模型
Alireza Rezaei Mahdiraji, P. Baumann, G. Berti
Although, many applications use unstructured meshes, there is no specialized mesh database which supports storing and querying mesh data. Existing mesh libraries do not support declarative querying and are expensive to maintain. A mesh database can benefit the domains in several ways such as: declarative query language, ease of maintenance, etc. In this paper, we propose the Incidence multi-Graph Complex (ImG-Complex) data model for storing topological aspects of meshes in a database. ImG-Complex extends incidence graph (IG) model with multi-incidence information to represent a new object class which we call ImG-Complexes. We introduce optional and application-specific constraints to limit the ImG model to smaller object classes and validate mesh structures based on the modeled object class properties. We show how Neo4j graph database can be used to query mesh topology based on the (possibly constrained) ImG model. Finally, we experiment Neo4j and PostgreSQL performance on executing topological mesh queries.
尽管许多应用程序使用非结构化网格,但没有专门的网格数据库来支持存储和查询网格数据。现有的网格库不支持声明式查询,而且维护成本很高。网格数据库可以在几个方面使这些领域受益,例如:声明性查询语言、易于维护等。在本文中,我们提出了关联多图复合体(ImG-Complex)数据模型用于在数据库中存储网格的拓扑方面。ImG-Complex扩展了具有多关联信息的关联图(IG)模型来表示一个新的对象类,我们称之为ImG-Complex。我们引入了可选的和特定于应用程序的约束,将ImG模型限制为较小的对象类,并根据建模的对象类属性验证网格结构。我们展示了Neo4j图形数据库如何用于基于(可能受约束的)ImG模型查询网格拓扑。最后,我们测试了Neo4j和PostgreSQL在执行拓扑网格查询方面的性能。
{"title":"ImG-complex: graph data model for topology of unstructured meshes","authors":"Alireza Rezaei Mahdiraji, P. Baumann, G. Berti","doi":"10.1145/2505515.2505733","DOIUrl":"https://doi.org/10.1145/2505515.2505733","url":null,"abstract":"Although, many applications use unstructured meshes, there is no specialized mesh database which supports storing and querying mesh data. Existing mesh libraries do not support declarative querying and are expensive to maintain. A mesh database can benefit the domains in several ways such as: declarative query language, ease of maintenance, etc. In this paper, we propose the Incidence multi-Graph Complex (ImG-Complex) data model for storing topological aspects of meshes in a database. ImG-Complex extends incidence graph (IG) model with multi-incidence information to represent a new object class which we call ImG-Complexes. We introduce optional and application-specific constraints to limit the ImG model to smaller object classes and validate mesh structures based on the modeled object class properties. We show how Neo4j graph database can be used to query mesh topology based on the (possibly constrained) ImG model. Finally, we experiment Neo4j and PostgreSQL performance on executing topological mesh queries.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75561708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
PIDGIN: ontology alignment using web text as interlingua PIDGIN:使用网络文本作为中间语言的本体对齐
D. Wijaya, P. Talukdar, Tom Michael Mitchell
The problem of aligning ontologies and database schemas across different knowledge bases and databases is fundamental to knowledge management problems, including the problem of integrating the disparate knowledge sources that form the semantic web's Linked Data [5]. We present a novel approach to this ontology alignment problem that employs a very large natural language text corpus as an interlingua to relate different knowledge bases (KBs). The result is a scalable and robust method (PIDGIN) that aligns relations and categories across different KBs by analyzing both (1) shared relation instances across these KBs, and (2) the verb phrases in the text instantiations of these relation instances. Experiments with PIDGIN demonstrate its superior performance when aligning ontologies across large existing KBs including NELL, Yago and Freebase. Furthermore, we show that in addition to aligning ontologies, PIDGIN can automatically learn from text, the verb phrases to identify relations, and can also type the arguments of relations of different KBs.
跨不同的知识库和数据库调整本体和数据库模式的问题是知识管理问题的基础,包括整合构成语义网关联数据的不同知识来源的问题[5]。我们提出了一种新的方法来解决这个本体对齐问题,该方法使用一个非常大的自然语言文本语料库作为连接不同知识库(KBs)的中间语言。结果是一种可扩展且健壮的方法(PIDGIN),它通过分析(1)这些KBs之间的共享关系实例和(2)这些关系实例的文本实例中的动词短语来对齐不同KBs之间的关系和类别。使用PIDGIN进行的实验证明了它在跨大型现有KBs(包括NELL、Yago和Freebase)对齐本体时的优越性能。此外,我们表明,除了对齐本体之外,PIDGIN还可以自动从文本、动词短语中学习以识别关系,并且还可以键入不同KBs关系的参数。
{"title":"PIDGIN: ontology alignment using web text as interlingua","authors":"D. Wijaya, P. Talukdar, Tom Michael Mitchell","doi":"10.1145/2505515.2505559","DOIUrl":"https://doi.org/10.1145/2505515.2505559","url":null,"abstract":"The problem of aligning ontologies and database schemas across different knowledge bases and databases is fundamental to knowledge management problems, including the problem of integrating the disparate knowledge sources that form the semantic web's Linked Data [5]. We present a novel approach to this ontology alignment problem that employs a very large natural language text corpus as an interlingua to relate different knowledge bases (KBs). The result is a scalable and robust method (PIDGIN) that aligns relations and categories across different KBs by analyzing both (1) shared relation instances across these KBs, and (2) the verb phrases in the text instantiations of these relation instances. Experiments with PIDGIN demonstrate its superior performance when aligning ontologies across large existing KBs including NELL, Yago and Freebase. Furthermore, we show that in addition to aligning ontologies, PIDGIN can automatically learn from text, the verb phrases to identify relations, and can also type the arguments of relations of different KBs.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73082021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Causality and responsibility: probabilistic queries revisited in uncertain databases 因果关系和责任:在不确定的数据库中重新访问的概率查询
Xiang Lian, Lei Chen
Recently, due to ubiquitous data uncertainty in many real-life applications, it has become increasingly important to study efficient and effective processing of various probabilistic queries over uncertain data, which usually retrieve uncertain objects that satisfy query predicates with high probabilities. However, one annoying, yet challenging, problem is that, some probabilistic queries are very sensitive to low-quality objects in uncertain databases, and the returned query answers might miss some important results (due to low data quality). To identify both accurate query answers and those potentially low-quality objects, in this paper, we investigate the causes of query answers/non-answers from a novel angle of causality and responsibility (CR), and propose a new interpretation of probabilistic queries. Particularly, we focus on the problem of CR-based probabilistic nearest neighbor (CR-PNN) query, and design a general framework for answering CR-based queries (including CR-PNN), which can return both query answers with high confidences and low-quality objects that may potentially affect query results (for data cleaning purposes). To efficiently process CR-PNN queries, we propose effective pruning strategies to quickly filter out false alarms, and design efficient algorithms to obtain CR-PNN answers. Extensive experiments have been conducted to verify the efficiency and effectiveness of our proposed approaches.
近年来,由于数据的不确定性在许多实际应用中普遍存在,研究对不确定数据的各种概率查询的高效处理变得越来越重要,这些查询通常检索到满足高概率查询谓词的不确定对象。然而,一个恼人但又具有挑战性的问题是,一些概率查询对不确定数据库中的低质量对象非常敏感,并且返回的查询答案可能会错过一些重要的结果(由于数据质量低)。为了识别准确的查询答案和潜在的低质量对象,本文从因果关系和责任(CR)的新角度研究了查询答案/不答案的原因,并提出了对概率查询的新解释。特别地,我们关注基于cr的概率最近邻(CR-PNN)查询问题,并设计了一个通用框架来回答基于cr的查询(包括CR-PNN),该框架既可以返回高置信度的查询答案,也可以返回可能影响查询结果的低质量对象(用于数据清理目的)。为了有效地处理CR-PNN查询,我们提出了有效的修剪策略来快速过滤假警报,并设计了高效的算法来获得CR-PNN答案。已经进行了大量的实验来验证我们提出的方法的效率和有效性。
{"title":"Causality and responsibility: probabilistic queries revisited in uncertain databases","authors":"Xiang Lian, Lei Chen","doi":"10.1145/2505515.2505754","DOIUrl":"https://doi.org/10.1145/2505515.2505754","url":null,"abstract":"Recently, due to ubiquitous data uncertainty in many real-life applications, it has become increasingly important to study efficient and effective processing of various probabilistic queries over uncertain data, which usually retrieve uncertain objects that satisfy query predicates with high probabilities. However, one annoying, yet challenging, problem is that, some probabilistic queries are very sensitive to low-quality objects in uncertain databases, and the returned query answers might miss some important results (due to low data quality). To identify both accurate query answers and those potentially low-quality objects, in this paper, we investigate the causes of query answers/non-answers from a novel angle of causality and responsibility (CR), and propose a new interpretation of probabilistic queries. Particularly, we focus on the problem of CR-based probabilistic nearest neighbor (CR-PNN) query, and design a general framework for answering CR-based queries (including CR-PNN), which can return both query answers with high confidences and low-quality objects that may potentially affect query results (for data cleaning purposes). To efficiently process CR-PNN queries, we propose effective pruning strategies to quickly filter out false alarms, and design efficient algorithms to obtain CR-PNN answers. Extensive experiments have been conducted to verify the efficiency and effectiveness of our proposed approaches.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74964086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Flexible and dynamic compromises for effective recommendations 灵活和动态的妥协,有效的建议
Saurabh Gupta, Sutanu Chakraborti
Conversational Recommendation mimics the kind of dialog that takes between a customer and a shopkeeper involving multiple interactions where the user can give feedback at every interaction as opposed to Single Shot Retrieval, which corresponds to a scheme where the system retrieves a set of items in response to a user query in a single interaction. Compromise refers to a particular user preference which the recommender system failed to satisfy. But in the context of conversational systems, where the user's preferences keep on evolving as she interacts with the system, what constitutes as a compromise for her also keeps on changing. Typically, in Single Shot retrieval, the notion of compromise is characterized by the assignment of a particular feature to a particular dominance group such as MIB (higher value is better) or LIB (lower value is better) and this assignment remains true for all the users who use the system. In this paper, we propose a way to realize the notion of compromise in a conversational setting. Our approach, Flexi-Comp, introduces the notion of dynamically assigning a feature to two dominance groups simultaneously which is then used to redefine the notion of compromise. We show experimentally that a utility function based on this notion of compromise outperforms the existing conversational recommenders in terms of recommendation efficiency.
会话推荐模仿了一种顾客和店主之间的对话,其中涉及多个交互,用户可以在每次交互中给出反馈,而不是单次检索,它对应于系统在单个交互中检索一组项目以响应用户查询的方案。折衷是指推荐系统无法满足的特定用户偏好。但是在会话系统的环境中,用户的偏好随着她与系统的交互而不断变化,对她来说,什么是妥协也在不断变化。通常,在单次检索中,折衷概念的特征是将特定特征分配给特定的优势组,例如MIB(值越高越好)或LIB(值越低越好),并且该分配对使用系统的所有用户都是正确的。在本文中,我们提出了一种在会话环境中实现妥协概念的方法。我们的方法,flex - comp,引入了动态地将一个特性同时分配给两个优势群体的概念,然后用于重新定义妥协的概念。我们通过实验证明,基于妥协概念的效用函数在推荐效率方面优于现有的会话推荐器。
{"title":"Flexible and dynamic compromises for effective recommendations","authors":"Saurabh Gupta, Sutanu Chakraborti","doi":"10.1145/2505515.2507893","DOIUrl":"https://doi.org/10.1145/2505515.2507893","url":null,"abstract":"Conversational Recommendation mimics the kind of dialog that takes between a customer and a shopkeeper involving multiple interactions where the user can give feedback at every interaction as opposed to Single Shot Retrieval, which corresponds to a scheme where the system retrieves a set of items in response to a user query in a single interaction. Compromise refers to a particular user preference which the recommender system failed to satisfy. But in the context of conversational systems, where the user's preferences keep on evolving as she interacts with the system, what constitutes as a compromise for her also keeps on changing. Typically, in Single Shot retrieval, the notion of compromise is characterized by the assignment of a particular feature to a particular dominance group such as MIB (higher value is better) or LIB (lower value is better) and this assignment remains true for all the users who use the system. In this paper, we propose a way to realize the notion of compromise in a conversational setting. Our approach, Flexi-Comp, introduces the notion of dynamically assigning a feature to two dominance groups simultaneously which is then used to redefine the notion of compromise. We show experimentally that a utility function based on this notion of compromise outperforms the existing conversational recommenders in terms of recommendation efficiency.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72803530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Network-aware search in social tagging applications: instance optimality versus efficiency 社交标签应用中的网络感知搜索:实例最优性与效率
S. Maniu, Bogdan Cautis
We consider in this paper top-k query answering in social applications, with a focus on social tagging. This problem requires a significant departure from socially agnostic techniques. In a network- aware context, one can (and should) exploit the social links, which can indicate how users relate to the seeker and how much weight their tagging actions should have in the result build-up. We propose algorithms that have the potential to scale to current applications. While the problem has already been considered in previous literature, this was done either under strong simplifying assumptions or under choices that cannot scale to even moderate-size real-world applications. We first revisit a key aspect of the problem, which is accessing the closest or most relevant users for a given seeker. We describe how this can be done on the fly (without any pre- computations) for several possible choices -- arguably the most natural ones -- of proximity computation in a user network. Based on this, our top-k algorithm is sound and complete, addressing the applicability issues of the existing ones. Moreover, it performs significantly better in general and is instance optimal in the case when the search relies exclusively on the social weight of tagging actions. To further address the efficiency needs of online applications, for which the exact search, albeit optimal, may still be expensive, we then consider approximate algorithms. Specifically, these rely on concise statistics about the social network or on approximate shortest-paths computations. Extensive experiments on real-world data from Twitter show that our techniques can drastically improve response time, without sacrificing precision.
本文研究了社交应用中的top-k查询应答,重点研究了社交标签。这个问题需要从社会不可知论技术中做出重大改变。在网络感知的环境中,人们可以(也应该)利用社会链接,这可以表明用户与搜索者之间的关系,以及他们的标签行为在结果构建中应该占有多大的权重。我们提出的算法具有扩展到当前应用程序的潜力。虽然以前的文献已经考虑过这个问题,但这要么是在严格简化的假设下完成的,要么是在无法扩展到中等规模的实际应用程序的选择下完成的。我们首先回顾这个问题的一个关键方面,即为给定的搜索者访问最接近或最相关的用户。我们描述了如何在运行中(没有任何预先计算)为几个可能的选择(可以说是最自然的选择)在用户网络中进行接近计算。基于此,我们的top-k算法是完善的,解决了现有算法的适用性问题。此外,它在一般情况下表现得更好,并且在搜索完全依赖于标记操作的社会权重的情况下是实例最佳的。为了进一步解决在线应用程序的效率需求,尽管精确搜索是最优的,但仍然可能是昂贵的,我们然后考虑近似算法。具体来说,这些依赖于关于社会网络的简明统计数据或近似的最短路径计算。对Twitter真实数据的大量实验表明,我们的技术可以在不牺牲精度的情况下大幅提高响应时间。
{"title":"Network-aware search in social tagging applications: instance optimality versus efficiency","authors":"S. Maniu, Bogdan Cautis","doi":"10.1145/2505515.2505760","DOIUrl":"https://doi.org/10.1145/2505515.2505760","url":null,"abstract":"We consider in this paper top-k query answering in social applications, with a focus on social tagging. This problem requires a significant departure from socially agnostic techniques. In a network- aware context, one can (and should) exploit the social links, which can indicate how users relate to the seeker and how much weight their tagging actions should have in the result build-up. We propose algorithms that have the potential to scale to current applications. While the problem has already been considered in previous literature, this was done either under strong simplifying assumptions or under choices that cannot scale to even moderate-size real-world applications. We first revisit a key aspect of the problem, which is accessing the closest or most relevant users for a given seeker. We describe how this can be done on the fly (without any pre- computations) for several possible choices -- arguably the most natural ones -- of proximity computation in a user network. Based on this, our top-k algorithm is sound and complete, addressing the applicability issues of the existing ones. Moreover, it performs significantly better in general and is instance optimal in the case when the search relies exclusively on the social weight of tagging actions. To further address the efficiency needs of online applications, for which the exact search, albeit optimal, may still be expensive, we then consider approximate algorithms. Specifically, these rely on concise statistics about the social network or on approximate shortest-paths computations. Extensive experiments on real-world data from Twitter show that our techniques can drastically improve response time, without sacrificing precision.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73961550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Recommending tags with a model of human categorization 推荐带有人类分类模型的标签
Paul Seitlinger, Dominik Kowald, C. Trattner, Tobias Ley
When interacting with social tagging systems, humans exercise complex processes of categorization that have been the topic of much research in cognitive science. In this paper we present a recommender approach for social tags derived from ALCOVE, a model of human category learning. The basic architecture is a simple three-layers connectionist model. The input layer encodes patterns of semantic features of a user-specific resource, such as latent topics elicited through Latent Dirichlet Allocation (LDA) or available external categories. The hidden layer categorizes the resource by matching the encoded pattern against already learned exemplar patterns. The latter are composed of unique feature patterns and associated tag distributions. Finally, the output layer samples tags from the associated tag distributions to verbalize the preceding categorization process. We have evaluated this approach on a real-world folksonomy gathered from Wikipedia bookmarks in Delicious. In the experiment our approach outperformed LDA, a well-established algorithm. We attribute this to the fact that our approach processes semantic information (either latent topics or external categories) across the three different layers. With this paper, we demonstrate that a theoretically guided design of algorithms not only holds potential for improving existing recommendation mechanisms, but it also allows us to derive more generalizable insights about how human information interaction on the Web is determined by both semantic and verbal processes.
当与社会标签系统交互时,人类会进行复杂的分类过程,这是认知科学中许多研究的主题。在本文中,我们提出了一种来自ALCOVE的社会标签推荐方法,ALCOVE是一种人类类别学习模型。基本架构是一个简单的三层连接模型。输入层对用户特定资源的语义特征模式进行编码,例如通过潜在狄利克雷分配(latent Dirichlet Allocation, LDA)或可用的外部类别引发的潜在主题。隐藏层通过将编码模式与已经学习的范例模式进行匹配来对资源进行分类。后者由独特的特征模式和相关的标签分布组成。最后,输出层从相关的标记分布中采样标记,以描述前面的分类过程。我们在Delicious中对从维基百科书签中收集的真实世界的大众分类法进行了评估。在实验中,我们的方法优于LDA(一种成熟的算法)。我们将此归因于我们的方法跨三个不同层处理语义信息(潜在主题或外部类别)的事实。通过本文,我们证明了理论指导的算法设计不仅具有改进现有推荐机制的潜力,而且还允许我们获得关于Web上的人类信息交互如何由语义和口头过程决定的更一般化的见解。
{"title":"Recommending tags with a model of human categorization","authors":"Paul Seitlinger, Dominik Kowald, C. Trattner, Tobias Ley","doi":"10.1145/2505515.2505625","DOIUrl":"https://doi.org/10.1145/2505515.2505625","url":null,"abstract":"When interacting with social tagging systems, humans exercise complex processes of categorization that have been the topic of much research in cognitive science. In this paper we present a recommender approach for social tags derived from ALCOVE, a model of human category learning. The basic architecture is a simple three-layers connectionist model. The input layer encodes patterns of semantic features of a user-specific resource, such as latent topics elicited through Latent Dirichlet Allocation (LDA) or available external categories. The hidden layer categorizes the resource by matching the encoded pattern against already learned exemplar patterns. The latter are composed of unique feature patterns and associated tag distributions. Finally, the output layer samples tags from the associated tag distributions to verbalize the preceding categorization process. We have evaluated this approach on a real-world folksonomy gathered from Wikipedia bookmarks in Delicious. In the experiment our approach outperformed LDA, a well-established algorithm. We attribute this to the fact that our approach processes semantic information (either latent topics or external categories) across the three different layers. With this paper, we demonstrate that a theoretically guided design of algorithms not only holds potential for improving existing recommendation mechanisms, but it also allows us to derive more generalizable insights about how human information interaction on the Web is determined by both semantic and verbal processes.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84434952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Modeling clicks beyond the first result page 建模点击超出了第一个结果页面
A. Chuklin, P. Serdyukov, M. de Rijke
Most modern web search engines yield a list of documents of a fixed length (usually 10) in response to a user query. The next ten search results are usually available in one click. These documents either replace the current result page or are appended to the end. Hence, in order to examine more documents than the first 10 the user needs to explicitly express her intention. Although clickthrough numbers are lower for documents on the second and later result pages, they still represent a noticeable amount of traffic. We propose a modification of the Dynamic Bayesian Network (DBN) click model by explicitly including into the model the probability of transition between result pages. We show that our new click model can significantly better capture user behavior on the second and later result pages while giving the same performance on the first result page.
大多数现代网络搜索引擎在响应用户查询时产生固定长度(通常为10)的文档列表。接下来的10个搜索结果通常是一次点击即可获得。这些文档要么替换当前结果页,要么追加到页面末尾。因此,为了检查比前10个更多的文档,用户需要明确地表达她的意图。尽管第二个和后面的结果页面上的文档的点击率较低,但它们仍然代表了显著的流量。我们提出了一个动态贝叶斯网络(DBN)点击模型的修改,通过明确地将结果页面之间的转换概率包含到模型中。我们表明,我们的新点击模型可以更好地捕捉第二个和后面的结果页面上的用户行为,同时在第一个结果页面上提供相同的性能。
{"title":"Modeling clicks beyond the first result page","authors":"A. Chuklin, P. Serdyukov, M. de Rijke","doi":"10.1145/2505515.2507859","DOIUrl":"https://doi.org/10.1145/2505515.2507859","url":null,"abstract":"Most modern web search engines yield a list of documents of a fixed length (usually 10) in response to a user query. The next ten search results are usually available in one click. These documents either replace the current result page or are appended to the end. Hence, in order to examine more documents than the first 10 the user needs to explicitly express her intention. Although clickthrough numbers are lower for documents on the second and later result pages, they still represent a noticeable amount of traffic. We propose a modification of the Dynamic Bayesian Network (DBN) click model by explicitly including into the model the probability of transition between result pages. We show that our new click model can significantly better capture user behavior on the second and later result pages while giving the same performance on the first result page.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84776334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
iNewsBox: modeling and exploiting implicit feedback for building personalized news radio iNewsBox:建模和利用隐式反馈来构建个性化的新闻广播
Yanan Xie, Liang Chen, Kunyang Jia, Lichuan Ji, Jian Wu
Online news reading has become the major method to know about the world as web provide more information than other media like TV and radio. However, traditional online news reading interface is inconvenient for many types of people, especially for those who are disabled or taking a bus. This paper presents a mobile application iNewsBox enabling users to listen to news collected from the Internet. In order to simplify necessary interactions of getting valuable news, we also propose a framework for using implicit feedback to recommend news in this paper. Experiment shows our algorithms in iNewsBox are effective.
在线新闻阅读已经成为了解世界的主要方法,因为网络提供了比电视和广播等其他媒体更多的信息。然而,传统的在线新闻阅读界面对于许多类型的人来说都是不方便的,特别是对于那些残疾人或乘坐公共汽车的人。本文提出了一个移动应用程序iNewsBox,使用户能够收听从互联网上收集的新闻。为了简化获取有价值新闻的必要交互,本文还提出了一个使用隐式反馈进行新闻推荐的框架。实验表明,该算法在iNewsBox中是有效的。
{"title":"iNewsBox: modeling and exploiting implicit feedback for building personalized news radio","authors":"Yanan Xie, Liang Chen, Kunyang Jia, Lichuan Ji, Jian Wu","doi":"10.1145/2505515.2508199","DOIUrl":"https://doi.org/10.1145/2505515.2508199","url":null,"abstract":"Online news reading has become the major method to know about the world as web provide more information than other media like TV and radio. However, traditional online news reading interface is inconvenient for many types of people, especially for those who are disabled or taking a bus. This paper presents a mobile application iNewsBox enabling users to listen to news collected from the Internet. In order to simplify necessary interactions of getting valuable news, we also propose a framework for using implicit feedback to recommend news in this paper. Experiment shows our algorithms in iNewsBox are effective.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85027029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Usability in machine learning at scale with graphlab graphlab在大规模机器学习中的可用性
Carlos Guestrin
Today, machine learning (ML) methods play a central role in industry and science. The growth of the Web and improvements in sensor data collection technology have been rapidly increasing the magnitude and complexity of the ML tasks we must solve. This growth is driving the need for scalable, parallel ML algorithms that can handle "Big Data." In this talk, we will focus on: Examining common algorithmic patterns in distributed ML methods. Qualifying the challenges of implementing these algorithms in real distributed systems. Describing computational frameworks for implementing these algorithms at scale. Addressing a significant core challenge to large-scale ML -- enabling the widespread adoption of machine learning beyond experts. In the latter part, we will focus mainly on the GraphLab framework, which naturally expresses asynchronous, dynamic graph computations that are key for state-of-the-art ML algorithms. When these algorithms are expressed in our higher-level abstraction, GraphLab will effectively address many of the underlying parallelism challenges, including data distribution, optimized communication, and guaranteeing sequential consistency, a property that is surprisingly important for many ML algorithms. On a variety of large-scale tasks, GraphLab provides 20-100x performance improvements over Hadoop. In recent months, GraphLab has received many tens of thousands of downloads, and is being actively used by a number of startups, companies, research labs and universities.
今天,机器学习(ML)方法在工业和科学中发挥着核心作用。网络的发展和传感器数据收集技术的改进已经迅速增加了我们必须解决的机器学习任务的规模和复杂性。这种增长推动了对可扩展的并行ML算法的需求,这些算法可以处理“大数据”。在这次演讲中,我们将重点关注:检查分布式机器学习方法中的常见算法模式。确定在真实的分布式系统中实现这些算法的挑战。描述大规模实现这些算法的计算框架。解决大规模机器学习的重大核心挑战——使机器学习在专家之外得到广泛采用。在后面的部分,我们将主要关注GraphLab框架,它自然地表达异步、动态图计算,这是最先进的ML算法的关键。当这些算法在我们的高级抽象中表达时,GraphLab将有效地解决许多潜在的并行性挑战,包括数据分布、优化的通信和保证顺序一致性,这是一个对许多ML算法非常重要的属性。在各种大规模任务上,GraphLab提供了比Hadoop 20-100倍的性能提升。最近几个月,GraphLab已经获得了数以万计的下载量,并被许多初创公司、公司、研究实验室和大学积极使用。
{"title":"Usability in machine learning at scale with graphlab","authors":"Carlos Guestrin","doi":"10.1145/2505515.2527108","DOIUrl":"https://doi.org/10.1145/2505515.2527108","url":null,"abstract":"Today, machine learning (ML) methods play a central role in industry and science. The growth of the Web and improvements in sensor data collection technology have been rapidly increasing the magnitude and complexity of the ML tasks we must solve. This growth is driving the need for scalable, parallel ML algorithms that can handle \"Big Data.\" In this talk, we will focus on: Examining common algorithmic patterns in distributed ML methods. Qualifying the challenges of implementing these algorithms in real distributed systems. Describing computational frameworks for implementing these algorithms at scale. Addressing a significant core challenge to large-scale ML -- enabling the widespread adoption of machine learning beyond experts. In the latter part, we will focus mainly on the GraphLab framework, which naturally expresses asynchronous, dynamic graph computations that are key for state-of-the-art ML algorithms. When these algorithms are expressed in our higher-level abstraction, GraphLab will effectively address many of the underlying parallelism challenges, including data distribution, optimized communication, and guaranteeing sequential consistency, a property that is surprisingly important for many ML algorithms. On a variety of large-scale tasks, GraphLab provides 20-100x performance improvements over Hadoop. In recent months, GraphLab has received many tens of thousands of downloads, and is being actively used by a number of startups, companies, research labs and universities.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85047185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the 22nd ACM international conference on Information & Knowledge Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1