首页 > 最新文献

Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management最新文献

英文 中文
Finding redundant and complementary communities in multidimensional networks 在多维网络中寻找冗余和互补的社区
M. Berlingerio, M. Coscia, F. Giannotti
Community Discovery in networks is the problem of detecting, for each node, its membership to one of more groups of nodes, the communities, that are densely connected, or highly interactive. We define the community discovery problem in multidimensional networks, where more than one connection may reside between any two nodes. We also introduce two measures able to characterize the communities found. Our experiments on real world multidimensional networks support the methodology proposed in this paper, and open the way for a new class of algorithms, aimed at capturing the multifaceted complexity of connections among nodes in a network.
网络中的社区发现问题是为每个节点检测其在多个节点组(密集连接或高度交互的社区)中的成员资格。我们定义了多维网络中的社区发现问题,其中任意两个节点之间可能存在多个连接。我们还介绍了能够表征所发现的群落的两种措施。我们在现实世界多维网络上的实验支持本文提出的方法,并为一类新的算法开辟了道路,旨在捕获网络中节点之间连接的多方面复杂性。
{"title":"Finding redundant and complementary communities in multidimensional networks","authors":"M. Berlingerio, M. Coscia, F. Giannotti","doi":"10.1145/2063576.2063921","DOIUrl":"https://doi.org/10.1145/2063576.2063921","url":null,"abstract":"Community Discovery in networks is the problem of detecting, for each node, its membership to one of more groups of nodes, the communities, that are densely connected, or highly interactive. We define the community discovery problem in multidimensional networks, where more than one connection may reside between any two nodes. We also introduce two measures able to characterize the communities found. Our experiments on real world multidimensional networks support the methodology proposed in this paper, and open the way for a new class of algorithms, aimed at capturing the multifaceted complexity of connections among nodes in a network.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"6 1","pages":"2181-2184"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81173088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
SISP: a new framework for searching the informative subgraph based on PSO 基于粒子群算法的信息子图搜索新框架
Chen Chen, Guoren Wang, Huilin Liu, Junchang Xin, Ye Yuan
A significant number of applications on graph require the key relations among a group of query nodes. Given a relational graph such as social network or biochemical interaction, an informative subgraph is urgent, which can best explain the relationships among a group of given query nodes. Based on Particle Swarm Optimization (PSO), a new framework of SISP (Searching the Informative Subgraph based on PSO) is proposed. SISP contains three key stages. In the initialization stage, a random spreading method is proposed, which can effectively guarantee the connectivity of the nodes in each particle; In the calculating stage of fitness, a fitness function is designed by incorporating a sign function with the goodness score; In the update stage, the intersection-based particle extension method and rule-based particle compression method are proposed. To evaluate the qualities of returned subgraphs, the appropriate calculating of goodness score is studied. Considering the importance and relevance of a node together, we present the PNR method, which makes the definition of informativeness more reliable and the returned subgraph more satisfying. At last, we present experiments on a real dataset and a synthetic dataset separately. The experimental results confirm that the proposed methods achieve increased accuracy and are efficient for any query set.
图上的大量应用程序需要一组查询节点之间的键关系。给定一个关系图,如社会网络或生化交互,信息子图是迫切需要的,它可以最好地解释一组给定查询节点之间的关系。基于粒子群算法,提出了一种基于粒子群算法的信息子图搜索(SISP)框架。SISP包含三个关键阶段。在初始化阶段,提出了一种随机扩散方法,可以有效地保证每个粒子中节点的连通性;在适应度计算阶段,将一个带有优度分数的符号函数结合起来,设计一个适应度函数;在更新阶段,提出了基于交集的粒子扩展方法和基于规则的粒子压缩方法。为了评估返回子图的质量,研究了优度分数的适当计算方法。考虑到节点的重要性和相关性,我们提出了PNR方法,使得信息度的定义更可靠,返回的子图更令人满意。最后,分别在真实数据集和合成数据集上进行了实验。实验结果表明,该方法对任何查询集都具有较高的准确率和效率。
{"title":"SISP: a new framework for searching the informative subgraph based on PSO","authors":"Chen Chen, Guoren Wang, Huilin Liu, Junchang Xin, Ye Yuan","doi":"10.1145/2063576.2063645","DOIUrl":"https://doi.org/10.1145/2063576.2063645","url":null,"abstract":"A significant number of applications on graph require the key relations among a group of query nodes. Given a relational graph such as social network or biochemical interaction, an informative subgraph is urgent, which can best explain the relationships among a group of given query nodes. Based on Particle Swarm Optimization (PSO), a new framework of SISP (Searching the Informative Subgraph based on PSO) is proposed. SISP contains three key stages. In the initialization stage, a random spreading method is proposed, which can effectively guarantee the connectivity of the nodes in each particle; In the calculating stage of fitness, a fitness function is designed by incorporating a sign function with the goodness score; In the update stage, the intersection-based particle extension method and rule-based particle compression method are proposed. To evaluate the qualities of returned subgraphs, the appropriate calculating of goodness score is studied. Considering the importance and relevance of a node together, we present the PNR method, which makes the definition of informativeness more reliable and the returned subgraph more satisfying. At last, we present experiments on a real dataset and a synthetic dataset separately. The experimental results confirm that the proposed methods achieve increased accuracy and are efficient for any query set.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"57 1","pages":"453-462"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84809007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Probabilistic model for discovering topic based communities in social networks 基于主题的社交网络社区发现的概率模型
Mrinmaya Sachan, Danish Contractor, T. Faruquie, L. V. Subramaniam
Social graphs have received renewed interest as a research topic with the advent of social networking websites. These online networks provide a rich source of data to study user relationships and interaction patterns on a large scale. In this paper, we propose a generative Bayesian model for extracting latent communities from a social graph. We assume that community memberships depend on topics of interest between users and the link relationships between them in the social graph topology. In addition, we make use of the nature of interaction to gauge user interests. Our model allows communities to be related to multiple topics and each user in the graph can be a member of multiple communities. This gives an insight into user interests and topical distribution in communities. We show the effectiveness of our model using a real world data set and also compare our model with existing community discovery methods.
随着社交网站的出现,社交图谱作为一个研究课题重新引起了人们的兴趣。这些在线网络为大规模研究用户关系和交互模式提供了丰富的数据来源。在本文中,我们提出了一个生成贝叶斯模型,用于从社交图中提取潜在社区。我们假设社区成员取决于用户之间感兴趣的主题以及他们在社交图拓扑中的链接关系。此外,我们利用交互的本质来衡量用户的兴趣。我们的模型允许社区与多个主题相关,图中的每个用户可以是多个社区的成员。这可以让我们深入了解用户兴趣和社区中的主题分布。我们使用真实世界的数据集展示了我们模型的有效性,并将我们的模型与现有的社区发现方法进行了比较。
{"title":"Probabilistic model for discovering topic based communities in social networks","authors":"Mrinmaya Sachan, Danish Contractor, T. Faruquie, L. V. Subramaniam","doi":"10.1145/2063576.2063963","DOIUrl":"https://doi.org/10.1145/2063576.2063963","url":null,"abstract":"Social graphs have received renewed interest as a research topic with the advent of social networking websites. These online networks provide a rich source of data to study user relationships and interaction patterns on a large scale. In this paper, we propose a generative Bayesian model for extracting latent communities from a social graph. We assume that community memberships depend on topics of interest between users and the link relationships between them in the social graph topology. In addition, we make use of the nature of interaction to gauge user interests. Our model allows communities to be related to multiple topics and each user in the graph can be a member of multiple communities. This gives an insight into user interests and topical distribution in communities. We show the effectiveness of our model using a real world data set and also compare our model with existing community discovery methods.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"17 1","pages":"2349-2352"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85296433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Building a generic debugger for information extraction pipelines 构建用于信息提取管道的通用调试器
A. Sarma, Alpa Jain, P. Bohannon
Complex information extraction (IE) pipelines are becoming an integral component of most text processing frameworks. We introduce a first system to help IE users analyze extraction pipeline semantics and operator transformations interactively while debugging. This allows the effort to be proportional to the need, and to focus on the portions of the pipeline under the greatest suspicion. We present a generic debugger for running post-execution analysis of any IE pipeline consisting of arbitrary types of operators. For this, we propose an effective provenance model for IE pipelines which captures a variety of operator types, ranging from those for which full to no specifications are available. We have evaluated our proposed algorithms and provenance model on large-scale real-world extraction pipelines.
复杂信息提取(IE)管道正在成为大多数文本处理框架的组成部分。我们介绍了第一个系统,以帮助IE用户在调试时交互式地分析抽取管道语义和操作符转换。这使得工作量与需求成正比,并将重点放在最可疑的管道部分。我们提供了一个通用调试器,用于运行由任意类型的操作符组成的任何IE管道的执行后分析。为此,我们提出了一个有效的IE管道来源模型,该模型捕获了各种操作员类型,范围从完全规格到没有规格。我们已经在大规模的现实世界提取管道上评估了我们提出的算法和来源模型。
{"title":"Building a generic debugger for information extraction pipelines","authors":"A. Sarma, Alpa Jain, P. Bohannon","doi":"10.1145/2063576.2063933","DOIUrl":"https://doi.org/10.1145/2063576.2063933","url":null,"abstract":"Complex information extraction (IE) pipelines are becoming an integral component of most text processing frameworks. We introduce a first system to help IE users analyze extraction pipeline semantics and operator transformations interactively while debugging. This allows the effort to be proportional to the need, and to focus on the portions of the pipeline under the greatest suspicion. We present a generic debugger for running post-execution analysis of any IE pipeline consisting of arbitrary types of operators. For this, we propose an effective provenance model for IE pipelines which captures a variety of operator types, ranging from those for which full to no specifications are available. We have evaluated our proposed algorithms and provenance model on large-scale real-world extraction pipelines.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"49 1","pages":"2229-2232"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90660444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Using games with a purpose and bootstrapping to create domain-specific sentiment lexicons 使用带有目的和引导的游戏来创建特定领域的情感词典
A. Weichselbraun, Stefan Gindl, A. Scharl
Sentiment detection analyzes the positive or negative polarity of text. The field has received considerable attention in recent years, since it plays an important role in providing means to assess user opinions regarding an organization's products, services, or actions. Approaches towards sentiment detection include machine learning techniques as well as computationally less expensive methods. Both approaches rely on the use of language-specific sentiment lexicons, which are lists of sentiment terms with their corresponding sentiment value. The effort involved in creating, customizing, and extending sentiment lexicons is considerable, particularly if less common languages and domains are targeted without access to appropriate language resources. This paper proposes a semi-automatic approach for the creation of sentiment lexicons which assigns sentiment values to sentiment terms via crowd-sourcing. Furthermore, it introduces a bootstrapping process operating on unlabeled domain documents to extend the created lexicons, and to customize them according to the particular use case. This process considers sentiment terms as well as sentiment indicators occurring in the discourse surrounding a articular topic. Such indicators are associated with a positive or negative context in a particular domain, but might have a neutral connotation in other domains. A formal evaluation shows that bootstrapping considerably improves the method's recall. Automatically created lexicons yield a performance comparable to professionally created language resources such as the General Inquirer.
情感检测分析文本的积极或消极极性。近年来,该领域受到了相当多的关注,因为它在提供评估用户对组织的产品、服务或行动的意见方面发挥了重要作用。情感检测的方法包括机器学习技术以及计算成本较低的方法。这两种方法都依赖于使用特定于语言的情感词汇,这些词汇是带有相应情感值的情感术语列表。创建、自定义和扩展情感词汇所涉及的工作是相当大的,特别是如果不太常见的语言和领域的目标是没有访问适当的语言资源。本文提出了一种半自动化的情感词汇生成方法,该方法通过众包的方式为情感术语分配情感值。此外,它还引入了一个在未标记的领域文档上操作的引导过程,以扩展所创建的词汇,并根据特定的用例定制它们。这个过程考虑情绪术语以及围绕特定主题的话语中出现的情绪指标。这些指标在某一特定领域具有积极或消极的含义,但在其他领域可能具有中性含义。一个正式的评估表明,自举大大提高了方法的召回率。自动创建的词典产生的性能可与专业创建的语言资源(如General Inquirer)相媲美。
{"title":"Using games with a purpose and bootstrapping to create domain-specific sentiment lexicons","authors":"A. Weichselbraun, Stefan Gindl, A. Scharl","doi":"10.1145/2063576.2063729","DOIUrl":"https://doi.org/10.1145/2063576.2063729","url":null,"abstract":"Sentiment detection analyzes the positive or negative polarity of text. The field has received considerable attention in recent years, since it plays an important role in providing means to assess user opinions regarding an organization's products, services, or actions. Approaches towards sentiment detection include machine learning techniques as well as computationally less expensive methods. Both approaches rely on the use of language-specific sentiment lexicons, which are lists of sentiment terms with their corresponding sentiment value. The effort involved in creating, customizing, and extending sentiment lexicons is considerable, particularly if less common languages and domains are targeted without access to appropriate language resources. This paper proposes a semi-automatic approach for the creation of sentiment lexicons which assigns sentiment values to sentiment terms via crowd-sourcing. Furthermore, it introduces a bootstrapping process operating on unlabeled domain documents to extend the created lexicons, and to customize them according to the particular use case. This process considers sentiment terms as well as sentiment indicators occurring in the discourse surrounding a articular topic. Such indicators are associated with a positive or negative context in a particular domain, but might have a neutral connotation in other domains. A formal evaluation shows that bootstrapping considerably improves the method's recall. Automatically created lexicons yield a performance comparable to professionally created language resources such as the General Inquirer.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"59 1","pages":"1053-1060"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91015992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Content based social behavior prediction: a multi-task learning approach 基于内容的社会行为预测:一种多任务学习方法
Hongliang Fei, R. Jiang, Yuhao Yang, Bo Luo, Jun Huan
Information Flow Studies analyze the principles and mechanisms of social information distribution and is an essential research topic in social networks. Traditional approaches are primarily based on the social network graph topology. However, topology itself can not accurately reflect the user interests or activities. In this paper, we adopt a "microeconomics" approach to study social information diffusion and aim to answer the question that how social information flow and socialization behaviors are related to content similarity and user interests. In particular, we study content-based social activity prediction, i.e., to predict a user's response (e.g. comment or like) to their friends' postings (e.g. blogs) w.r.t. message content. In our solution, we cast the social behavior prediction problem as a multi-task learning problem, in which each task corresponds to a user. We have designed a novel multi-task learning algorithm that is specifically designed for learning information flow in social networks. In our model, we apply l1 and Tikhonov regularization to obtain a sparse and smooth model in a linear multi-task learning framework. Using comprehensive experimental study, we have demonstrated the effectiveness of the proposed learning method.
信息流研究分析了社会信息分布的原理和机制,是社会网络中的一个重要研究课题。传统的方法主要基于社会网络图拓扑。然而,拓扑结构本身并不能准确反映用户的兴趣或活动。本文采用“微观经济学”的方法研究社会信息扩散,旨在回答社会信息流和社会化行为与内容相似度和用户兴趣之间的关系。特别是,我们研究基于内容的社交活动预测,即预测用户对其朋友的帖子(例如博客)的响应(例如评论或点赞)。在我们的解决方案中,我们将社会行为预测问题视为一个多任务学习问题,其中每个任务对应一个用户。我们设计了一种新的多任务学习算法,专门用于学习社交网络中的信息流。在我们的模型中,我们应用l1和Tikhonov正则化来获得线性多任务学习框架中的稀疏光滑模型。通过综合实验研究,我们证明了所提出的学习方法的有效性。
{"title":"Content based social behavior prediction: a multi-task learning approach","authors":"Hongliang Fei, R. Jiang, Yuhao Yang, Bo Luo, Jun Huan","doi":"10.1145/2063576.2063719","DOIUrl":"https://doi.org/10.1145/2063576.2063719","url":null,"abstract":"Information Flow Studies analyze the principles and mechanisms of social information distribution and is an essential research topic in social networks. Traditional approaches are primarily based on the social network graph topology. However, topology itself can not accurately reflect the user interests or activities. In this paper, we adopt a \"microeconomics\" approach to study social information diffusion and aim to answer the question that how social information flow and socialization behaviors are related to content similarity and user interests. In particular, we study content-based social activity prediction, i.e., to predict a user's response (e.g. comment or like) to their friends' postings (e.g. blogs) w.r.t. message content. In our solution, we cast the social behavior prediction problem as a multi-task learning problem, in which each task corresponds to a user. We have designed a novel multi-task learning algorithm that is specifically designed for learning information flow in social networks. In our model, we apply l1 and Tikhonov regularization to obtain a sparse and smooth model in a linear multi-task learning framework. Using comprehensive experimental study, we have demonstrated the effectiveness of the proposed learning method.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"6 1","pages":"995-1000"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91165521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Information diffusion in social networks: observing and affecting what society cares about 社会网络中的信息扩散:观察和影响社会所关注的事物
D. Agrawal, Ceren Budak, A. E. Abbadi
Information diffusion in social networks provide great opportunities for political and social change as well as societal education. Therefore understanding information diffusion in social networks is a critical research goal. This greater understanding can be achieved through data analysis, development of reliable models that can predict outcomes of social processes, and ultimately the creation of applications that can shape the outcome of these processes. In this tutorial, we aim to provide an overview of such recent research based on a wide variety of techniques such as optimization algorithms, data mining, data streams covering a large number of problems such as influence spread maximization, misinformation limitation and study of trends in online social networks.
社会网络中的信息扩散为政治和社会变革以及社会教育提供了巨大的机会。因此,了解社会网络中的信息扩散是一个重要的研究目标。这种更好的理解可以通过数据分析、可靠模型的开发来实现,这些模型可以预测社会过程的结果,并最终创建可以塑造这些过程结果的应用程序。在本教程中,我们的目标是提供基于各种技术的此类最新研究的概述,如优化算法,数据挖掘,涵盖大量问题的数据流,如影响传播最大化,错误信息限制和在线社交网络趋势研究。
{"title":"Information diffusion in social networks: observing and affecting what society cares about","authors":"D. Agrawal, Ceren Budak, A. E. Abbadi","doi":"10.1145/2063576.2064036","DOIUrl":"https://doi.org/10.1145/2063576.2064036","url":null,"abstract":"Information diffusion in social networks provide great opportunities for political and social change as well as societal education. Therefore understanding information diffusion in social networks is a critical research goal. This greater understanding can be achieved through data analysis, development of reliable models that can predict outcomes of social processes, and ultimately the creation of applications that can shape the outcome of these processes. In this tutorial, we aim to provide an overview of such recent research based on a wide variety of techniques such as optimization algorithms, data mining, data streams covering a large number of problems such as influence spread maximization, misinformation limitation and study of trends in online social networks.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"76 1","pages":"2609-2610"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90912052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
On bias problem in relevance feedback 关于相关反馈中的偏差问题
Qianli Xing, Yi Zhang, Lanbo Zhang
Relevance feedback is an effective approach to improve retrieval quality over the initial query. Typical relevance feedback methods usually select top-ranked documents for relevance judgments, then query expansion or model updating are carried out based on the feedback documents. However, the number of feedback documents is usually limited due to expensive human labeling. Thus relevant documents in the feedback set are hardly representative of all relevant documents and the feedback set is actually biased. As a result, the performance of relevance feedback will get hurt. In this paper, we first show how and where the bias problem exists through experiments. Then we study how the bias can be reduced by utilizing the unlabeled documents. After analyzing the usefulness of a document to relevance feedback, we propose an approach that extends the feedback set with carefully selected unlabeled documents by heuristics. Our experiment results show that the extended feedback set has less bias than the original feedback set and better performance can be achieved when the extended feedback set is used for relevance feedback.
相关性反馈是提高检索质量的有效方法。典型的相关性反馈方法通常是选择排名靠前的文档进行相关性判断,然后根据反馈的文档进行查询扩展或模型更新。然而,由于昂贵的人工标注,反馈文档的数量通常是有限的。因此,反馈集中的相关文档很难代表所有相关文档,反馈集实际上是有偏差的。因此,相关性反馈的性能将受到损害。在本文中,我们首先通过实验证明了偏差问题如何存在以及在哪里存在。然后,我们研究了如何利用未标记的文档来减少偏差。在分析了文档对相关反馈的有用性之后,我们提出了一种方法,通过启发式方法将反馈集扩展为精心选择的未标记文档。实验结果表明,扩展反馈集比原始反馈集具有更小的偏差,使用扩展反馈集进行相关反馈可以获得更好的性能。
{"title":"On bias problem in relevance feedback","authors":"Qianli Xing, Yi Zhang, Lanbo Zhang","doi":"10.1145/2063576.2063866","DOIUrl":"https://doi.org/10.1145/2063576.2063866","url":null,"abstract":"Relevance feedback is an effective approach to improve retrieval quality over the initial query. Typical relevance feedback methods usually select top-ranked documents for relevance judgments, then query expansion or model updating are carried out based on the feedback documents. However, the number of feedback documents is usually limited due to expensive human labeling. Thus relevant documents in the feedback set are hardly representative of all relevant documents and the feedback set is actually biased. As a result, the performance of relevance feedback will get hurt. In this paper, we first show how and where the bias problem exists through experiments. Then we study how the bias can be reduced by utilizing the unlabeled documents. After analyzing the usefulness of a document to relevance feedback, we propose an approach that extends the feedback set with carefully selected unlabeled documents by heuristics. Our experiment results show that the extended feedback set has less bias than the original feedback set and better performance can be achieved when the extended feedback set is used for relevance feedback.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"15 1","pages":"1965-1968"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89579408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Exploiting longer cycles for link prediction in signed networks 在签名网络中利用更长的周期进行链路预测
Kai-Yang Chiang, Nagarajan Natarajan, Ambuj Tewari, I. Dhillon
We consider the problem of link prediction in signed networks. Such networks arise on the web in a variety of ways when users can implicitly or explicitly tag their relationship with other users as positive or negative. The signed links thus created reflect social attitudes of the users towards each other in terms of friendship or trust. Our first contribution is to show how any quantitative measure of social imbalance in a network can be used to derive a link prediction algorithm. Our framework allows us to reinterpret some existing algorithms as well as derive new ones. Second, we extend the approach of Leskovec et al. (2010) by presenting a supervised machine learning based link prediction method that uses features derived from longer cycles in the network. The supervised method outperforms all previous approaches on 3 networks drawn from sources such as Epinions, Slashdot and Wikipedia. The supervised approach easily scales to these networks, the largest of which has 132k nodes and 841k edges. Most real-world networks have an overwhelmingly large proportion of positive edges and it is therefore easy to get a high overall accuracy at the cost of a high false positive rate. We see that our supervised method not only achieves good accuracy for sign prediction but is also especially effective in lowering the false positive rate.
研究了签名网络中的链路预测问题。当用户可以隐式或显式地将他们与其他用户的关系标记为积极或消极时,这种网络以各种方式出现在网络上。由此创建的签名链接反映了用户在友谊或信任方面对彼此的社会态度。我们的第一个贡献是展示了如何使用网络中社会不平衡的任何定量测量来推导链接预测算法。我们的框架允许我们重新解释一些现有的算法,并派生出新的算法。其次,我们扩展了Leskovec等人(2010)的方法,提出了一种基于监督机器学习的链接预测方法,该方法使用了来自网络中较长周期的特征。在Epinions, Slashdot和Wikipedia等来源的3个网络上,监督方法优于之前的所有方法。监督方法很容易扩展到这些网络,其中最大的网络有132k个节点和841k条边。大多数现实世界的网络都有绝大多数的正边,因此很容易以高误报率为代价获得高的整体精度。我们发现,我们的监督方法不仅在符号预测方面取得了很好的准确性,而且在降低误报率方面也特别有效。
{"title":"Exploiting longer cycles for link prediction in signed networks","authors":"Kai-Yang Chiang, Nagarajan Natarajan, Ambuj Tewari, I. Dhillon","doi":"10.1145/2063576.2063742","DOIUrl":"https://doi.org/10.1145/2063576.2063742","url":null,"abstract":"We consider the problem of link prediction in signed networks. Such networks arise on the web in a variety of ways when users can implicitly or explicitly tag their relationship with other users as positive or negative. The signed links thus created reflect social attitudes of the users towards each other in terms of friendship or trust. Our first contribution is to show how any quantitative measure of social imbalance in a network can be used to derive a link prediction algorithm. Our framework allows us to reinterpret some existing algorithms as well as derive new ones. Second, we extend the approach of Leskovec et al. (2010) by presenting a supervised machine learning based link prediction method that uses features derived from longer cycles in the network. The supervised method outperforms all previous approaches on 3 networks drawn from sources such as Epinions, Slashdot and Wikipedia. The supervised approach easily scales to these networks, the largest of which has 132k nodes and 841k edges. Most real-world networks have an overwhelmingly large proportion of positive edges and it is therefore easy to get a high overall accuracy at the cost of a high false positive rate. We see that our supervised method not only achieves good accuracy for sign prediction but is also especially effective in lowering the false positive rate.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"51 1","pages":"1157-1162"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89417575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 160
Learning kernels with upper bounds of leave-one-out error 学习误差上界为留一的核函数
Yong Liu, Shizhong Liao, Yuexian Hou
We propose a new leaning method for Multiple Kernel Learning (MKL) based on the upper bounds of the leave-one-out error that is an almost unbiased estimate of the expected generalization error. Specifically, we first present two new formulations for MKL by minimizing the upper bounds of the leave-one-out error. Then, we compute the derivatives of these bounds and design an efficient iterative algorithm for solving these formulations. Experimental results show that the proposed method gives better accuracy results than that of both SVM with the uniform combination of basis kernels and other state-of-art kernel learning approaches.
我们提出了一种基于留一误差上界的多核学习(MKL)的新学习方法,该方法是期望泛化误差的几乎无偏估计。具体而言,我们首先通过最小化留一误差的上界提出了MKL的两个新公式。然后,我们计算这些边界的导数,并设计一个有效的迭代算法来求解这些公式。实验结果表明,该方法比基核统一组合的支持向量机和其他核学习方法具有更好的准确率。
{"title":"Learning kernels with upper bounds of leave-one-out error","authors":"Yong Liu, Shizhong Liao, Yuexian Hou","doi":"10.1145/2063576.2063927","DOIUrl":"https://doi.org/10.1145/2063576.2063927","url":null,"abstract":"We propose a new leaning method for Multiple Kernel Learning (MKL) based on the upper bounds of the leave-one-out error that is an almost unbiased estimate of the expected generalization error. Specifically, we first present two new formulations for MKL by minimizing the upper bounds of the leave-one-out error. Then, we compute the derivatives of these bounds and design an efficient iterative algorithm for solving these formulations. Experimental results show that the proposed method gives better accuracy results than that of both SVM with the uniform combination of basis kernels and other state-of-art kernel learning approaches.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"149 1","pages":"2205-2208"},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76751948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
期刊
Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1