首页 > 最新文献

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval最新文献

英文 中文
A mutual information-based framework for the analysis of information retrieval systems 基于互信息的信息检索系统分析框架
Peter B. Golbus, J. Aslam
We consider the problem of information retrieval evaluation and the methods and metrics used for such evaluations. We propose a probabilistic framework for evaluation which we use to develop new information-theoretic evaluation metrics. We demonstrate that these new metrics are powerful and generalizable, enabling evaluations heretofore not possible. We introduce four preliminary uses of our framework: (1) a measure of conditional rank correlation, information tau, a powerful meta-evaluation tool whose use we demonstrate on understanding novelty and diversity evaluation; (2) a new evaluation measure, relevance information correlation, which is correlated with traditional evaluation measures and can be used to (3) evaluate a collection of systems simultaneously, which provides a natural upper bound on metasearch performance; and (4) a measure of the similarity between rankers on judged documents, information difference, which allows us to determine whether systems with similar performance are in fact different.
我们考虑了信息检索评估的问题以及用于此类评估的方法和度量。我们提出了一个评估的概率框架,我们使用它来开发新的信息论评估指标。我们证明了这些新的度量标准是强大的和可推广的,使得以前不可能的评估成为可能。我们介绍了我们的框架的四种初步用途:(1)条件等级相关性的测量,信息tau,一个强大的元评估工具,我们展示了它在理解新颖性和多样性评估方面的用途;(2)一种新的评价指标——关联信息相关性(relevance information correlation),它与传统的评价指标相关联,可用于(3)同时评价一组系统,这为元搜索性能提供了一个自然的上限;(4)衡量被评判文件的排名者之间的相似性,信息差异,这使我们能够确定具有相似性能的系统是否实际上不同。
{"title":"A mutual information-based framework for the analysis of information retrieval systems","authors":"Peter B. Golbus, J. Aslam","doi":"10.1145/2484028.2484073","DOIUrl":"https://doi.org/10.1145/2484028.2484073","url":null,"abstract":"We consider the problem of information retrieval evaluation and the methods and metrics used for such evaluations. We propose a probabilistic framework for evaluation which we use to develop new information-theoretic evaluation metrics. We demonstrate that these new metrics are powerful and generalizable, enabling evaluations heretofore not possible. We introduce four preliminary uses of our framework: (1) a measure of conditional rank correlation, information tau, a powerful meta-evaluation tool whose use we demonstrate on understanding novelty and diversity evaluation; (2) a new evaluation measure, relevance information correlation, which is correlated with traditional evaluation measures and can be used to (3) evaluate a collection of systems simultaneously, which provides a natural upper bound on metasearch performance; and (4) a measure of the similarity between rankers on judged documents, information difference, which allows us to determine whether systems with similar performance are in fact different.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123557820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An adaptive evidence weighting method for medical record search 病案检索中的自适应证据加权方法
Dongqing Zhu, Ben Carterette
In this paper, we present a medical record search system which is useful for identifying cohorts required in clinical studies. In particular, we propose a query-adaptive weighting method that can dynamically aggregate and score evidence in multiple medical reports (from different hospital departments or from different tests within the same department) of a patient. Furthermore, we explore several informative features for learning our retrieval model.
在本文中,我们提出了一个医疗记录检索系统,该系统可用于识别临床研究所需的队列。特别是,我们提出了一种查询自适应加权方法,该方法可以动态地汇总和评分患者的多个医疗报告(来自不同医院部门或同一部门的不同测试)中的证据。此外,我们探索了一些信息特征来学习我们的检索模型。
{"title":"An adaptive evidence weighting method for medical record search","authors":"Dongqing Zhu, Ben Carterette","doi":"10.1145/2484028.2484175","DOIUrl":"https://doi.org/10.1145/2484028.2484175","url":null,"abstract":"In this paper, we present a medical record search system which is useful for identifying cohorts required in clinical studies. In particular, we propose a query-adaptive weighting method that can dynamically aggregate and score evidence in multiple medical reports (from different hospital departments or from different tests within the same department) of a patient. Furthermore, we explore several informative features for learning our retrieval model.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122112287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A weakly-supervised detection of entity central documents in a stream 流中实体中心文档的弱监督检测
L. Bonnefoy, Vincent Bouvier, P. Bellot
Filtering a time-ordered corpus for documents that are highly relevant to an entity is a task receiving more and more attention over the years. One application is to reduce the delay between the moment an information about an entity is being first observed and the moment the entity entry in a knowledge base is being updated. Current state-of-the-art approaches are highly supervised and require training examples for each entity monitored. We propose an approach which does not require new training data when processing a new entity. To capture intrinsic characteristics of highly relevant documents our approach relies on three types of features: document centric features, entity profile related features and time features. Evaluated within the framework of the "Knowledge Base Acceleration" track at TREC 2012, it outperforms current state-of-the-art approaches.
过滤与实体高度相关的按时间排序的语料库是近年来越来越受到关注的一项任务。一个应用程序是减少从第一次观察到实体信息到更新知识库中的实体条目之间的延迟。目前最先进的方法受到高度监督,需要为每一个被监测的实体提供训练实例。我们提出了一种在处理新实体时不需要新的训练数据的方法。为了捕获高度相关文档的内在特征,我们的方法依赖于三种类型的特征:以文档为中心的特征、实体概要相关的特征和时间特征。在TREC 2012的“知识库加速”轨道框架内进行评估,它优于当前最先进的方法。
{"title":"A weakly-supervised detection of entity central documents in a stream","authors":"L. Bonnefoy, Vincent Bouvier, P. Bellot","doi":"10.1145/2484028.2484180","DOIUrl":"https://doi.org/10.1145/2484028.2484180","url":null,"abstract":"Filtering a time-ordered corpus for documents that are highly relevant to an entity is a task receiving more and more attention over the years. One application is to reduce the delay between the moment an information about an entity is being first observed and the moment the entity entry in a knowledge base is being updated. Current state-of-the-art approaches are highly supervised and require training examples for each entity monitored. We propose an approach which does not require new training data when processing a new entity. To capture intrinsic characteristics of highly relevant documents our approach relies on three types of features: document centric features, entity profile related features and time features. Evaluated within the framework of the \"Knowledge Base Acceleration\" track at TREC 2012, it outperforms current state-of-the-art approaches.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124425232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
How far will you go?: characterizing and predicting online search stopping behavior using information scent and need for cognition 你会走多远?利用信息气味和认知需求表征和预测在线搜索停止行为
Wan-Ching Wu
1. ABSTRACT Predicting when online searchers terminate search for information tasks without obvious end-points is a challenging task. Previous research concludes that people stop based on intuitions of enough [4], yet few studies have systematically examined online search stopping behavior. For open-ended search tasks, searchers often have to reformulate their queries in order to obtain a sufficient amount of information, which means that before searchers quit searching for a task entirely (stopping at the task level), they also stop result evaluation for different queries during the search (stopping at the query level).
1. 预测在线搜索者何时终止无明显终点的信息搜索任务是一项具有挑战性的任务。以往的研究认为,人们停止搜索是基于足够的直觉[4],但很少有研究系统地考察在线搜索停止行为。对于开放式搜索任务,为了获得足够的信息量,搜索者经常需要重新表述他们的查询,这意味着在搜索者完全停止对任务的搜索(在任务级停止)之前,他们也会在搜索过程中停止对不同查询的结果评估(在查询级停止)。
{"title":"How far will you go?: characterizing and predicting online search stopping behavior using information scent and need for cognition","authors":"Wan-Ching Wu","doi":"10.1145/2484028.2484232","DOIUrl":"https://doi.org/10.1145/2484028.2484232","url":null,"abstract":"1. ABSTRACT Predicting when online searchers terminate search for information tasks without obvious end-points is a challenging task. Previous research concludes that people stop based on intuitions of enough [4], yet few studies have systematically examined online search stopping behavior. For open-ended search tasks, searchers often have to reformulate their queries in order to obtain a sufficient amount of information, which means that before searchers quit searching for a task entirely (stopping at the task level), they also stop result evaluation for different queries during the search (stopping at the query level).","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129441596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A framework for specific term recommendation systems 特定术语推荐系统的框架
Thomas Lüke, Philipp Schaer, Philipp Mayr
In this paper we present the IRSA framework that enables the automatic creation of search term suggestion or recommendation systems (TS). Such TS are used to operationalize interactive query expansion and help users in refining their information need in the query formulation phase. Our recent research has shown TS to be more effective when specific to a certain domain. The presented technical framework allows owners of Digital Libraries to create their own specific TS constructed via OAI-harvested metadata with very little effort.
在本文中,我们提出了一个IRSA框架,它可以自动创建搜索词建议或推荐系统(TS)。这种TS用于实现交互式查询扩展,并帮助用户在查询制定阶段细化其信息需求。我们最近的研究表明,当特定于某个领域时,TS更有效。所提出的技术框架允许数字图书馆的所有者通过oai收集的元数据轻松创建他们自己的特定TS。
{"title":"A framework for specific term recommendation systems","authors":"Thomas Lüke, Philipp Schaer, Philipp Mayr","doi":"10.1145/2484028.2484207","DOIUrl":"https://doi.org/10.1145/2484028.2484207","url":null,"abstract":"In this paper we present the IRSA framework that enables the automatic creation of search term suggestion or recommendation systems (TS). Such TS are used to operationalize interactive query expansion and help users in refining their information need in the query formulation phase. Our recent research has shown TS to be more effective when specific to a certain domain. The presented technical framework allows owners of Digital Libraries to create their own specific TS constructed via OAI-harvested metadata with very little effort.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129754807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Extracting query facets from search results 从搜索结果中提取查询方面
Weize Kong, James Allan
Web search queries are often ambiguous or multi-faceted, which makes a simple ranked list of results inadequate. To assist information finding for such faceted queries, we explore a technique that explicitly represents interesting facets of a query using groups of semantically related terms extracted from search results. As an example, for the query ``baggage allowance'', these groups might be different airlines, different flight types (domestic, international), or different travel classes (first, business, economy). We name these groups query facets and the terms in these groups facet terms. We develop a supervised approach based on a graphical model to recognize query facets from the noisy candidates found. The graphical model learns how likely a candidate term is to be a facet term as well as how likely two terms are to be grouped together in a query facet, and captures the dependencies between the two factors. We propose two algorithms for approximate inference on the graphical model since exact inference is intractable. Our evaluation combines recall and precision of the facet terms with the grouping quality. Experimental results on a sample of web queries show that the supervised method significantly outperforms existing approaches, which are mostly unsupervised, suggesting that query facet extraction can be effectively learned.
网络搜索查询通常是模糊的或多方面的,这使得简单的结果排序列表是不够的。为了帮助查找此类分面查询的信息,我们探索了一种技术,该技术使用从搜索结果中提取的语义相关术语组显式地表示查询的感兴趣的方面。例如,对于查询“行李限额”,这些组可能是不同的航空公司、不同的航班类型(国内、国际)或不同的旅行等级(头等舱、商务舱、经济舱)。我们将这些组命名为查询facet,并将这些组中的术语命名为facet terms。我们开发了一种基于图形模型的监督方法,从发现的噪声候选对象中识别查询面。图形模型了解候选项成为面项的可能性,以及两个项在查询面中组合在一起的可能性,并捕获这两个因素之间的依赖关系。由于精确推理是难以处理的,我们提出了两种近似推理算法。我们的评估将面项的召回率和精度与分组质量相结合。在web查询样本上的实验结果表明,监督方法明显优于现有的方法,这些方法大多是无监督的,这表明查询facet提取可以有效地学习。
{"title":"Extracting query facets from search results","authors":"Weize Kong, James Allan","doi":"10.1145/2484028.2484097","DOIUrl":"https://doi.org/10.1145/2484028.2484097","url":null,"abstract":"Web search queries are often ambiguous or multi-faceted, which makes a simple ranked list of results inadequate. To assist information finding for such faceted queries, we explore a technique that explicitly represents interesting facets of a query using groups of semantically related terms extracted from search results. As an example, for the query ``baggage allowance'', these groups might be different airlines, different flight types (domestic, international), or different travel classes (first, business, economy). We name these groups query facets and the terms in these groups facet terms. We develop a supervised approach based on a graphical model to recognize query facets from the noisy candidates found. The graphical model learns how likely a candidate term is to be a facet term as well as how likely two terms are to be grouped together in a query facet, and captures the dependencies between the two factors. We propose two algorithms for approximate inference on the graphical model since exact inference is intractable. Our evaluation combines recall and precision of the facet terms with the grouping quality. Experimental results on a sample of web queries show that the supervised method significantly outperforms existing approaches, which are mostly unsupervised, suggesting that query facet extraction can be effectively learned.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129910067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Time-aware point-of-interest recommendation 有时间意识的兴趣点推荐
Quan Yuan, G. Cong, Zongyang Ma, Aixin Sun, N. Magnenat-Thalmann
The availability of user check-in data in large volume from the rapid growing location based social networks (LBSNs) enables many important location-aware services to users. Point-of-interest (POI) recommendation is one of such services, which is to recommend places where users have not visited before. Several techniques have been recently proposed for the recommendation service. However, no existing work has considered the temporal information for POI recommendations in LBSNs. We believe that time plays an important role in POI recommendations because most users tend to visit different places at different time in a day, eg visiting a restaurant at noon and visiting a bar at night. In this paper, we define a new problem, namely, the time-aware POI recommendation, to recommend POIs for a given user at a specified time in a day. To solve the problem, we develop a collaborative recommendation model that is able to incorporate temporal information. Moreover, based on the observation that users tend to visit nearby POIs, we further enhance the recommendation model by considering geographical information. Our experimental results on two real-world datasets show that the proposed approach outperforms the state-of-the-art POI recommendation methods substantially.
来自快速增长的基于位置的社交网络(LBSNs)的大量用户登记数据的可用性为用户提供了许多重要的位置感知服务。兴趣点推荐(Point-of-interest, POI)就是这样一种服务,它推荐用户以前没有去过的地方。最近针对推荐服务提出了几种技术。然而,现有的工作没有考虑到lbsn中POI建议的时间信息。我们认为时间在POI推荐中起着重要的作用,因为大多数用户倾向于在一天中的不同时间访问不同的地方,例如中午去餐馆,晚上去酒吧。在本文中,我们定义了一个新的问题,即时间感知POI推荐,即在一天中的指定时间为给定用户推荐POI。为了解决这个问题,我们开发了一个能够结合时间信息的协作推荐模型。此外,在观察到用户倾向于访问附近的poi的基础上,我们通过考虑地理信息进一步增强了推荐模型。我们在两个真实数据集上的实验结果表明,所提出的方法大大优于最先进的POI推荐方法。
{"title":"Time-aware point-of-interest recommendation","authors":"Quan Yuan, G. Cong, Zongyang Ma, Aixin Sun, N. Magnenat-Thalmann","doi":"10.1145/2484028.2484030","DOIUrl":"https://doi.org/10.1145/2484028.2484030","url":null,"abstract":"The availability of user check-in data in large volume from the rapid growing location based social networks (LBSNs) enables many important location-aware services to users. Point-of-interest (POI) recommendation is one of such services, which is to recommend places where users have not visited before. Several techniques have been recently proposed for the recommendation service. However, no existing work has considered the temporal information for POI recommendations in LBSNs. We believe that time plays an important role in POI recommendations because most users tend to visit different places at different time in a day, eg visiting a restaurant at noon and visiting a bar at night. In this paper, we define a new problem, namely, the time-aware POI recommendation, to recommend POIs for a given user at a specified time in a day. To solve the problem, we develop a collaborative recommendation model that is able to incorporate temporal information. Moreover, based on the observation that users tend to visit nearby POIs, we further enhance the recommendation model by considering geographical information. Our experimental results on two real-world datasets show that the proposed approach outperforms the state-of-the-art POI recommendation methods substantially.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128519015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 740
Faster upper bounding of intersection sizes 更快的交点大小上限
Daisuke Takuma, H. Yanagisawa
There is a long history of developing efficient algorithms for set intersection, which is a fundamental operation in information retrieval and databases. In this paper, we describe a new data structure, a Cardinality Filter, to quickly compute an upper bound on the size of a set intersection. Knowing an upper bound of the size can be used to accelerate many applications such as top-k query processing in text mining. Given finite sets A and B, the expected computation time for the upper bound of the size of the intersection |A cap B| is O( (|A| + |B|) w), where w is the machine word length. This is much faster than the current best algorithm for the exact intersection, which runs in O((|A| + |B|) / √w + |A cap B|) expected time. Our performance studies show that our implementations of Cardinality Filters are from 2 to 10 times faster than existing set intersection algorithms, and the time for a top-k query in a text mining application can be reduced by half.
集合交集是信息检索和数据库中的一项基本操作,其高效算法的开发已有很长的历史。在本文中,我们描述了一种新的数据结构,即基数过滤器,用于快速计算集合交集大小的上界。知道大小的上界可以用来加速许多应用程序,例如文本挖掘中的top-k查询处理。给定有限集合A和B,交集|A cap B|大小的上界的期望计算时间为O((|A| + |B|) w),其中w为机器字长。这比目前最好的精确交集算法要快得多,后者的预期时间为O((|A| + |B|) /√w + |A cap B|)。我们的性能研究表明,我们的Cardinality Filters的实现比现有的集合交集算法快2到10倍,并且文本挖掘应用程序中top-k查询的时间可以减少一半。
{"title":"Faster upper bounding of intersection sizes","authors":"Daisuke Takuma, H. Yanagisawa","doi":"10.1145/2484028.2484065","DOIUrl":"https://doi.org/10.1145/2484028.2484065","url":null,"abstract":"There is a long history of developing efficient algorithms for set intersection, which is a fundamental operation in information retrieval and databases. In this paper, we describe a new data structure, a Cardinality Filter, to quickly compute an upper bound on the size of a set intersection. Knowing an upper bound of the size can be used to accelerate many applications such as top-k query processing in text mining. Given finite sets A and B, the expected computation time for the upper bound of the size of the intersection |A cap B| is O( (|A| + |B|) w), where w is the machine word length. This is much faster than the current best algorithm for the exact intersection, which runs in O((|A| + |B|) / √w + |A cap B|) expected time. Our performance studies show that our implementations of Cardinality Filters are from 2 to 10 times faster than existing set intersection algorithms, and the time for a top-k query in a text mining application can be reduced by half.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128589684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Exploiting user feedback to learn to rank answers in q&a forums: a case study with stack overflow 利用用户反馈来学习对问答论坛中的答案进行排名:堆栈溢出的案例研究
D. H. Dalip, Marcos André Gonçalves, Marco Cristo, P. Calado
Collaborative web sites, such as collaborative encyclopedias, blogs, and forums, are characterized by a loose edit control, which allows anyone to freely edit their content. As a consequence, the quality of this content raises much concern. To deal with this, many sites adopt manual quality control mechanisms. However, given their size and change rate, manual assessment strategies do not scale and content that is new or unpopular is seldom reviewed. This has a negative impact on the many services provided, such as ranking and recommendation. To tackle with this problem, we propose a learning to rank (L2R) approach for ranking answers in Q&A forums. In particular, we adopt an approach based on Random Forests and represent query and answer pairs using eight different groups of features. Some of these features are used in the Q&A domain for the first time. Our L2R method was trained to learn the answer rating, based on the feedback users give to answers in Q&A forums. Using the proposed method, we were able (i) to outperform a state of the art baseline with gains of up to 21% in NDCG, a metric used to evaluate rankings; we also conducted a comprehensive study of the features, showing that (ii) review and user features are the most important in the Q&A domain although text features are useful for assessing quality of new answers; and (iii) the best set of new features we proposed was able to yield the best quality rankings.
协作式网站,如协作式百科全书、博客和论坛,其特点是编辑控制松散,允许任何人自由地编辑其内容。因此,这些内容的质量引起了很多关注。为了解决这个问题,许多站点采用手动质量控制机制。然而,考虑到它们的规模和变化速度,人工评估策略不能扩展,并且很少审查新的或不受欢迎的内容。这对所提供的许多服务有负面影响,例如排名和推荐。为了解决这个问题,我们提出了一种学习排序(L2R)方法来对问答论坛中的答案进行排序。特别地,我们采用了一种基于随机森林的方法,并使用八组不同的特征来表示查询和回答对。其中一些特性是首次在问答领域中使用。我们的L2R方法经过训练,可以根据用户在问答论坛上给出的答案反馈来学习答案评级。使用所提出的方法,我们能够(i)在NDCG(用于评估排名的指标)上取得高达21%的收益,超过最先进的基线;我们还对这些特征进行了全面的研究,表明(ii)尽管文本特征对于评估新答案的质量很有用,但评论和用户特征在问答领域是最重要的;(3)我们提出的最佳新功能集能够产生最佳质量排名。
{"title":"Exploiting user feedback to learn to rank answers in q&a forums: a case study with stack overflow","authors":"D. H. Dalip, Marcos André Gonçalves, Marco Cristo, P. Calado","doi":"10.1145/2484028.2484072","DOIUrl":"https://doi.org/10.1145/2484028.2484072","url":null,"abstract":"Collaborative web sites, such as collaborative encyclopedias, blogs, and forums, are characterized by a loose edit control, which allows anyone to freely edit their content. As a consequence, the quality of this content raises much concern. To deal with this, many sites adopt manual quality control mechanisms. However, given their size and change rate, manual assessment strategies do not scale and content that is new or unpopular is seldom reviewed. This has a negative impact on the many services provided, such as ranking and recommendation. To tackle with this problem, we propose a learning to rank (L2R) approach for ranking answers in Q&A forums. In particular, we adopt an approach based on Random Forests and represent query and answer pairs using eight different groups of features. Some of these features are used in the Q&A domain for the first time. Our L2R method was trained to learn the answer rating, based on the feedback users give to answers in Q&A forums. Using the proposed method, we were able (i) to outperform a state of the art baseline with gains of up to 21% in NDCG, a metric used to evaluate rankings; we also conducted a comprehensive study of the features, showing that (ii) review and user features are the most important in the Q&A domain although text features are useful for assessing quality of new answers; and (iii) the best set of new features we proposed was able to yield the best quality rankings.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126858544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
Term level search result diversification 术语级搜索结果多样化
Van Dang, W. Bruce Croft
Current approaches for search result diversification have been categorized as either implicit or explicit. The implicit approach assumes each document represents its own topic, and promotes diversity by selecting documents for different topics based on the difference of their vocabulary. On the other hand, the explicit approach models the set of query topics, or aspects. While the former approach is generally less effective, the latter usually depends on a manually created description of the query aspects, the automatic construction of which has proven difficult. This paper introduces a new approach: term-level diversification. Instead of modeling the set of query aspects, which are typically represented as coherent groups of terms, our approach uses terms without the grouping. Our results on the ClueWeb collection show that the grouping of topic terms provides very little benefit to diversification compared to simply using the terms themselves. Consequently, we demonstrate that term-level diversification, with topic terms identified automatically from the search results using a simple greedy algorithm, significantly outperforms methods that attempt to create a full topic structure for diversification.
目前搜索结果多样化的方法分为隐式和显式两类。隐式方法假设每个文档代表自己的主题,并根据词汇表的差异为不同主题选择文档,从而促进多样性。另一方面,显式方法对查询主题集或方面进行建模。虽然前一种方法通常不太有效,但后一种方法通常依赖于手动创建的查询方面的描述,其自动构造已被证明是困难的。本文介绍了一种新的方法:期限级多样化。我们的方法没有对查询方面集(通常表示为一致的术语组)建模,而是使用没有分组的术语。我们在ClueWeb集合上的结果表明,与简单地使用术语本身相比,对主题术语进行分组对多样化提供的好处很少。因此,我们证明了术语级多样化,使用简单的贪婪算法从搜索结果中自动识别主题术语,明显优于试图为多样化创建完整主题结构的方法。
{"title":"Term level search result diversification","authors":"Van Dang, W. Bruce Croft","doi":"10.1145/2484028.2484095","DOIUrl":"https://doi.org/10.1145/2484028.2484095","url":null,"abstract":"Current approaches for search result diversification have been categorized as either implicit or explicit. The implicit approach assumes each document represents its own topic, and promotes diversity by selecting documents for different topics based on the difference of their vocabulary. On the other hand, the explicit approach models the set of query topics, or aspects. While the former approach is generally less effective, the latter usually depends on a manually created description of the query aspects, the automatic construction of which has proven difficult. This paper introduces a new approach: term-level diversification. Instead of modeling the set of query aspects, which are typically represented as coherent groups of terms, our approach uses terms without the grouping. Our results on the ClueWeb collection show that the grouping of topic terms provides very little benefit to diversification compared to simply using the terms themselves. Consequently, we demonstrate that term-level diversification, with topic terms identified automatically from the search results using a simple greedy algorithm, significantly outperforms methods that attempt to create a full topic structure for diversification.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123416611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 64
期刊
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1