首页 > 最新文献

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval最新文献

英文 中文
Is relevance hard work?: evaluating the effort of making relevant assessments 相关性很难吗?:对相关评估工作的评价
R. Villa, Martin Halvey
The judging of relevance has been a subject of study in information retrieval for a long time, especially in the creation of relevance judgments for test collections. While the criteria by which assessors? judge relevance has been intensively studied, little work has investigated the process individual assessors go through to judge the relevance of a document. In this paper, we focus on the process by which relevance is judged, and in particular, the degree of effort a user must expend to judge relevance. By better understanding this effort in isolation, we may provide data which can be used to create better models of search. We present the results of an empirical evaluation of the effort users must exert to judge the relevance of document, investigating the effect of relevance level and document size. Results suggest that 'relevant' documents require more effort to judge when compared to highly relevant and not relevant documents, and that effort increases as document size increases.
相关性判断一直是信息检索领域的一个研究课题,特别是在测验集相关性判断的创建方面。而评估员的标准是什么呢?法官相关性已经得到了深入的研究,很少有工作调查个别评估人员判断文件相关性的过程。在本文中,我们关注的是判断相关性的过程,特别是用户必须花费多少精力来判断相关性。通过更好地理解这种孤立的努力,我们可以提供可用于创建更好的搜索模型的数据。我们提出了对用户判断文档相关性所付出的努力的实证评估结果,调查了相关性水平和文档大小的影响。结果表明,与高度相关和不相关的文档相比,判断“相关”文档需要更多的努力,而且随着文档大小的增加,这种努力也会增加。
{"title":"Is relevance hard work?: evaluating the effort of making relevant assessments","authors":"R. Villa, Martin Halvey","doi":"10.1145/2484028.2484150","DOIUrl":"https://doi.org/10.1145/2484028.2484150","url":null,"abstract":"The judging of relevance has been a subject of study in information retrieval for a long time, especially in the creation of relevance judgments for test collections. While the criteria by which assessors? judge relevance has been intensively studied, little work has investigated the process individual assessors go through to judge the relevance of a document. In this paper, we focus on the process by which relevance is judged, and in particular, the degree of effort a user must expend to judge relevance. By better understanding this effort in isolation, we may provide data which can be used to create better models of search. We present the results of an empirical evaluation of the effort users must exert to judge the relevance of document, investigating the effect of relevance level and document size. Results suggest that 'relevant' documents require more effort to judge when compared to highly relevant and not relevant documents, and that effort increases as document size increases.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127834146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
A multilingual and multiplatform application for medicinal plants prescription from medical symptoms 一个多语言、多平台的药用植物处方应用程序
Fernando Ruiz-Rico, D. Tomás, J. González, María-Consuelo Rubio-Sánchez
This paper presents an application for medicinal plants prescription based on text classification techniques. The system receives as an input a free text describing the symptoms of a user, and retrieves a ranked list of medicinal plants related to those symptoms. In addition, a set of links to Wikipedia are also provided, enriching the information about every medicinal plant presented to the user. In order to improve the accessibility to the application, the input can be written in six different languages, adapting the results accordingly. The application interface can be accessed from different devices and platforms.
本文介绍了基于文本分类技术在药用植物处方分类中的应用。系统接收描述用户症状的免费文本作为输入,并检索与这些症状相关的药用植物的排序列表。此外,还提供了一组到维基百科的链接,丰富了提供给用户的每种药用植物的信息。为了提高应用程序的可访问性,可以用六种不同的语言编写输入,并相应地调整结果。应用程序接口可以从不同的设备和平台访问。
{"title":"A multilingual and multiplatform application for medicinal plants prescription from medical symptoms","authors":"Fernando Ruiz-Rico, D. Tomás, J. González, María-Consuelo Rubio-Sánchez","doi":"10.1145/2484028.2484201","DOIUrl":"https://doi.org/10.1145/2484028.2484201","url":null,"abstract":"This paper presents an application for medicinal plants prescription based on text classification techniques. The system receives as an input a free text describing the symptoms of a user, and retrieves a ranked list of medicinal plants related to those symptoms. In addition, a set of links to Wikipedia are also provided, enriching the information about every medicinal plant presented to the user. In order to improve the accessibility to the application, the input can be written in six different languages, adapting the results accordingly. The application interface can be accessed from different devices and platforms.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"10 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133203729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretation of coordinations, compound generation, and result fusion for query variants 解释查询变量的协调、复合生成和结果融合
Johannes Leveling
We investigate interpreting coordinations (e.g. word sequences connected with coordinating conjunctions such as "and" and "or") as logical disjunctions of terms to generate a set of disjunctionfree query variants for information retrieval (IR) queries. In addition, so-called hyphen coordinations are resolved by generating full compound forms and rephrasing the original query, e.g. "rice im-and export" is transformed into "rice import and export". Query variants are then processed separately and retrieval results are merged using a standard data fusion technique. We evaluate the approach on German standard IR benchmarking data. The results show that: i) Our proposed approach to generate compounds from hyphen coordinations produces the correct results for all test topics. ii) Our proposed heuristics to identify coordinations and generate query variants based on shallow natural language processing (NLP) techniques is highly accurate on the topics and does not rely on parsing or part-of-speech tagging. iii) Using query variants to produce multiple retrieval results and merging the results decreases precision at top ranks. However, in combination with blind relevance feedback (BRF), this approach can show significant improvement over the standard BRF baseline using the original queries.
我们研究了将协调(例如与“and”和“or”等协调连词相连的单词序列)解释为术语的逻辑析取,以生成一组用于信息检索(IR)查询的无析取查询变体。此外,所谓的连字符协调是通过生成完整的复合形式和改写原始查询来解决的,例如:由“大米进出口”转变为“大米进出口”。然后分别处理查询变量,并使用标准数据融合技术合并检索结果。我们对德国标准IR基准数据的方法进行了评估。结果表明:i)我们提出的从连字符配位生成复合词的方法对所有测试主题都产生了正确的结果。ii)我们提出的基于浅层自然语言处理(NLP)技术的识别协调和生成查询变体的启发式方法在主题上非常准确,并且不依赖于解析或词性标记。iii)使用查询变量产生多个检索结果并合并结果会降低顶级的精度。然而,结合盲目相关反馈(BRF),这种方法可以比使用原始查询的标准BRF基线显示出显著的改进。
{"title":"Interpretation of coordinations, compound generation, and result fusion for query variants","authors":"Johannes Leveling","doi":"10.1145/2484028.2484115","DOIUrl":"https://doi.org/10.1145/2484028.2484115","url":null,"abstract":"We investigate interpreting coordinations (e.g. word sequences connected with coordinating conjunctions such as \"and\" and \"or\") as logical disjunctions of terms to generate a set of disjunctionfree query variants for information retrieval (IR) queries. In addition, so-called hyphen coordinations are resolved by generating full compound forms and rephrasing the original query, e.g. \"rice im-and export\" is transformed into \"rice import and export\". Query variants are then processed separately and retrieval results are merged using a standard data fusion technique. We evaluate the approach on German standard IR benchmarking data. The results show that: i) Our proposed approach to generate compounds from hyphen coordinations produces the correct results for all test topics. ii) Our proposed heuristics to identify coordinations and generate query variants based on shallow natural language processing (NLP) techniques is highly accurate on the topics and does not rely on parsing or part-of-speech tagging. iii) Using query variants to produce multiple retrieval results and merging the results decreases precision at top ranks. However, in combination with blind relevance feedback (BRF), this approach can show significant improvement over the standard BRF baseline using the original queries.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133226472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Estimating topical context by diverging from external resources 通过偏离外部资源来估计主题背景
Romain Deveaud, E. SanJuan, P. Bellot
Improving query understanding is crucial for providing the user with information that suits her needs. To this end, the retrieval system must be able to deal with several sources of knowledge from which it could infer a topical context. The use of external sources of information for improving document retrieval has been extensively studied. Improvements with either structured or large sets of data have been reported. However, in these studies resources are often used separately and rarely combined together. We experiment in this paper a method that discounts documents based on their weighted divergence from a set of external resources. We present an evaluation of the combination of four resources on two standard TREC test collections. Our proposed method significantly outperforms a state-of-the-art Mixture of Relevance Models on one test collection, while no significant differences are detected on the other one.
提高查询理解能力对于向用户提供适合其需求的信息至关重要。为此目的,检索系统必须能够处理几个知识来源,从中可以推断出主题上下文。利用外部信息源改进文件检索的问题已得到广泛研究。已经报告了对结构化或大型数据集的改进。然而,在这些研究中,资源往往是单独使用的,很少结合在一起。我们在本文中实验了一种方法,该方法基于它们与一组外部资源的加权散度来对文档进行折扣。我们在两个标准TREC测试集合上对四种资源的组合进行了评估。我们提出的方法在一个测试集合上显著优于最先进的混合相关模型,而在另一个测试集合上没有检测到显著差异。
{"title":"Estimating topical context by diverging from external resources","authors":"Romain Deveaud, E. SanJuan, P. Bellot","doi":"10.1145/2484028.2484148","DOIUrl":"https://doi.org/10.1145/2484028.2484148","url":null,"abstract":"Improving query understanding is crucial for providing the user with information that suits her needs. To this end, the retrieval system must be able to deal with several sources of knowledge from which it could infer a topical context. The use of external sources of information for improving document retrieval has been extensively studied. Improvements with either structured or large sets of data have been reported. However, in these studies resources are often used separately and rarely combined together. We experiment in this paper a method that discounts documents based on their weighted divergence from a set of external resources. We present an evaluation of the combination of four resources on two standard TREC test collections. Our proposed method significantly outperforms a state-of-the-art Mixture of Relevance Models on one test collection, while no significant differences are detected on the other one.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132341309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Incorporating popularity in topic models for social network analysis 将流行度纳入社会网络分析的主题模型
Youngchul Cha, Bin Bi, Chu-Cheng Hsieh, Junghoo Cho
Topic models are used to group words in a text dataset into a set of relevant topics. Unfortunately, when a few words frequently appear in a dataset, the topic groups identified by topic models become noisy because these frequent words repeatedly appear in "irrelevant" topic groups. This noise has not been a serious problem in a text dataset because the frequent words (e.g., the and is) do not have much meaning and have been simply removed before a topic model analysis. However, in a social network dataset we are interested in, they correspond to popular persons (e.g., Barack Obama and Justin Bieber) and cannot be simply removed because most people are interested in them. To solve this "popularity problem", we explicitly model the popularity of nodes (words) in topic models. For this purpose, we first introduce a notion of a "popularity component" and propose topic model extensions that effectively accommodate the popularity component. We evaluate the effectiveness of our models with a real-world Twitter dataset. Our proposed models achieve significantly lower perplexity (i.e., better prediction power) compared to the state-of-the-art baselines. In addition to the popularity problem caused by the nodes with high incoming edge degree, we also investigate the effect of the outgoing edge degree with another topic model extensions. We show that considering outgoing edge degree does not help much in achieving lower perplexity.
主题模型用于将文本数据集中的单词分组为一组相关主题。不幸的是,当一些单词频繁出现在数据集中时,由主题模型识别的主题组变得嘈杂,因为这些频繁出现的单词反复出现在“不相关”的主题组中。这种噪声在文本数据集中并不是一个严重的问题,因为频繁的单词(例如,and is)没有太多的意义,在主题模型分析之前已经被简单地删除了。然而,在我们感兴趣的社交网络数据集中,他们对应于受欢迎的人(例如,巴拉克·奥巴马和贾斯汀·比伯),不能简单地删除,因为大多数人对他们感兴趣。为了解决这个“流行度问题”,我们在主题模型中显式地对节点(词)的流行度进行建模。为此,我们首先引入“流行度组件”的概念,并提出有效容纳流行度组件的主题模型扩展。我们用真实的Twitter数据集来评估模型的有效性。与最先进的基线相比,我们提出的模型显着降低了困惑度(即更好的预测能力)。除了高入边度节点的受欢迎程度问题外,我们还通过另一个主题模型扩展研究了出边度的影响。结果表明,考虑向外边缘度对实现较低的困惑度没有太大帮助。
{"title":"Incorporating popularity in topic models for social network analysis","authors":"Youngchul Cha, Bin Bi, Chu-Cheng Hsieh, Junghoo Cho","doi":"10.1145/2484028.2484086","DOIUrl":"https://doi.org/10.1145/2484028.2484086","url":null,"abstract":"Topic models are used to group words in a text dataset into a set of relevant topics. Unfortunately, when a few words frequently appear in a dataset, the topic groups identified by topic models become noisy because these frequent words repeatedly appear in \"irrelevant\" topic groups. This noise has not been a serious problem in a text dataset because the frequent words (e.g., the and is) do not have much meaning and have been simply removed before a topic model analysis. However, in a social network dataset we are interested in, they correspond to popular persons (e.g., Barack Obama and Justin Bieber) and cannot be simply removed because most people are interested in them. To solve this \"popularity problem\", we explicitly model the popularity of nodes (words) in topic models. For this purpose, we first introduce a notion of a \"popularity component\" and propose topic model extensions that effectively accommodate the popularity component. We evaluate the effectiveness of our models with a real-world Twitter dataset. Our proposed models achieve significantly lower perplexity (i.e., better prediction power) compared to the state-of-the-art baselines. In addition to the popularity problem caused by the nodes with high incoming edge degree, we also investigate the effect of the outgoing edge degree with another topic model extensions. We show that considering outgoing edge degree does not help much in achieving lower perplexity.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134438144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Exploiting hybrid contexts for Tweet segmentation 利用混合上下文进行Tweet分割
Chenliang Li, Aixin Sun, J. Weng, Qi He
Twitter has attracted hundred millions of users to share and disseminate most up-to-date information. However, the noisy and short nature of tweets makes many applications in information retrieval (IR) and natural language processing (NLP) challenging. Recently, segment-based tweet representation has demonstrated effectiveness in named entity recognition (NER) and event detection from tweet streams. To split tweets into meaningful phrases or segments, the previous work is purely based on external knowledge bases, which ignores the rich local context information embedded in the tweets. In this paper, we propose a novel framework for tweet segmentation in a batch mode, called HybridSeg. HybridSeg incorporates local context knowledge with global knowledge bases for better tweet segmentation. HybridSeg consists of two steps: learning from off-the-shelf weak NERs and learning from pseudo feedback. In the first step, the existing NER tools are applied to a batch of tweets. The named entities recognized by these NERs are then employed to guide the tweet segmentation process. In the second step, HybridSeg adjusts the tweet segmentation results iteratively by exploiting all segments in the batch of tweets in a collective manner. Experiments on two tweet datasets show that HybridSeg significantly improves tweet segmentation quality compared with the state-of-the-art algorithm. We also conduct a case study by using tweet segments for the task of named entity recognition from tweets. The experimental results demonstrate that HybridSeg significantly benefits the downstream applications.
Twitter已经吸引了数亿用户来分享和传播最新的信息。然而,推文的噪声和简短特性给信息检索(IR)和自然语言处理(NLP)中的许多应用带来了挑战。最近,基于片段的推文表示在推文流的命名实体识别(NER)和事件检测中表现出了有效性。为了将推文拆分为有意义的短语或片段,之前的工作纯粹是基于外部知识库,忽略了推文中嵌入的丰富的本地上下文信息。在本文中,我们提出了一个新的框架,用于批处理模式下的tweet分割,称为HybridSeg。HybridSeg将本地上下文知识与全球知识库相结合,以实现更好的tweet分割。HybridSeg包括两个步骤:从现成的弱ner中学习和从伪反馈中学习。在第一步中,将现有的NER工具应用于一批tweet。然后使用这些ner识别的命名实体来指导tweet分割过程。第二步,HybridSeg通过集体利用批推文中的所有片段,迭代调整推文分割结果。在两个tweet数据集上的实验表明,与现有算法相比,HybridSeg显著提高了tweet分割质量。我们还进行了一个案例研究,使用推文片段来完成从推文中识别命名实体的任务。实验结果表明,HybridSeg对下游应用有显著的好处。
{"title":"Exploiting hybrid contexts for Tweet segmentation","authors":"Chenliang Li, Aixin Sun, J. Weng, Qi He","doi":"10.1145/2484028.2484044","DOIUrl":"https://doi.org/10.1145/2484028.2484044","url":null,"abstract":"Twitter has attracted hundred millions of users to share and disseminate most up-to-date information. However, the noisy and short nature of tweets makes many applications in information retrieval (IR) and natural language processing (NLP) challenging. Recently, segment-based tweet representation has demonstrated effectiveness in named entity recognition (NER) and event detection from tweet streams. To split tweets into meaningful phrases or segments, the previous work is purely based on external knowledge bases, which ignores the rich local context information embedded in the tweets. In this paper, we propose a novel framework for tweet segmentation in a batch mode, called HybridSeg. HybridSeg incorporates local context knowledge with global knowledge bases for better tweet segmentation. HybridSeg consists of two steps: learning from off-the-shelf weak NERs and learning from pseudo feedback. In the first step, the existing NER tools are applied to a batch of tweets. The named entities recognized by these NERs are then employed to guide the tweet segmentation process. In the second step, HybridSeg adjusts the tweet segmentation results iteratively by exploiting all segments in the batch of tweets in a collective manner. Experiments on two tweet datasets show that HybridSeg significantly improves tweet segmentation quality compared with the state-of-the-art algorithm. We also conduct a case study by using tweet segments for the task of named entity recognition from tweets. The experimental results demonstrate that HybridSeg significantly benefits the downstream applications.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130339808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Incorporating vertical results into search click models 将垂直结果整合到搜索点击模型中
Chao Wang, Yiqun Liu, Min Zhang, Shaoping Ma, Meihong Zheng, Jing Qian, Kuo Zhang
In modern search engines, an increasing number of search result pages (SERPs) are federated from multiple specialized search engines (called verticals, such as Image or Video). As an effective approach to interpret users' click-through behavior as feedback information, most click models were designed to reduce the position bias and improve ranking performance of ordinary search results, which have homogeneous appearances. However, when vertical results are combined with ordinary ones, significant differences in presentation may lead to user behavior biases and thus failure of state-of-the-art click models. With the help of a popular commercial search engine in China, we collected a large scale log data set which contains behavior information on both vertical and ordinary results. We also performed eye-tracking analysis to study user's real-world examining behavior. According these analysis, we found that different result appearances may cause different behavior biases both for vertical results (local effect) and for the whole result lists (global effect). These biases include: examine bias for vertical results (especially those with multimedia components), trust bias for result lists with vertical results, and a higher probability of result revisitation for vertical results. Based on these findings, a novel click model considering these biases besides position bias was constructed to describe interaction with SERPs containing verticals. Experimental results show that the new Vertical-aware Click Model (VCM) is better at interpreting user click behavior on federated searches in terms of both log-likelihood and perplexity than existing models.
在现代搜索引擎中,越来越多的搜索结果页面(serp)由多个专门的搜索引擎(称为垂直搜索引擎,如Image或Video)联合而成。作为一种将用户点击行为解释为反馈信息的有效方法,大多数点击模型的设计都是为了减少位置偏差,提高普通搜索结果的排名性能,而普通搜索结果具有同质的外观。然而,当垂直结果与普通结果结合在一起时,表现的显著差异可能会导致用户行为偏差,从而导致最先进的点击模型失败。在中国一个流行的商业搜索引擎的帮助下,我们收集了一个大规模的日志数据集,其中包含垂直和普通结果的行为信息。我们还进行了眼动追踪分析,以研究用户在现实世界中的检查行为。根据这些分析,我们发现不同的结果出现可能导致不同的行为偏差,无论是垂直结果(局部效应)还是整个结果列表(全局效应)。这些偏差包括:垂直结果的检验偏差(特别是那些带有多媒体组件的结果),垂直结果的结果列表的信任偏差,以及垂直结果的更高的结果重访概率。基于这些发现,我们构建了一个新的点击模型,除了考虑位置偏差之外,还考虑了这些偏差,以描述与包含垂直内容的serp的交互。实验结果表明,与现有模型相比,新的垂直感知点击模型(VCM)在对数似然和困惑度方面都能更好地解释用户在联邦搜索中的点击行为。
{"title":"Incorporating vertical results into search click models","authors":"Chao Wang, Yiqun Liu, Min Zhang, Shaoping Ma, Meihong Zheng, Jing Qian, Kuo Zhang","doi":"10.1145/2484028.2484036","DOIUrl":"https://doi.org/10.1145/2484028.2484036","url":null,"abstract":"In modern search engines, an increasing number of search result pages (SERPs) are federated from multiple specialized search engines (called verticals, such as Image or Video). As an effective approach to interpret users' click-through behavior as feedback information, most click models were designed to reduce the position bias and improve ranking performance of ordinary search results, which have homogeneous appearances. However, when vertical results are combined with ordinary ones, significant differences in presentation may lead to user behavior biases and thus failure of state-of-the-art click models. With the help of a popular commercial search engine in China, we collected a large scale log data set which contains behavior information on both vertical and ordinary results. We also performed eye-tracking analysis to study user's real-world examining behavior. According these analysis, we found that different result appearances may cause different behavior biases both for vertical results (local effect) and for the whole result lists (global effect). These biases include: examine bias for vertical results (especially those with multimedia components), trust bias for result lists with vertical results, and a higher probability of result revisitation for vertical results. Based on these findings, a novel click model considering these biases besides position bias was constructed to describe interaction with SERPs containing verticals. Experimental results show that the new Vertical-aware Click Model (VCM) is better at interpreting user click behavior on federated searches in terms of both log-likelihood and perplexity than existing models.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114794224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
How query cost affects search behavior 查询成本如何影响搜索行为
L. Azzopardi, D. Kelly, Kathy Brennan
affects how users interact with a search system. Microeconomic theory is used to generate the cost-interaction hypothesis that states as the cost of querying increases, users will pose fewer queries and examine more documents per query. A between-subjects laboratory study with 36 undergraduate subjects was conducted, where subjects were randomly assigned to use one of three search interfaces that varied according to the amount of physical cost required to query: Structured (high cost), Standard (medium cost) and Query Suggestion (low cost). Results show that subjects who used the Structured interface submitted significantly fewer queries, spent more time on search results pages, examined significantly more documents per query, and went to greater depths in the search results list. Results also showed that these subjects spent longer generating their initial queries, saved more relevant documents and rated their queries as more successful. These findings have implications for the usefulness of microeconomic theory as a way to model and explain search interaction, as well as for the design of query facilities.
影响用户与搜索系统的交互方式。微观经济学理论用于生成成本交互假设,该假设认为,随着查询成本的增加,用户将提出更少的查询,每个查询检查更多的文档。我们对36名本科生进行了受试者间实验室研究,受试者被随机分配使用三种搜索界面中的一种,这些界面根据查询所需的物理成本而变化:结构化(高成本)、标准(中等成本)和查询建议(低成本)。结果表明,使用结构化接口的受试者提交的查询明显减少,在搜索结果页面上花费的时间更多,每个查询检查的文档明显更多,并且在搜索结果列表中更深入。结果还显示,这些受试者花了更长的时间来生成他们的初始查询,保存了更多的相关文档,并认为他们的查询更成功。这些发现暗示了微观经济理论作为建模和解释搜索交互的有用性,以及查询设施的设计。
{"title":"How query cost affects search behavior","authors":"L. Azzopardi, D. Kelly, Kathy Brennan","doi":"10.1145/2484028.2484049","DOIUrl":"https://doi.org/10.1145/2484028.2484049","url":null,"abstract":"affects how users interact with a search system. Microeconomic theory is used to generate the cost-interaction hypothesis that states as the cost of querying increases, users will pose fewer queries and examine more documents per query. A between-subjects laboratory study with 36 undergraduate subjects was conducted, where subjects were randomly assigned to use one of three search interfaces that varied according to the amount of physical cost required to query: Structured (high cost), Standard (medium cost) and Query Suggestion (low cost). Results show that subjects who used the Structured interface submitted significantly fewer queries, spent more time on search results pages, examined significantly more documents per query, and went to greater depths in the search results list. Results also showed that these subjects spent longer generating their initial queries, saved more relevant documents and rated their queries as more successful. These findings have implications for the usefulness of microeconomic theory as a way to model and explain search interaction, as well as for the design of query facilities.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117122337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
Recommending personalized touristic sights using google places 使用谷歌位置推荐个性化的旅游景点
Maya Sappelli, S. Verberne, Wessel Kraaij
The purpose of the Contextual Suggestion track, an evaluation task at the TREC 2012 conference, is to suggest personalized tourist activities to an individual, given a certain location and time. In our content-based approach, we collected initial recommendations using the location context as search query in Google Places. We first ranked the recommendations based on their textual similarity to the user profiles. In order to improve the ranking of popular sights, we combined the initial ranking with rankings based on Google Search, popularity and categories. Finally, we performed filtering based on the temporal context. Overall, our system performed well above average and median, and outperformed the baseline - Google Places only -- run.
作为TREC 2012会议的一项评估任务,“情境建议”的目的是在给定的地点和时间内向个人建议个性化的旅游活动。在我们基于内容的方法中,我们使用位置上下文作为Google Places中的搜索查询来收集初始推荐。我们首先根据与用户资料的文本相似性对推荐进行排名。为了提高热门景点的排名,我们将最初的排名与基于谷歌搜索、受欢迎程度和类别的排名结合起来。最后,我们基于时间上下文执行过滤。总体而言,我们的系统的表现远远高于平均水平和中位数,并且优于基准(仅限Google Places)。
{"title":"Recommending personalized touristic sights using google places","authors":"Maya Sappelli, S. Verberne, Wessel Kraaij","doi":"10.1145/2484028.2484155","DOIUrl":"https://doi.org/10.1145/2484028.2484155","url":null,"abstract":"The purpose of the Contextual Suggestion track, an evaluation task at the TREC 2012 conference, is to suggest personalized tourist activities to an individual, given a certain location and time. In our content-based approach, we collected initial recommendations using the location context as search query in Google Places. We first ranked the recommendations based on their textual similarity to the user profiles. In order to improve the ranking of popular sights, we combined the initial ranking with rankings based on Google Search, popularity and categories. Finally, we performed filtering based on the temporal context. Overall, our system performed well above average and median, and outperformed the baseline - Google Places only -- run.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116182226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Workshop on health search and discovery: helping users and advancing medicine 健康搜索和发现:帮助用户和推进医学研讨会
Ryen W. White, E. Yom-Tov, E. Horvitz, Eugene Agichtein, W. Hersh
This workshop brings together researchers and practitioners from industry and academia to discuss search and discovery in the medi-cal domain. The event focuses on ways to make medical and health information more accessible to laypeople (including enhancements to ranking algorithms and search interfaces), and how we can dis-cover new medical facts and phenomena from information sought online, as evidenced in query streams and other sources such as social media. This domain also offers many opportunities for appli-cations that monitor and improve quality of life of those affected by medical conditions, by providing tools to support their health-related information behavior.
本次研讨会汇集了来自工业界和学术界的研究人员和实践者,讨论医疗领域的搜索和发现。该活动的重点是如何使非专业人员更容易获得医疗和卫生信息(包括增强排名算法和搜索界面),以及我们如何从在线搜索的信息中发现新的医疗事实和现象,如查询流和社交媒体等其他来源所证明的那样。该领域还为应用程序提供了许多机会,这些应用程序通过提供工具来支持与健康相关的信息行为,监测并改善受医疗条件影响者的生活质量。
{"title":"Workshop on health search and discovery: helping users and advancing medicine","authors":"Ryen W. White, E. Yom-Tov, E. Horvitz, Eugene Agichtein, W. Hersh","doi":"10.1145/2484028.2484220","DOIUrl":"https://doi.org/10.1145/2484028.2484220","url":null,"abstract":"This workshop brings together researchers and practitioners from industry and academia to discuss search and discovery in the medi-cal domain. The event focuses on ways to make medical and health information more accessible to laypeople (including enhancements to ranking algorithms and search interfaces), and how we can dis-cover new medical facts and phenomena from information sought online, as evidenced in query streams and other sources such as social media. This domain also offers many opportunities for appli-cations that monitor and improve quality of life of those affected by medical conditions, by providing tools to support their health-related information behavior.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115149662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1