首页 > 最新文献

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

英文 中文
Controversy Detection and Stance Analysis 争议检测与立场分析
Shiri Dori-Hacohen
Alerting users about controversial search results can encourage critical literacy, promote healthy civic discourse and counteract the "filter bubble" effect. Additionally, presenting information to the user about the different stances or sides of the debate can help her navigate the landscape of search results. Our existing work made strides in the emerging niche of controversy detection and analysis; we propose further work on automatic stance detection.
提醒用户有争议的搜索结果可以鼓励批判性素养,促进健康的公民话语,抵消“过滤泡沫”效应。此外,向用户展示关于辩论的不同立场或方面的信息可以帮助他们浏览搜索结果。我们现有的工作在新兴的争议检测和分析领域取得了长足的进步;我们建议进一步研究自动姿态检测。
{"title":"Controversy Detection and Stance Analysis","authors":"Shiri Dori-Hacohen","doi":"10.1145/2766462.2767844","DOIUrl":"https://doi.org/10.1145/2766462.2767844","url":null,"abstract":"Alerting users about controversial search results can encourage critical literacy, promote healthy civic discourse and counteract the \"filter bubble\" effect. Additionally, presenting information to the user about the different stances or sides of the debate can help her navigate the landscape of search results. Our existing work made strides in the emerging niche of controversy detection and analysis; we propose further work on automatic stance detection.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127097568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Using Contextual Information to Understand Searching and Browsing Behavior 使用上下文信息来理解搜索和浏览行为
Julia Kiseleva
There is great imbalance in the richness of information on the web and the succinctness and poverty of search requests of web users, making their queries only a partial description of the underlying complex information needs. Finding ways to better leverage contextual information and make search context-aware holds the promise to dramatically improve the search experience of users. We conducted a series of studies to discover, model and utilize contextual information in order to understand and improve users' searching and browsing behavior on the web. Our results capture important aspects of context under the realistic conditions of different online search services, aiming to ensure that our scientific insights and solutions transfer to the operational settings of real world applications.
网络上信息的丰富性与网络用户搜索请求的简洁性和贫乏性之间存在很大的不平衡,使得他们的查询只能部分地描述潜在的复杂信息需求。寻找更好地利用上下文信息并使搜索具有上下文意识的方法有望极大地改善用户的搜索体验。我们进行了一系列的研究来发现、建模和利用上下文信息,以了解和改善用户在网络上的搜索和浏览行为。我们的结果在不同在线搜索服务的现实条件下捕获了上下文的重要方面,旨在确保我们的科学见解和解决方案转移到现实世界应用程序的操作设置中。
{"title":"Using Contextual Information to Understand Searching and Browsing Behavior","authors":"Julia Kiseleva","doi":"10.1145/2766462.2767852","DOIUrl":"https://doi.org/10.1145/2766462.2767852","url":null,"abstract":"There is great imbalance in the richness of information on the web and the succinctness and poverty of search requests of web users, making their queries only a partial description of the underlying complex information needs. Finding ways to better leverage contextual information and make search context-aware holds the promise to dramatically improve the search experience of users. We conducted a series of studies to discover, model and utilize contextual information in order to understand and improve users' searching and browsing behavior on the web. Our results capture important aspects of context under the realistic conditions of different online search services, aiming to ensure that our scientific insights and solutions transfer to the operational settings of real world applications.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129097449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Reducing Hubness: A Cause of Vulnerability in Recommender Systems 减少hub:推荐系统中漏洞的一个原因
Kazuo Hara, Ikumi Suzuki, Kei Kobayashi, K. Fukumizu
It is known that memory-based collaborative filtering systems are vulnerable to shilling attacks. In this paper, we demonstrate that hubness, which occurs in high dimensional data, is exploited by the attacks. Hence we explore methods for reducing hubness in user-response data to make these systems robust against attacks. Using the MovieLens dataset, we empirically show that the two methods for reducing hubness by transforming a similarity matrix(i) centering and (ii) conversion to a commute time kernel-can thwart attacks without degrading the recommendation performance.
众所周知,基于内存的协同过滤系统容易受到先令攻击。在本文中,我们证明了在高维数据中出现的中心性被攻击所利用。因此,我们探索减少用户响应数据中心的方法,以使这些系统对攻击具有鲁棒性。使用MovieLens数据集,我们通过经验表明,通过转换相似矩阵(i)集中和(ii)转换到通勤时间核来减少中心度的两种方法可以在不降低推荐性能的情况下阻止攻击。
{"title":"Reducing Hubness: A Cause of Vulnerability in Recommender Systems","authors":"Kazuo Hara, Ikumi Suzuki, Kei Kobayashi, K. Fukumizu","doi":"10.1145/2766462.2767823","DOIUrl":"https://doi.org/10.1145/2766462.2767823","url":null,"abstract":"It is known that memory-based collaborative filtering systems are vulnerable to shilling attacks. In this paper, we demonstrate that hubness, which occurs in high dimensional data, is exploited by the attacks. Hence we explore methods for reducing hubness in user-response data to make these systems robust against attacks. Using the MovieLens dataset, we empirically show that the two methods for reducing hubness by transforming a similarity matrix(i) centering and (ii) conversion to a commute time kernel-can thwart attacks without degrading the recommendation performance.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133009891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Spoken Conversational Search: Information Retrieval over a Speech-only Communication Channel 口语会话搜索:基于语音通信通道的信息检索
Johanne R. Trippas
This research is investigating a new interaction paradigm for Interactive Information Retrieval (IIR), where all input and output is mediated via speech. While such information systems have been important for the visually impaired for many years, a renewed focus on speech is driven by the growing sales of internet enabled mobile devices. Presenting search results over a speech-only communication channel involves a number of challenges for users due to cognitive limitations and the serial nature of the audio channel [2]. Other research has shown that one cannot just ‘bolt on’ speech recognizers and screen readers to an existing system [5]. Therefore the aim of this research is to develop a new framework for effective and efficient IIR over a speech-only channel: a Spoken Conversational Search System (SCSS) which provides a conversational approach to determining user information needs, presenting results and enabling search reformulations. This research will go beyond current Voice Search approaches by aiming for a greater integration between document search and conversational dialogue processes in order to provide a more efficient and effective search experience when using a SCSS. We will also investigate an information seeking model for audio and language models. Presenting a Search Engine Result Page (SERP) over a speechonly communication channel presents a number of challenges, e.g., the textual component of a standard search results list has been shown to be ineffectual [4]. The transient nature of speech poses problems due to memory constraints, and makes the possibility of “skimming” back and forth over a list of results (a standard process in browsing a visual list) difficult. These issues are greatly exacerbated when the result being sought is further down the list. This research will advance the knowledge base by: Providing an understanding of which strategies and IIR techniques for SCSS are best for users. Defining novel technologies for contextual conversational interaction with a large collection of unstructured documents that supports effective search over a speech-only communication channel (audio). Determining new methods for providing summary-based resultpresentation for unstructured documents.
本研究探讨了交互式信息检索(IIR)的一种新的交互范式,其中所有的输入和输出都是通过语音介导的。虽然这些信息系统多年来一直对视障人士很重要,但互联网移动设备销售的增长推动了对语音的重新关注。由于认知限制和音频通道的串行性,在纯语音通信通道上呈现搜索结果对用户来说存在许多挑战[2]。其他研究表明,人们不能仅仅将语音识别器和屏幕阅读器“连接”到现有系统上[5]。因此,本研究的目的是开发一种新的框架,用于在纯语音通道上有效和高效的IIR:口语会话搜索系统(SCSS),它提供了一种会话方法来确定用户信息需求,呈现结果并启用搜索重新制定。这项研究将超越当前的语音搜索方法,旨在将文档搜索和会话对话过程更大程度地整合在一起,以便在使用SCSS时提供更高效和有效的搜索体验。我们还将研究音频和语言模型的信息搜索模型。通过语音通信渠道呈现搜索引擎结果页面(SERP)存在许多挑战,例如,标准搜索结果列表的文本部分已被证明是无效的[4]。由于记忆的限制,语音的短暂性带来了问题,并且使得在结果列表上来回“略读”(浏览视觉列表的标准过程)变得困难。当所寻求的结果在列表的后面时,这些问题会大大加剧。本研究将通过以下方式推进知识库:提供对SCSS的哪些策略和IIR技术最适合用户的理解。定义与大量非结构化文档进行上下文会话交互的新技术,这些文档支持在纯语音通信通道(音频)上进行有效搜索。确定为非结构化文档提供基于摘要的结果表示的新方法。
{"title":"Spoken Conversational Search: Information Retrieval over a Speech-only Communication Channel","authors":"Johanne R. Trippas","doi":"10.1145/2766462.2767850","DOIUrl":"https://doi.org/10.1145/2766462.2767850","url":null,"abstract":"This research is investigating a new interaction paradigm for Interactive Information Retrieval (IIR), where all input and output is mediated via speech. While such information systems have been important for the visually impaired for many years, a renewed focus on speech is driven by the growing sales of internet enabled mobile devices. Presenting search results over a speech-only communication channel involves a number of challenges for users due to cognitive limitations and the serial nature of the audio channel [2]. Other research has shown that one cannot just ‘bolt on’ speech recognizers and screen readers to an existing system [5]. Therefore the aim of this research is to develop a new framework for effective and efficient IIR over a speech-only channel: a Spoken Conversational Search System (SCSS) which provides a conversational approach to determining user information needs, presenting results and enabling search reformulations. This research will go beyond current Voice Search approaches by aiming for a greater integration between document search and conversational dialogue processes in order to provide a more efficient and effective search experience when using a SCSS. We will also investigate an information seeking model for audio and language models. Presenting a Search Engine Result Page (SERP) over a speechonly communication channel presents a number of challenges, e.g., the textual component of a standard search results list has been shown to be ineffectual [4]. The transient nature of speech poses problems due to memory constraints, and makes the possibility of “skimming” back and forth over a list of results (a standard process in browsing a visual list) difficult. These issues are greatly exacerbated when the result being sought is further down the list. This research will advance the knowledge base by: Providing an understanding of which strategies and IIR techniques for SCSS are best for users. Defining novel technologies for contextual conversational interaction with a large collection of unstructured documents that supports effective search over a speech-only communication channel (audio). Determining new methods for providing summary-based resultpresentation for unstructured documents.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130796587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Modularity-Based Query Clustering for Identifying Users Sharing a Common Condition 基于模块化的查询聚类识别共享共同条件的用户
Maayan Harel, E. Yom-Tov
We present an algorithm for identifying users who share a common condition from anonymized search engine logs. Input to the algorithm is a set of seed phrases that identify users with the condition of interest with high precision albeit at a very low recall. We expand the set of seed phrases by clustering queries according to the pages users clicked following these queries and the temporal ordering of queries within sessions, emphasizing the subgraph containing seed phrases. To this end, we extend modularity-based clustering such that it uses the information in the initial seed phrases as well as other queries of users in the population of interest. We evaluate the performance of the proposed method on two datasets, one of mood disorders and the other of anorexia, by classifying users according to the clusters in which they appeared and the phrases contained thereof, and show that the area under the receiver operating characteristic curve (AUC) obtained by these methods exceeds 0.87. These results demonstrate the value of our algorithm for both identifying users for future research and to gain better understanding of the language associated with the condition.
我们提出了一种算法,用于从匿名搜索引擎日志中识别具有共同条件的用户。该算法的输入是一组种子短语,这些种子短语可以高精度地识别具有感兴趣条件的用户,尽管召回率非常低。我们通过根据用户在这些查询之后点击的页面和会话内查询的时间顺序对查询进行聚类来扩展种子短语集,强调包含种子短语的子图。为此,我们扩展了基于模块化的聚类,使其使用初始种子短语中的信息以及感兴趣的总体中用户的其他查询。我们根据用户出现的聚类及其包含的短语对用户进行分类,评估了所提出方法在情绪障碍和厌食症两个数据集上的性能,结果表明,这些方法获得的接收者工作特征曲线(AUC)下面积超过0.87。这些结果证明了我们的算法的价值,既可以为未来的研究识别用户,也可以更好地理解与该条件相关的语言。
{"title":"Modularity-Based Query Clustering for Identifying Users Sharing a Common Condition","authors":"Maayan Harel, E. Yom-Tov","doi":"10.1145/2766462.2767798","DOIUrl":"https://doi.org/10.1145/2766462.2767798","url":null,"abstract":"We present an algorithm for identifying users who share a common condition from anonymized search engine logs. Input to the algorithm is a set of seed phrases that identify users with the condition of interest with high precision albeit at a very low recall. We expand the set of seed phrases by clustering queries according to the pages users clicked following these queries and the temporal ordering of queries within sessions, emphasizing the subgraph containing seed phrases. To this end, we extend modularity-based clustering such that it uses the information in the initial seed phrases as well as other queries of users in the population of interest. We evaluate the performance of the proposed method on two datasets, one of mood disorders and the other of anorexia, by classifying users according to the clusters in which they appeared and the phrases contained thereof, and show that the area under the receiver operating characteristic curve (AUC) obtained by these methods exceeds 0.87. These results demonstrate the value of our algorithm for both identifying users for future research and to gain better understanding of the language associated with the condition.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127845348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tailoring Music Recommendations to Users by Considering Diversity, Mainstreaminess, and Novelty 通过考虑多样性、主流性和新颖性,为用户量身定制音乐推荐
M. Schedl, D. Hauger
A shortcoming of current approaches for music recommendation is that they consider user-specific characteristics only on a very simple level, typically as some kind of interaction between users and items when employing collaborative filtering. To alleviate this issue, we propose several user features that model aspects of the user's music listening behavior: diversity, mainstreaminess, and novelty of the user's music taste. To validate the proposed features, we conduct a comprehensive evaluation of a variety of music recommendation approaches (stand-alone and hybrids) on a collection of almost 200 million listening events gathered from propername{Last.fm}. We report first results and highlight cases where our diversity, mainstreaminess, and novelty features can be beneficially integrated into music recommender systems.
当前音乐推荐方法的一个缺点是,它们只在非常简单的层面上考虑用户特定的特征,通常在使用协同过滤时作为用户和项目之间的某种交互。为了缓解这个问题,我们提出了几个用户特征来模拟用户音乐聆听行为的各个方面:用户音乐品味的多样性、主流性和新颖性。为了验证所提出的功能,我们在从propername{Last.fm}收集的近2亿个收听事件的集合上对各种音乐推荐方法(独立和混合)进行了全面评估。我们报告了第一批结果,并强调了我们的多样性、主流性和新颖性可以有效地集成到音乐推荐系统中的案例。
{"title":"Tailoring Music Recommendations to Users by Considering Diversity, Mainstreaminess, and Novelty","authors":"M. Schedl, D. Hauger","doi":"10.1145/2766462.2767763","DOIUrl":"https://doi.org/10.1145/2766462.2767763","url":null,"abstract":"A shortcoming of current approaches for music recommendation is that they consider user-specific characteristics only on a very simple level, typically as some kind of interaction between users and items when employing collaborative filtering. To alleviate this issue, we propose several user features that model aspects of the user's music listening behavior: diversity, mainstreaminess, and novelty of the user's music taste. To validate the proposed features, we conduct a comprehensive evaluation of a variety of music recommendation approaches (stand-alone and hybrids) on a collection of almost 200 million listening events gathered from propername{Last.fm}. We report first results and highlight cases where our diversity, mainstreaminess, and novelty features can be beneficially integrated into music recommender systems.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131673558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Joint Matrix Factorization and Manifold-Ranking for Topic-Focused Multi-Document Summarization 面向主题的多文档摘要的联合矩阵分解与流形排序
Jiwei Tan, Xiaojun Wan, Jianguo Xiao
Manifold-ranking has proved to be an effective method for topic-focused multi-document summarization. As basic manifold-ranking based summarization method constructs the relationships between sentences simply by the bag-of-words cosine similarity, we believe a better similarity metric will further improve the effectiveness of manifold-ranking. In this paper, we propose a joint optimization framework, which integrates the manifold-ranking process with a similarity metric learning process. The joint framework aims at learning better sentence similarity scores and better sentence ranking scores simultaneously. Experiments on DUC datasets show the proposed joint method achieves better performance than the manifold-ranking baselines and several popular methods.
流形排序已被证明是一种有效的多文档主题摘要方法。由于基于基本流形排序的摘要方法仅仅通过词袋余弦相似度来构建句子之间的关系,我们认为更好的相似度度量将进一步提高流形排序的有效性。在本文中,我们提出了一个联合优化框架,该框架将流形排序过程与相似度度量学习过程相结合。联合框架旨在同时学习更好的句子相似度分数和更好的句子排名分数。在DUC数据集上的实验表明,该联合方法比流形排序基线和几种常用方法具有更好的性能。
{"title":"Joint Matrix Factorization and Manifold-Ranking for Topic-Focused Multi-Document Summarization","authors":"Jiwei Tan, Xiaojun Wan, Jianguo Xiao","doi":"10.1145/2766462.2767765","DOIUrl":"https://doi.org/10.1145/2766462.2767765","url":null,"abstract":"Manifold-ranking has proved to be an effective method for topic-focused multi-document summarization. As basic manifold-ranking based summarization method constructs the relationships between sentences simply by the bag-of-words cosine similarity, we believe a better similarity metric will further improve the effectiveness of manifold-ranking. In this paper, we propose a joint optimization framework, which integrates the manifold-ranking process with a similarity metric learning process. The joint framework aims at learning better sentence similarity scores and better sentence ranking scores simultaneously. Experiments on DUC datasets show the proposed joint method achieves better performance than the manifold-ranking baselines and several popular methods.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128799225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Improving Search using Proximity-Based Statistics 使用基于接近度的统计改进搜索
Xiaolu Lu
Modern retrieval systems often use more sophisticated ranking models. Although more new features are added in, term proximity has been studied for a long time, and still plays an important role. A recent study by Huston and Croft [2] shows that many-term dependency is a better choice for a large corpus and long queries. However, utilizing proximity-based features often leads to computational overhead, and most of the existing solutions are tailored to term pairs. Fewer studies have focused on many-term proximity computation, and the plane-sweep approach proposed by Sadakane and Imai [6] is still state-of-the-art. Consider a multi-pass retrieval process where the proximity features could be an effective first pass ranker if we can reduce the cost of the proximity calculation. In this PhD project, we consider the following questions: (i) How important are the proximity statistics in the term dependency models and what is the cost of extracting the proximity features? (ii) Although all term dependencies are considered in ranking models, can we design an early termination strategy considering only partial proximity? Moreover, instead of viewing the term from the same level, can we utilizing its locality for obtaining more efficiency? (iii) How do we best organize the term proximity statistics to be more indexable, facilitating the extraction process? (iv) How do we best define the approximation form of term proximity in order to find the best trade-off between effectiveness and efficiency? In a preliminary experimental study, Lu et al. [3] compare how different term dependency components affect the entire ranking models show that although the phrase component helps to improve the effectiveness in an overall sense, it degrades dramatically on Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author(s). Copyright is held by the owner/author(s). SIGIR’15, August 09 13, 2015, Santiago, Chile. ACM 978-1-4503-3621-5/15/08. DOI:http://dx.doi.org/10.1145/2766462.2767847. some queries. Although the proximity part doesn’t always improve the effectiveness, it is more stable. From the computational perspective, we have found that extracting single term dependency proximity using the plane-sweep algorithm is not a bottleneck. But it is a computational intensive job when processing each dependency feature separately. However, the extra cost of considering proximity independently can be reduced by extracting all dependencies together [4]. Further, since most of retrieval systems keep both a direct file and an inverted file, it is possible to exploit both representation to maximize the efficiency. Although the cost of extracting proxim
现代检索系统通常使用更复杂的排序模型。虽然有更多的新特征加入,但术语接近性研究已经进行了很长时间,并且仍然发挥着重要作用。休斯顿和克罗夫特最近的一项研究[2]表明,对于大型语料库和长查询,多术语依赖是更好的选择。然而,利用基于接近度的特征通常会导致计算开销,并且大多数现有的解决方案都是针对术语对进行定制的。关注多项接近计算的研究较少,Sadakane和Imai[6]提出的平面扫描方法仍然是最先进的。考虑一个多遍检索过程,如果我们可以减少接近计算的成本,那么接近特征可以是一个有效的第一遍排序器。在这个博士项目中,我们考虑了以下问题:(i)术语依赖模型中的接近统计量有多重要,提取接近特征的成本是多少?(ii)尽管在排序模型中考虑了所有的术语依赖关系,我们能否设计一个只考虑部分接近性的早期终止策略?此外,我们是否可以利用其局部性来获得更高的效率,而不是从同一层次来看待术语?(iii)我们如何最好地组织术语接近统计,使其更易于索引,从而促进提取过程?(iv)我们如何最好地定义术语接近的近似形式,以便在有效性和效率之间找到最佳的权衡?在初步的实验研究中,Lu等人[3]比较了不同的词依赖成分对整个排序模型的影响,结果表明,尽管短语成分在整体上有助于提高排序的有效性,如果不以盈利或商业利益为目的制作或分发副本,并且副本在第一页上带有此通知和完整的引用,则允许免费制作部分或全部作品的数字或硬拷贝供个人或课堂使用。本作品的第三方组件的版权必须得到尊重。对于所有其他用途,请联系所有者/作者。版权由所有人/作者持有。2015年8月13日,智利圣地亚哥。ACM 978 - 1 - 4503 - 3621 - 5/15/08。DOI: http://dx.doi.org/10.1145/2766462.2767847。一些查询。虽然邻近部分并不总是提高有效性,但它更稳定。从计算的角度来看,我们发现使用平面扫描算法提取单项依赖邻近并不是一个瓶颈。但是,当单独处理每个依赖特性时,这是一个计算密集型的工作。然而,独立考虑邻近性的额外成本可以通过一起提取所有依赖项来减少[4]。此外,由于大多数检索系统同时保留了一个直接文件和一个反向文件,因此可以利用这两种表示来最大化效率。尽管与单独计算相比,提取接近特征的成本可以降低,但在处理包含长文档的频繁查询词时,效率仍然较低。考虑到所有术语依赖特征的特点,存在一些冗余信息。因此,我们有可能通过仅计算部分接近特征而不是提取全部接近特征来设计秩安全的早期终止策略。启发式地,为了实现这一点,我们可以通过考虑TF和IDF值,将邻近提取问题映射为加权区间排序。除了早期终止方法外,寻找一种可索引的邻近统计量组织方法也是值得研究的。提取过程中的planessweep方法揭示了构建辅助结构以增强现有索引结构的可能性。通常,排序模型会捕捉到术语之间的精确距离,但定义会有所不同。Clarke等人[1]考虑了封面和封面集的分数,而Metzler和Croft[5]使用了无序窗口。但是捕获实际距离对空间和查询时间都有更高的要求。除了计算项之间的距离,还可以考虑近似的接近度。特别是当使用邻近特性作为有效的第一次通过排名时,效率将被视为更高的优先级,而不会牺牲太多的有效性。通过适当的近似,可以在给定的有效性阈值内优化效率。
{"title":"Improving Search using Proximity-Based Statistics","authors":"Xiaolu Lu","doi":"10.1145/2766462.2767847","DOIUrl":"https://doi.org/10.1145/2766462.2767847","url":null,"abstract":"Modern retrieval systems often use more sophisticated ranking models. Although more new features are added in, term proximity has been studied for a long time, and still plays an important role. A recent study by Huston and Croft [2] shows that many-term dependency is a better choice for a large corpus and long queries. However, utilizing proximity-based features often leads to computational overhead, and most of the existing solutions are tailored to term pairs. Fewer studies have focused on many-term proximity computation, and the plane-sweep approach proposed by Sadakane and Imai [6] is still state-of-the-art. Consider a multi-pass retrieval process where the proximity features could be an effective first pass ranker if we can reduce the cost of the proximity calculation. In this PhD project, we consider the following questions: (i) How important are the proximity statistics in the term dependency models and what is the cost of extracting the proximity features? (ii) Although all term dependencies are considered in ranking models, can we design an early termination strategy considering only partial proximity? Moreover, instead of viewing the term from the same level, can we utilizing its locality for obtaining more efficiency? (iii) How do we best organize the term proximity statistics to be more indexable, facilitating the extraction process? (iv) How do we best define the approximation form of term proximity in order to find the best trade-off between effectiveness and efficiency? In a preliminary experimental study, Lu et al. [3] compare how different term dependency components affect the entire ranking models show that although the phrase component helps to improve the effectiveness in an overall sense, it degrades dramatically on Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author(s). Copyright is held by the owner/author(s). SIGIR’15, August 09 13, 2015, Santiago, Chile. ACM 978-1-4503-3621-5/15/08. DOI:http://dx.doi.org/10.1145/2766462.2767847. some queries. Although the proximity part doesn’t always improve the effectiveness, it is more stable. From the computational perspective, we have found that extracting single term dependency proximity using the plane-sweep algorithm is not a bottleneck. But it is a computational intensive job when processing each dependency feature separately. However, the extra cost of considering proximity independently can be reduced by extracting all dependencies together [4]. Further, since most of retrieval systems keep both a direct file and an inverted file, it is possible to exploit both representation to maximize the efficiency. Although the cost of extracting proxim","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128811840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Session details: Session 2C: Graphs 会话详细信息:会话2C:图形
J. Kamps
{"title":"Session details: Session 2C: Graphs","authors":"J. Kamps","doi":"10.1145/3255920","DOIUrl":"https://doi.org/10.1145/3255920","url":null,"abstract":"","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"10 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120912915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Opportunities to Facilitate Serendipity in Search 探索机会,促进搜索中的意外发现
Ataur Rahman, Max L. Wilson
Serendipitously discovering new information can bring many benefits. Although we can design systems to highlight serendipitous information, serendipity cannot be easily orchestrated and is thus hard to study. In this paper, we deployed a working search engine that matched search results with Facebook 'Like' data, as a technology probe to examine naturally occurring serendipitous discoveries. Search logs and diary entries revealed the nature of these occasions in both leisure and work contexts. The findings support the use of the micro-serendipity model in search system design.
偶然发现新信息可以带来很多好处。尽管我们可以设计系统来突出偶然性信息,但偶然性不容易被安排,因此很难研究。在本文中,我们部署了一个工作搜索引擎,将搜索结果与Facebook的“喜欢”数据相匹配,作为一种技术探针来检查自然发生的偶然发现。搜索日志和日记条目揭示了这些场合在休闲和工作环境中的性质。研究结果支持在搜索系统设计中使用微意外发现模型。
{"title":"Exploring Opportunities to Facilitate Serendipity in Search","authors":"Ataur Rahman, Max L. Wilson","doi":"10.1145/2766462.2767783","DOIUrl":"https://doi.org/10.1145/2766462.2767783","url":null,"abstract":"Serendipitously discovering new information can bring many benefits. Although we can design systems to highlight serendipitous information, serendipity cannot be easily orchestrated and is thus hard to study. In this paper, we deployed a working search engine that matched search results with Facebook 'Like' data, as a technology probe to examine naturally occurring serendipitous discoveries. Search logs and diary entries revealed the nature of these occasions in both leisure and work contexts. The findings support the use of the micro-serendipity model in search system design.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116229181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1