首页 > 最新文献

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

英文 中文
Session details: Session 8B: Citations 会话详细信息:会话8B:引文
M. Sanderson
{"title":"Session details: Session 8B: Citations","authors":"M. Sanderson","doi":"10.1145/3255937","DOIUrl":"https://doi.org/10.1145/3255937","url":null,"abstract":"","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124078805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Session 3B: Social Media 会议详情:第三部分:社交媒体
C. Hauff
{"title":"Session details: Session 3B: Social Media","authors":"C. Hauff","doi":"10.1145/3255922","DOIUrl":"https://doi.org/10.1145/3255922","url":null,"abstract":"","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126544339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Unlabelled Tweets to Twitter-specific Opinion Words 从未标记的推文到推特特定的意见词
Felipe Bravo-Marquez, E. Frank, B. Pfahringer
In this article, we propose a word-level classification model for automatically generating a Twitter-specific opinion lexicon from a corpus of unlabelled tweets. The tweets from the corpus are represented by two vectors: a bag-of-words vector and a semantic vector based on word-clusters. We propose a distributional representation for words by treating them as the centroids of the tweet vectors in which they appear. The lexicon generation is conducted by training a word-level classifier using these centroids to form the instance space and a seed lexicon to label the training instances. Experimental results show that the two types of tweet vectors complement each other in a statistically significant manner and that our generated lexicon produces significant improvements for tweet-level polarity classification.
在本文中,我们提出了一个词级分类模型,用于从未标记的tweet语料库中自动生成特定于twitter的观点词典。语料库中的推文由两个向量表示:词袋向量和基于词簇的语义向量。我们提出了一个词的分布表示,将它们视为它们出现的推文向量的质心。词汇生成是通过使用这些质心来训练一个词级分类器来形成实例空间和一个种子词汇来标记训练实例来完成的。实验结果表明,两种类型的推文向量在统计上具有显著的互补性,并且我们生成的词典对推文级极性分类产生了显著的改进。
{"title":"From Unlabelled Tweets to Twitter-specific Opinion Words","authors":"Felipe Bravo-Marquez, E. Frank, B. Pfahringer","doi":"10.1145/2766462.2767770","DOIUrl":"https://doi.org/10.1145/2766462.2767770","url":null,"abstract":"In this article, we propose a word-level classification model for automatically generating a Twitter-specific opinion lexicon from a corpus of unlabelled tweets. The tweets from the corpus are represented by two vectors: a bag-of-words vector and a semantic vector based on word-clusters. We propose a distributional representation for words by treating them as the centroids of the tweet vectors in which they appear. The lexicon generation is conducted by training a word-level classifier using these centroids to form the instance space and a seed lexicon to label the training instances. Experimental results show that the two types of tweet vectors complement each other in a statistically significant manner and that our generated lexicon produces significant improvements for tweet-level polarity classification.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132008712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Web Question Answering: Beyond Factoids: SIGIR 2015 Workshop 网络问答:超越Factoids: SIGIR 2015研讨会
Eugene Agichtein, David Carmel, C. Clarke, Praveen K. Paritosh, D. Pelleg, Idan Szpektor
Automatic question answering is a central topic in information retrieval. Web search engines have made great progress at answering factoid queries, such as “how many people live in Australia?”. These can provide a succinct answer, up to a few words in length, and sometimes offer additional information such as related facts or entities. However, for deeper questions which could benefit from a longer response (e.g., “history of Australia”), current search engine resort to returning a link to a detailed web document. Alternatively, such a question might be posted on a Community Question Answering (CQA) site (“Visiting Australia in May, what should I see?”), hoping to get a human authored and detailed response. In this workshop we aim to explore the boundaries of Web question answering to better understand the spectrum of approaches and possible responses that are more detailed than a short fact, yet are more useful than a full document result. Is it possible to automatically answer diverse questions ranging from advice on fixing a broken sink to requests for opinions on the best basketball player of all time. In addition, questions submitted on the Web can be either short and ambiguous (such as Web queries to a search engine), or long and detailed (such as CQA questions). This workshop is particularly timely for two additional reasons: (1) there still exist many disagreements regarding the goals and nature of Web question answering services, mostly relating to the questions of “question intent” (what kind of queries benefit from question answering compared to other methods); and (2) leading search engines are eager to provide
自动问答是信息检索中的一个核心问题。网络搜索引擎在回答虚假查询方面取得了巨大进步,比如“澳大利亚有多少人口?”这可以提供一个简洁的答案,最多几个字的长度,有时还提供额外的信息,如相关的事实或实体。然而,对于更深入的问题,可以从更长的回答中获益(例如,“澳大利亚的历史”),目前的搜索引擎会返回一个链接到一个详细的网络文档。或者,这样的问题可以发布在社区问题回答(CQA)网站上(“5月访问澳大利亚,我应该看到什么?”),希望得到一个人工撰写的详细回答。在本次研讨会中,我们的目标是探索Web问题回答的边界,以更好地理解各种方法和可能的回答,这些方法和回答比简短的事实更详细,但比完整的文档结果更有用。是否有可能自动回答各种各样的问题,从修理坏水槽的建议到对有史以来最好的篮球运动员的意见。此外,在Web上提交的问题可以是简短而模棱两可的(例如对搜索引擎的Web查询),也可以是冗长而详细的(例如CQA问题)。这次研讨会特别及时,还有另外两个原因:(1)关于Web问答服务的目标和性质仍然存在许多分歧,主要是关于“问题意图”的问题(与其他方法相比,什么样的查询从问答中受益);(2)领先的搜索引擎渴望提供
{"title":"Web Question Answering: Beyond Factoids: SIGIR 2015 Workshop","authors":"Eugene Agichtein, David Carmel, C. Clarke, Praveen K. Paritosh, D. Pelleg, Idan Szpektor","doi":"10.1145/2766462.2767861","DOIUrl":"https://doi.org/10.1145/2766462.2767861","url":null,"abstract":"Automatic question answering is a central topic in information retrieval. Web search engines have made great progress at answering factoid queries, such as “how many people live in Australia?”. These can provide a succinct answer, up to a few words in length, and sometimes offer additional information such as related facts or entities. However, for deeper questions which could benefit from a longer response (e.g., “history of Australia”), current search engine resort to returning a link to a detailed web document. Alternatively, such a question might be posted on a Community Question Answering (CQA) site (“Visiting Australia in May, what should I see?”), hoping to get a human authored and detailed response. In this workshop we aim to explore the boundaries of Web question answering to better understand the spectrum of approaches and possible responses that are more detailed than a short fact, yet are more useful than a full document result. Is it possible to automatically answer diverse questions ranging from advice on fixing a broken sink to requests for opinions on the best basketball player of all time. In addition, questions submitted on the Web can be either short and ambiguous (such as Web queries to a search engine), or long and detailed (such as CQA questions). This workshop is particularly timely for two additional reasons: (1) there still exist many disagreements regarding the goals and nature of Web question answering services, mostly relating to the questions of “question intent” (what kind of queries benefit from question answering compared to other methods); and (2) leading search engines are eager to provide","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130217262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
NeuroIR 2015: Neuro-Physiological Methods in IR Research NeuroIR 2015: IR研究中的神经生理学方法
J. Gwizdka, J. Jose, Javed Mostafa, Max L. Wilson
This Tutorial+Workshop will discuss opportunities and challenges involved in using neuro-physiological tools/techniques (such as fMRI, fNIRS, EEG, eye-tracking, GSR, HR, and facial expressions) and theories in information retrieval. The hybrid format will engage researchers and students at different levels of expertise, from those who are active in this area to those who are interested and want to learn more. The workshop will combine presentations, discussions and tutorial elements and consist of four segments (tutorial, completed research, work-in-progress, closing panel).
本课程将讨论使用神经生理学工具/技术(如fMRI, fNIRS, EEG,眼动追踪,GSR, HR和面部表情)和信息检索理论所涉及的机遇和挑战。这种混合形式将吸引不同专业水平的研究人员和学生,从活跃在这一领域的人到感兴趣并希望了解更多的人。研讨会将结合演讲、讨论和指导元素,分为四个部分(指导、已完成的研究、正在进行的工作、闭幕小组讨论)。
{"title":"NeuroIR 2015: Neuro-Physiological Methods in IR Research","authors":"J. Gwizdka, J. Jose, Javed Mostafa, Max L. Wilson","doi":"10.1145/2766462.2767856","DOIUrl":"https://doi.org/10.1145/2766462.2767856","url":null,"abstract":"This Tutorial+Workshop will discuss opportunities and challenges involved in using neuro-physiological tools/techniques (such as fMRI, fNIRS, EEG, eye-tracking, GSR, HR, and facial expressions) and theories in information retrieval. The hybrid format will engage researchers and students at different levels of expertise, from those who are active in this area to those who are interested and want to learn more. The workshop will combine presentations, discussions and tutorial elements and consist of four segments (tutorial, completed research, work-in-progress, closing panel).","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134594409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Introduction to Click Models for Web Search: SIGIR 2015 Tutorial Web搜索的点击模型介绍:SIGIR 2015教程
A. Chuklin, I. Markov, M. de Rijke
In this introductory tutorial we give an overview of click models for web search. We show how the framework of probabilistic graphical models help to explain user behavior, build new evaluation metrics and perform simulations. The tutorial is augmented with a live demo where participants have a chance to implement a click model and to test it on a publicly available dataset.
在这篇介绍性教程中,我们概述了网络搜索的点击模型。我们展示了概率图形模型框架如何帮助解释用户行为,构建新的评估指标并执行模拟。本教程增加了一个现场演示,参与者有机会实现一个点击模型,并在一个公开可用的数据集上进行测试。
{"title":"An Introduction to Click Models for Web Search: SIGIR 2015 Tutorial","authors":"A. Chuklin, I. Markov, M. de Rijke","doi":"10.1145/2766462.2767881","DOIUrl":"https://doi.org/10.1145/2766462.2767881","url":null,"abstract":"In this introductory tutorial we give an overview of click models for web search. We show how the framework of probabilistic graphical models help to explain user behavior, build new evaluation metrics and perform simulations. The tutorial is augmented with a live demo where participants have a chance to implement a click model and to test it on a publicly available dataset.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133927528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
CricketLinking: Linking Event Mentions from Cricket Match Reports to Ball Entities in Commentaries CricketLinking:将板球比赛报告中的事件提到链接到评论中的球实体
Manish Gupta
The 2011 Cricket World Cup final match was watched by around 135 million people. Such a huge viewership demands a great experience for users of online cricket portals. Many portals like espncricinfo.com host a variety of content related to recent matches including match reports and ball-by-ball commentaries. When reading a match report, reader experience can be significantly improved by augmenting (on demand) the event mentions in the report with detailed commentaries. We build an event linking system emph{CricketLinking} which first identifies event mentions from the reports and then links them to a set of balls. Finding linkable mentions is challenging because unlike entity linking problem settings, we do not have a concrete set of event entities to link to. Further, depending on the event type, event mentions could be linked to a single ball, or to a set of balls. Hence, identifying mention type as well as linking becomes challenging. We use a large number of domain specific features to learn classifiers for mention and mention type detection. Further, we leverage structured match, context similarity and sequential proximity to perform accurate linking. Finally, context based summarization is performed to provide a concise briefing of linked balls to each mention.
2011年板球世界杯决赛有大约1.35亿人观看。如此庞大的收视率要求在线板球门户网站的用户有良好的体验。许多像espncricinfo.com这样的门户网站都提供各种与最近比赛相关的内容,包括比赛报告和逐球解说。在阅读比赛报告时,通过(按需)增加报告中提到的事件的详细评论,可以显著改善读者的体验。我们建立了一个事件链接系统emph{CricketLinking},它首先从报告中识别事件提及,然后将它们链接到一组球。寻找可链接的提及是具有挑战性的,因为与实体链接问题设置不同,我们没有一组具体的事件实体来链接。此外,根据事件类型,事件提及可以链接到单个球,也可以链接到一组球。因此,识别提及类型和链接变得很有挑战性。我们使用大量的领域特定特征来学习分类器,用于提及和提及类型检测。此外,我们利用结构化匹配、上下文相似度和顺序接近度来执行准确的链接。最后,执行基于上下文的摘要,以提供与每个提及相关的球的简明介绍。
{"title":"CricketLinking: Linking Event Mentions from Cricket Match Reports to Ball Entities in Commentaries","authors":"Manish Gupta","doi":"10.1145/2766462.2767865","DOIUrl":"https://doi.org/10.1145/2766462.2767865","url":null,"abstract":"The 2011 Cricket World Cup final match was watched by around 135 million people. Such a huge viewership demands a great experience for users of online cricket portals. Many portals like espncricinfo.com host a variety of content related to recent matches including match reports and ball-by-ball commentaries. When reading a match report, reader experience can be significantly improved by augmenting (on demand) the event mentions in the report with detailed commentaries. We build an event linking system emph{CricketLinking} which first identifies event mentions from the reports and then links them to a set of balls. Finding linkable mentions is challenging because unlike entity linking problem settings, we do not have a concrete set of event entities to link to. Further, depending on the event type, event mentions could be linked to a single ball, or to a set of balls. Hence, identifying mention type as well as linking becomes challenging. We use a large number of domain specific features to learn classifiers for mention and mention type detection. Further, we leverage structured match, context similarity and sequential proximity to perform accurate linking. Finally, context based summarization is performed to provide a concise briefing of linked balls to each mention.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133934077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking 网页前沿排名中搜索影响优化的随机游走模型
G. Tran, Ata Turk, B. B. Cambazoglu, W. Nejdl
Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.
大型网络搜索引擎需要不断地抓取网络来发现和下载新创建的网络内容。发现新内容的速度和发现内容的质量对搜索引擎提供的结果的覆盖范围和质量有很大的影响。在本文中,我们提出了一个以搜索为中心的解决方案,以解决爬虫下载边界页面的优先级问题。我们的方法本质上是通过随机游走模型对边界的网页进行排序,该模型考虑了网页对用户感知的搜索质量的潜在影响。此外,我们提出了一种扩展该解决方案的链接图富集技术。最后,我们探索了一种结合不同前沿优先排序方法的机器学习方法。我们使用两个非常大的、真实的网络数据集来进行实验,观察各种搜索质量指标。与几种基线技术的比较表明,所提出的方法具有显著提高用户感知的网络搜索结果质量的潜力。
{"title":"A Random Walk Model for Optimization of Search Impact in Web Frontier Ranking","authors":"G. Tran, Ata Turk, B. B. Cambazoglu, W. Nejdl","doi":"10.1145/2766462.2767737","DOIUrl":"https://doi.org/10.1145/2766462.2767737","url":null,"abstract":"Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131518655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
IR Evaluation: Designing an End-to-End Offline Evaluation Pipeline IR评估:设计一个端到端的离线评估管道
Jin Young Kim, Emine Yilmaz
This tutorial aims to provide attendees with a detailed understanding of end-to-end evaluation pipeline based on human judgments (offline measurement). The tutorial will give an overview of the state of the art methods, techniques, and metrics necessary for each stage of evaluation process. We will mostly focus on evaluating an information retrieval (search) system, but the other tasks such as recommendation and classification will also be discussed. Practical examples will be drawn both from the literature and from real world usage scenarios in industry.
本教程旨在为与会者提供基于人类判断(离线测量)的端到端评估管道的详细了解。本教程将概述评估过程的每个阶段所需的最新方法、技术和度量标准。我们将主要集中在评估一个信息检索(搜索)系统,但其他任务,如推荐和分类也将讨论。实际的例子将从文献和现实世界的工业使用场景中绘制。
{"title":"IR Evaluation: Designing an End-to-End Offline Evaluation Pipeline","authors":"Jin Young Kim, Emine Yilmaz","doi":"10.1145/2766462.2767875","DOIUrl":"https://doi.org/10.1145/2766462.2767875","url":null,"abstract":"This tutorial aims to provide attendees with a detailed understanding of end-to-end evaluation pipeline based on human judgments (offline measurement). The tutorial will give an overview of the state of the art methods, techniques, and metrics necessary for each stage of evaluation process. We will mostly focus on evaluating an information retrieval (search) system, but the other tasks such as recommendation and classification will also be discussed. Practical examples will be drawn both from the literature and from real world usage scenarios in industry.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125197979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search 基于语义置信度的半监督哈希大规模视觉搜索
Yingwei Pan, Ting Yao, Houqiang Li, C. Ngo, Tao Mei
Similarity search is one of the fundamental problems for large scale multimedia applications. Hashing techniques, as one popular strategy, have been intensively investigated owing to the speed and memory efficiency. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, most existing supervised methods learn hashing function by treating each training example equally while ignoring the different semantic degree related to the label, i.e. semantic confidence, of different examples. In this paper, we propose a novel semi-supervised hashing framework by leveraging semantic confidence. Specifically, a confidence factor is first assigned to each example by neighbor voting and click count in the scenarios with label and click-through data, respectively. Then, the factor is incorporated into the pairwise and triplet relationship learning for hashing. Furthermore, the two learnt relationships are seamlessly encoded into semi-supervised hashing methods with pairwise and listwise supervision respectively, which are formulated as minimizing empirical error on the labeled data while maximizing the variance of hash bits or minimizing quantization loss over both the labeled and unlabeled data. In addition, the kernelized variant of semi-supervised hashing is also presented. We have conducted experiments on both CIFAR-10 (with label) and Clickture (with click data) image benchmarks (up to one million image examples), demonstrating that our approaches outperform the state-of-the-art hashing techniques.
相似度搜索是大规模多媒体应用的基本问题之一。哈希技术作为一种流行的策略,由于其速度和内存效率而得到了广泛的研究。最近的研究表明,利用受监督的信息可以产生高质量的散列。然而,现有的大多数监督式方法学习哈希函数都是平等对待每个训练样例,而忽略了不同样例与标签相关的不同语义程度,即语义置信度。在本文中,我们利用语义置信度提出了一种新的半监督哈希框架。具体来说,首先通过邻居投票和点击计数分别在具有标签和点击通过数据的场景中为每个示例分配一个置信度因子。然后,将该因子纳入到配对和三元关系学习中进行哈希。此外,这两种学习到的关系被无缝编码为分别具有成对和列表监督的半监督哈希方法,其表述为最小化标记数据上的经验误差,同时最大化哈希位的方差或最小化标记和未标记数据上的量化损失。此外,还提出了半监督哈希算法的核化变体。我们在CIFAR-10(带标签)和Clickture(带点击数据)图像基准(多达一百万个图像示例)上进行了实验,证明我们的方法优于最先进的散列技术。
{"title":"Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search","authors":"Yingwei Pan, Ting Yao, Houqiang Li, C. Ngo, Tao Mei","doi":"10.1145/2766462.2767725","DOIUrl":"https://doi.org/10.1145/2766462.2767725","url":null,"abstract":"Similarity search is one of the fundamental problems for large scale multimedia applications. Hashing techniques, as one popular strategy, have been intensively investigated owing to the speed and memory efficiency. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, most existing supervised methods learn hashing function by treating each training example equally while ignoring the different semantic degree related to the label, i.e. semantic confidence, of different examples. In this paper, we propose a novel semi-supervised hashing framework by leveraging semantic confidence. Specifically, a confidence factor is first assigned to each example by neighbor voting and click count in the scenarios with label and click-through data, respectively. Then, the factor is incorporated into the pairwise and triplet relationship learning for hashing. Furthermore, the two learnt relationships are seamlessly encoded into semi-supervised hashing methods with pairwise and listwise supervision respectively, which are formulated as minimizing empirical error on the labeled data while maximizing the variance of hash bits or minimizing quantization loss over both the labeled and unlabeled data. In addition, the kernelized variant of semi-supervised hashing is also presented. We have conducted experiments on both CIFAR-10 (with label) and Clickture (with click data) image benchmarks (up to one million image examples), demonstrating that our approaches outperform the state-of-the-art hashing techniques.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131051704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
期刊
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1