首页 > 最新文献

Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval最新文献

英文 中文
Combining fields in known-item email search 在已知项目电子邮件搜索组合字段
C. Macdonald, I. Ounis
Emails are examples of structured documents with various fields. These fields can be exploited to enhance the retrieval effectiveness of an Information Retrieval (IR) system that mailing list archives. In recent experiments of the TREC2005 Enterprise track, various fields were applied to varying degrees of success by the participants. In his work, using a field-based weighting model, we investigate the retrieval performance attainable by each field, and examine when fields evidence should be combined or not.
电子邮件是具有各种字段的结构化文档的示例。可以利用这些字段来提高邮件列表存档的信息检索(IR)系统的检索效率。在最近的TREC2005企业跟踪实验中,参与者在不同领域的应用取得了不同程度的成功。在他的工作中,使用基于字段的加权模型,我们研究了每个字段可实现的检索性能,并检查了何时应该合并字段证据或不合并字段证据。
{"title":"Combining fields in known-item email search","authors":"C. Macdonald, I. Ounis","doi":"10.1145/1148170.1148312","DOIUrl":"https://doi.org/10.1145/1148170.1148312","url":null,"abstract":"Emails are examples of structured documents with various fields. These fields can be exploited to enhance the retrieval effectiveness of an Information Retrieval (IR) system that mailing list archives. In recent experiments of the TREC2005 Enterprise track, various fields were applied to varying degrees of success by the participants. In his work, using a field-based weighting model, we investigate the retrieval performance attainable by each field, and examine when fields evidence should be combined or not.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128811426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Regularized estimation of mixture models for robust pseudo-relevance feedback 鲁棒伪相关反馈混合模型的正则化估计
Tao Tao, ChengXiang Zhai
Pseudo-relevance feedback has proven to be an effective strategy for improving retrieval accuracy in all retrieval models. However the performance of existing pseudo feedback methods is often affected significantly by some parameters, such as the number of feedback documents to use and the relative weight of original query terms; these parameters generally have to be set by trial-and-error without any guidance. In this paper, we present a more robust method for pseudo feedback based on statistical language models. Our main idea is to integrate the original query with feedback documents in a single probabilistic mixture model and regularize the estimation of the language model parameters in the model so that the information in the feedback documents can be gradually added to the original query. Unlike most existing feedback methods, our new method has no parameter to tune. Experiment results on two representative data sets show that the new method is significantly more robust than a state-of-the-art baseline language modeling approach for feedback with comparable or better retrieval accuracy.
在各种检索模型中,伪相关反馈已被证明是提高检索精度的有效策略。然而,现有的伪反馈方法的性能往往受到一些参数的显著影响,如要使用的反馈文档的数量和原始查询词的相对权重;这些参数通常必须在没有任何指导的情况下通过反复试验来设置。本文提出了一种基于统计语言模型的伪反馈鲁棒性更好的方法。我们的主要思想是将原始查询和反馈文档集成在一个单一的概率混合模型中,并对模型中语言模型参数的估计进行正则化,从而将反馈文档中的信息逐渐添加到原始查询中。与大多数现有的反馈方法不同,我们的新方法没有参数可调。在两个代表性数据集上的实验结果表明,新方法比最先进的基线语言建模方法具有更强的鲁棒性,并且具有相当或更好的检索精度。
{"title":"Regularized estimation of mixture models for robust pseudo-relevance feedback","authors":"Tao Tao, ChengXiang Zhai","doi":"10.1145/1148170.1148201","DOIUrl":"https://doi.org/10.1145/1148170.1148201","url":null,"abstract":"Pseudo-relevance feedback has proven to be an effective strategy for improving retrieval accuracy in all retrieval models. However the performance of existing pseudo feedback methods is often affected significantly by some parameters, such as the number of feedback documents to use and the relative weight of original query terms; these parameters generally have to be set by trial-and-error without any guidance. In this paper, we present a more robust method for pseudo feedback based on statistical language models. Our main idea is to integrate the original query with feedback documents in a single probabilistic mixture model and regularize the estimation of the language model parameters in the model so that the information in the feedback documents can be gradually added to the original query. Unlike most existing feedback methods, our new method has no parameter to tune. Experiment results on two representative data sets show that the new method is significantly more robust than a state-of-the-art baseline language modeling approach for feedback with comparable or better retrieval accuracy.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117232533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 230
An exploratory web log study of multitasking 多任务处理的探索性网络日志研究
Nikolai Buzikashvili
The Web search multitasking study based on automatic task session detection procedure is described. The results of the study: 1) multitasking is very rare, 2) it usually covers only 2 task sessions, 3) it is frequently formed into a temporal inclusion of an interrupting task session into the interrupted session, 4) the quantitative characteristics of multitasking greatly differ from the characteristics of sequential execution of one and several tasks. A searcher minimizes task switching costs: he avoids multitasking and while multitasking he uses cheapest manner of task switching.
描述了基于任务会话自动检测的Web搜索多任务研究过程。研究结果表明:1)多任务处理是非常罕见的;2)它通常只涵盖2个任务会话;3)它经常被形成一个中断任务会话的时间包含到被中断的会话中;4)多任务处理的数量特征与顺序执行一个或几个任务的特征有很大不同。搜索者最小化任务切换成本:他避免多任务处理,而在多任务处理时,他使用最便宜的任务切换方式。
{"title":"An exploratory web log study of multitasking","authors":"Nikolai Buzikashvili","doi":"10.1145/1148170.1148286","DOIUrl":"https://doi.org/10.1145/1148170.1148286","url":null,"abstract":"The Web search multitasking study based on automatic task session detection procedure is described. The results of the study: 1) multitasking is very rare, 2) it usually covers only 2 task sessions, 3) it is frequently formed into a temporal inclusion of an interrupting task session into the interrupted session, 4) the quantitative characteristics of multitasking greatly differ from the characteristics of sequential execution of one and several tasks. A searcher minimizes task switching costs: he avoids multitasking and while multitasking he uses cheapest manner of task switching.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122634679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Unifying user-based and item-based collaborative filtering approaches by similarity fusion 通过相似性融合统一基于用户和基于项目的协同过滤方法
Jun Wang, A. D. Vries, M. Reinders
Memory-based methods for collaborative filtering predict new ratings by averaging (weighted) ratings between, respectively, pairs of similar users or items. In practice, a large number of ratings from similar users or similar items are not available, due to the sparsity inherent to rating data. Consequently, prediction quality can be poor. This paper re-formulates the memory-based collaborative filtering problem in a generative probabilistic framework, treating individual user-item ratings as predictors of missing ratings. The final rating is estimated by fusing predictions from three sources: predictions based on ratings of the same item by other users, predictions based on different item ratings made by the same user, and, third, ratings predicted based on data from other but similar users rating other but similar items. Existing user-based and item-based approaches correspond to the two simple cases of our framework. The complete model is however more robust to data sparsity, because the different types of ratings are used in concert, while additional ratings from similar users towards similar items are employed as a background model to smooth the predictions. Experiments demonstrate that the proposed methods are indeed more robust against data sparsity and give better recommendations.
基于记忆的协同过滤方法通过平均(加权)相似用户或项目对之间的评分来预测新的评分。在实践中,由于评级数据固有的稀疏性,来自类似用户或类似项目的大量评级是不可用的。因此,预测质量可能很差。本文在生成概率框架中对基于记忆的协同过滤问题进行了重新表述,将单个用户-物品评分作为缺失评分的预测因子。最终评级是通过融合来自三个来源的预测来估计的:基于其他用户对同一商品的评级的预测,基于同一用户对不同商品的评级的预测,以及基于其他但相似的用户对其他但相似的商品的评级的预测。现有的基于用户和基于项目的方法对应于我们框架的两个简单案例。然而,完整的模型对数据稀疏性更健壮,因为不同类型的评级被一致使用,而类似用户对类似项目的额外评级被用作背景模型,以平滑预测。实验表明,该方法对数据稀疏性具有更好的鲁棒性,并给出了更好的推荐。
{"title":"Unifying user-based and item-based collaborative filtering approaches by similarity fusion","authors":"Jun Wang, A. D. Vries, M. Reinders","doi":"10.1145/1148170.1148257","DOIUrl":"https://doi.org/10.1145/1148170.1148257","url":null,"abstract":"Memory-based methods for collaborative filtering predict new ratings by averaging (weighted) ratings between, respectively, pairs of similar users or items. In practice, a large number of ratings from similar users or similar items are not available, due to the sparsity inherent to rating data. Consequently, prediction quality can be poor. This paper re-formulates the memory-based collaborative filtering problem in a generative probabilistic framework, treating individual user-item ratings as predictors of missing ratings. The final rating is estimated by fusing predictions from three sources: predictions based on ratings of the same item by other users, predictions based on different item ratings made by the same user, and, third, ratings predicted based on data from other but similar users rating other but similar items. Existing user-based and item-based approaches correspond to the two simple cases of our framework. The complete model is however more robust to data sparsity, because the different types of ratings are used in concert, while additional ratings from similar users towards similar items are employed as a background model to smooth the predictions. Experiments demonstrate that the proposed methods are indeed more robust against data sparsity and give better recommendations.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117081401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 896
Near-duplicate detection by instance-level constrained clustering 通过实例级约束聚类进行近重复检测
G. Yang, Jamie Callan
For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information retrieval community are not sufficiently accurate. This is due to the fact that the characteristics of near-duplicated documents are different from that of both "almost-identical" documents in the data cleaning task and "relevant" documents in the search task. This paper presents an instance-level constrained clustering approach for near-duplicate detection. The framework incorporates information such as document attributes and content structure into the clustering process to form near-duplicate clusters. Gathered from several collections of public comments sent to U.S. government agencies on proposed new regulations, the experimental results demonstrate that our approach outperforms other near-duplicate detection algorithms and as about as effective as human assessors.
对于近重复文档检测任务,传统的数据库社区指纹识别技术和信息检索社区词袋比对方法都不够准确。这是因为近重复文档的特征不同于数据清理任务中的“几乎相同”文档和搜索任务中的“相关”文档。本文提出了一种用于近重复检测的实例级约束聚类方法。该框架将文档属性和内容结构等信息合并到集群过程中,以形成近乎重复的集群。从向美国政府机构提交的关于拟议新法规的若干公众评论中收集的实验结果表明,我们的方法优于其他近重复检测算法,并且与人类评估器一样有效。
{"title":"Near-duplicate detection by instance-level constrained clustering","authors":"G. Yang, Jamie Callan","doi":"10.1145/1148170.1148243","DOIUrl":"https://doi.org/10.1145/1148170.1148243","url":null,"abstract":"For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information retrieval community are not sufficiently accurate. This is due to the fact that the characteristics of near-duplicated documents are different from that of both \"almost-identical\" documents in the data cleaning task and \"relevant\" documents in the search task. This paper presents an instance-level constrained clustering approach for near-duplicate detection. The framework incorporates information such as document attributes and content structure into the clustering process to form near-duplicate clusters. Gathered from several collections of public comments sent to U.S. government agencies on proposed new regulations, the experimental results demonstrate that our approach outperforms other near-duplicate detection algorithms and as about as effective as human assessors.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115290384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 102
Stylistic text segmentation 文体文本分割
Paul J. Chase, S. Argamon
This paper focuses on a method for the stylistic segmentation of text documents. Our technique involves mapping the change in a feature throughout a text. We use the linguistic features of conjunction and modality, through taxonomies from Systemic Functional Linguistics. This segmentation has applications in automated summarization, particularly of large documents.
本文研究了一种文本文档的文体分词方法。我们的技术包括映射整个文本中特征的变化。通过系统功能语言学的分类,我们使用了连词和情态的语言特征。这种分割在自动摘要中有应用,特别是在大型文档中。
{"title":"Stylistic text segmentation","authors":"Paul J. Chase, S. Argamon","doi":"10.1145/1148170.1148291","DOIUrl":"https://doi.org/10.1145/1148170.1148291","url":null,"abstract":"This paper focuses on a method for the stylistic segmentation of text documents. Our technique involves mapping the change in a feature throughout a text. We use the linguistic features of conjunction and modality, through taxonomies from Systemic Functional Linguistics. This segmentation has applications in automated summarization, particularly of large documents.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"38 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130758783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Using web-graph distance for relevance feedback in web search 在网络搜索中利用网络图距离进行相关性反馈
Sergei Vassilvitskii, Eric Brill
We study the effect of user supplied relevance feedback in improving web search results. Rather than using query refinement or document similarity measures to rerank results, we show that the web-graph distance between two documents is a robust measure of their relative relevancy. We demonstrate how the use of this metric can improve the rankings of result URLs, even when the user only rates one document in the dataset. Our research suggests that such interactive systems can significantly improve search results.
我们研究了用户提供的相关反馈在改善网络搜索结果中的作用。我们表明,两个文档之间的网络图距离是它们相对相关性的稳健度量,而不是使用查询细化或文档相似度度量来重新排序结果。我们演示了如何使用这个指标来提高结果url的排名,即使用户只对数据集中的一个文档进行评级。我们的研究表明,这种交互系统可以显著改善搜索结果。
{"title":"Using web-graph distance for relevance feedback in web search","authors":"Sergei Vassilvitskii, Eric Brill","doi":"10.1145/1148170.1148199","DOIUrl":"https://doi.org/10.1145/1148170.1148199","url":null,"abstract":"We study the effect of user supplied relevance feedback in improving web search results. Rather than using query refinement or document similarity measures to rerank results, we show that the web-graph distance between two documents is a robust measure of their relative relevancy. We demonstrate how the use of this metric can improve the rankings of result URLs, even when the user only rates one document in the dataset. Our research suggests that such interactive systems can significantly improve search results.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128344571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Towards efficient automated singer identification in large music databases 在大型音乐数据库中实现高效的自动歌手识别
Jialie Shen, B. Cui, J. Shepherd, K. Tan
Automated singer identification is important in organising, browsing and retrieving data in large music databases. In this paper, we propose a novel scheme, called Hybrid Singer Identifier (HSI), for automated singer recognition. HSI can effectively use multiple low-level features extracted from both vocal and non-vocal music segments to enhance the identification process with a hybrid architecture and build profiles of individual singer characteristics based on statistical mixture models. Extensive experimental results conducted on a large music database demonstrate the superiority of our method over state-of-the-art approaches.
自动歌手识别对于组织、浏览和检索大型音乐数据库中的数据非常重要。在本文中,我们提出了一种新的方案,称为混合歌手标识符(HSI),用于自动识别歌手。HSI可以有效地利用从人声和非人声音乐片段中提取的多个低级特征,通过混合架构增强识别过程,并基于统计混合模型构建歌手个人特征轮廓。在大型音乐数据库上进行的大量实验结果表明,我们的方法优于最先进的方法。
{"title":"Towards efficient automated singer identification in large music databases","authors":"Jialie Shen, B. Cui, J. Shepherd, K. Tan","doi":"10.1145/1148170.1148184","DOIUrl":"https://doi.org/10.1145/1148170.1148184","url":null,"abstract":"Automated singer identification is important in organising, browsing and retrieving data in large music databases. In this paper, we propose a novel scheme, called Hybrid Singer Identifier (HSI), for automated singer recognition. HSI can effectively use multiple low-level features extracted from both vocal and non-vocal music segments to enhance the identification process with a hybrid architecture and build profiles of individual singer characteristics based on statistical mixture models. Extensive experimental results conducted on a large music database demonstrate the superiority of our method over state-of-the-art approaches.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131571238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Question classification with log-linear models 用对数线性模型进行问题分类
Phil Blunsom, K. Kocik, J. Curran
Question classification has become a crucial step in modern question answering systems. Previous work has demonstrated the effectiveness of statistical machine learning approaches to this problem. This paper presents a new approach to building a question classifier using log-linear models. Evidence from a rich and diverse set of syntactic and semantic features is evaluated, as well as approaches which exploit the hierarchical structure of the question classes.
问题分类已经成为现代问答系统中至关重要的一步。以前的工作已经证明了统计机器学习方法在这个问题上的有效性。本文提出了一种利用对数线性模型构建问题分类器的新方法。来自丰富多样的句法和语义特征的证据被评估,以及利用问题类的层次结构的方法。
{"title":"Question classification with log-linear models","authors":"Phil Blunsom, K. Kocik, J. Curran","doi":"10.1145/1148170.1148282","DOIUrl":"https://doi.org/10.1145/1148170.1148282","url":null,"abstract":"Question classification has become a crucial step in modern question answering systems. Previous work has demonstrated the effectiveness of statistical machine learning approaches to this problem. This paper presents a new approach to building a question classifier using log-linear models. Evidence from a rich and diverse set of syntactic and semantic features is evaluated, as well as approaches which exploit the hierarchical structure of the question classes.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126767279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Capturing collection size for distributed non-cooperative retrieval 为分布式非合作检索捕获集合大小
Milad Shokouhi, J. Zobel, Falk Scholer, S. Tahaghoghi
Modern distributed information retrieval techniques require accurate knowledge of collection size. In non-cooperative environments, where detailed collection statistics are not available, the size of the underlying collections must be estimated. While several approaches for the estimation of collection size have been proposed, their accuracy has not been thoroughly evaluated. An empirical analysis of past estimation approaches across a variety of collections demonstrates that their prediction accuracy is low. Motivated by ecological techniques for the estimation of animal populations, we propose two new approaches for the estimation of collection size. We show that our approaches are significantly more accurate that previous methods, and are more efficient in use of resources required to perform the estimation.
现代分布式信息检索技术需要对集合大小有准确的了解。在没有详细收集统计信息的非合作环境中,必须估计底层收集的大小。虽然已经提出了几种估计集合大小的方法,但它们的准确性尚未得到彻底评估。对过去各种集合的估计方法的实证分析表明,它们的预测精度很低。在动物种群估计的生态技术的激励下,我们提出了两种新的收集规模估计方法。我们表明,我们的方法明显比以前的方法更准确,并且在使用执行估计所需的资源方面更有效。
{"title":"Capturing collection size for distributed non-cooperative retrieval","authors":"Milad Shokouhi, J. Zobel, Falk Scholer, S. Tahaghoghi","doi":"10.1145/1148170.1148227","DOIUrl":"https://doi.org/10.1145/1148170.1148227","url":null,"abstract":"Modern distributed information retrieval techniques require accurate knowledge of collection size. In non-cooperative environments, where detailed collection statistics are not available, the size of the underlying collections must be estimated. While several approaches for the estimation of collection size have been proposed, their accuracy has not been thoroughly evaluated. An empirical analysis of past estimation approaches across a variety of collections demonstrates that their prediction accuracy is low. Motivated by ecological techniques for the estimation of animal populations, we propose two new approaches for the estimation of collection size. We show that our approaches are significantly more accurate that previous methods, and are more efficient in use of resources required to perform the estimation.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124630477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
期刊
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1