{"title":"Diversifying Search Result Leveraging Aspect-based Query Expansion","authors":"Shajalal, Muhammad Anwarul Azim Masaki Aono","doi":"10.17781/P002433","DOIUrl":null,"url":null,"abstract":"Web search queries are short, ambiguous and tend to have multiple underlying interpretations. To reformulate such queries, query expansion is a prominent method that leads to retrieve a set of relevant documents. In this paper, we propose an aspectbased query expansion technique for diversified document retrieval. At first, query suggestions and completions are retrieved from major commercial search engines. A frequent phrase-based soft clustering algorithm is then applied to group similar retrieved candidates into clusters. Each cluster represents different query aspect. The expansion terms are selected from the generated cluster labels for each cluster. To estimate the relevancy between the expanded query and the documents, multiple new lexical and semantic features are introduced using the content information, and word-embedding model, respectively. Finally, a linear ranking approach is employed to re-rank the documents retrieved for the original query using the extracted features. We conduct experiments on Clueweb09 document collection using TREC 2012 Web Track queries. The experimental results clearly demonstrate that our proposed aspect-based query expansion method is effective to diversify the retrieved documents and outperformed baseline and some known related methods in terms of diversity metrics ERR-IA, α-nDCG and NRBP at the cut of 20.","PeriodicalId":211757,"journal":{"name":"International journal of new computer architectures and their applications","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of new computer architectures and their applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17781/P002433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Web search queries are short, ambiguous and tend to have multiple underlying interpretations. To reformulate such queries, query expansion is a prominent method that leads to retrieve a set of relevant documents. In this paper, we propose an aspectbased query expansion technique for diversified document retrieval. At first, query suggestions and completions are retrieved from major commercial search engines. A frequent phrase-based soft clustering algorithm is then applied to group similar retrieved candidates into clusters. Each cluster represents different query aspect. The expansion terms are selected from the generated cluster labels for each cluster. To estimate the relevancy between the expanded query and the documents, multiple new lexical and semantic features are introduced using the content information, and word-embedding model, respectively. Finally, a linear ranking approach is employed to re-rank the documents retrieved for the original query using the extracted features. We conduct experiments on Clueweb09 document collection using TREC 2012 Web Track queries. The experimental results clearly demonstrate that our proposed aspect-based query expansion method is effective to diversify the retrieved documents and outperformed baseline and some known related methods in terms of diversity metrics ERR-IA, α-nDCG and NRBP at the cut of 20.
Web搜索查询简短、不明确,而且往往有多种潜在的解释。为了重新表述这样的查询,查询扩展是一种重要的方法,它导致检索一组相关文档。本文提出了一种基于方面的多元文档检索扩展技术。首先,查询建议和补全是从主要的商业搜索引擎中检索的。然后应用基于频繁短语的软聚类算法对相似的检索候选词进行聚类。每个集群代表不同的查询方面。从每个集群生成的集群标签中选择扩展项。为了估计扩展查询与文档之间的相关性,分别使用内容信息和词嵌入模型引入了多个新的词汇和语义特征。最后,采用线性排序方法,使用提取的特征对原始查询检索到的文档重新排序。我们使用TREC 2012 Web Track查询对Clueweb09文档收集进行了实验。实验结果清楚地表明,我们提出的基于方面的查询扩展方法在检索文档的多样性方面是有效的,并且在多样性指标ERR-IA, α-nDCG和NRBP方面优于基线和一些已知的相关方法。