比较网络搜索问题

Proceedings of the 13th International Conference on Web Search and Data Mining Pub Date : 2020-01-20 DOI:10.1145/3336191.3371848

Alexander Bondarenko, Pavel Braslavski, Michael Völske, Rami Aly, Maik Fröbe, Alexander Panchenko, Christian Biemann, Benno Stein, Matthias Hagen

{"title":"比较网络搜索问题","authors":"Alexander Bondarenko, Pavel Braslavski, Michael Völske, Rami Aly, Maik Fröbe, Alexander Panchenko, Christian Biemann, Benno Stein, Matthias Hagen","doi":"10.1145/3336191.3371848","DOIUrl":null,"url":null,"abstract":"\\beginabstract We analyze comparative questions, i.e., questions asking to compare different items, that were submitted to Yandex in 2012. Responses to such questions might be quite different from the simple \"ten blue links'' and could, for example, aggregate pros and cons of the different options as direct answers. However, changing the result presentation is an intricate decision such that the classification of comparative questions forms a highly precision-oriented task. From a year-long Yandex log, we annotate a random sample of 50,000~questions; 2.8%~of which are comparative. For these annotated questions, we develop a precision-oriented classifier by combining carefully hand-crafted lexico-syntactic rules with feature-based and neural approaches---achieving a recall of~0.6 at a perfect precision of~1.0. After running the classifier on the full year log (on average, there is at least one comparative question per second), we analyze 6,250~comparative questions using more fine-grained subclasses (e.g., should the answer be a \"simple'' fact or rather a more verbose argument) for which individual classifiers are trained. An important insight is that more than 65%~of the comparative questions demand argumentation and opinions, i.e., reliable direct answers to comparative questions require more than the facts from a search engine's knowledge graph. In addition, we present a qualitative analysis of the underlying comparative information needs (separated into 14~categories likeconsumer electronics orhealth ), their seasonal dynamics, and possible answers from community question answering platforms. \\endabstract","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Comparative Web Search Questions\",\"authors\":\"Alexander Bondarenko, Pavel Braslavski, Michael Völske, Rami Aly, Maik Fröbe, Alexander Panchenko, Christian Biemann, Benno Stein, Matthias Hagen\",\"doi\":\"10.1145/3336191.3371848\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\\\beginabstract We analyze comparative questions, i.e., questions asking to compare different items, that were submitted to Yandex in 2012. Responses to such questions might be quite different from the simple \\\"ten blue links'' and could, for example, aggregate pros and cons of the different options as direct answers. However, changing the result presentation is an intricate decision such that the classification of comparative questions forms a highly precision-oriented task. From a year-long Yandex log, we annotate a random sample of 50,000~questions; 2.8%~of which are comparative. For these annotated questions, we develop a precision-oriented classifier by combining carefully hand-crafted lexico-syntactic rules with feature-based and neural approaches---achieving a recall of~0.6 at a perfect precision of~1.0. After running the classifier on the full year log (on average, there is at least one comparative question per second), we analyze 6,250~comparative questions using more fine-grained subclasses (e.g., should the answer be a \\\"simple'' fact or rather a more verbose argument) for which individual classifiers are trained. An important insight is that more than 65%~of the comparative questions demand argumentation and opinions, i.e., reliable direct answers to comparative questions require more than the facts from a search engine's knowledge graph. In addition, we present a qualitative analysis of the underlying comparative information needs (separated into 14~categories likeconsumer electronics orhealth ), their seasonal dynamics, and possible answers from community question answering platforms. \\\\endabstract\",\"PeriodicalId\":319008,\"journal\":{\"name\":\"Proceedings of the 13th International Conference on Web Search and Data Mining\",\"volume\":\"87 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 13th International Conference on Web Search and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3336191.3371848\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3336191.3371848","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

我们分析比较问题，即要求比较不同项目的问题，这些问题于2012年提交给Yandex。对这些问题的回答可能与简单的“十个蓝色链接”截然不同，例如，可以将不同选项的利弊汇总为直接答案。然而，改变结果表示是一个复杂的决定，因此比较问题的分类形成了一个高度精确导向的任务。从长达一年的Yandex日志中，我们对随机抽取的5万个问题进行了注释;其中2.8%是比较的。对于这些注释问题，我们开发了一个面向精度的分类器，通过将精心制作的词典句法规则与基于特征和神经方法相结合，实现了~0.6的召回率和~1.0的完美精度。在全年日志上运行分类器之后(平均而言，每秒至少有一个比较问题)，我们使用更细粒度的子类(例如，答案应该是一个“简单”的事实还是更冗长的参数)分析了6,250~比较问题，每个分类器都为此进行了训练。一个重要的洞察是，超过65%的比较问题需要论证和观点，也就是说，对比较问题的可靠直接答案需要的不仅仅是搜索引擎知识图谱中的事实。此外，我们对潜在的比较信息需求(分为14个类别，如消费电子产品或健康)，其季节性动态以及社区问答平台的可能答案进行了定性分析。\ endabstract

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparative Web Search Questions

\beginabstract We analyze comparative questions, i.e., questions asking to compare different items, that were submitted to Yandex in 2012. Responses to such questions might be quite different from the simple "ten blue links'' and could, for example, aggregate pros and cons of the different options as direct answers. However, changing the result presentation is an intricate decision such that the classification of comparative questions forms a highly precision-oriented task. From a year-long Yandex log, we annotate a random sample of 50,000~questions; 2.8%~of which are comparative. For these annotated questions, we develop a precision-oriented classifier by combining carefully hand-crafted lexico-syntactic rules with feature-based and neural approaches---achieving a recall of~0.6 at a perfect precision of~1.0. After running the classifier on the full year log (on average, there is at least one comparative question per second), we analyze 6,250~comparative questions using more fine-grained subclasses (e.g., should the answer be a "simple'' fact or rather a more verbose argument) for which individual classifiers are trained. An important insight is that more than 65%~of the comparative questions demand argumentation and opinions, i.e., reliable direct answers to comparative questions require more than the facts from a search engine's knowledge graph. In addition, we present a qualitative analysis of the underlying comparative information needs (separated into 14~categories likeconsumer electronics orhealth ), their seasonal dynamics, and possible answers from community question answering platforms. \endabstract

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助