首页 > 最新文献

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

英文 中文
Non-factoid Question Answering in the Legal Domain 法律领域的非事实性问答
Gayle McElvain, George Sanchez, Don Teo, Tonya Custis
Non-factoid question answering in the legal domain must provide legally correct, jurisdictionally relevant, and conversationally responsive answers to user-entered questions. We present work done on a QA system that is entirely based on IR and NLP, and does not rely on a structured knowledge base. Our system retrieves concise one-sentence answers for basic questions about the law. It is not restricted in scope to particular topics or jurisdictions. The corpus of potential answers contains approximately 22M documents classified to over 120K legal topics.
法律领域的非事实性问题回答必须为用户输入的问题提供法律上正确的、司法上相关的、对话式响应的答案。我们展示了一个完全基于IR和NLP的QA系统,而不依赖于结构化知识库。我们的系统为有关法律的基本问题检索简洁的一句话答案。它的范围不限于特定主题或司法管辖区。潜在答案的语料库包含大约2200万份文档,分类为超过12万个法律主题。
{"title":"Non-factoid Question Answering in the Legal Domain","authors":"Gayle McElvain, George Sanchez, Don Teo, Tonya Custis","doi":"10.1145/3331184.3331431","DOIUrl":"https://doi.org/10.1145/3331184.3331431","url":null,"abstract":"Non-factoid question answering in the legal domain must provide legally correct, jurisdictionally relevant, and conversationally responsive answers to user-entered questions. We present work done on a QA system that is entirely based on IR and NLP, and does not rely on a structured knowledge base. Our system retrieves concise one-sentence answers for basic questions about the law. It is not restricted in scope to particular topics or jurisdictions. The corpus of potential answers contains approximately 22M documents classified to over 120K legal topics.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88441278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
TDP: Personalized Taxi Demand Prediction Based on Heterogeneous Graph Embedding 基于异构图嵌入的个性化出租车需求预测
Zhenlong Zhu, Ruixuan Li, Minghui Shan, Yuhua Li, Lu Gao, Fei Wang, Jixing Xu, X. Gu
Predicting users' irregular trips in a short term period is one of the crucial tasks in the intelligent transportation system. With the prediction, the taxi requesting services, such as Didi Chuxing in China, can manage the transportation resources to offer better services. There are several different transportation scenes, such as commuting scene and entertainment scene. The origin and the destination of entertainment scene are more unsure than that of commuting scene, so both origin and destination should be predicted. Moreover, users' trips on Didi platform is only a part of their real life, so these transportation data are only few weak samples. To address these challenges, in this paper, we propose Taxi Demand Prediction (TDP) model in challenging entertainment scene based on heterogeneous graph embedding and deep neural predicting network. TDP aims to predict next possible trip edges that have not appeared in historical data for each user in entertainment scene. Experimental results on the real-world dataset show that TDP achieves significant improvements over the state-of-the-art methods.
预测用户在短期内的不规律出行是智能交通系统的关键任务之一。有了预测,像中国的滴滴出行这样的请求服务的出租车可以管理交通资源,提供更好的服务。有几种不同的交通场景,如通勤场景和娱乐场景。娱乐场景的起源和目的地比通勤场景的起源和目的地更不确定,因此需要对起源和目的地进行预测。此外,用户在滴滴平台上的出行只是他们现实生活的一部分,因此这些交通数据只是少数弱样本。针对这些挑战,本文提出了基于异构图嵌入和深度神经网络的挑战性娱乐场景出租车需求预测(TDP)模型。TDP旨在预测娱乐场景中每个用户在历史数据中未出现的下一个可能的行程边缘。在真实数据集上的实验结果表明,TDP比最先进的方法取得了显着的改进。
{"title":"TDP: Personalized Taxi Demand Prediction Based on Heterogeneous Graph Embedding","authors":"Zhenlong Zhu, Ruixuan Li, Minghui Shan, Yuhua Li, Lu Gao, Fei Wang, Jixing Xu, X. Gu","doi":"10.1145/3331184.3331368","DOIUrl":"https://doi.org/10.1145/3331184.3331368","url":null,"abstract":"Predicting users' irregular trips in a short term period is one of the crucial tasks in the intelligent transportation system. With the prediction, the taxi requesting services, such as Didi Chuxing in China, can manage the transportation resources to offer better services. There are several different transportation scenes, such as commuting scene and entertainment scene. The origin and the destination of entertainment scene are more unsure than that of commuting scene, so both origin and destination should be predicted. Moreover, users' trips on Didi platform is only a part of their real life, so these transportation data are only few weak samples. To address these challenges, in this paper, we propose Taxi Demand Prediction (TDP) model in challenging entertainment scene based on heterogeneous graph embedding and deep neural predicting network. TDP aims to predict next possible trip edges that have not appeared in historical data for each user in entertainment scene. Experimental results on the real-world dataset show that TDP achieves significant improvements over the state-of-the-art methods.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88080931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Adversarial Collaborative Neural Network for Robust Recommendation 稳健推荐的对抗协同神经网络
Feng Yuan, Lina Yao, B. Benatallah
Most of recent neural network(NN)-based recommendation techniques mainly focus on improving the overall performance, such as hit ratio for top-N recommendation, where the users' feedbacks are considered as the ground-truth. In real-world applications, those feedbacks are possibly contaminated by imperfect user behaviours, posing challenges on the design of robust recommendation methods. Some methods apply man-made noises on the input data to train the networks more effectively (e.g. the collaborative denoising auto-encoder). In this work, we propose a general adversarial training framework for NN-based recommendation models, improving both the model robustness and the overall performance. We apply our approach on the collaborative auto-encoder model, and show that the combination of adversarial training and NN-based models outperforms highly competitive state-of-the-art recommendation methods on three public datasets.
最近大多数基于神经网络的推荐技术主要关注于提高整体性能,例如top-N推荐的命中率,其中用户的反馈被认为是基础事实。在现实应用中,这些反馈可能会受到不完美用户行为的污染,这对鲁棒推荐方法的设计提出了挑战。一些方法在输入数据上应用人工噪声来更有效地训练网络(如协同去噪自编码器)。在这项工作中,我们为基于神经网络的推荐模型提出了一个通用的对抗训练框架,提高了模型的鲁棒性和整体性能。我们将我们的方法应用于协作自编码器模型,并表明对抗性训练和基于神经网络的模型的组合在三个公共数据集上优于竞争激烈的最先进的推荐方法。
{"title":"Adversarial Collaborative Neural Network for Robust Recommendation","authors":"Feng Yuan, Lina Yao, B. Benatallah","doi":"10.1145/3331184.3331321","DOIUrl":"https://doi.org/10.1145/3331184.3331321","url":null,"abstract":"Most of recent neural network(NN)-based recommendation techniques mainly focus on improving the overall performance, such as hit ratio for top-N recommendation, where the users' feedbacks are considered as the ground-truth. In real-world applications, those feedbacks are possibly contaminated by imperfect user behaviours, posing challenges on the design of robust recommendation methods. Some methods apply man-made noises on the input data to train the networks more effectively (e.g. the collaborative denoising auto-encoder). In this work, we propose a general adversarial training framework for NN-based recommendation models, improving both the model robustness and the overall performance. We apply our approach on the collaborative auto-encoder model, and show that the combination of adversarial training and NN-based models outperforms highly competitive state-of-the-art recommendation methods on three public datasets.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90374792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Automatic Curation of Content Tables for Educational Videos 教育视频内容表的自动管理
Arpan Mukherjee, Shubhi Tiwari, Tanya Chowdhury, Tanmoy Chakraborty
Traditional forms of education are increasingly being replaced by online forms of learning. With many degrees being awarded without the requirement of co-location, it becomes necessary to build tools to enhance online learning interfaces. Online educational videos are often long and do not have enough metadata. Viewers trying to learn about a particular topic have to go through the entire video to find suitable content. We present a novel architecture to curate content tables for educational videos. We harvest text and acoustic properties of the videos to form a hierarchical content table (similar to a table of contents available in a textbook). We allow users to browse the video smartly by skipping to a particular portion rather than going through the entire video. We consider other text-based approaches as our baselines. We find that our approach beats the macro F1-score and micro F1-score of baseline by 39.45% and 35.76% respectively. We present our demo as an independent web page where the user can paste the URL of the video to obtain a generated hierarchical table of contents and navigate to the required content. In the spirit of reproducibility, we make our code public at https://goo.gl/Qzku9d and provide a screen cast to be viewed at https://goo.gl/4HSV1v.
传统的教育形式正日益被在线学习形式所取代。由于许多学位的授予不需要托管,因此有必要构建工具来增强在线学习界面。在线教育视频通常很长,而且没有足够的元数据。想要了解特定主题的观众必须通读整个视频才能找到合适的内容。我们提出了一个新的架构来策划教育视频的内容表。我们收集视频的文本和声学属性,形成一个分层的内容表(类似于教科书中的目录)。我们允许用户通过跳过特定部分而不是浏览整个视频来巧妙地浏览视频。我们考虑其他基于文本的方法作为基准。我们发现,我们的方法比基线的宏观f1得分和微观f1得分分别高出39.45%和35.76%。我们将演示作为一个独立的网页呈现,用户可以在其中粘贴视频的URL,以获得生成的分层目录,并导航到所需的内容。本着可再现性的精神,我们在https://goo.gl/Qzku9d上公开了我们的代码,并在https://goo.gl/4HSV1v上提供了一个屏幕播放供查看。
{"title":"Automatic Curation of Content Tables for Educational Videos","authors":"Arpan Mukherjee, Shubhi Tiwari, Tanya Chowdhury, Tanmoy Chakraborty","doi":"10.1145/3331184.3331400","DOIUrl":"https://doi.org/10.1145/3331184.3331400","url":null,"abstract":"Traditional forms of education are increasingly being replaced by online forms of learning. With many degrees being awarded without the requirement of co-location, it becomes necessary to build tools to enhance online learning interfaces. Online educational videos are often long and do not have enough metadata. Viewers trying to learn about a particular topic have to go through the entire video to find suitable content. We present a novel architecture to curate content tables for educational videos. We harvest text and acoustic properties of the videos to form a hierarchical content table (similar to a table of contents available in a textbook). We allow users to browse the video smartly by skipping to a particular portion rather than going through the entire video. We consider other text-based approaches as our baselines. We find that our approach beats the macro F1-score and micro F1-score of baseline by 39.45% and 35.76% respectively. We present our demo as an independent web page where the user can paste the URL of the video to obtain a generated hierarchical table of contents and navigate to the required content. In the spirit of reproducibility, we make our code public at https://goo.gl/Qzku9d and provide a screen cast to be viewed at https://goo.gl/4HSV1v.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84783974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Information Cascades Modeling via Deep Multi-Task Learning 基于深度多任务学习的信息级联建模
Xueqin Chen, Kunpeng Zhang, Fan Zhou, Goce Trajcevski, Ting Zhong, Fengli Zhang
Effectively modeling and predicting the information cascades is at the core of understanding the information diffusion, which is essential for many related downstream applications, such as fake news detection and viral marketing identification. Conventional methods for cascade prediction heavily depend on the hypothesis of diffusion models and hand-crafted features. Owing to the significant recent successes of deep learning in multiple domains, attempts have been made to predict cascades by developing neural networks based approaches. However, the existing models are not capable of capturing both the underlying structure of a cascade graph and the node sequence in the diffusion process which, in turn, results in unsatisfactory prediction performance. In this paper, we propose a deep multi-task learning framework with a novel design of shared-representation layer to aid in explicitly understanding and predicting the cascades. As it turns out, the learned latent representation from the shared-representation layer can encode the structure and the node sequence of the cascade very well. Our experiments conducted on real-world datasets demonstrate that our method can significantly improve the prediction accuracy and reduce the computational cost compared to state-of-the-art baselines.
有效地建模和预测信息级联是理解信息扩散的核心,这对于许多相关的下游应用,如假新闻检测和病毒营销识别至关重要。传统的级联预测方法严重依赖于扩散模型的假设和手工制作的特征。由于最近深度学习在多个领域取得了重大成功,人们尝试通过开发基于神经网络的方法来预测级联。然而,现有的模型不能同时捕捉级联图的底层结构和扩散过程中的节点序列,从而导致预测性能不理想。在本文中,我们提出了一个深度多任务学习框架,该框架具有新颖的共享表示层设计,以帮助明确理解和预测级联。结果表明,从共享表示层学习到的潜在表示可以很好地编码级联的结构和节点序列。我们在真实世界数据集上进行的实验表明,与最先进的基线相比,我们的方法可以显着提高预测精度并降低计算成本。
{"title":"Information Cascades Modeling via Deep Multi-Task Learning","authors":"Xueqin Chen, Kunpeng Zhang, Fan Zhou, Goce Trajcevski, Ting Zhong, Fengli Zhang","doi":"10.1145/3331184.3331288","DOIUrl":"https://doi.org/10.1145/3331184.3331288","url":null,"abstract":"Effectively modeling and predicting the information cascades is at the core of understanding the information diffusion, which is essential for many related downstream applications, such as fake news detection and viral marketing identification. Conventional methods for cascade prediction heavily depend on the hypothesis of diffusion models and hand-crafted features. Owing to the significant recent successes of deep learning in multiple domains, attempts have been made to predict cascades by developing neural networks based approaches. However, the existing models are not capable of capturing both the underlying structure of a cascade graph and the node sequence in the diffusion process which, in turn, results in unsatisfactory prediction performance. In this paper, we propose a deep multi-task learning framework with a novel design of shared-representation layer to aid in explicitly understanding and predicting the cascades. As it turns out, the learned latent representation from the shared-representation layer can encode the structure and the node sequence of the cascade very well. Our experiments conducted on real-world datasets demonstrate that our method can significantly improve the prediction accuracy and reduce the computational cost compared to state-of-the-art baselines.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82085716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
USEing Transfer Learning in Retrieval of Statistical Data 迁移学习在统计数据检索中的应用
A. Firsov, Vladimir Bugay, A. Karpenko
DSSM-like models showed good results in retrieval of short documents that semantically match the query. However, these models require large collections of click-through data that are not available in some domains. On the other hand, the recent advances in NLP demonstrated the possibility to fine-tune language models and models trained on one set of tasks to achieve a state of the art results on a multitude of other tasks or to get competitive results using much smaller training sets. Following this trend, we combined DSSM-like architecture with USE (Universal Sentence Encoder) and BERT (Bidirectional Encoder Representations from Transformers) models in order to be able to fine-tune them on a small amount of click-through data and use them for information retrieval. This approach allowed us to significantly improve our search engine for statistical data.
类dsm模型在检索语义上与查询匹配的短文档方面显示出良好的结果。然而,这些模型需要大量的点击数据,而这些数据在某些领域是不可用的。另一方面,NLP的最新进展表明,可以对语言模型和在一组任务上训练的模型进行微调,从而在许多其他任务上获得最先进的结果,或者使用更小的训练集获得具有竞争力的结果。遵循这一趋势,我们将类似dsm的架构与USE(通用句子编码器)和BERT(来自变压器的双向编码器表示)模型结合起来,以便能够在少量的点击数据上对它们进行微调,并将它们用于信息检索。这种方法使我们能够显著改进统计数据的搜索引擎。
{"title":"USEing Transfer Learning in Retrieval of Statistical Data","authors":"A. Firsov, Vladimir Bugay, A. Karpenko","doi":"10.1145/3331184.3331427","DOIUrl":"https://doi.org/10.1145/3331184.3331427","url":null,"abstract":"DSSM-like models showed good results in retrieval of short documents that semantically match the query. However, these models require large collections of click-through data that are not available in some domains. On the other hand, the recent advances in NLP demonstrated the possibility to fine-tune language models and models trained on one set of tasks to achieve a state of the art results on a multitude of other tasks or to get competitive results using much smaller training sets. Following this trend, we combined DSSM-like architecture with USE (Universal Sentence Encoder) and BERT (Bidirectional Encoder Representations from Transformers) models in order to be able to fine-tune them on a small amount of click-through data and use them for information retrieval. This approach allowed us to significantly improve our search engine for statistical data.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89251980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Why do Users Issue Good Queries?: Neural Correlates of Term Specificity 为什么用户会发出好的查询?:术语特异性的神经相关
Lauri Kangassalo, Michiel M. A. Spapé, Giulio Jacucci, Tuukka Ruotsalo
Despite advances in the past few decades in studying what kind of queries users input to search engines and how to suggest queries for the users, the fundamental question of what makes human cognition able to estimate goodness of query terms is largely unanswered. For example, a person searching information about "cats'' is able to choose query terms, such as "housecat'', "feline'', or "animal'' and avoid terms like "similar'', "variety'', and "distinguish''. We investigated the association between the specificity of terms occurring in documents and human brain activity measured via electroencephalography (EEG). We analyzed the brain activity data of fifteen participants, recorded in response to reading terms from Wikipedia documents. Term specificity was shown to be associated with the amplitude of evoked brain responses. The results indicate that by being able to determine which terms carry maximal information about, and can best discriminate between, documents, people have the capability to enter good query terms. Moreover, our results suggest that the effective query term selection process, often observed in practical search behavior studies, has a neural basis. We believe our findings constitute an important step in revealing the cognitive processing behind query formulation and evaluating informativeness of language in general.
尽管过去几十年在研究用户向搜索引擎输入什么样的查询以及如何为用户建议查询方面取得了进展,但是什么使人类认知能够估计查询词的好坏这一基本问题在很大程度上没有答案。例如,搜索有关“猫”的信息的人可以选择查询词,如“家猫”、“猫科动物”或“动物”,并避免使用“相似”、“品种”和“区分”等词。我们研究了通过脑电图(EEG)测量的文档中出现的术语特异性与人脑活动之间的关系。我们分析了15名参与者的大脑活动数据,记录了他们阅读维基百科文档中的术语时的反应。术语特异性被证明与诱发的大脑反应的幅度有关。结果表明,通过能够确定哪些词携带的信息最多,并且能够最好地区分文档,人们就有能力输入好的查询词。此外,我们的研究结果表明,在实际搜索行为研究中经常观察到的有效查询词选择过程具有神经基础。我们相信我们的发现在揭示查询公式背后的认知过程和评估语言的信息性方面迈出了重要的一步。
{"title":"Why do Users Issue Good Queries?: Neural Correlates of Term Specificity","authors":"Lauri Kangassalo, Michiel M. A. Spapé, Giulio Jacucci, Tuukka Ruotsalo","doi":"10.1145/3331184.3331243","DOIUrl":"https://doi.org/10.1145/3331184.3331243","url":null,"abstract":"Despite advances in the past few decades in studying what kind of queries users input to search engines and how to suggest queries for the users, the fundamental question of what makes human cognition able to estimate goodness of query terms is largely unanswered. For example, a person searching information about \"cats'' is able to choose query terms, such as \"housecat'', \"feline'', or \"animal'' and avoid terms like \"similar'', \"variety'', and \"distinguish''. We investigated the association between the specificity of terms occurring in documents and human brain activity measured via electroencephalography (EEG). We analyzed the brain activity data of fifteen participants, recorded in response to reading terms from Wikipedia documents. Term specificity was shown to be associated with the amplitude of evoked brain responses. The results indicate that by being able to determine which terms carry maximal information about, and can best discriminate between, documents, people have the capability to enter good query terms. Moreover, our results suggest that the effective query term selection process, often observed in practical search behavior studies, has a neural basis. We believe our findings constitute an important step in revealing the cognitive processing behind query formulation and evaluating informativeness of language in general.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89277869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Multiple Query Processing via Logic Function Factoring 基于逻辑函数分解的多重查询处理
Matteo Catena, N. Tonellotto
Some extensions to search systems require support for multiple query processing. This is the case with query variations, i.e., different query formulations of the same information need. The results of their processing can be fused together to improve effectiveness, but this requires to traverse more than once the query terms' posting lists, thus prolonging the multiple query processing time. In this work, we propose an approach to optimize the processing of query variations to reduce their overall response time. Similarly to the standard Boolean model, we firstly represent a group of query variations as a logic function where Boolean variables represent query terms. We then apply factoring to such function, in order to produce a more compact but logically equivalent representation. The factored form is used to process the query variations in a single pass over the inverted index. We experimentally show that our approach can improve by up to 1.95× the mean processing time of a multiple query with no statistically significant degradation in terms of NDCG@10.
搜索系统的一些扩展需要支持多个查询处理。查询变量就是这种情况,即相同信息需求的不同查询公式。它们的处理结果可以融合在一起以提高效率,但这需要遍历多次查询条件的发布列表,从而延长了多次查询处理时间。在这项工作中,我们提出了一种方法来优化查询变化的处理,以减少它们的总体响应时间。与标准布尔模型类似,我们首先将一组查询变量表示为逻辑函数,其中布尔变量表示查询项。然后,我们对这样的函数应用分解,以产生一个更紧凑但逻辑上等效的表示。因子形式用于在倒排索引的单次传递中处理查询变化。我们通过实验表明,我们的方法可以将多个查询的平均处理时间提高1.95倍,而在NDCG@10方面没有统计学上显著的下降。
{"title":"Multiple Query Processing via Logic Function Factoring","authors":"Matteo Catena, N. Tonellotto","doi":"10.1145/3331184.3331297","DOIUrl":"https://doi.org/10.1145/3331184.3331297","url":null,"abstract":"Some extensions to search systems require support for multiple query processing. This is the case with query variations, i.e., different query formulations of the same information need. The results of their processing can be fused together to improve effectiveness, but this requires to traverse more than once the query terms' posting lists, thus prolonging the multiple query processing time. In this work, we propose an approach to optimize the processing of query variations to reduce their overall response time. Similarly to the standard Boolean model, we firstly represent a group of query variations as a logic function where Boolean variables represent query terms. We then apply factoring to such function, in order to produce a more compact but logically equivalent representation. The factored form is used to process the query variations in a single pass over the inverted index. We experimentally show that our approach can improve by up to 1.95× the mean processing time of a multiple query with no statistically significant degradation in terms of NDCG@10.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88568548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Expert-Guided Entity Extraction using Expressive Rules 使用表达规则的专家引导实体抽取
M. Kejriwal, Runqi Shao, Pedro A. Szekely
Knowledge Graph Construction (KGC) is an important problem that has many domain-specific applications, including semantic search and predictive analytics. As sophisticated KGC algorithms continue to be proposed, an important, neglected use case is to empower domain experts who do not have much technical background to construct high-fidelity, interpretable knowledge graphs. Such domain experts are a valuable source of input because of their (both formal and learned) knowledge of the domain. In this demonstration paper, we present a system that allows domain experts to construct knowledge graphs by writing sophisticated rule-based entity extractors with minimal training, using a GUI-based editor that offers a range of complex facilities.
知识图谱构建(Knowledge Graph Construction, KGC)是一个重要的问题,在许多特定领域都有应用,包括语义搜索和预测分析。随着复杂的KGC算法不断被提出,一个重要的、被忽视的用例是赋予没有太多技术背景的领域专家构建高保真、可解释的知识图的能力。这些领域专家是一个有价值的输入来源,因为他们(正式的和学习的)领域知识。在这篇演示论文中,我们提出了一个系统,该系统允许领域专家通过编写复杂的基于规则的实体提取器来构建知识图,只需最少的训练,使用提供一系列复杂设施的基于gui的编辑器。
{"title":"Expert-Guided Entity Extraction using Expressive Rules","authors":"M. Kejriwal, Runqi Shao, Pedro A. Szekely","doi":"10.1145/3331184.3331392","DOIUrl":"https://doi.org/10.1145/3331184.3331392","url":null,"abstract":"Knowledge Graph Construction (KGC) is an important problem that has many domain-specific applications, including semantic search and predictive analytics. As sophisticated KGC algorithms continue to be proposed, an important, neglected use case is to empower domain experts who do not have much technical background to construct high-fidelity, interpretable knowledge graphs. Such domain experts are a valuable source of input because of their (both formal and learned) knowledge of the domain. In this demonstration paper, we present a system that allows domain experts to construct knowledge graphs by writing sophisticated rule-based entity extractors with minimal training, using a GUI-based editor that offers a range of complex facilities.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"357 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76510400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
ENT Rank: Retrieving Entities for Topical Information Needs through Entity-Neighbor-Text Relations ENT秩:通过实体-邻居-文本关系检索主题信息需求的实体
Laura Dietz
Related work has demonstrated the helpfulness of utilizing information about entities in text retrieval; here we explore the converse: Utilizing information about text in entity retrieval. We model the relevance of Entity-Neighbor-Text (ENT) relations to derive a learning-to-rank-entities model. We focus on the task of retrieving (multiple) relevant entities in response to a topical information need such as "Zika fever". The ENT Rank model is designed to exploit semi-structured knowledge resources such as Wikipedia for entity retrieval. The ENT Rank model combines (1) established features of entity-relevance, with (2) information from neighboring entities (co-mentioned or mentioned-on-page) through (3) relevance scores of textual contexts through traditional retrieval models such as BM25 and RM3.
相关工作已经证明了实体信息在文本检索中的有用性;在这里,我们探索相反的方向:在实体检索中利用文本信息。我们对实体-邻居-文本(ENT)关系的相关性进行建模,以派生出一个学习-排序实体模型。我们专注于检索(多个)相关实体的任务,以响应主题信息需求,如“寨卡热”。ENT Rank模型旨在利用半结构化的知识资源(如Wikipedia)进行实体检索。ENT Rank模型将(1)已建立的实体相关性特征与(2)来自相邻实体(共同提及或在页面上提及)的信息通过(3)通过传统检索模型(如BM25和RM3)对文本上下文的相关性评分相结合。
{"title":"ENT Rank: Retrieving Entities for Topical Information Needs through Entity-Neighbor-Text Relations","authors":"Laura Dietz","doi":"10.1145/3331184.3331257","DOIUrl":"https://doi.org/10.1145/3331184.3331257","url":null,"abstract":"Related work has demonstrated the helpfulness of utilizing information about entities in text retrieval; here we explore the converse: Utilizing information about text in entity retrieval. We model the relevance of Entity-Neighbor-Text (ENT) relations to derive a learning-to-rank-entities model. We focus on the task of retrieving (multiple) relevant entities in response to a topical information need such as \"Zika fever\". The ENT Rank model is designed to exploit semi-structured knowledge resources such as Wikipedia for entity retrieval. The ENT Rank model combines (1) established features of entity-relevance, with (2) information from neighboring entities (co-mentioned or mentioned-on-page) through (3) relevance scores of textual contexts through traditional retrieval models such as BM25 and RM3.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89570677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
期刊
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1