{"title":"Efficient parsing-based search over structured data","authors":"Aditya G. Parameswaran, R. Kaushik, A. Arasu","doi":"10.1145/2505515.2505764","DOIUrl":null,"url":null,"abstract":"Parsing-based search, i.e., parsing keyword search queries using grammars, is often used to override the traditional \"bag-of-words'\" semantics in web search and enterprise search scenarios. Compared to the \"bag-of-words\" semantics, the parsing-based semantics is richer and more customizable. While a formalism for parsing-based semantics for keyword search has been proposed in prior work and ad-hoc implementations exist, the problem of designing efficient algorithms to support the semantics is largely unstudied. In this paper, we present a suite of efficient algorithms and auxiliary indexes for this problem. Our algorithms work for a broad classes of grammars used in practice, and cover a variety of database matching functions (set- and substring-containment, approximate and exact equality) and scoring functions (to filter and rank different parses). We formally analyze the time complexity of our algorithms and provide an empirical evaluation over real-world data to show that our algorithms scale well with the size of the database and grammar.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"197 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2505515.2505764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Parsing-based search, i.e., parsing keyword search queries using grammars, is often used to override the traditional "bag-of-words'" semantics in web search and enterprise search scenarios. Compared to the "bag-of-words" semantics, the parsing-based semantics is richer and more customizable. While a formalism for parsing-based semantics for keyword search has been proposed in prior work and ad-hoc implementations exist, the problem of designing efficient algorithms to support the semantics is largely unstudied. In this paper, we present a suite of efficient algorithms and auxiliary indexes for this problem. Our algorithms work for a broad classes of grammars used in practice, and cover a variety of database matching functions (set- and substring-containment, approximate and exact equality) and scoring functions (to filter and rank different parses). We formally analyze the time complexity of our algorithms and provide an empirical evaluation over real-world data to show that our algorithms scale well with the size of the database and grammar.