Web-scale stream reasoning is based on continuous queries and reasoning on a snapshot of the dynamic knowledge combined with background knowledge. The existing stream reasoners usually use either time-based or count-based window techniques following the data stream principles, however they do not fit all scenarios in the stream reasoning area. In this paper, different types of windowing mechanisms are described with exemplary scenarios in which they are most suitable for reasoning on stream of facts. A new windowing technique namely Adaptive Window is also proposed. Lastly, some important questions related to windowing techniques for web-scale stream reasoning are positioned.
{"title":"Windowing mechanisms for web scale stream reasoning","authors":"Snehasish Banerjee, D. Mukherjee","doi":"10.1145/2512405.2512409","DOIUrl":"https://doi.org/10.1145/2512405.2512409","url":null,"abstract":"Web-scale stream reasoning is based on continuous queries and reasoning on a snapshot of the dynamic knowledge combined with background knowledge. The existing stream reasoners usually use either time-based or count-based window techniques following the data stream principles, however they do not fit all scenarios in the stream reasoning area. In this paper, different types of windowing mechanisms are described with exemplary scenarios in which they are most suitable for reasoning on stream of facts. A new windowing technique namely Adaptive Window is also proposed. Lastly, some important questions related to windowing techniques for web-scale stream reasoning are positioned.","PeriodicalId":266349,"journal":{"name":"Web-KR '13","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125948688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowledge bases such as Wikipedia have been shown to be effective to improve the performance in many information tasks. Clearly, the effectiveness is based upon the quality of these knowledge bases. A high-quality knowledge base should have up-to-date complete information. However, constructing a high-quality knowledge base is not an easy task because it would require significant manual efforts to collect relevant documents, extract valuable information and update the knowledge bases accordingly. In this paper, we aim to automate this labor-intensive process. Specifically, we focus on how to collect relevant documents with regard to an entity from sheer volume of Web data automatically. To solve the problem, we propose to construct the profile of the entity by leveraging a set of its related entities and then discuss how to use the training data to weight the related entities. Experiments over the TREC 2012 KBA collection shows that the proposed method can outperform state-of-the-art methods.
{"title":"Leveraging related entities for knowledge base acceleration","authors":"Xitong Liu, Hui Fang","doi":"10.1145/2512405.2512407","DOIUrl":"https://doi.org/10.1145/2512405.2512407","url":null,"abstract":"Knowledge bases such as Wikipedia have been shown to be effective to improve the performance in many information tasks. Clearly, the effectiveness is based upon the quality of these knowledge bases. A high-quality knowledge base should have up-to-date complete information. However, constructing a high-quality knowledge base is not an easy task because it would require significant manual efforts to collect relevant documents, extract valuable information and update the knowledge bases accordingly. In this paper, we aim to automate this labor-intensive process. Specifically, we focus on how to collect relevant documents with regard to an entity from sheer volume of Web data automatically. To solve the problem, we propose to construct the profile of the entity by leveraging a set of its related entities and then discuss how to use the training data to weight the related entities. Experiments over the TREC 2012 KBA collection shows that the proposed method can outperform state-of-the-art methods.","PeriodicalId":266349,"journal":{"name":"Web-KR '13","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114083459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deepak Pai, Balaraman Ravindran, S. Rajagopalan, Ramesh Srinivasaraghavan
Traditionally, web analytics has focused on analysis and reporting of business metrics of interest to marketers, such as page views and revenue, by various dimensions of session characteristics, that can be obtained from user request. We introduce the notion of faceted reporting in the context of web analytics, where aggregated business metrics are reported grouped by a facet, a dimension along which a document could be represented. For example, in the case of e-Commerce sites, facets are typically various product attributes such as price, color, manufacturer, etc. For a typical website one could think of thousands of facets, but not all of them are equally important for the marketer in all reporting scenarios. In this work, we propose a business-metric driven scheme for automatic selection of facets for various reporting scenarios. The facet selection is done based on optimizing an objective function involving business metrics and we present our evaluation results based on multiple objective functions. We observe that, marketers' intuitive selection of useful facets is inaccurate. On the other hand automated methods proposed in this paper can highlight insights from the data.
{"title":"Automated faceted reporting for web analytics","authors":"Deepak Pai, Balaraman Ravindran, S. Rajagopalan, Ramesh Srinivasaraghavan","doi":"10.1145/2512405.2512406","DOIUrl":"https://doi.org/10.1145/2512405.2512406","url":null,"abstract":"Traditionally, web analytics has focused on analysis and reporting of business metrics of interest to marketers, such as page views and revenue, by various dimensions of session characteristics, that can be obtained from user request. We introduce the notion of faceted reporting in the context of web analytics, where aggregated business metrics are reported grouped by a facet, a dimension along which a document could be represented. For example, in the case of e-Commerce sites, facets are typically various product attributes such as price, color, manufacturer, etc. For a typical website one could think of thousands of facets, but not all of them are equally important for the marketer in all reporting scenarios. In this work, we propose a business-metric driven scheme for automatic selection of facets for various reporting scenarios. The facet selection is done based on optimizing an objective function involving business metrics and we present our evaluation results based on multiple objective functions. We observe that, marketers' intuitive selection of useful facets is inaccurate. On the other hand automated methods proposed in this paper can highlight insights from the data.","PeriodicalId":266349,"journal":{"name":"Web-KR '13","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114417830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The massive and diverse data sources on the Deep Web presents a serious data integration challenge. Existing virtual integration approaches suffer from slow query response, while surfacing approaches demand hefty storage space and incur huge costs in maintaining data freshness. We propose a novel hybrid integration approach that strikes a balance between the virtual and surfacing approaches. The key idea is to capture user needs in query templates and focus the integration efforts on the templates. However, realizing this approach requires innovations in template-driven query planning, query parsing, and template discovery. We elaborate on these challenges and propose our solutions.
{"title":"The deep web: woven to catch the middle ground","authors":"Wensheng Wu","doi":"10.1145/2512405.2512408","DOIUrl":"https://doi.org/10.1145/2512405.2512408","url":null,"abstract":"The massive and diverse data sources on the Deep Web presents a serious data integration challenge. Existing virtual integration approaches suffer from slow query response, while surfacing approaches demand hefty storage space and incur huge costs in maintaining data freshness. We propose a novel hybrid integration approach that strikes a balance between the virtual and surfacing approaches. The key idea is to capture user needs in query templates and focus the integration efforts on the templates. However, realizing this approach requires innovations in template-driven query planning, query parsing, and template discovery. We elaborate on these challenges and propose our solutions.","PeriodicalId":266349,"journal":{"name":"Web-KR '13","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132265671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}