With the rapid growth of the web's size, web page classification becomes more prominent. The representation way of a web page and contextual features used for this representation have both an impact on the classification's performance. Thus, finding an adequate representation of web pages is essential for a better web page classification. In this paper, we propose a web page representation based on the structure of the implicit graph built using implicit links extracted from the query-log. In this representation, we represent web pages using their textual contents along with their neighbors as features instead of using features of their neighbors. When two or more web pages in the implicit graph share the same direct neighbors and belong to the same class ci, it is most likely that every other web page, having the same immediate neighbors, will belong to the same class ci. We propose two kinds of web page representations: Boolean Neighbor Vector (BNV) and Weighted Neighbor Vector (WNV). In BNV, we supplement the feature vector, which represents the textual content of a web page, by a Boolean vector. This vector represents the target web page's neighbors and shows whether a web page is a direct neighbor of the target web page or not. In WNV, we supplement the feature vector, which represents the textual content of a web page, by a weighted vector. This latter represents the target web page's neighbors and shows strengths of relations between the target web page and its neighbors. We conduct experiments using four classifiers: SVM (Support Vector Machine), NB (Naive Bayes), RF (Random Forest) and KNN (K-Nearest Neighbors) on two subsets of ODP (Open Directory Project). Results show that: (1) the proposed representation helps obtain better classification results when using SVM, NB, RF and KNN for both Bag of Words (BW) and 5-gram representations. (2) The performances based on BNV are better than those based on WNV.
随着网络规模的快速增长,网页分类变得更加突出。网页的表示方式和用于这种表示的上下文特征都对分类的性能有影响。因此,找到一个适当的网页表示对于更好的网页分类是必不可少的。在本文中,我们提出了一种基于从查询日志中提取的隐式链接构建的隐式图结构的网页表示。在这种表示中,我们使用网页的文本内容及其邻居作为特征来表示网页,而不是使用其邻居的特征。当隐式图中的两个或多个网页共享相同的直接邻居并属于同一类ci时,最有可能的是,具有相同的直接邻居的所有其他网页都属于同一类ci。我们提出了两种网页表示:布尔邻居向量(BNV)和加权邻居向量(WNV)。在BNV中,我们用布尔向量来补充表示网页文本内容的特征向量。此向量表示目标网页的邻居,并显示网页是否为目标网页的直接邻居。在WNV中,我们用加权向量来补充表示网页文本内容的特征向量。后者表示目标网页的邻居,并显示目标网页与其邻居之间的关系强度。我们在ODP (Open Directory Project)的两个子集上使用支持向量机(SVM)、朴素贝叶斯(NB)、随机森林(RF)和k近邻(KNN)四种分类器进行了实验。结果表明:(1)所提出的表示方法在使用SVM、NB、RF和KNN对Bag of Words (BW)和5-gram表示时都能获得更好的分类效果。(2)基于BNV的性能优于基于WNV的性能。
{"title":"Implicit Links based Web Page Representation for Web Page Classification","authors":"Abdelbadie Belmouhcine, M. Benkhalifa","doi":"10.1145/2797115.2797125","DOIUrl":"https://doi.org/10.1145/2797115.2797125","url":null,"abstract":"With the rapid growth of the web's size, web page classification becomes more prominent. The representation way of a web page and contextual features used for this representation have both an impact on the classification's performance. Thus, finding an adequate representation of web pages is essential for a better web page classification. In this paper, we propose a web page representation based on the structure of the implicit graph built using implicit links extracted from the query-log. In this representation, we represent web pages using their textual contents along with their neighbors as features instead of using features of their neighbors. When two or more web pages in the implicit graph share the same direct neighbors and belong to the same class ci, it is most likely that every other web page, having the same immediate neighbors, will belong to the same class ci. We propose two kinds of web page representations: Boolean Neighbor Vector (BNV) and Weighted Neighbor Vector (WNV). In BNV, we supplement the feature vector, which represents the textual content of a web page, by a Boolean vector. This vector represents the target web page's neighbors and shows whether a web page is a direct neighbor of the target web page or not. In WNV, we supplement the feature vector, which represents the textual content of a web page, by a weighted vector. This latter represents the target web page's neighbors and shows strengths of relations between the target web page and its neighbors. We conduct experiments using four classifiers: SVM (Support Vector Machine), NB (Naive Bayes), RF (Random Forest) and KNN (K-Nearest Neighbors) on two subsets of ODP (Open Directory Project). Results show that: (1) the proposed representation helps obtain better classification results when using SVM, NB, RF and KNN for both Bag of Words (BW) and 5-gram representations. (2) The performances based on BNV are better than those based on WNV.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126922826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Papadakis, Ioannis Apostolatos, Dimitris Apostolou
Nowadays, search engines are the obvious way of finding information on the web. However, there are times when users are forced to engage themselves in long and tedious search sessions during which they have to process their initial query a number of times until they come up with results that satisfy their information needs. This paper proposes a query construction and refinement service that aids users during their engagement with a large scale web search engine. As a proof of concept, GContext is presented and accordingly evaluated as an implementation of the proposed service. GContext integrates various sources of the lod-cloud within the environment of a large scale web search engine.
{"title":"A LOD-based, query construction and refinement service for web search engines","authors":"I. Papadakis, Ioannis Apostolatos, Dimitris Apostolou","doi":"10.1145/2797115.2797122","DOIUrl":"https://doi.org/10.1145/2797115.2797122","url":null,"abstract":"Nowadays, search engines are the obvious way of finding information on the web. However, there are times when users are forced to engage themselves in long and tedious search sessions during which they have to process their initial query a number of times until they come up with results that satisfy their information needs. This paper proposes a query construction and refinement service that aids users during their engagement with a large scale web search engine. As a proof of concept, GContext is presented and accordingly evaluated as an implementation of the proposed service. GContext integrates various sources of the lod-cloud within the environment of a large scale web search engine.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123866714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Krieger, J. Schneider, Christian Nywelt, D. Rösner
With Semantic Web technologies and Linked Data datasets we are able to not only retrieve the textual content of a document but also to automatically create formal semantic descriptions of its content. In this paper we present a Linked Data-based approach to automatically generate semantic fingerprints for Web documents. Our approach exploits the structured information in Linked Data datasets to derive an explicit semantic description of a Web resource. A two-stage evaluation of the implementation of the presented approach shows its feasibility and robustness.
{"title":"Creating Semantic Fingerprints for Web Documents","authors":"K. Krieger, J. Schneider, Christian Nywelt, D. Rösner","doi":"10.1145/2797115.2797132","DOIUrl":"https://doi.org/10.1145/2797115.2797132","url":null,"abstract":"With Semantic Web technologies and Linked Data datasets we are able to not only retrieve the textual content of a document but also to automatically create formal semantic descriptions of its content. In this paper we present a Linked Data-based approach to automatically generate semantic fingerprints for Web documents. Our approach exploits the structured information in Linked Data datasets to derive an explicit semantic description of a Web resource. A two-stage evaluation of the implementation of the presented approach shows its feasibility and robustness.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127411015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Promoted by major search engines, schema.org has become a widely adopted standard for marking up structured data in HTML web pages. In this paper, we use a series of large-scale Web crawls to analyze the evolution and adoption of schema.org over time. The availability of data from different points in time for both the schema and the websites deploying data allows for a new kind of empirical analysis of standards adoption, which has not been possible before. To conduct our analysis, we compare different versions of the schema.org vocabulary to the data that was deployed on hundreds of thousands of Web pages at different points in time. We measure both top-down adoption (i.e., the extent to which changes in the schema are adopted by data providers) as well as bottom-up evolution (i.e., the extent to which the actually deployed data drives changes in the schema). Our empirical analysis shows that both processes can be observed.
{"title":"A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time","authors":"R. Meusel, Christian Bizer, Heiko Paulheim","doi":"10.1145/2797115.2797124","DOIUrl":"https://doi.org/10.1145/2797115.2797124","url":null,"abstract":"Promoted by major search engines, schema.org has become a widely adopted standard for marking up structured data in HTML web pages. In this paper, we use a series of large-scale Web crawls to analyze the evolution and adoption of schema.org over time. The availability of data from different points in time for both the schema and the websites deploying data allows for a new kind of empirical analysis of standards adoption, which has not been possible before. To conduct our analysis, we compare different versions of the schema.org vocabulary to the data that was deployed on hundreds of thousands of Web pages at different points in time. We measure both top-down adoption (i.e., the extent to which changes in the schema are adopted by data providers) as well as bottom-up evolution (i.e., the extent to which the actually deployed data drives changes in the schema). Our empirical analysis shows that both processes can be observed.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133350318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Millions of HTML tables containing structured data can be found on the Web. With their wide coverage, these tables are potentially very useful for filling missing values and extending cross-domain knowledge bases such as DBpedia, YAGO, or the Google Knowledge Graph. As a prerequisite for being able to use table data for knowledge base extension, the HTML tables need to be matched with the knowledge base, meaning that correspondences between table rows/columns and entities/schema elements of the knowledge base need to be found. This paper presents the T2D gold standard for measuring and comparing the performance of HTML table to knowledge base matching systems. T2D consists of 8 700 schema-level and 26 100 entity-level correspondences between the WebDataCommons Web Tables Corpus and the DBpedia knowledge base. In contrast related work on HTML table to knowledge base matching, the Web Tables Corpus (147 million tables), the knowledge base, as well as the gold standard are publicly available. The gold standard is used afterward to evaluate the performance of T2K Match, an iterative matching method which combines schema and instance matching. T2K Match is designed for the use case of matching large quantities of mostly small and narrow HTML tables against large cross-domain knowledge bases. The evaluation using the T2D gold standard shows that T2K Match discovers table-to-class correspondences with a precision of 94%, row-to-entity correspondences with a precision of 90%, and column-to-property correspondences with a precision of 77%.
在Web上可以找到数百万个包含结构化数据的HTML表。由于覆盖范围广,这些表对于填补缺失值和扩展跨领域知识库(如DBpedia、YAGO或Google knowledge Graph)可能非常有用。作为能够将表数据用于知识库扩展的先决条件,HTML表需要与知识库匹配,这意味着需要找到表行/列与知识库的实体/模式元素之间的对应关系。本文提出了用于度量和比较HTML表与知识库匹配系统性能的T2D金标准。T2D由webdataccommons Web Tables语料库和DBpedia知识库之间的8700个模式级和26100个实体级通信组成。与HTML表与知识库匹配的相关工作相比,Web Tables语料库(1.47亿个表)、知识库以及黄金标准都是公开可用的。然后使用金标准来评估T2K匹配的性能,T2K匹配是一种结合模式和实例匹配的迭代匹配方法。T2K Match是为针对大型跨领域知识库匹配大量小而窄的HTML表的用例而设计的。使用T2D金标准的评估表明,T2K Match发现表到类对应的精度为94%,行到实体对应的精度为90%,列到属性对应的精度为77%。
{"title":"Matching HTML Tables to DBpedia","authors":"Dominique Ritze, O. Lehmberg, Christian Bizer","doi":"10.1145/2797115.2797118","DOIUrl":"https://doi.org/10.1145/2797115.2797118","url":null,"abstract":"Millions of HTML tables containing structured data can be found on the Web. With their wide coverage, these tables are potentially very useful for filling missing values and extending cross-domain knowledge bases such as DBpedia, YAGO, or the Google Knowledge Graph. As a prerequisite for being able to use table data for knowledge base extension, the HTML tables need to be matched with the knowledge base, meaning that correspondences between table rows/columns and entities/schema elements of the knowledge base need to be found. This paper presents the T2D gold standard for measuring and comparing the performance of HTML table to knowledge base matching systems. T2D consists of 8 700 schema-level and 26 100 entity-level correspondences between the WebDataCommons Web Tables Corpus and the DBpedia knowledge base. In contrast related work on HTML table to knowledge base matching, the Web Tables Corpus (147 million tables), the knowledge base, as well as the gold standard are publicly available. The gold standard is used afterward to evaluate the performance of T2K Match, an iterative matching method which combines schema and instance matching. T2K Match is designed for the use case of matching large quantities of mostly small and narrow HTML tables against large cross-domain knowledge bases. The evaluation using the T2D gold standard shows that T2K Match discovers table-to-class correspondences with a precision of 94%, row-to-entity correspondences with a precision of 90%, and column-to-property correspondences with a precision of 77%.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133393499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Sivaramakrishnan, Madhusudhan Krishnamachari, Vidhya Balasubramanian
Recommender systems have become very prominent over the past decade. Methods such as collaborative filtering and knowledge based recommender systems have been developed extensively for non-customizable products. However, as manufacturers today are moving towards customizable products to satisfy customers, the need of the hour is customizable product recommender systems. Such systems must be able to capture customer preferences and provide recommendations that are both diverse and novel. This paper proposes an approach to building a recommender system that can be adapted to customizable products such as desktop computers and home theater systems. The Customizable Product Recommendation problem is modeled as a special case of the Multiple Choice Knapsack Problem, and an algorithm is proposed to generate desirable product recommendations in real-time. The performance of the proposed system is then evaluated.
{"title":"Recommending Customizable Products: A Multiple Choice Knapsack Solution","authors":"A. Sivaramakrishnan, Madhusudhan Krishnamachari, Vidhya Balasubramanian","doi":"10.1145/2797115.2797116","DOIUrl":"https://doi.org/10.1145/2797115.2797116","url":null,"abstract":"Recommender systems have become very prominent over the past decade. Methods such as collaborative filtering and knowledge based recommender systems have been developed extensively for non-customizable products. However, as manufacturers today are moving towards customizable products to satisfy customers, the need of the hour is customizable product recommender systems. Such systems must be able to capture customer preferences and provide recommendations that are both diverse and novel. This paper proposes an approach to building a recommender system that can be adapted to customizable products such as desktop computers and home theater systems. The Customizable Product Recommendation problem is modeled as a special case of the Multiple Choice Knapsack Problem, and an algorithm is proposed to generate desirable product recommendations in real-time. The performance of the proposed system is then evaluated.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126746771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reasoning with ontologies is one of the core fields of research in Description Logics. A variety of efficient reasoner with highly optimized algorithms have been developed to allow inference tasks on expressive ontology languages such as OWL(DL). However, reasoner reported computing times have exceeded and sometimes fall behind the expected theoretical values. From an empirical perspective, it is not yet well understood, which particular aspects in the ontology are reasoner performance degrading factors. In this paper, we conducted an investigation about state of art works that attempted to portray potential correlation between reasoner empirical behaviour and particular ontological features. These works were analysed and then broken down into categories. Further, we proposed a set of ontology features covering a broad range of structural and syntactic ontology characteristics. We claim that these features are good indicators of the ontology hardness level against reasoning tasks. In order to assess the worthiness of our proposals, we adopted a supervised machine learning approach. Features served as the bases to learn predictive models of reasoners robustness. These models was trained for 6 well known reasoners and using their evaluation results during the ORE'2014 competition. Our prediction models showed a high accuracy level which witness the effectiveness of our set of features.
{"title":"What Makes Ontology Reasoning so Arduous?: Unveiling the key ontological features","authors":"N. Alaya, S. Yahia, M. Lamolle","doi":"10.1145/2797115.2797117","DOIUrl":"https://doi.org/10.1145/2797115.2797117","url":null,"abstract":"Reasoning with ontologies is one of the core fields of research in Description Logics. A variety of efficient reasoner with highly optimized algorithms have been developed to allow inference tasks on expressive ontology languages such as OWL(DL). However, reasoner reported computing times have exceeded and sometimes fall behind the expected theoretical values. From an empirical perspective, it is not yet well understood, which particular aspects in the ontology are reasoner performance degrading factors. In this paper, we conducted an investigation about state of art works that attempted to portray potential correlation between reasoner empirical behaviour and particular ontological features. These works were analysed and then broken down into categories. Further, we proposed a set of ontology features covering a broad range of structural and syntactic ontology characteristics. We claim that these features are good indicators of the ontology hardness level against reasoning tasks. In order to assess the worthiness of our proposals, we adopted a supervised machine learning approach. Features served as the bases to learn predictive models of reasoners robustness. These models was trained for 6 well known reasoners and using their evaluation results during the ORE'2014 competition. Our prediction models showed a high accuracy level which witness the effectiveness of our set of features.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133773695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a user-modeling method for folksonomic data. Since data mining of folksonomic data is difficult due to their complexity, significant amounts of preprocessing are usually required. To catch sketchy characteristics of such complex data, our method employs two steps: (1) using the infinite relational model (IRM) to perform relational clustering of a folksonomic data set, and (2) using tag-weighting to extract the characteristics of each user cluster. As an experimental evaluation, we applied our method to real-world data from one of the most popular social bookmarking services in Japan. Our user-modeling method successfully extracted semantically clustered user models, thus demonstrating that relational data analysis has promise for mining folksonomic data. In addition, we developed the user-model-based filtering algorithm (UMF), which evaluates the user models by their resource recommendations. The F-measure was higher than that of random recommendation, and the running time was much shorter than that of collaborative-filtering-based top-n recommendation.
{"title":"User Modeling in Folksonomies: Relational Clustering and Tag Weighting","authors":"Takuya Kitazawa, M. Sugiyama","doi":"10.1145/2797115.2797129","DOIUrl":"https://doi.org/10.1145/2797115.2797129","url":null,"abstract":"This paper proposes a user-modeling method for folksonomic data. Since data mining of folksonomic data is difficult due to their complexity, significant amounts of preprocessing are usually required. To catch sketchy characteristics of such complex data, our method employs two steps: (1) using the infinite relational model (IRM) to perform relational clustering of a folksonomic data set, and (2) using tag-weighting to extract the characteristics of each user cluster. As an experimental evaluation, we applied our method to real-world data from one of the most popular social bookmarking services in Japan. Our user-modeling method successfully extracted semantically clustered user models, thus demonstrating that relational data analysis has promise for mining folksonomic data. In addition, we developed the user-model-based filtering algorithm (UMF), which evaluates the user models by their resource recommendations. The F-measure was higher than that of random recommendation, and the running time was much shorter than that of collaborative-filtering-based top-n recommendation.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134456684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Discovering and integrating relevant real-live datasets are essential tasks, when it comes to handling Linked Data. Similar to Data Warehousing approaches, Linked Data can be prepared to enable sophisticated data analysis. The developed open source framework bacon enables interactive and crowed-sourced Data Integration on Linked Data (Linked Data Integration), utilizing the RDF Data Cube Vocabulary and the semantic properties of Linked Open Data. Discovering suitable datasets on-the-fly in local or remote repositories sets up the ensuing integration process. Based on well-known Data Warehousing processes, the semantic nature of the data is taken into account to handle and merge RDF Data Cubes. To do so, structure and content of the cubes must be analyzed and processed. A similarity measure has been developed to find similarly structured cubes. The user is offered a graphical interface, where he can search for suitable cubes and modify their structure based on semantic properties. This process is fostered by a set of automated suggestions to support inexperienced users and also domain experts.
{"title":"bacon: Linked Data Integration based on the RDF Data Cube Vocabulary","authors":"Sebastian P. Bayerl, M. Granitzer","doi":"10.1145/2797115.2797126","DOIUrl":"https://doi.org/10.1145/2797115.2797126","url":null,"abstract":"Discovering and integrating relevant real-live datasets are essential tasks, when it comes to handling Linked Data. Similar to Data Warehousing approaches, Linked Data can be prepared to enable sophisticated data analysis. The developed open source framework bacon enables interactive and crowed-sourced Data Integration on Linked Data (Linked Data Integration), utilizing the RDF Data Cube Vocabulary and the semantic properties of Linked Open Data. Discovering suitable datasets on-the-fly in local or remote repositories sets up the ensuing integration process. Based on well-known Data Warehousing processes, the semantic nature of the data is taken into account to handle and merge RDF Data Cubes. To do so, structure and content of the cubes must be analyzed and processed. A similarity measure has been developed to find similarly structured cubes. The user is offered a graphical interface, where he can search for suitable cubes and modify their structure based on semantic properties. This process is fostered by a set of automated suggestions to support inexperienced users and also domain experts.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134609855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saraschandra Karanam, H. Oostendorp, M. Sanchiz, A. Chevalier, Jessie Chin, W. Fu
This paper looks at two limitations of cognitive models of web-navigation: first, they do not account for the entire process of information search and second, they do not account for the differences in search behavior caused by aging. To address these limitations, data from an experiment in which two types of information search tasks (simple and difficult), presented to both young and old participants was used. We found that in general difficult tasks demand significantly more time, significantly more clicks, significantly more reformulations and are answered significantly less accurately than simple tasks. Older persons inspect the search engine result pages significantly longer, produce significantly fewer reformulations with difficult tasks than younger persons, and are significantly more accurate than younger persons with simple tasks. We next used a cognitive model of web-navigation called CoLiDeS to predict which search engine result a user would choose to click. Old participants were found to click more often only on search engine results with high semantic similarity with the query. Search engine results generated by old participants were of higher semantic similarity value (computed w.r.t the query) than those generated by young participants only in the second cycle. Match between model-predicted clicks and actual user clicks was found to be significantly higher for difficult tasks compared to simple tasks. Potential improvements in enhancing the modeling and its applications are discussed.
{"title":"Modeling and predicting information search behavior","authors":"Saraschandra Karanam, H. Oostendorp, M. Sanchiz, A. Chevalier, Jessie Chin, W. Fu","doi":"10.1145/2797115.2797123","DOIUrl":"https://doi.org/10.1145/2797115.2797123","url":null,"abstract":"This paper looks at two limitations of cognitive models of web-navigation: first, they do not account for the entire process of information search and second, they do not account for the differences in search behavior caused by aging. To address these limitations, data from an experiment in which two types of information search tasks (simple and difficult), presented to both young and old participants was used. We found that in general difficult tasks demand significantly more time, significantly more clicks, significantly more reformulations and are answered significantly less accurately than simple tasks. Older persons inspect the search engine result pages significantly longer, produce significantly fewer reformulations with difficult tasks than younger persons, and are significantly more accurate than younger persons with simple tasks. We next used a cognitive model of web-navigation called CoLiDeS to predict which search engine result a user would choose to click. Old participants were found to click more often only on search engine results with high semantic similarity with the query. Search engine results generated by old participants were of higher semantic similarity value (computed w.r.t the query) than those generated by young participants only in the second cycle. Match between model-predicted clicks and actual user clicks was found to be significantly higher for difficult tasks compared to simple tasks. Potential improvements in enhancing the modeling and its applications are discussed.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114173044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}