Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics最新文献

英文中文

Implicit Links based Web Page Representation for Web Page Classification 网页分类中基于隐式链接的网页表示

Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2015-07-13 DOI: 10.1145/2797115.2797125

Abdelbadie Belmouhcine, M. Benkhalifa

With the rapid growth of the web's size, web page classification becomes more prominent. The representation way of a web page and contextual features used for this representation have both an impact on the classification's performance. Thus, finding an adequate representation of web pages is essential for a better web page classification. In this paper, we propose a web page representation based on the structure of the implicit graph built using implicit links extracted from the query-log. In this representation, we represent web pages using their textual contents along with their neighbors as features instead of using features of their neighbors. When two or more web pages in the implicit graph share the same direct neighbors and belong to the same class ci, it is most likely that every other web page, having the same immediate neighbors, will belong to the same class ci. We propose two kinds of web page representations: Boolean Neighbor Vector (BNV) and Weighted Neighbor Vector (WNV). In BNV, we supplement the feature vector, which represents the textual content of a web page, by a Boolean vector. This vector represents the target web page's neighbors and shows whether a web page is a direct neighbor of the target web page or not. In WNV, we supplement the feature vector, which represents the textual content of a web page, by a weighted vector. This latter represents the target web page's neighbors and shows strengths of relations between the target web page and its neighbors. We conduct experiments using four classifiers: SVM (Support Vector Machine), NB (Naive Bayes), RF (Random Forest) and KNN (K-Nearest Neighbors) on two subsets of ODP (Open Directory Project). Results show that: (1) the proposed representation helps obtain better classification results when using SVM, NB, RF and KNN for both Bag of Words (BW) and 5-gram representations. (2) The performances based on BNV are better than those based on WNV.

随着网络规模的快速增长，网页分类变得更加突出。网页的表示方式和用于这种表示的上下文特征都对分类的性能有影响。因此，找到一个适当的网页表示对于更好的网页分类是必不可少的。在本文中，我们提出了一种基于从查询日志中提取的隐式链接构建的隐式图结构的网页表示。在这种表示中，我们使用网页的文本内容及其邻居作为特征来表示网页，而不是使用其邻居的特征。当隐式图中的两个或多个网页共享相同的直接邻居并属于同一类ci时，最有可能的是，具有相同的直接邻居的所有其他网页都属于同一类ci。我们提出了两种网页表示:布尔邻居向量(BNV)和加权邻居向量(WNV)。在BNV中，我们用布尔向量来补充表示网页文本内容的特征向量。此向量表示目标网页的邻居，并显示网页是否为目标网页的直接邻居。在WNV中，我们用加权向量来补充表示网页文本内容的特征向量。后者表示目标网页的邻居，并显示目标网页与其邻居之间的关系强度。我们在ODP (Open Directory Project)的两个子集上使用支持向量机(SVM)、朴素贝叶斯(NB)、随机森林(RF)和k近邻(KNN)四种分类器进行了实验。结果表明:(1)所提出的表示方法在使用SVM、NB、RF和KNN对Bag of Words (BW)和5-gram表示时都能获得更好的分类效果。(2)基于BNV的性能优于基于WNV的性能。

{"title":"Implicit Links based Web Page Representation for Web Page Classification","authors":"Abdelbadie Belmouhcine, M. Benkhalifa","doi":"10.1145/2797115.2797125","DOIUrl":"https://doi.org/10.1145/2797115.2797125","url":null,"abstract":"With the rapid growth of the web's size, web page classification becomes more prominent. The representation way of a web page and contextual features used for this representation have both an impact on the classification's performance. Thus, finding an adequate representation of web pages is essential for a better web page classification. In this paper, we propose a web page representation based on the structure of the implicit graph built using implicit links extracted from the query-log. In this representation, we represent web pages using their textual contents along with their neighbors as features instead of using features of their neighbors. When two or more web pages in the implicit graph share the same direct neighbors and belong to the same class ci, it is most likely that every other web page, having the same immediate neighbors, will belong to the same class ci. We propose two kinds of web page representations: Boolean Neighbor Vector (BNV) and Weighted Neighbor Vector (WNV). In BNV, we supplement the feature vector, which represents the textual content of a web page, by a Boolean vector. This vector represents the target web page's neighbors and shows whether a web page is a direct neighbor of the target web page or not. In WNV, we supplement the feature vector, which represents the textual content of a web page, by a weighted vector. This latter represents the target web page's neighbors and shows strengths of relations between the target web page and its neighbors. We conduct experiments using four classifiers: SVM (Support Vector Machine), NB (Naive Bayes), RF (Random Forest) and KNN (K-Nearest Neighbors) on two subsets of ODP (Open Directory Project). Results show that: (1) the proposed representation helps obtain better classification results when using SVM, NB, RF and KNN for both Bag of Words (BW) and 5-gram representations. (2) The performances based on BNV are better than those based on WNV.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126922826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A LOD-based, query construction and refinement service for web search engines 一个基于lod的web搜索引擎查询构建和细化服务

Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2015-07-13 DOI: 10.1145/2797115.2797122

I. Papadakis, Ioannis Apostolatos, Dimitris Apostolou

Nowadays, search engines are the obvious way of finding information on the web. However, there are times when users are forced to engage themselves in long and tedious search sessions during which they have to process their initial query a number of times until they come up with results that satisfy their information needs. This paper proposes a query construction and refinement service that aids users during their engagement with a large scale web search engine. As a proof of concept, GContext is presented and accordingly evaluated as an implementation of the proposed service. GContext integrates various sources of the lod-cloud within the environment of a large scale web search engine.

如今，搜索引擎是在网络上查找信息的明显方式。然而，有时用户被迫进行冗长而乏味的搜索会话，在此期间，他们必须多次处理初始查询，直到得到满足其信息需求的结果。本文提出了一种查询构建和优化服务，以帮助用户在使用大型网络搜索引擎的过程中进行查询。作为概念的证明，GContext作为提议的服务的实现被提出并相应地进行了评估。GContext将负载云的各种来源集成在一个大型的web搜索引擎环境中。

引用次数: 1

Creating Semantic Fingerprints for Web Documents 为Web文档创建语义指纹

Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2015-07-13 DOI: 10.1145/2797115.2797132

K. Krieger, J. Schneider, Christian Nywelt, D. Rösner

With Semantic Web technologies and Linked Data datasets we are able to not only retrieve the textual content of a document but also to automatically create formal semantic descriptions of its content. In this paper we present a Linked Data-based approach to automatically generate semantic fingerprints for Web documents. Our approach exploits the structured information in Linked Data datasets to derive an explicit semantic description of a Web resource. A two-stage evaluation of the implementation of the presented approach shows its feasibility and robustness.

使用语义Web技术和关联数据数据集，我们不仅能够检索文档的文本内容，还能够自动创建其内容的正式语义描述。在本文中，我们提出了一种基于关联数据的方法来自动生成Web文档的语义指纹。我们的方法利用关联数据数据集中的结构化信息来派生Web资源的显式语义描述。对该方法的实施进行了两阶段的评估，证明了该方法的可行性和鲁棒性。

引用次数: 5

A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time schema.org词汇的采用和演变的网络尺度研究

Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2015-07-13 DOI: 10.1145/2797115.2797124

R. Meusel, Christian Bizer, Heiko Paulheim

Promoted by major search engines, schema.org has become a widely adopted standard for marking up structured data in HTML web pages. In this paper, we use a series of large-scale Web crawls to analyze the evolution and adoption of schema.org over time. The availability of data from different points in time for both the schema and the websites deploying data allows for a new kind of empirical analysis of standards adoption, which has not been possible before. To conduct our analysis, we compare different versions of the schema.org vocabulary to the data that was deployed on hundreds of thousands of Web pages at different points in time. We measure both top-down adoption (i.e., the extent to which changes in the schema are adopted by data providers) as well as bottom-up evolution (i.e., the extent to which the actually deployed data drives changes in the schema). Our empirical analysis shows that both processes can be observed.

在主要搜索引擎的推动下，schema.org已经成为在HTML网页中标记结构化数据的一种广泛采用的标准。在本文中，我们使用一系列大规模的Web爬虫来分析schema.org随着时间的推移的演变和采用。模式和部署数据的网站在不同时间点上的数据的可用性允许对标准采用进行一种新的经验分析，这在以前是不可能的。为了进行分析，我们将schema.org词汇表的不同版本与不同时间点部署在数十万个Web页面上的数据进行比较。我们既测量自顶向下的采用(即，数据提供者采用模式中的更改的程度)，也测量自底向上的演进(即，实际部署的数据驱动模式中更改的程度)。我们的实证分析表明，这两个过程都可以观察到。

引用次数: 40

Matching HTML Tables to DBpedia 匹配HTML表到DBpedia

Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2015-07-13 DOI: 10.1145/2797115.2797118

Dominique Ritze, O. Lehmberg, Christian Bizer

Millions of HTML tables containing structured data can be found on the Web. With their wide coverage, these tables are potentially very useful for filling missing values and extending cross-domain knowledge bases such as DBpedia, YAGO, or the Google Knowledge Graph. As a prerequisite for being able to use table data for knowledge base extension, the HTML tables need to be matched with the knowledge base, meaning that correspondences between table rows/columns and entities/schema elements of the knowledge base need to be found. This paper presents the T2D gold standard for measuring and comparing the performance of HTML table to knowledge base matching systems. T2D consists of 8 700 schema-level and 26 100 entity-level correspondences between the WebDataCommons Web Tables Corpus and the DBpedia knowledge base. In contrast related work on HTML table to knowledge base matching, the Web Tables Corpus (147 million tables), the knowledge base, as well as the gold standard are publicly available. The gold standard is used afterward to evaluate the performance of T2K Match, an iterative matching method which combines schema and instance matching. T2K Match is designed for the use case of matching large quantities of mostly small and narrow HTML tables against large cross-domain knowledge bases. The evaluation using the T2D gold standard shows that T2K Match discovers table-to-class correspondences with a precision of 94%, row-to-entity correspondences with a precision of 90%, and column-to-property correspondences with a precision of 77%.

在Web上可以找到数百万个包含结构化数据的HTML表。由于覆盖范围广，这些表对于填补缺失值和扩展跨领域知识库(如DBpedia、YAGO或Google knowledge Graph)可能非常有用。作为能够将表数据用于知识库扩展的先决条件，HTML表需要与知识库匹配，这意味着需要找到表行/列与知识库的实体/模式元素之间的对应关系。本文提出了用于度量和比较HTML表与知识库匹配系统性能的T2D金标准。T2D由webdataccommons Web Tables语料库和DBpedia知识库之间的8700个模式级和26100个实体级通信组成。与HTML表与知识库匹配的相关工作相比，Web Tables语料库(1.47亿个表)、知识库以及黄金标准都是公开可用的。然后使用金标准来评估T2K匹配的性能，T2K匹配是一种结合模式和实例匹配的迭代匹配方法。T2K Match是为针对大型跨领域知识库匹配大量小而窄的HTML表的用例而设计的。使用T2D金标准的评估表明，T2K Match发现表到类对应的精度为94%，行到实体对应的精度为90%，列到属性对应的精度为77%。

{"title":"Matching HTML Tables to DBpedia","authors":"Dominique Ritze, O. Lehmberg, Christian Bizer","doi":"10.1145/2797115.2797118","DOIUrl":"https://doi.org/10.1145/2797115.2797118","url":null,"abstract":"Millions of HTML tables containing structured data can be found on the Web. With their wide coverage, these tables are potentially very useful for filling missing values and extending cross-domain knowledge bases such as DBpedia, YAGO, or the Google Knowledge Graph. As a prerequisite for being able to use table data for knowledge base extension, the HTML tables need to be matched with the knowledge base, meaning that correspondences between table rows/columns and entities/schema elements of the knowledge base need to be found. This paper presents the T2D gold standard for measuring and comparing the performance of HTML table to knowledge base matching systems. T2D consists of 8 700 schema-level and 26 100 entity-level correspondences between the WebDataCommons Web Tables Corpus and the DBpedia knowledge base. In contrast related work on HTML table to knowledge base matching, the Web Tables Corpus (147 million tables), the knowledge base, as well as the gold standard are publicly available. The gold standard is used afterward to evaluate the performance of T2K Match, an iterative matching method which combines schema and instance matching. T2K Match is designed for the use case of matching large quantities of mostly small and narrow HTML tables against large cross-domain knowledge bases. The evaluation using the T2D gold standard shows that T2K Match discovers table-to-class correspondences with a precision of 94%, row-to-entity correspondences with a precision of 90%, and column-to-property correspondences with a precision of 77%.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133393499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 160

Recommending Customizable Products: A Multiple Choice Knapsack Solution 推荐可定制的产品:一个多重选择的背包解决方案

Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2015-07-13 DOI: 10.1145/2797115.2797116

A. Sivaramakrishnan, Madhusudhan Krishnamachari, Vidhya Balasubramanian

Recommender systems have become very prominent over the past decade. Methods such as collaborative filtering and knowledge based recommender systems have been developed extensively for non-customizable products. However, as manufacturers today are moving towards customizable products to satisfy customers, the need of the hour is customizable product recommender systems. Such systems must be able to capture customer preferences and provide recommendations that are both diverse and novel. This paper proposes an approach to building a recommender system that can be adapted to customizable products such as desktop computers and home theater systems. The Customizable Product Recommendation problem is modeled as a special case of the Multiple Choice Knapsack Problem, and an algorithm is proposed to generate desirable product recommendations in real-time. The performance of the proposed system is then evaluated.

在过去的十年里，推荐系统变得非常突出。针对非定制产品，协作过滤和基于知识的推荐系统等方法得到了广泛的发展。然而，由于今天的制造商正朝着可定制产品的方向发展，以满足客户的需求，当前的需求是可定制的产品推荐系统。这样的系统必须能够捕捉客户的偏好，并提供多样化和新颖的建议。本文提出了一种构建推荐系统的方法，该系统可以适应可定制的产品，如台式电脑和家庭影院系统。将可定制产品推荐问题建模为多选题背包问题的一个特例，提出了一种实时生成理想产品推荐的算法。然后评估所建议系统的性能。

引用次数: 2

What Makes Ontology Reasoning so Arduous?: Unveiling the key ontological features 是什么让本体论推理如此艰难?:揭示关键的本体特征

Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2015-07-13 DOI: 10.1145/2797115.2797117

N. Alaya, S. Yahia, M. Lamolle

Reasoning with ontologies is one of the core fields of research in Description Logics. A variety of efficient reasoner with highly optimized algorithms have been developed to allow inference tasks on expressive ontology languages such as OWL(DL). However, reasoner reported computing times have exceeded and sometimes fall behind the expected theoretical values. From an empirical perspective, it is not yet well understood, which particular aspects in the ontology are reasoner performance degrading factors. In this paper, we conducted an investigation about state of art works that attempted to portray potential correlation between reasoner empirical behaviour and particular ontological features. These works were analysed and then broken down into categories. Further, we proposed a set of ontology features covering a broad range of structural and syntactic ontology characteristics. We claim that these features are good indicators of the ontology hardness level against reasoning tasks. In order to assess the worthiness of our proposals, we adopted a supervised machine learning approach. Features served as the bases to learn predictive models of reasoners robustness. These models was trained for 6 well known reasoners and using their evaluation results during the ORE'2014 competition. Our prediction models showed a high accuracy level which witness the effectiveness of our set of features.

本体推理是描述逻辑研究的核心领域之一。各种具有高度优化算法的高效推理器已经被开发出来，以允许在表达本体语言(如OWL(DL))上进行推理任务。然而，推理器报告的计算时间已经超过甚至有时落后于预期的理论值。从经验的角度来看，本体中哪些特定的方面是导致推理器性能下降的因素，目前还没有得到很好的理解。在本文中，我们对试图描绘推理者经验行为与特定本体论特征之间潜在相关性的艺术作品进行了调查。对这些作品进行了分析，然后分类。此外，我们提出了一套本体特征，涵盖了广泛的结构和句法本体特征。我们声称，这些特征是对推理任务的本体硬度的良好指标。为了评估我们提案的价值，我们采用了监督机器学习方法。特征作为学习推理器鲁棒性预测模型的基础。这些模型是为6个知名推理者训练的，并在2014年的ORE比赛中使用了他们的评估结果。我们的预测模型显示出较高的精度水平，这证明了我们的特征集的有效性。

{"title":"What Makes Ontology Reasoning so Arduous?: Unveiling the key ontological features","authors":"N. Alaya, S. Yahia, M. Lamolle","doi":"10.1145/2797115.2797117","DOIUrl":"https://doi.org/10.1145/2797115.2797117","url":null,"abstract":"Reasoning with ontologies is one of the core fields of research in Description Logics. A variety of efficient reasoner with highly optimized algorithms have been developed to allow inference tasks on expressive ontology languages such as OWL(DL). However, reasoner reported computing times have exceeded and sometimes fall behind the expected theoretical values. From an empirical perspective, it is not yet well understood, which particular aspects in the ontology are reasoner performance degrading factors. In this paper, we conducted an investigation about state of art works that attempted to portray potential correlation between reasoner empirical behaviour and particular ontological features. These works were analysed and then broken down into categories. Further, we proposed a set of ontology features covering a broad range of structural and syntactic ontology characteristics. We claim that these features are good indicators of the ontology hardness level against reasoning tasks. In order to assess the worthiness of our proposals, we adopted a supervised machine learning approach. Features served as the bases to learn predictive models of reasoners robustness. These models was trained for 6 well known reasoners and using their evaluation results during the ORE'2014 competition. Our prediction models showed a high accuracy level which witness the effectiveness of our set of features.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133773695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

User Modeling in Folksonomies: Relational Clustering and Tag Weighting 大众分类法中的用户建模:关系聚类和标签加权

Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2015-07-13 DOI: 10.1145/2797115.2797129

Takuya Kitazawa, M. Sugiyama

This paper proposes a user-modeling method for folksonomic data. Since data mining of folksonomic data is difficult due to their complexity, significant amounts of preprocessing are usually required. To catch sketchy characteristics of such complex data, our method employs two steps: (1) using the infinite relational model (IRM) to perform relational clustering of a folksonomic data set, and (2) using tag-weighting to extract the characteristics of each user cluster. As an experimental evaluation, we applied our method to real-world data from one of the most popular social bookmarking services in Japan. Our user-modeling method successfully extracted semantically clustered user models, thus demonstrating that relational data analysis has promise for mining folksonomic data. In addition, we developed the user-model-based filtering algorithm (UMF), which evaluates the user models by their resource recommendations. The F-measure was higher than that of random recommendation, and the running time was much shorter than that of collaborative-filtering-based top-n recommendation.

本文提出了一种民俗学数据的用户建模方法。由于民俗学数据的复杂性，其数据挖掘是困难的，通常需要大量的预处理。为了捕捉这些复杂数据的大致特征，我们的方法采用了两个步骤:(1)使用无限关系模型(IRM)对民俗数据集进行关系聚类，(2)使用标签加权提取每个用户聚类的特征。作为一项实验性评估，我们将我们的方法应用于来自日本最受欢迎的社交书签服务之一的真实数据。我们的用户建模方法成功地提取了语义聚类的用户模型，从而证明了关系数据分析在挖掘民俗数据方面的前景。此外，我们开发了基于用户模型的过滤算法(UMF)，该算法通过用户模型的资源推荐来评估用户模型。f值高于随机推荐，运行时间远短于基于协同过滤的top-n推荐。

引用次数: 0

bacon: Linked Data Integration based on the RDF Data Cube Vocabulary bacon:基于RDF数据立方体词汇表的关联数据集成

Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2015-07-13 DOI: 10.1145/2797115.2797126

Sebastian P. Bayerl, M. Granitzer

Discovering and integrating relevant real-live datasets are essential tasks, when it comes to handling Linked Data. Similar to Data Warehousing approaches, Linked Data can be prepared to enable sophisticated data analysis. The developed open source framework bacon enables interactive and crowed-sourced Data Integration on Linked Data (Linked Data Integration), utilizing the RDF Data Cube Vocabulary and the semantic properties of Linked Open Data. Discovering suitable datasets on-the-fly in local or remote repositories sets up the ensuing integration process. Based on well-known Data Warehousing processes, the semantic nature of the data is taken into account to handle and merge RDF Data Cubes. To do so, structure and content of the cubes must be analyzed and processed. A similarity measure has been developed to find similarly structured cubes. The user is offered a graphical interface, where he can search for suitable cubes and modify their structure based on semantic properties. This process is fostered by a set of automated suggestions to support inexperienced users and also domain experts.

在处理关联数据时，发现和集成相关的实时数据集是必不可少的任务。与数据仓库方法类似，可以准备关联数据以启用复杂的数据分析。开发的开源框架培根利用RDF数据立方体词汇表和链接开放数据的语义属性，支持在关联数据上进行交互式和众源数据集成(关联数据集成)。在本地或远程存储库中动态发现合适的数据集可以建立随后的集成过程。基于众所周知的数据仓库流程，考虑了数据的语义性质来处理和合并RDF数据立方体。为此，必须分析和处理多维数据集的结构和内容。人们开发了一种相似性度量来寻找结构相似的立方体。为用户提供了一个图形界面，用户可以在其中搜索合适的多维数据集并根据语义属性修改它们的结构。这个过程是由一组自动建议来促进的，以支持没有经验的用户和领域专家。

引用次数: 4

Modeling and predicting information search behavior 建模和预测信息搜索行为

Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2015-07-13 DOI: 10.1145/2797115.2797123

Saraschandra Karanam, H. Oostendorp, M. Sanchiz, A. Chevalier, Jessie Chin, W. Fu

This paper looks at two limitations of cognitive models of web-navigation: first, they do not account for the entire process of information search and second, they do not account for the differences in search behavior caused by aging. To address these limitations, data from an experiment in which two types of information search tasks (simple and difficult), presented to both young and old participants was used. We found that in general difficult tasks demand significantly more time, significantly more clicks, significantly more reformulations and are answered significantly less accurately than simple tasks. Older persons inspect the search engine result pages significantly longer, produce significantly fewer reformulations with difficult tasks than younger persons, and are significantly more accurate than younger persons with simple tasks. We next used a cognitive model of web-navigation called CoLiDeS to predict which search engine result a user would choose to click. Old participants were found to click more often only on search engine results with high semantic similarity with the query. Search engine results generated by old participants were of higher semantic similarity value (computed w.r.t the query) than those generated by young participants only in the second cycle. Match between model-predicted clicks and actual user clicks was found to be significantly higher for difficult tasks compared to simple tasks. Potential improvements in enhancing the modeling and its applications are discussed.

本文着眼于网络导航认知模型的两个局限性:第一，它们没有考虑到信息搜索的整个过程;第二，它们没有考虑到年龄导致的搜索行为差异。为了解决这些限制，我们使用了一个实验的数据，在这个实验中，两种类型的信息搜索任务(简单和困难)分别呈现给年轻和年老的参与者。我们发现，一般来说，与简单任务相比，困难任务需要更多的时间，更多的点击，更多的重新表述，而且回答的准确性要低得多。年龄较大的人检查搜索引擎结果页面的时间明显更长，对困难任务的重新表述明显少于年轻人，对简单任务的重新表述明显比年轻人更准确。接下来，我们使用了一个名为CoLiDeS的网络导航认知模型来预测用户会选择点击哪个搜索引擎结果。研究发现，年长的参与者只会在搜索引擎中与查询内容语义相似度高的搜索结果上点击更多。老年参与者生成的搜索引擎结果的语义相似度值(与查询一起计算)仅在第二个周期中高于年轻参与者生成的搜索引擎结果。模型预测的点击次数与实际用户点击次数之间的匹配度在困难任务中明显高于简单任务。讨论了增强建模及其应用的潜在改进。

{"title":"Modeling and predicting information search behavior","authors":"Saraschandra Karanam, H. Oostendorp, M. Sanchiz, A. Chevalier, Jessie Chin, W. Fu","doi":"10.1145/2797115.2797123","DOIUrl":"https://doi.org/10.1145/2797115.2797123","url":null,"abstract":"This paper looks at two limitations of cognitive models of web-navigation: first, they do not account for the entire process of information search and second, they do not account for the differences in search behavior caused by aging. To address these limitations, data from an experiment in which two types of information search tasks (simple and difficult), presented to both young and old participants was used. We found that in general difficult tasks demand significantly more time, significantly more clicks, significantly more reformulations and are answered significantly less accurately than simple tasks. Older persons inspect the search engine result pages significantly longer, produce significantly fewer reformulations with difficult tasks than younger persons, and are significantly more accurate than younger persons with simple tasks. We next used a cognitive model of web-navigation called CoLiDeS to predict which search engine result a user would choose to click. Old participants were found to click more often only on search engine results with high semantic similarity with the query. Search engine results generated by old participants were of higher semantic similarity value (computed w.r.t the query) than those generated by young participants only in the second cycle. Match between model-predicted clicks and actual user clicks was found to be significantly higher for difficult tasks compared to simple tasks. Potential improvements in enhancing the modeling and its applications are discussed.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114173044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀