Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics最新文献

英文中文

Assessing the suitability of network community detection to available meta-data using rank stability 使用等级稳定性评估网络社区检测对可用元数据的适用性

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2017-08-23 DOI: 10.1145/3106426.3106493

Ryan Hartman, Josemar Faustino, Diego Pinheiro, R. Menezes

In the last two decades, we have witnessed the widespread use of structural analysis of data. The area, generally called Network Science, concentrates on understanding complex phenomena by looking for properties that emerge from the relationships between the pieces of data instead of the traditional mining of the data itself. A commonly used structural analysis in networks consists of finding subgraphs whose density of connections within the subgraph surpasses that of outside connections; called Community Detection. Many techniques have been proposed to find communities as well as benchmarks to evaluate the algorithms ability to find these substructures. Until recently, the literature has mostly neglected the fact that these communities often represent common characteristic of the elements in the community. For instance, in a social network, communities could represent: people who follow the same particular sport, people from the same classroom, authors working in the same field of study, to name a few. The problem here is one of community detection selection as a function of the ground truth provided by available meta-data. In this work, we propose the use of rank stability (entropy of ranks) to assess communities identified using different techniques from the perspective of meta-data. We validate our approach using a large-scale data set of on-line social interactions across multiple community detection techniques.

在过去的二十年中，我们见证了数据结构分析的广泛应用。该领域通常被称为网络科学，专注于通过寻找从数据块之间的关系中出现的属性来理解复杂现象，而不是传统的对数据本身的挖掘。网络中常用的结构分析包括寻找子图内连接密度超过外部连接密度的子图;叫做社区检测。已经提出了许多技术来寻找社区，以及评估算法找到这些子结构的能力的基准。直到最近，文献大多忽略了这样一个事实，即这些社区往往代表了社区中元素的共同特征。例如，在社交网络中，社区可以代表:关注同一特定运动的人，来自同一教室的人，在同一研究领域工作的作者，等等。这里的问题是社区检测选择作为可用元数据提供的基础真相的函数之一。在这项工作中，我们建议从元数据的角度使用等级稳定性(等级熵)来评估使用不同技术识别的社区。我们使用跨多个社区检测技术的在线社会互动的大规模数据集来验证我们的方法。

{"title":"Assessing the suitability of network community detection to available meta-data using rank stability","authors":"Ryan Hartman, Josemar Faustino, Diego Pinheiro, R. Menezes","doi":"10.1145/3106426.3106493","DOIUrl":"https://doi.org/10.1145/3106426.3106493","url":null,"abstract":"In the last two decades, we have witnessed the widespread use of structural analysis of data. The area, generally called Network Science, concentrates on understanding complex phenomena by looking for properties that emerge from the relationships between the pieces of data instead of the traditional mining of the data itself. A commonly used structural analysis in networks consists of finding subgraphs whose density of connections within the subgraph surpasses that of outside connections; called Community Detection. Many techniques have been proposed to find communities as well as benchmarks to evaluate the algorithms ability to find these substructures. Until recently, the literature has mostly neglected the fact that these communities often represent common characteristic of the elements in the community. For instance, in a social network, communities could represent: people who follow the same particular sport, people from the same classroom, authors working in the same field of study, to name a few. The problem here is one of community detection selection as a function of the ground truth provided by available meta-data. In this work, we propose the use of rank stability (entropy of ranks) to assess communities identified using different techniques from the perspective of meta-data. We validate our approach using a large-scale data set of on-line social interactions across multiple community detection techniques.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83209923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A collaborative approach to web information foraging based on multi-agent systems 基于多智能体系统的网络信息采集协同方法

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2017-08-23 DOI: 10.1145/3106426.3106533

Y. Drias, G. Pasi

In this paper the task of Information Foraging (IF) is considered as a useful paradigm to address Exploratory Search. In the context of IF, a Web navigation strategy is introduced and formalized, and a multi-agent based model is proposed to exploit a collaborative approach to Information Foraging. A system based on this model has been developed, and its evaluations on the ACM and DBLP repositories are reported. Two datasets with different sizes were considered to show the effectiveness and the efficiency of the developed system. Furthermore, comparative evaluations were conducted in order to compare our approach with classical information access approaches. The results are promising and show the ability of the proposed Web Information Foraging system to find relevant Web pages in a very short time.

本文认为信息觅食任务(Information Foraging, IF)是解决探索性搜索问题的一个有用范例。在信息搜索的背景下，引入并形式化了网络导航策略，提出了基于多智能体的信息搜索模型。在此基础上开发了一个系统，并对ACM和DBLP存储库进行了评估。考虑了两个不同大小的数据集，以显示所开发系统的有效性和效率。此外，为了将我们的方法与经典的信息获取方法进行比较评估。结果表明，所提出的Web信息觅食系统能够在很短的时间内找到相关的网页。

引用次数: 1

Sweet-spotting security and usability for intelligent graphical authentication mechanisms 智能图形身份验证机制的甜蜜定位安全性和可用性

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2017-08-23 DOI: 10.1145/3106426.3106488

Marios Belk, Andreas Pamboris, C. Fidas, C. Katsini, N. Avouris, G. Samaras

This paper investigates the trade-off between security and usability in recognition-based graphical authentication mechanisms. Through a user study (N=103) based on a real usage scenario, it draws insights about the security strength and memorability of a chosen password with respect to the amount of images presented to users during sign-up. In particular, it reveals the users' predisposition in following predictable patterns when selecting graphical passwords, and its effect on practical security strength. It also demonstrates that a "sweet-spot" exists between security and usability in graphical authentication approaches on the basis of adjusting accordingly the image grid size presented to users when creating passwords. The results of the study can be leveraged by researchers and practitioners engaged in designing intelligent graphical authentication user interfaces for striking an appropriate balance between security and usability.

本文研究了基于识别的图形认证机制中安全性和可用性之间的权衡。通过基于真实使用场景的用户研究(N=103)，它得出了关于所选密码的安全强度和可记忆性的见解，以及在注册期间向用户提供的图像数量。特别是，它揭示了用户在选择图形密码时遵循可预测模式的倾向，以及它对实际安全强度的影响。它还表明，在创建密码时相应地调整呈现给用户的图像网格大小的基础上，图形身份验证方法的安全性和可用性之间存在一个“最佳点”。研究人员和从事智能图形认证用户界面设计的从业人员可以利用这项研究的结果，在安全性和可用性之间取得适当的平衡。

引用次数: 21

Improving content based recommender systems using linked data cloud and FOAF vocabulary 使用关联数据云和FOAF词汇表改进基于内容的推荐系统

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2017-08-23 DOI: 10.1145/3106426.3120963

Hanane Zitouni, S. Meshoul, Kamel Taouche

With the deluge of data published on the web, it becomes even more difficult for a user to get access to the relevant information based on his preferences. In order to accurately predict the preference a user would give to an item, recommender systems should use an effective information filtering engine. This task can be achieved using content based filtering (CBF) or collaborative filtering or a hybrid approach. This work describes an approach to CBF that aims to deal with the issues of unstructured data and new user on which existing approaches perform poorly. The basic feature of the proposed approach is to incorporate linked data cloud into the information filtering process using a semantic space vector model. FOAF vocabulary is used to define a new distance measure between users based on their FOAF profiles. Unstructured items representations are enhanced by additional attributes extracted from Linked data cloud which alleviates the burden to analyze the content of these items and therefore reduces the computational cost. We report on some promising experiments of the proposed approach performed on MovieLens data sets.

随着网络上发布的大量数据，用户根据自己的喜好获取相关信息变得更加困难。为了准确预测用户对商品的偏好，推荐系统应该使用有效的信息过滤引擎。此任务可以使用基于内容的过滤(CBF)或协作过滤或混合方法来实现。这项工作描述了一种CBF方法，旨在处理非结构化数据和新用户的问题，现有方法在这些问题上表现不佳。该方法的基本特征是使用语义空间向量模型将关联数据云纳入信息过滤过程。FOAF词汇表用于根据用户的FOAF配置文件定义用户之间的新距离度量。通过从关联数据云中提取额外的属性来增强非结构化项目的表示，从而减轻了分析这些项目内容的负担，从而降低了计算成本。我们报告了在MovieLens数据集上对所提出的方法进行的一些有希望的实验。

{"title":"Improving content based recommender systems using linked data cloud and FOAF vocabulary","authors":"Hanane Zitouni, S. Meshoul, Kamel Taouche","doi":"10.1145/3106426.3120963","DOIUrl":"https://doi.org/10.1145/3106426.3120963","url":null,"abstract":"With the deluge of data published on the web, it becomes even more difficult for a user to get access to the relevant information based on his preferences. In order to accurately predict the preference a user would give to an item, recommender systems should use an effective information filtering engine. This task can be achieved using content based filtering (CBF) or collaborative filtering or a hybrid approach. This work describes an approach to CBF that aims to deal with the issues of unstructured data and new user on which existing approaches perform poorly. The basic feature of the proposed approach is to incorporate linked data cloud into the information filtering process using a semantic space vector model. FOAF vocabulary is used to define a new distance measure between users based on their FOAF profiles. Unstructured items representations are enhanced by additional attributes extracted from Linked data cloud which alleviates the burden to analyze the content of these items and therefore reduces the computational cost. We report on some promising experiments of the proposed approach performed on MovieLens data sets.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80267645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Deep deformable Q-Network: an extension of deep Q-Network 深度可变形Q-Network:深度Q-Network的扩展

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2017-08-23 DOI: 10.1145/3106426.3109426

Beibei Jin, Jianing Yang, Xiangsheng Huang, D. Khan

The performance of Deep Reinforcement Learning (DRL) algorithms is usually constrained by instability and variability. In this work, we present an extension of Deep Q-Network (DQN) called Deep Deformable Q-Network which is based on deformable convolution mechanisms. The new algorithm can readily be built on existing models and can be easily trained end-to-end by standard back-propagation. Extensive experiments on the Atari games validate the feasibility and effectiveness of the proposed Deep Deformable Q-Network.

深度强化学习(DRL)算法的性能通常受到不稳定性和可变性的限制。在这项工作中，我们提出了深度q网络(DQN)的扩展，称为深度可变形q网络，它基于可变形卷积机制。新算法可以很容易地建立在现有模型上，并且可以很容易地通过标准反向传播进行端到端训练。在Atari游戏上的大量实验验证了所提出的深度可变形q网络的可行性和有效性。

引用次数: 5

Investigation on dynamics of group decision making with collaborative web search 基于协同网络搜索的群体决策动力学研究

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2017-08-23 DOI: 10.1145/3106426.3106505

Tatsuya Nakamura, T. Tominaga, Miki Watanabe, Nattapong Thammasan, K. Urai, Yutaka Nakamura, Kazufumi Hosoda, T. Hara, Y. Hijikata

In this paper, we present results of investigation on the dynamics of group decision making - how people discuss and make a decision-with collaborative web search. Prior works proposed systems that support group decision making with web search but have not examined the influence of discussion behaviors especially on the satisfaction levels with the final conclusion. In this study, we conducted a set of experiments to observe discussion behaviors and the consequent satisfaction with the conclusion using our experimental system and a set of questionnaires. The task for each participant was to make a decision on a restaurant. Our primary results revealed (1) the similar activities across all groups at the beginning and the end of the group discussion, (2) a lack of correspondence between the satisfaction with the conclusion and the time spent to reach the conclusion, and (3) the presumption that a member who actively engaged in the activities that were visible for the other members was likely to be voted as a leader in the group discussion beyond the discussion. Finally, we discussed how to implement intelligent systems that aid group decision making.

在这篇论文中，我们展示了在协同网络搜索下群体决策动力学的研究结果——人们如何讨论并做出决策。先前的工作提出了支持网络搜索群体决策的系统，但没有检查讨论行为的影响，特别是对最终结论的满意度水平。在本研究中，我们使用我们的实验系统和一套问卷进行了一组实验，观察讨论行为和对结论的满意度。每个参与者的任务是选择一家餐馆。我们的初步结果显示:(1)在小组讨论的开始和结束时，所有小组的活动都是相似的，(2)对结论的满意度和得出结论所花费的时间之间缺乏对应关系，(3)假设积极参与其他成员可见的活动的成员很可能在讨论之外的小组讨论中被选为领导者。最后，我们讨论了如何实现帮助群体决策的智能系统。

{"title":"Investigation on dynamics of group decision making with collaborative web search","authors":"Tatsuya Nakamura, T. Tominaga, Miki Watanabe, Nattapong Thammasan, K. Urai, Yutaka Nakamura, Kazufumi Hosoda, T. Hara, Y. Hijikata","doi":"10.1145/3106426.3106505","DOIUrl":"https://doi.org/10.1145/3106426.3106505","url":null,"abstract":"In this paper, we present results of investigation on the dynamics of group decision making - how people discuss and make a decision-with collaborative web search. Prior works proposed systems that support group decision making with web search but have not examined the influence of discussion behaviors especially on the satisfaction levels with the final conclusion. In this study, we conducted a set of experiments to observe discussion behaviors and the consequent satisfaction with the conclusion using our experimental system and a set of questionnaires. The task for each participant was to make a decision on a restaurant. Our primary results revealed (1) the similar activities across all groups at the beginning and the end of the group discussion, (2) a lack of correspondence between the satisfaction with the conclusion and the time spent to reach the conclusion, and (3) the presumption that a member who actively engaged in the activities that were visible for the other members was likely to be voted as a leader in the group discussion beyond the discussion. Finally, we discussed how to implement intelligent systems that aid group decision making.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87941989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Research on design and implementation of data exchange system 数据交换系统的设计与实现研究

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2017-08-23 DOI: 10.1145/3106426.3109048

Wenwen Zhu, Zhijun Guo, Yonghong Cheng

With the rapid development of computer technology, the barriers of communication among different systems caused by system heterogeneity or data structure have been broken down. However, the demands for personalized content for accuracy in resource exchange and delivery are becoming increasingly high. The structures of existing literature resources like papers, patents and books, with different formats and structures, leading to lots of problems in content delivery and inheritance. Thus, based on the XML technology, we design and develop the data exchange system. This system supports the mapping and integration of different structures of literature resource, and parsing resources at the same time, so that users can upload and verify the XML schema files according to their individual demands for data exchange.

随着计算机技术的飞速发展，由于系统异构或数据结构造成的系统间通信障碍已经被打破。然而，在资源交换和传递的准确性方面，对个性化内容的要求越来越高。论文、专利、图书等现有文献资源的结构形式、结构各异，在内容传递和传承方面存在诸多问题。因此，基于XML技术，我们设计并开发了数据交换系统。本系统支持对不同结构的文献资源进行映射和集成，同时对资源进行解析，使用户可以根据个人需求上传和验证XML模式文件，进行数据交换。

引用次数: 1

Twitter for marijuana infodemiology 大麻信息流行病学的推特

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2017-08-23 DOI: 10.1145/3106426.3106541

Víctor D. Cortés, J. D. Velásquez, Carlos F. Ibáñez

Today online social networks seem to be good tools to quickly monitor what is going on with the population, since they provide environments where users can freely share large amounts of information related to their own lives. Due to well known limitations of surveys, this novel kind of data can be used to get additional real time insights from people to understand their actual behavior related to drug use. The aim of this work is to make use of text messages (tweets) and relationships between Chilean Twitter users to predict marijuana use among them. To do this we collected Twitter accounts using a location-based criteria, and built a set of features based on tweets they made and ego centric network metrics. To get tweet-based features, tweets were filtered using marijuana-related keywords and a set of 1000 tweets were manually labeled to train algorithms capable of predicting marijuana use in tweets. In addition, a sentiment classifier of tweets was developed using the TASS corpus. Then, we made a survey to get real marijuana use labels related to accounts and these labels were used to train supervised machine learning algorithms. The marijuana use per user classifier had precision, recall and F-measure results close to 0.7, implying significant predictive power of the selected variables. We obtained a model capable of predicting marijuana use of Twitter users and estimating their opinion about marijuana. This information can be used as an efficient (fast and low cost) tool for marijuana surveillance, and support decision making about drug policies.

今天，在线社交网络似乎是快速监控人口动态的好工具，因为它们提供了用户可以自由分享与自己生活相关的大量信息的环境。由于众所周知的调查的局限性，这种新颖的数据可以用来从人们那里获得额外的实时见解，以了解他们与吸毒有关的实际行为。这项工作的目的是利用智利Twitter用户之间的短信(tweet)和关系来预测他们之间的大麻使用情况。为了做到这一点，我们使用基于位置的标准收集Twitter账户，并根据他们发布的推文和以自我为中心的网络指标构建了一组功能。为了获得基于推文的功能，推文使用与大麻相关的关键词进行过滤，并手动标记1000条推文，以训练能够预测推文中大麻使用情况的算法。此外，利用TASS语料库开发了推文情感分类器。然后，我们做了一个调查，得到与账户相关的真实大麻使用标签，这些标签被用来训练有监督的机器学习算法。每个用户使用大麻分类器的精度、召回率和F-measure结果接近0.7，这意味着所选变量的预测能力显著。我们获得了一个能够预测Twitter用户使用大麻的模型，并估计他们对大麻的看法。这些信息可以作为一种高效(快速和低成本)的大麻监控工具，并支持有关毒品政策的决策。

{"title":"Twitter for marijuana infodemiology","authors":"Víctor D. Cortés, J. D. Velásquez, Carlos F. Ibáñez","doi":"10.1145/3106426.3106541","DOIUrl":"https://doi.org/10.1145/3106426.3106541","url":null,"abstract":"Today online social networks seem to be good tools to quickly monitor what is going on with the population, since they provide environments where users can freely share large amounts of information related to their own lives. Due to well known limitations of surveys, this novel kind of data can be used to get additional real time insights from people to understand their actual behavior related to drug use. The aim of this work is to make use of text messages (tweets) and relationships between Chilean Twitter users to predict marijuana use among them. To do this we collected Twitter accounts using a location-based criteria, and built a set of features based on tweets they made and ego centric network metrics. To get tweet-based features, tweets were filtered using marijuana-related keywords and a set of 1000 tweets were manually labeled to train algorithms capable of predicting marijuana use in tweets. In addition, a sentiment classifier of tweets was developed using the TASS corpus. Then, we made a survey to get real marijuana use labels related to accounts and these labels were used to train supervised machine learning algorithms. The marijuana use per user classifier had precision, recall and F-measure results close to 0.7, implying significant predictive power of the selected variables. We obtained a model capable of predicting marijuana use of Twitter users and estimating their opinion about marijuana. This information can be used as an efficient (fast and low cost) tool for marijuana surveillance, and support decision making about drug policies.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86607113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

An interoperable service for the provenance of machine learning experiments 用于机器学习实验来源的可互操作服务

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2017-08-23 DOI: 10.1145/3106426.3106496

J. C. Duarte, M. C. Cavalcanti, Igor de Souza Costa, Diego Esteves

Nowadays, despite the fact that Machine Learning (ML) experiments can be easily built using several ML frameworks, as the demand for practical solutions for several kinds of scientific problems is always increasing, organizing its results and the different algorithms' setups used, in order to be able to reproduce them, is a long known problem without an easy solution. Motivated by the need of a high level of interoperability and data provenance with respect to ML experiments, this work presents a generic solution using a web-service application that interacts with the MEX vocabulary, a lightweight solution for archiving and querying ML experiments. By using this solution, researchers can share their setups and results, in a interoperable format that describes all the steps needed to reproduce their research. Although the solution presented in this work could be implemented in any programming language, we chose Java to build the web-service and also we chose to present experiments with Python's Scikit-learn ML Framework, using Decorators and Code Reflection, that demonstrates the simplicity of incorporating data provenance in such a high level, simplifying the experiment logging process.

如今，尽管机器学习(ML)实验可以使用几个ML框架轻松构建，但由于对几种科学问题的实际解决方案的需求一直在增加，为了能够重现它们，组织其结果和使用的不同算法设置是一个长期存在的问题，没有一个简单的解决方案。由于ML实验需要高水平的互操作性和数据来源，这项工作提出了一个通用的解决方案，使用一个与MEX词汇表交互的web服务应用程序，一个用于存档和查询ML实验的轻量级解决方案。通过使用这个解决方案，研究人员可以以一种可互操作的格式共享他们的设置和结果，该格式描述了重现他们的研究所需的所有步骤。虽然本工作中提出的解决方案可以在任何编程语言中实现，但我们选择Java来构建web服务，并且我们选择使用Python的Scikit-learn ML框架进行实验，使用装饰器和代码反射，这表明了在如此高的级别上合并数据来源的简单性，简化了实验日志记录过程。

{"title":"An interoperable service for the provenance of machine learning experiments","authors":"J. C. Duarte, M. C. Cavalcanti, Igor de Souza Costa, Diego Esteves","doi":"10.1145/3106426.3106496","DOIUrl":"https://doi.org/10.1145/3106426.3106496","url":null,"abstract":"Nowadays, despite the fact that Machine Learning (ML) experiments can be easily built using several ML frameworks, as the demand for practical solutions for several kinds of scientific problems is always increasing, organizing its results and the different algorithms' setups used, in order to be able to reproduce them, is a long known problem without an easy solution. Motivated by the need of a high level of interoperability and data provenance with respect to ML experiments, this work presents a generic solution using a web-service application that interacts with the MEX vocabulary, a lightweight solution for archiving and querying ML experiments. By using this solution, researchers can share their setups and results, in a interoperable format that describes all the steps needed to reproduce their research. Although the solution presented in this work could be implemented in any programming language, we chose Java to build the web-service and also we chose to present experiments with Python's Scikit-learn ML Framework, using Decorators and Code Reflection, that demonstrates the simplicity of incorporating data provenance in such a high level, simplifying the experiment logging process.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77395108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Keeping linked open data caches up-to-date by predicting the life-time of RDF triples 通过预测RDF三元组的生命周期，使链接的开放数据缓存保持最新状态

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

Pub Date : 2017-08-23 DOI: 10.1145/3106426.3106463

Chifumi Nishioka, A. Scherp

Many Linked Open Data applications require fresh copies of RDF data at their local repositories. Since RDF documents constantly change and those changes are not automatically propagated to the LOD applications, it is important to regularly visit the RDF documents to refresh the local copies and keep them up-to-date. For this purpose, crawling strategies determine which RDF documents should be preferentially fetched. Traditional crawling strategies rely only on how an RDF document has been modified in the past. In contrast, we predict on the triple level whether a change will occur in the future. We use the weekly snapshots of the DyLDO dataset as well as the monthly snapshots of the Wikidata dataset. First, we conduct an in-depth analysis of the life span of triples in RDF documents. Through the analysis, we identify which triples are stable and which are ephemeral. We introduce different features based on the triples and apply a simple but effective linear regression model. Second, we propose a novel crawling strategy based on the linear regression model. We conduct two experimental setups where we vary the amount of available bandwidth as well as iteratively observe the quality of the local copies over time. The results demonstrate that the novel crawling strategy outperforms the state of the art in both setups.

许多链接开放数据应用程序需要在其本地存储库中获得RDF数据的新副本。由于RDF文档不断更改，而这些更改不会自动传播到LOD应用程序，因此定期访问RDF文档以刷新本地副本并使其保持最新非常重要。为此，爬行策略决定应该优先获取哪些RDF文档。传统的爬行策略仅依赖于RDF文档在过去是如何被修改的。相反，我们在三重层面上预测未来是否会发生变化。我们使用DyLDO数据集的每周快照以及维基数据集的每月快照。首先，我们对RDF文档中三元组的生命周期进行深入分析。通过分析，我们确定了哪些三元组是稳定的，哪些是短暂的。我们在三元组的基础上引入不同的特征，并应用一个简单而有效的线性回归模型。其次，我们提出了一种新的基于线性回归模型的爬行策略。我们进行了两个实验设置，其中我们改变了可用带宽的数量，并随时间迭代地观察本地副本的质量。结果表明，在这两种设置中，新的爬行策略都优于目前的状态。

{"title":"Keeping linked open data caches up-to-date by predicting the life-time of RDF triples","authors":"Chifumi Nishioka, A. Scherp","doi":"10.1145/3106426.3106463","DOIUrl":"https://doi.org/10.1145/3106426.3106463","url":null,"abstract":"Many Linked Open Data applications require fresh copies of RDF data at their local repositories. Since RDF documents constantly change and those changes are not automatically propagated to the LOD applications, it is important to regularly visit the RDF documents to refresh the local copies and keep them up-to-date. For this purpose, crawling strategies determine which RDF documents should be preferentially fetched. Traditional crawling strategies rely only on how an RDF document has been modified in the past. In contrast, we predict on the triple level whether a change will occur in the future. We use the weekly snapshots of the DyLDO dataset as well as the monthly snapshots of the Wikidata dataset. First, we conduct an in-depth analysis of the life span of triples in RDF documents. Through the analysis, we identify which triples are stable and which are ephemeral. We introduce different features based on the triples and apply a simple but effective linear regression model. Second, we propose a novel crawling strategy based on the linear regression model. We conduct two experimental setups where we vary the amount of available bandwidth as well as iteratively observe the quality of the local copies over time. The results demonstrate that the novel crawling strategy outperforms the state of the art in both setups.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":"72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76549299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀