Ontology-based approach for unsupervised and adaptive focused crawling

Proceedings of the International Workshop on Semantic Big Data Pub Date : 2017-05-19 DOI:10.1145/3066911.3066912

Thomas Hassan, C. Cruz, Aurélie Bertaux

引用次数: 14

Abstract

Information from the web is a key resource exploited in the domain of competitive intelligence. These sources represent important volumes of information to process everyday. As the amount of information available grows rapidly, this process becomes overwhelming for experts. To leverage this challenge, this paper presents a novel approach to process such sources and extract only the most valuable pieces of information. The approach is based on an unsupervised and adaptive ontology-learning process. The resulting ontology is used to enhance the performance of a focused crawler. The combination of Big Data and Semantic Web technologies allows to classify information precisely according to domain knowledge, while maintaining optimal performances. The approach and its implementation are described, and an presents the feasibility and performance of the approach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于本体的无监督自适应聚焦爬行方法

网络信息是竞争情报领域的重要资源。这些来源代表了每天需要处理的大量重要信息。随着可用信息量的迅速增长，这一过程对专家来说变得势不可挡。为了利用这一挑战，本文提出了一种新的方法来处理这些来源并仅提取最有价值的信息。该方法基于无监督和自适应本体学习过程。生成的本体用于增强聚焦爬虫的性能。大数据和语义网技术的结合可以根据领域知识对信息进行精确分类，同时保持最佳性能。介绍了该方法及其实现，并给出了该方法的可行性和性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the International Workshop on Semantic Big Data

自引率

0.00%

发文量