Web Robot Detection: A Semantic Approach

2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI) Pub Date : 2018-11-01 DOI:10.1109/ICTAI.2018.00150

Athanasios Lagopoulos, Grigorios Tsoumakas, Georgios Papadopoulos

引用次数: 10

Abstract

Web robots constitute nowadays more than half of the total web traffic. Malicious robots threaten the security, privacy and performance of the web, while non-malicious ones are involved in analytics skewing. The latter constitutes an important problem for large websites with unique content, as it can lead to false impressions about the popularity and impact of a piece of information. To deal with this problem, we present a novel web robot detection approach for content-rich websites, based on the assumption that human web users are interested in specific topics, while web robots crawl the web randomly. Our approach extends the typical representation of user sessions with a novel set of features that capture the semantics of the content of the requested resources. Empirical results on real-world data from the web portal of an academic publisher, show that the proposed semantic features lead to improved web robot detection accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

网络机器人检测:一种语义方法

如今，网络机器人构成了总网络流量的一半以上。恶意机器人威胁网络的安全、隐私和性能，而非恶意机器人则涉及分析偏差。对于拥有独特内容的大型网站来说，后者构成了一个重要问题，因为它可能导致对一条信息的受欢迎程度和影响的错误印象。为了解决这一问题，我们提出了一种针对内容丰富的网站的新型网络机器人检测方法，该方法基于人类网络用户对特定主题感兴趣的假设，而网络机器人则随机抓取网络。我们的方法扩展了用户会话的典型表示，使用一组新颖的特性来捕获所请求资源内容的语义。在某学术出版社门户网站的真实数据上的实证结果表明，所提出的语义特征提高了网络机器人的检测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)

自引率

0.00%

发文量

期刊最新文献

[Title page i] Enhanced Unsatisfiable Cores for QBF: Weakening Universal to Existential Quantifiers Effective Ant Colony Optimization Solution for the Brazilian Family Health Team Scheduling Problem Exploiting Global Semantic Similarity Biterms for Short-Text Topic Discovery Assigning and Scheduling Service Visits in a Mixed Urban/Rural Setting