Web Robot Detection: A Semantic Approach

Athanasios Lagopoulos, Grigorios Tsoumakas, Georgios Papadopoulos
{"title":"Web Robot Detection: A Semantic Approach","authors":"Athanasios Lagopoulos, Grigorios Tsoumakas, Georgios Papadopoulos","doi":"10.1109/ICTAI.2018.00150","DOIUrl":null,"url":null,"abstract":"Web robots constitute nowadays more than half of the total web traffic. Malicious robots threaten the security, privacy and performance of the web, while non-malicious ones are involved in analytics skewing. The latter constitutes an important problem for large websites with unique content, as it can lead to false impressions about the popularity and impact of a piece of information. To deal with this problem, we present a novel web robot detection approach for content-rich websites, based on the assumption that human web users are interested in specific topics, while web robots crawl the web randomly. Our approach extends the typical representation of user sessions with a novel set of features that capture the semantics of the content of the requested resources. Empirical results on real-world data from the web portal of an academic publisher, show that the proposed semantic features lead to improved web robot detection accuracy.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2018.00150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Web robots constitute nowadays more than half of the total web traffic. Malicious robots threaten the security, privacy and performance of the web, while non-malicious ones are involved in analytics skewing. The latter constitutes an important problem for large websites with unique content, as it can lead to false impressions about the popularity and impact of a piece of information. To deal with this problem, we present a novel web robot detection approach for content-rich websites, based on the assumption that human web users are interested in specific topics, while web robots crawl the web randomly. Our approach extends the typical representation of user sessions with a novel set of features that capture the semantics of the content of the requested resources. Empirical results on real-world data from the web portal of an academic publisher, show that the proposed semantic features lead to improved web robot detection accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
网络机器人检测:一种语义方法
如今,网络机器人构成了总网络流量的一半以上。恶意机器人威胁网络的安全、隐私和性能,而非恶意机器人则涉及分析偏差。对于拥有独特内容的大型网站来说,后者构成了一个重要问题,因为它可能导致对一条信息的受欢迎程度和影响的错误印象。为了解决这一问题,我们提出了一种针对内容丰富的网站的新型网络机器人检测方法,该方法基于人类网络用户对特定主题感兴趣的假设,而网络机器人则随机抓取网络。我们的方法扩展了用户会话的典型表示,使用一组新颖的特性来捕获所请求资源内容的语义。在某学术出版社门户网站的真实数据上的实证结果表明,所提出的语义特征提高了网络机器人的检测精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
[Title page i] Enhanced Unsatisfiable Cores for QBF: Weakening Universal to Existential Quantifiers Effective Ant Colony Optimization Solution for the Brazilian Family Health Team Scheduling Problem Exploiting Global Semantic Similarity Biterms for Short-Text Topic Discovery Assigning and Scheduling Service Visits in a Mixed Urban/Rural Setting
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1