Objectionable content filtering by click-through data

Proceedings of the 22nd ACM international conference on Information & Knowledge Management Pub Date : 2013-10-27 DOI:10.1145/2505515.2507849

Lung-Hao Lee, Yen-Cheng Juan, Hsin-Hsi Chen, Yuen-Hsien Tseng

引用次数: 3

Abstract

This paper explores users' browsing intents to predict the category of a user's next access during web surfing, and applies the results to objectionable content filtering. A user's access trail represented as a sequence of URLs reveals the contextual information of web browsing behaviors. We extract behavioral features of each clicked URL, i.e., hostname, bag-of-words, gTLD, IP, and port, to develop a linear chain CRF model for context-aware category prediction. Large-scale experiments show that our method achieves a promising accuracy of 0.9396 for objectionable access identification without requesting their corresponding page content. Error analysis indicates that our proposed model results in a low false positive rate of 0.0571. In real-life filtering simulations, our proposed model accomplishes macro-averaging blocking rate 0.9271, while maintaining a favorably low macro-averaging over-blocking rate 0.0575 for collaboratively filtering objectionable content with time change on the dynamic web.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过点击数据过滤不良内容

本文通过研究用户的浏览意图来预测用户在网上冲浪时下一次访问的类别，并将结果应用于不良内容过滤。以url序列表示的用户访问轨迹揭示了web浏览行为的上下文信息。我们提取每个被点击URL的行为特征，即主机名、词袋、gTLD、IP和端口，以开发用于上下文感知类别预测的线性链CRF模型。大规模实验表明，我们的方法在不要求相应页面内容的情况下对不良访问进行识别，准确率达到0.9396。误差分析表明，我们提出的模型的假阳性率为0.0571。在实际过滤模拟中，我们提出的模型实现了宏观平均阻塞率0.9271，同时保持了一个有利的低宏观平均过阻塞率0.0575，以协同过滤动态网络上随时间变化的不良内容。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 22nd ACM international conference on Information & Knowledge Management

自引率

0.00%

发文量

期刊最新文献

Exploring XML data is as easy as using maps Mining-based compression approach of propositional formulae Flexible and dynamic compromises for effective recommendations Efficient parsing-based search over structured data Recommendation via user's personality and social contextual