Measuring and Analyzing Search Engine Poisoning of Linguistic Collisions

Matthew Joslin, Neng Li, S. Hao, Minhui Xue, Haojin Zhu
{"title":"Measuring and Analyzing Search Engine Poisoning of Linguistic Collisions","authors":"Matthew Joslin, Neng Li, S. Hao, Minhui Xue, Haojin Zhu","doi":"10.1109/SP.2019.00025","DOIUrl":null,"url":null,"abstract":"Misspelled keywords have become an appealing target in search poisoning, since they are less competitive to promote than the correct queries and account for a considerable amount of search traffic. Search engines have adopted several countermeasure strategies, e.g., Google applies automated corrections on queried keywords and returns search results of the corrected versions directly. However, a sophisticated class of attack, which we term as linguistic-collision misspelling, can evade auto-correction and poison search results. Cybercriminals target special queries where the misspelled terms are existent words, even in other languages (e.g., \"idobe\", a misspelling of the English word \"adobe\", is a legitimate word in the Nigerian language). In this paper, we perform the first large-scale analysis on linguistic-collision search poisoning attacks. In particular, we check 1.77 million misspelled search terms on Google and Baidu and analyze both English and Chinese languages, which are the top two languages used by Internet users. We leverage edit distance operations and linguistic properties to generate misspelling candidates. To more efficiently identify linguistic-collision search terms, we design a deep learning model that can improve collection rate by 2.84x compared to random sampling. Our results show that the abuse is prevalent: around 1.19% of linguistic-collision search terms on Google and Baidu have results on the first page directing to malicious websites. We also find that cybercriminals mainly target categories of gambling, drugs, and adult content. Mobile-device users disproportionately search for misspelled keywords, presumably due to small screen for input. Our work highlights this new class of search engine poisoning and provides insights to help mitigate the threat.","PeriodicalId":272713,"journal":{"name":"2019 IEEE Symposium on Security and Privacy (SP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP.2019.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Misspelled keywords have become an appealing target in search poisoning, since they are less competitive to promote than the correct queries and account for a considerable amount of search traffic. Search engines have adopted several countermeasure strategies, e.g., Google applies automated corrections on queried keywords and returns search results of the corrected versions directly. However, a sophisticated class of attack, which we term as linguistic-collision misspelling, can evade auto-correction and poison search results. Cybercriminals target special queries where the misspelled terms are existent words, even in other languages (e.g., "idobe", a misspelling of the English word "adobe", is a legitimate word in the Nigerian language). In this paper, we perform the first large-scale analysis on linguistic-collision search poisoning attacks. In particular, we check 1.77 million misspelled search terms on Google and Baidu and analyze both English and Chinese languages, which are the top two languages used by Internet users. We leverage edit distance operations and linguistic properties to generate misspelling candidates. To more efficiently identify linguistic-collision search terms, we design a deep learning model that can improve collection rate by 2.84x compared to random sampling. Our results show that the abuse is prevalent: around 1.19% of linguistic-collision search terms on Google and Baidu have results on the first page directing to malicious websites. We also find that cybercriminals mainly target categories of gambling, drugs, and adult content. Mobile-device users disproportionately search for misspelled keywords, presumably due to small screen for input. Our work highlights this new class of search engine poisoning and provides insights to help mitigate the threat.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
测量和分析搜索引擎中毒的语言冲突
拼写错误的关键字已经成为搜索中毒的一个吸引人的目标,因为它们比正确的查询更具竞争力,并且占了相当大的搜索流量。搜索引擎采用了几种对策策略,例如谷歌对查询的关键词进行自动更正,并直接返回更正后的搜索结果。然而,一类复杂的攻击,我们称之为语言冲突拼写错误,可以逃避自动更正和毒害搜索结果。网络罪犯的目标是那些拼写错误的词是存在的特殊查询,即使是在其他语言中(例如,“idobe”是英语单词“adobe”的拼写错误,在尼日利亚语中是一个合法的单词)。在本文中,我们首次对语言冲突搜索中毒攻击进行了大规模分析。特别是,我们在谷歌和百度上检查了177万个拼写错误的搜索词,并分析了网民使用最多的两种语言——英语和汉语。我们利用编辑距离操作和语言属性来生成拼写错误候选项。为了更有效地识别语言冲突搜索词,我们设计了一个深度学习模型,与随机抽样相比,该模型可以将收集率提高2.84倍。我们的结果显示,滥用是普遍存在的:在谷歌和百度上,大约1.19%的语言冲突搜索词在第一页的结果指向恶意网站。我们还发现,网络罪犯主要针对赌博、毒品和成人内容。移动设备用户不成比例地搜索拼错的关键字,可能是由于输入屏幕太小。我们的工作突出了这类新的搜索引擎中毒,并提供了有助于减轻威胁的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The 9 Lives of Bleichenbacher's CAT: New Cache ATtacks on TLS Implementations CaSym: Cache Aware Symbolic Execution for Side Channel Detection and Mitigation PrivKV: Key-Value Data Collection with Local Differential Privacy Postcards from the Post-HTTP World: Amplification of HTTPS Vulnerabilities in the Web Ecosystem New Primitives for Actively-Secure MPC over Rings with Applications to Private Machine Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1