Attacking HTTPS Secure Search Service through Correlation Analysis of HTTP Webpages Accessed

International Journal of Security and Its Applications Pub Date : 2017-07-31 DOI:10.14257/IJSIA.2017.11.7.03

Qian Liping, Wang Lidong

{"title":"Attacking HTTPS Secure Search Service through Correlation Analysis of HTTP Webpages Accessed","authors":"Qian Liping, Wang Lidong","doi":"10.14257/IJSIA.2017.11.7.03","DOIUrl":null,"url":null,"abstract":"It is very common for Internet users to query a search engine when retrieving web information. Sensitive data about search engine user’s intentions or behavior can be inferred from his query phrases and the webpages he visits subsequently. In order to protect contents of communications from being eavesdropped, a search engine can adopt HTTPS-by-default to provide bidirectional encryption to protect its users’ privacy. Since the majority of webpages indexed in search engine’s results pages are still on HTTP-enabled websites and the contents of these webpages can be observed by attackers once the user click on the indexed web-links. We propose a novel approach for attacking secure search through correlating analysis of encrypted search with unencrypted webpages the user visits subsequently. We show that a simple weighted TF-DF mechanism is sufficient for selecting guessing phrase candidates. Imitating search engine users, by querying these candidates and enumerating webpages indexed in results pages, we can hit the definite query phrases and meanwhile reconstruct user’s web-surfing trails through DNS-based URLs comparison and flow feature statistics-based network traffic analysis. In the experiment including 180 Chinese and English search phrases, we achieved 67.78% hit rate at first guess and 96.11% hit rate within three guesses. Our empirical research shows that HTTPS traffic can be correlated and de-anonymized through HTTP traffic and secure search of search engine is not always secure unless HTTPS-by-default enabled everywhere.","PeriodicalId":46187,"journal":{"name":"International Journal of Security and Its Applications","volume":"11 1","pages":"25-42"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Security and Its Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14257/IJSIA.2017.11.7.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

It is very common for Internet users to query a search engine when retrieving web information. Sensitive data about search engine user’s intentions or behavior can be inferred from his query phrases and the webpages he visits subsequently. In order to protect contents of communications from being eavesdropped, a search engine can adopt HTTPS-by-default to provide bidirectional encryption to protect its users’ privacy. Since the majority of webpages indexed in search engine’s results pages are still on HTTP-enabled websites and the contents of these webpages can be observed by attackers once the user click on the indexed web-links. We propose a novel approach for attacking secure search through correlating analysis of encrypted search with unencrypted webpages the user visits subsequently. We show that a simple weighted TF-DF mechanism is sufficient for selecting guessing phrase candidates. Imitating search engine users, by querying these candidates and enumerating webpages indexed in results pages, we can hit the definite query phrases and meanwhile reconstruct user’s web-surfing trails through DNS-based URLs comparison and flow feature statistics-based network traffic analysis. In the experiment including 180 Chinese and English search phrases, we achieved 67.78% hit rate at first guess and 96.11% hit rate within three guesses. Our empirical research shows that HTTPS traffic can be correlated and de-anonymized through HTTP traffic and secure search of search engine is not always secure unless HTTPS-by-default enabled everywhere.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过访问HTTP网页的相关性分析攻击HTTPS安全搜索服务

互联网用户在检索网络信息时查询搜索引擎是非常常见的。关于搜索引擎用户意图或行为的敏感数据可以从他的查询短语和他随后访问的网页中推断出来。为了保护通信内容不被窃听，搜索引擎可以默认采用HTTPS来提供双向加密，以保护用户的隐私。由于搜索引擎结果页面中索引的大多数网页仍在启用HTTP的网站上，一旦用户单击索引的网页链接，攻击者就可以观察到这些网页的内容。我们提出了一种攻击安全搜索的新方法，通过将加密搜索与用户随后访问的未加密网页进行关联分析。我们证明了一个简单的加权TF-DF机制足以选择猜测短语候选者。模仿搜索引擎用户，通过查询这些候选者并枚举结果页面中索引的网页，我们可以命中确定的查询短语，同时通过基于DNS的URL比较和基于流量特征统计的网络流量分析来重建用户的网络浏览轨迹。在包含180个中英文搜索短语的实验中，我们获得了67.78%的第一次猜测命中率和96.11%的三次猜测命中度。我们的实证研究表明，HTTPS流量可以通过HTTP流量进行关联和去匿名化，除非在所有地方默认启用HTTPS，否则搜索引擎的安全搜索并不总是安全的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Security and Its Applications COMPUTER SCIENCE, INFORMATION SYSTEMS-

自引率

0.00%

发文量

期刊介绍： IJSIA aims to facilitate and support research related to security technology and its applications. Our Journal provides a chance for academic and industry professionals to discuss recent progress in the area of security technology and its applications. Journal Topics: -Access Control -Ad Hoc & Sensor Network Security -Applied Cryptography -Authentication and Non-repudiation -Cryptographic Protocols -Denial of Service -E-Commerce Security -Identity and Trust Management -Information Hiding -Insider Threats and Countermeasures -Intrusion Detection & Prevention -Network & Wireless Security -Peer-to-Peer Security -Privacy and Anonymity -Secure installation, generation and operation -Security Analysis Methodologies -Security assurance -Security in Software Outsourcing -Security products or systems -Security technology -Systems and Data Security