HVLearn: SSL/TLS实现中主机名验证的自动黑盒分析

2017 IEEE Symposium on Security and Privacy (SP) Pub Date : 2017-05-01 DOI:10.1109/SP.2017.46

Suphannee Sivakorn, G. Argyros, Kexin Pei, A. Keromytis, S. Jana

{"title":"HVLearn: SSL/TLS实现中主机名验证的自动黑盒分析","authors":"Suphannee Sivakorn, G. Argyros, Kexin Pei, A. Keromytis, S. Jana","doi":"10.1109/SP.2017.46","DOIUrl":null,"url":null,"abstract":"SSL/TLS is the most commonly deployed family of protocols for securing network communications. The security guarantees of SSL/TLS are critically dependent on the correct validation of the X.509 server certificates presented during the handshake stage of the SSL/TLS protocol. Hostname verification is a critical component of the certificate validation process that verifies the remote server's identity by checking if the hostname of the server matches any of the names present in the X.509 certificate. Hostname verification is a highly complex process due to the presence of numerous features and corner cases such as wildcards, IP addresses, international domain names, and so forth. Therefore, testing hostname verification implementations present a challenging task. In this paper, we present HVLearn, a novel black-box testing framework for analyzing SSL/TLS hostname verification implementations, which is based on automata learning algorithms. HVLearn utilizes a number of certificate templates, i.e., certificates with a common name (CN) set to a specific pattern, in order to test different rules from the corresponding specification. For each certificate template, HVLearn uses automata learning algorithms to infer a Deterministic Finite Automaton (DFA) that describes the set of all hostnames that match the CN of a given certificate. Once a model is inferred for a certificate template, HVLearn checks the model for bugs by finding discrepancies with the inferred models from other implementations or by checking against regular-expression-based rules derived from the specification. The key insight behind our approach is that the acceptable hostnames for a given certificate template form a regular language. Therefore, we can leverage automata learning techniques to efficiently infer DFA models that accept the corresponding regular language. We use HVLearn to analyze the hostname verification implementations in a number of popular SSL/TLS libraries and applications written in a diverse set of languages like C, Python, and Java. We demonstrate that HVLearn can achieve on average 11.21% higher code coverage than existing black/gray-box fuzzing techniques. By comparing the DFA models inferred by HVLearn, we found 8 unique violations of the RFC specifications in the tested hostname verification implementations. Several of these violations are critical and can render the affected implementations vulnerable to active man-in-the-middle attacks.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"58 1","pages":"521-538"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"71","resultStr":"{\"title\":\"HVLearn: Automated Black-Box Analysis of Hostname Verification in SSL/TLS Implementations\",\"authors\":\"Suphannee Sivakorn, G. Argyros, Kexin Pei, A. Keromytis, S. Jana\",\"doi\":\"10.1109/SP.2017.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SSL/TLS is the most commonly deployed family of protocols for securing network communications. The security guarantees of SSL/TLS are critically dependent on the correct validation of the X.509 server certificates presented during the handshake stage of the SSL/TLS protocol. Hostname verification is a critical component of the certificate validation process that verifies the remote server's identity by checking if the hostname of the server matches any of the names present in the X.509 certificate. Hostname verification is a highly complex process due to the presence of numerous features and corner cases such as wildcards, IP addresses, international domain names, and so forth. Therefore, testing hostname verification implementations present a challenging task. In this paper, we present HVLearn, a novel black-box testing framework for analyzing SSL/TLS hostname verification implementations, which is based on automata learning algorithms. HVLearn utilizes a number of certificate templates, i.e., certificates with a common name (CN) set to a specific pattern, in order to test different rules from the corresponding specification. For each certificate template, HVLearn uses automata learning algorithms to infer a Deterministic Finite Automaton (DFA) that describes the set of all hostnames that match the CN of a given certificate. Once a model is inferred for a certificate template, HVLearn checks the model for bugs by finding discrepancies with the inferred models from other implementations or by checking against regular-expression-based rules derived from the specification. The key insight behind our approach is that the acceptable hostnames for a given certificate template form a regular language. Therefore, we can leverage automata learning techniques to efficiently infer DFA models that accept the corresponding regular language. We use HVLearn to analyze the hostname verification implementations in a number of popular SSL/TLS libraries and applications written in a diverse set of languages like C, Python, and Java. We demonstrate that HVLearn can achieve on average 11.21% higher code coverage than existing black/gray-box fuzzing techniques. By comparing the DFA models inferred by HVLearn, we found 8 unique violations of the RFC specifications in the tested hostname verification implementations. Several of these violations are critical and can render the affected implementations vulnerable to active man-in-the-middle attacks.\",\"PeriodicalId\":6502,\"journal\":{\"name\":\"2017 IEEE Symposium on Security and Privacy (SP)\",\"volume\":\"58 1\",\"pages\":\"521-538\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"71\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Symposium on Security and Privacy (SP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SP.2017.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP.2017.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 71

摘要

SSL/TLS是用于保护网络通信的最常用的协议系列。SSL/TLS的安全保证严重依赖于在SSL/TLS协议握手阶段提供的X.509服务器证书的正确验证。主机名验证是证书验证过程的关键组成部分，它通过检查服务器的主机名是否与X.509证书中出现的任何名称匹配来验证远程服务器的身份。主机名验证是一个非常复杂的过程，因为存在许多特性和极端情况，例如通配符、IP地址、国际域名等等。因此，测试主机名验证实现是一项具有挑战性的任务。在本文中，我们提出了HVLearn，一个新的黑盒测试框架，用于分析SSL/TLS主机名验证实现，它基于自动机学习算法。HVLearn使用了许多证书模板，即将通用名称(CN)设置为特定模式的证书，以便测试来自相应规范的不同规则。对于每个证书模板，HVLearn使用自动机学习算法来推断一个确定性有限自动机(DFA)，该DFA描述了与给定证书的CN匹配的所有主机名的集合。一旦为证书模板推断出一个模型，HVLearn就会通过查找与其他实现推断出的模型之间的差异，或通过检查源自规范的基于正则表达式的规则，来检查模型中的错误。我们的方法背后的关键见解是，给定证书模板的可接受主机名形成一种常规语言。因此，我们可以利用自动机学习技术来有效地推断接受相应规则语言的DFA模型。我们使用HVLearn来分析许多流行的SSL/TLS库和用C、Python和Java等不同语言编写的应用程序中的主机名验证实现。我们证明，HVLearn可以实现比现有的黑/灰盒模糊技术平均高11.21%的代码覆盖率。通过比较HVLearn推断的DFA模型，我们在测试的主机名验证实现中发现了8个独特的RFC规范违规。其中一些违规是严重的，可能会使受影响的实现容易受到主动中间人攻击。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

HVLearn: Automated Black-Box Analysis of Hostname Verification in SSL/TLS Implementations

SSL/TLS is the most commonly deployed family of protocols for securing network communications. The security guarantees of SSL/TLS are critically dependent on the correct validation of the X.509 server certificates presented during the handshake stage of the SSL/TLS protocol. Hostname verification is a critical component of the certificate validation process that verifies the remote server's identity by checking if the hostname of the server matches any of the names present in the X.509 certificate. Hostname verification is a highly complex process due to the presence of numerous features and corner cases such as wildcards, IP addresses, international domain names, and so forth. Therefore, testing hostname verification implementations present a challenging task. In this paper, we present HVLearn, a novel black-box testing framework for analyzing SSL/TLS hostname verification implementations, which is based on automata learning algorithms. HVLearn utilizes a number of certificate templates, i.e., certificates with a common name (CN) set to a specific pattern, in order to test different rules from the corresponding specification. For each certificate template, HVLearn uses automata learning algorithms to infer a Deterministic Finite Automaton (DFA) that describes the set of all hostnames that match the CN of a given certificate. Once a model is inferred for a certificate template, HVLearn checks the model for bugs by finding discrepancies with the inferred models from other implementations or by checking against regular-expression-based rules derived from the specification. The key insight behind our approach is that the acceptable hostnames for a given certificate template form a regular language. Therefore, we can leverage automata learning techniques to efficiently infer DFA models that accept the corresponding regular language. We use HVLearn to analyze the hostname verification implementations in a number of popular SSL/TLS libraries and applications written in a diverse set of languages like C, Python, and Java. We demonstrate that HVLearn can achieve on average 11.21% higher code coverage than existing black/gray-box fuzzing techniques. By comparing the DFA models inferred by HVLearn, we found 8 unique violations of the RFC specifications in the tested hostname verification implementations. Several of these violations are critical and can render the affected implementations vulnerable to active man-in-the-middle attacks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE Symposium on Security and Privacy (SP)

自引率

0.00%

发文量