Seed selection for successful fuzzing

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis Pub Date : 2021-07-11 DOI:10.1145/3460319.3464795

Adrián Herrera, Hendra Gunadi, S. Magrath, Michael Norrish, Mathias Payer, Antony Lloyd Hosking

{"title":"Seed selection for successful fuzzing","authors":"Adrián Herrera, Hendra Gunadi, S. Magrath, Michael Norrish, Mathias Payer, Antony Lloyd Hosking","doi":"10.1145/3460319.3464795","DOIUrl":null,"url":null,"abstract":"Mutation-based greybox fuzzing---unquestionably the most widely-used fuzzing technique---relies on a set of non-crashing seed inputs (a corpus) to bootstrap the bug-finding process. When evaluating a fuzzer, common approaches for constructing this corpus include: (i) using an empty file; (ii) using a single seed representative of the target's input format; or (iii) collecting a large number of seeds (e.g., by crawling the Internet). Little thought is given to how this seed choice affects the fuzzing process, and there is no consensus on which approach is best (or even if a best approach exists). To address this gap in knowledge, we systematically investigate and evaluate how seed selection affects a fuzzer's ability to find bugs in real-world software. This includes a systematic review of seed selection practices used in both evaluation and deployment contexts, and a large-scale empirical evaluation (over 33 CPU-years) of six seed selection approaches. These six seed selection approaches include three corpus minimization techniques (which select the smallest subset of seeds that trigger the same range of instrumentation data points as a full corpus). Our results demonstrate that fuzzing outcomes vary significantly depending on the initial seeds used to bootstrap the fuzzer, with minimized corpora outperforming singleton, empty, and large (in the order of thousands of files) seed sets. Consequently, we encourage seed selection to be foremost in mind when evaluating/deploying fuzzers, and recommend that (a) seed choice be carefully considered and explicitly documented, and (b) never to evaluate fuzzers with only a single seed.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"179 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460319.3464795","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 48

Abstract

Mutation-based greybox fuzzing---unquestionably the most widely-used fuzzing technique---relies on a set of non-crashing seed inputs (a corpus) to bootstrap the bug-finding process. When evaluating a fuzzer, common approaches for constructing this corpus include: (i) using an empty file; (ii) using a single seed representative of the target's input format; or (iii) collecting a large number of seeds (e.g., by crawling the Internet). Little thought is given to how this seed choice affects the fuzzing process, and there is no consensus on which approach is best (or even if a best approach exists). To address this gap in knowledge, we systematically investigate and evaluate how seed selection affects a fuzzer's ability to find bugs in real-world software. This includes a systematic review of seed selection practices used in both evaluation and deployment contexts, and a large-scale empirical evaluation (over 33 CPU-years) of six seed selection approaches. These six seed selection approaches include three corpus minimization techniques (which select the smallest subset of seeds that trigger the same range of instrumentation data points as a full corpus). Our results demonstrate that fuzzing outcomes vary significantly depending on the initial seeds used to bootstrap the fuzzer, with minimized corpora outperforming singleton, empty, and large (in the order of thousands of files) seed sets. Consequently, we encourage seed selection to be foremost in mind when evaluating/deploying fuzzers, and recommend that (a) seed choice be carefully considered and explicitly documented, and (b) never to evaluate fuzzers with only a single seed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

成功模糊的种子选择

基于突变的灰盒模糊测试——毫无疑问是最广泛使用的模糊测试技术——依赖于一组非崩溃种子输入(语料库)来引导bug查找过程。当评估一个模糊器时，构建这个语料库的常用方法包括:(i)使用一个空文件;(ii)使用代表目标输入格式的单一种子;或(iii)收集大量种子(例如，通过在互联网上爬行)。很少有人考虑这种种子选择是如何影响模糊过程的，而且对于哪种方法是最好的(甚至是否存在最佳方法)也没有达成共识。为了解决这一知识缺口，我们系统地调查和评估种子选择如何影响模糊器在现实软件中发现缺陷的能力。这包括对评估和部署环境中使用的种子选择实践的系统回顾，以及对六种种子选择方法的大规模实证评估(超过33个cpu年)。这六种种子选择方法包括三种语料库最小化技术(选择触发与完整语料库相同范围的仪器数据点的种子的最小子集)。我们的结果表明，模糊结果根据用于引导模糊器的初始种子而显着变化，最小化语料库优于单例、空和大型(按数千个文件的顺序)种子集。因此，我们鼓励在评估/部署模糊器时首先考虑种子选择，并建议(a)仔细考虑种子选择并明确记录，(b)永远不要仅使用单一种子来评估模糊器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

自引率

0.00%

发文量

期刊最新文献

Semantic table structure identification in spreadsheets Parema: an unpacking framework for demystifying VM-based Android packers TERA: optimizing stochastic regression tests in machine learning projects Empirically evaluating readily available information for regression test optimization in continuous integration RESTest: automated black-box testing of RESTful web APIs