{"title":"Boosting Fuzzer Efficiency: An Information Theoretic Perspective","authors":"Marcel Böhme, Valentin J. M. Manès, Sang Kil Cha","doi":"10.1145/3611019","DOIUrl":null,"url":null,"abstract":"In this paper, we take the fundamental perspective of fuzzing as a learning process. Suppose before fuzzing, we know nothing about the behaviors of a program P : What does it do? Executing the first test input, we learn how P behaves for this input. Executing the next input, we either observe the same or discover a new behavior. As such, each execution reveals \"some amount\" of information about P 's behaviors. A classic measure of information is Shannon's entropy. Measuring entropy allows us to quantify how much is learned from each generated test input about the behaviors of the program. Within a probabilistic model of fuzzing, we show how entropy also measures fuzzer efficiency. Specifically, it measures the general rate at which the fuzzer discovers new behaviors. Intuitively, efficient fuzzers maximize information. From this information theoretic perspective, we develop ENTROPIC, an entropy-based power schedule for greybox fuzzing that assigns more energy to seeds that maximize information. We implemented ENTROPIC into the popular greybox fuzzer LIBFUZZER. Our experiments with more than 250 open-source programs (60 million LoC) demonstrate a substantially improved efficiency and confirm our hypothesis that an efficient fuzzer maximizes information. ENTROPIC has been independently evaluated and integrated into the main-line LIBFUZZER as the default power schedule. ENTROPIC now runs on more than 25,000 machines fuzzing hundreds of security-critical software systems simultaneously and continuously.","PeriodicalId":10594,"journal":{"name":"Communications of the ACM","volume":"37 7","pages":"0"},"PeriodicalIF":11.1000,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications of the ACM","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3611019","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 3
Abstract
In this paper, we take the fundamental perspective of fuzzing as a learning process. Suppose before fuzzing, we know nothing about the behaviors of a program P : What does it do? Executing the first test input, we learn how P behaves for this input. Executing the next input, we either observe the same or discover a new behavior. As such, each execution reveals "some amount" of information about P 's behaviors. A classic measure of information is Shannon's entropy. Measuring entropy allows us to quantify how much is learned from each generated test input about the behaviors of the program. Within a probabilistic model of fuzzing, we show how entropy also measures fuzzer efficiency. Specifically, it measures the general rate at which the fuzzer discovers new behaviors. Intuitively, efficient fuzzers maximize information. From this information theoretic perspective, we develop ENTROPIC, an entropy-based power schedule for greybox fuzzing that assigns more energy to seeds that maximize information. We implemented ENTROPIC into the popular greybox fuzzer LIBFUZZER. Our experiments with more than 250 open-source programs (60 million LoC) demonstrate a substantially improved efficiency and confirm our hypothesis that an efficient fuzzer maximizes information. ENTROPIC has been independently evaluated and integrated into the main-line LIBFUZZER as the default power schedule. ENTROPIC now runs on more than 25,000 machines fuzzing hundreds of security-critical software systems simultaneously and continuously.
期刊介绍:
Communications of the ACM is the leading print and online publication for the computing and information technology fields. Read by computing''s leading professionals worldwide, Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Following the traditions of the Communications print magazine, which each month brings its readership of over 100,000 ACM members in-depth coverage of emerging areas of computer science, new trends in information technology, and practical applications, the Communications website brings topical and informative news and material to computing professionals each business day. ACM''s membership includes the IT industry''s most respected leaders and decision makers. Industry leaders have for more than 50 years used the monthly Communications of the ACM magazine as a platform to present and debate various technology implications, public policies, engineering challenges, and market trends. The Communications website continues that practice.