{"title":"Concretely Mapped Symbolic Memory Locations for Memory Error Detection","authors":"Haoxin Tu;Lingxiao Jiang;Jiaqi Hong;Xuhua Ding;He Jiang","doi":"10.1109/TSE.2024.3395412","DOIUrl":null,"url":null,"abstract":"Memory allocation is a fundamental operation for managing memory objects in many programming languages. Misusing allocated memory objects (e.g., \n<italic>buffer overflow</i>\n and \n<italic>use-after-free</i>\n) can have catastrophic consequences. Symbolic execution-based approaches have been used to detect such memory errors, benefiting from their capabilities in automatic path exploration and test case generation. However, existing symbolic execution engines still suffer from fundamental limitations in modeling dynamic memory layouts; they either represent the locations of memory objects as concrete addresses and thus limit their analyses only to specific address layouts and miss errors that may only occur when the objects are located at special addresses, or represent the locations as simple symbolic variables without sufficient constraints and thus suffer from memory state explosion when they execute read/write operations involving symbolic addresses. Such limitations hinder the existing symbolic execution engines from effectively detecting certain memory errors. In this study, we propose \n<sc>SymLoc</small>\n, a symbolic execution-based approach that uses concretely mapped symbolic memory locations to alleviate the limitations mentioned above. Specifically, a new integration of three techniques is designed in \n<sc>SymLoc</small>\n: (1) the symbolization of addresses and encoding of symbolic addresses into path constraints, (2) the symbolic memory read/write operations using a symbolic-concrete memory map, and (3) the automatic tracking of the uses of symbolic memory locations. We build \n<sc>SymLoc</small>\n on top of the well-known symbolic execution engine KLEE and demonstrate its benefits in terms of memory error detection and code coverage capabilities. Our evaluation results show that: for address-specific spatial memory errors, \n<sc>SymLoc</small>\n can detect 23 more errors in \n<monospace>GNU Coreutils</monospace>\n, \n<monospace>Make</monospace>\n, and \n<monospace>m4</monospace>\n programs that are difficult for other approaches to detect, and cover 15% and 48% more unique lines of code in the programs than two baseline approaches; for temporal memory errors, \n<sc>SymLoc</small>\n can detect 8%-64% more errors in the Juliet Test Suite than various existing state-of-the-art memory error detectors. We also present two case studies to show sample memory errors detected by \n<sc>SymLoc</small>\n along with their root causes and implications.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 7","pages":"1747-1767"},"PeriodicalIF":6.5000,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10510589/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Memory allocation is a fundamental operation for managing memory objects in many programming languages. Misusing allocated memory objects (e.g.,
buffer overflow
and
use-after-free
) can have catastrophic consequences. Symbolic execution-based approaches have been used to detect such memory errors, benefiting from their capabilities in automatic path exploration and test case generation. However, existing symbolic execution engines still suffer from fundamental limitations in modeling dynamic memory layouts; they either represent the locations of memory objects as concrete addresses and thus limit their analyses only to specific address layouts and miss errors that may only occur when the objects are located at special addresses, or represent the locations as simple symbolic variables without sufficient constraints and thus suffer from memory state explosion when they execute read/write operations involving symbolic addresses. Such limitations hinder the existing symbolic execution engines from effectively detecting certain memory errors. In this study, we propose
SymLoc
, a symbolic execution-based approach that uses concretely mapped symbolic memory locations to alleviate the limitations mentioned above. Specifically, a new integration of three techniques is designed in
SymLoc
: (1) the symbolization of addresses and encoding of symbolic addresses into path constraints, (2) the symbolic memory read/write operations using a symbolic-concrete memory map, and (3) the automatic tracking of the uses of symbolic memory locations. We build
SymLoc
on top of the well-known symbolic execution engine KLEE and demonstrate its benefits in terms of memory error detection and code coverage capabilities. Our evaluation results show that: for address-specific spatial memory errors,
SymLoc
can detect 23 more errors in
GNU Coreutils
,
Make
, and
m4
programs that are difficult for other approaches to detect, and cover 15% and 48% more unique lines of code in the programs than two baseline approaches; for temporal memory errors,
SymLoc
can detect 8%-64% more errors in the Juliet Test Suite than various existing state-of-the-art memory error detectors. We also present two case studies to show sample memory errors detected by
SymLoc
along with their root causes and implications.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.