Zibin Zheng;Jianzhong Su;Jiachi Chen;David Lo;Zhijie Zhong;Mingxi Ye
{"title":"DAppSCAN: Building Large-Scale Datasets for Smart Contract Weaknesses in DApp Projects","authors":"Zibin Zheng;Jianzhong Su;Jiachi Chen;David Lo;Zhijie Zhong;Mingxi Ye","doi":"10.1109/TSE.2024.3383422","DOIUrl":null,"url":null,"abstract":"The Smart Contract Weakness Classification Registry (SWC Registry) is a widely recognized list of smart contract weaknesses specific to the Ethereum platform. Despite the SWC Registry not being updated with new entries since 2020, the sustained development of smart contract analysis tools for detecting SWC-listed weaknesses highlights their ongoing significance in the field. However, evaluating these tools has proven challenging due to the absence of a large, unbiased, real-world dataset. To address this problem, we aim to build a large-scale SWC weakness dataset from real-world DApp projects. We recruited 22 participants and spent 44 person-months analyzing 1,199 open-source audit reports from 29 security teams. In total, we identified 9,154 weaknesses and developed two distinct datasets, i.e., \n<sc>DAppSCAN-Source</small>\n and \n<sc>DAppSCAN-Bytecode</small>\n. The \n<sc>DAppSCAN-Source</small>\n dataset comprises 39,904 Solidity files, featuring 1,618 SWC weaknesses sourced from 682 real-world DApp projects. However, the Solidity files in this dataset may not be directly compilable for further analysis. To facilitate automated analysis, we developed a tool capable of automatically identifying dependency relationships within DApp projects and completing missing public libraries. Using this tool, we created \n<sc>DAppSCAN-Bytecode</small>\n dataset, which consists of 6,665 compiled smart contract with 888 SWC weaknesses. Based on \n<sc>DAppSCAN-Bytecode</small>\n, we conducted an empirical study to evaluate the performance of state-of-the-art smart contract weakness detection tools. The evaluation results revealed sub-par performance for these tools in terms of both effectiveness and success detection rate, indicating that future development should prioritize real-world datasets over simplistic toy contracts.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":null,"pages":null},"PeriodicalIF":6.5000,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10486822/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
The Smart Contract Weakness Classification Registry (SWC Registry) is a widely recognized list of smart contract weaknesses specific to the Ethereum platform. Despite the SWC Registry not being updated with new entries since 2020, the sustained development of smart contract analysis tools for detecting SWC-listed weaknesses highlights their ongoing significance in the field. However, evaluating these tools has proven challenging due to the absence of a large, unbiased, real-world dataset. To address this problem, we aim to build a large-scale SWC weakness dataset from real-world DApp projects. We recruited 22 participants and spent 44 person-months analyzing 1,199 open-source audit reports from 29 security teams. In total, we identified 9,154 weaknesses and developed two distinct datasets, i.e.,
DAppSCAN-Source
and
DAppSCAN-Bytecode
. The
DAppSCAN-Source
dataset comprises 39,904 Solidity files, featuring 1,618 SWC weaknesses sourced from 682 real-world DApp projects. However, the Solidity files in this dataset may not be directly compilable for further analysis. To facilitate automated analysis, we developed a tool capable of automatically identifying dependency relationships within DApp projects and completing missing public libraries. Using this tool, we created
DAppSCAN-Bytecode
dataset, which consists of 6,665 compiled smart contract with 888 SWC weaknesses. Based on
DAppSCAN-Bytecode
, we conducted an empirical study to evaluate the performance of state-of-the-art smart contract weakness detection tools. The evaluation results revealed sub-par performance for these tools in terms of both effectiveness and success detection rate, indicating that future development should prioritize real-world datasets over simplistic toy contracts.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.