A smart contract vulnerability detection method based on deep learning with opcode sequences

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Peer-To-Peer Networking and Applications Pub Date : 2024-06-27 DOI:10.1007/s12083-024-01750-7

Peiqiang Li, Guojun Wang, Xiaofei Xing, Jinyao Zhu, Wanyi Gu, Guangxin Zhai

{"title":"A smart contract vulnerability detection method based on deep learning with opcode sequences","authors":"Peiqiang Li, Guojun Wang, Xiaofei Xing, Jinyao Zhu, Wanyi Gu, Guangxin Zhai","doi":"10.1007/s12083-024-01750-7","DOIUrl":null,"url":null,"abstract":"<p>Ethereum is a blockchain network that allows developers to create smart contracts and programs that run on the blockchain. Smart contracts contain logic to transfer assets based on pre-defined conditions. With over 100,000 new smart contracts being deployed every day, the potential for coding errors is high, making the contracts vulnerable to exploits. A key limitation is that once deployed, smart contracts are immutable and cannot be updated, even if flaws are found. This inflexibility puts funds at risk of theft and loss. The rapid pace of deployment outpaces security audits, increasing vulnerabilities that put users’ cryptocurrency at risk. To reduce the risk caused by smart contract vulnerabilities, we applied deep learning techniques. To develop a deep learning model capable of detecting vulnerabilities, we first created a dataset by replaying real transactions on the Ethereum Mainnet, collecting opcode sequences from real Ethereum contracts, and labeling them using the SODA plugin. We pre-processed this opcode data by removing duplicates, normalizing sequence lengths, simplifying opcodes into representative groups, and converting sequences into numerical vectors to ultimately obtain an optimal representation of the data. We then trained and evaluated three different neural network architectures on this dataset. Our best-performing model achieved an average accuracy of 88% in detecting seven types of vulnerabilities. Further analysis showed that the model was effective at identifying potential problems in smart contracts, which was an important capability for securing funds and executing logic in live contracts.</p>","PeriodicalId":49313,"journal":{"name":"Peer-To-Peer Networking and Applications","volume":"29 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Peer-To-Peer Networking and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12083-024-01750-7","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Ethereum is a blockchain network that allows developers to create smart contracts and programs that run on the blockchain. Smart contracts contain logic to transfer assets based on pre-defined conditions. With over 100,000 new smart contracts being deployed every day, the potential for coding errors is high, making the contracts vulnerable to exploits. A key limitation is that once deployed, smart contracts are immutable and cannot be updated, even if flaws are found. This inflexibility puts funds at risk of theft and loss. The rapid pace of deployment outpaces security audits, increasing vulnerabilities that put users’ cryptocurrency at risk. To reduce the risk caused by smart contract vulnerabilities, we applied deep learning techniques. To develop a deep learning model capable of detecting vulnerabilities, we first created a dataset by replaying real transactions on the Ethereum Mainnet, collecting opcode sequences from real Ethereum contracts, and labeling them using the SODA plugin. We pre-processed this opcode data by removing duplicates, normalizing sequence lengths, simplifying opcodes into representative groups, and converting sequences into numerical vectors to ultimately obtain an optimal representation of the data. We then trained and evaluated three different neural network architectures on this dataset. Our best-performing model achieved an average accuracy of 88% in detecting seven types of vulnerabilities. Further analysis showed that the model was effective at identifying potential problems in smart contracts, which was an important capability for securing funds and executing logic in live contracts.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度学习与操作码序列的智能合约漏洞检测方法

以太坊是一个区块链网络，允许开发人员创建在区块链上运行的智能合约和程序。智能合约包含根据预定义条件转移资产的逻辑。每天都有超过 10 万份新的智能合约被部署，编码错误的可能性很高，使得合约容易被利用。一个关键的限制是，智能合约一旦部署，就不可更改，即使发现了缺陷也无法更新。这种不灵活性使资金面临被盗和损失的风险。部署速度之快超过了安全审计的速度，增加了漏洞，使用户的加密货币面临风险。为了降低智能合约漏洞带来的风险，我们应用了深度学习技术。为了开发能够检测漏洞的深度学习模型，我们首先通过重放以太坊主网上的真实交易创建了一个数据集，从真实的以太坊合约中收集操作码序列，并使用 SODA 插件对其进行标记。我们对这些操作码数据进行了预处理，包括删除重复数据、对序列长度进行归一化处理、将操作码简化为具有代表性的组别，以及将序列转换为数字向量，以最终获得最佳的数据表示。然后，我们在该数据集上训练并评估了三种不同的神经网络架构。我们性能最好的模型在检测七种类型的漏洞方面达到了 88% 的平均准确率。进一步的分析表明，该模型能有效识别智能合约中的潜在问题，而这正是确保资金安全和执行实时合约逻辑的重要能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Peer-To-Peer Networking and Applications COMPUTER SCIENCE, INFORMATION SYSTEMS-TELECOMMUNICATIONS

CiteScore

8.00

自引率

7.10%

发文量

145

审稿时长

12 months

期刊介绍： The aim of the Peer-to-Peer Networking and Applications journal is to disseminate state-of-the-art research and development results in this rapidly growing research area, to facilitate the deployment of P2P networking and applications, and to bring together the academic and industry communities, with the goal of fostering interaction to promote further research interests and activities, thus enabling new P2P applications and services. The journal not only addresses research topics related to networking and communications theory, but also considers the standardization, economic, and engineering aspects of P2P technologies, and their impacts on software engineering, computer engineering, networked communication, and security. The journal serves as a forum for tackling the technical problems arising from both file sharing and media streaming applications. It also includes state-of-the-art technologies in the P2P security domain. Peer-to-Peer Networking and Applications publishes regular papers, tutorials and review papers, case studies, and correspondence from the research, development, and standardization communities. Papers addressing system, application, and service issues are encouraged.