Sóley: Automated detection of logic vulnerabilities in Ethereum smart contracts using large language models

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Systems and Software Pub Date : 2025-03-01 DOI:10.1016/j.jss.2025.112406

Majd Soud, Waltteri Nuutinen, Grischa Liebel

{"title":"Sóley: Automated detection of logic vulnerabilities in Ethereum smart contracts using large language models","authors":"Majd Soud, Waltteri Nuutinen, Grischa Liebel","doi":"10.1016/j.jss.2025.112406","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Modern blockchain, such as Ethereum, supports the deployment and execution of so-called smart contracts, autonomous digital programs with significant value of cryptocurrency. Executing smart contracts requires gas costs paid by users, which define the limits of the contract’s execution. Logic vulnerabilities in smart contracts can lead to excessive gas consumption, financial losses, and are often the root cause of high-impact cyberattacks.</div></div><div><h3>Objective:</h3><div>Our objective is threefold: (i) empirically investigate logic vulnerabilities in real-world smart contracts extracted from code changes on GitHub, (ii) introduce Sóley, an automated method for detecting logic vulnerabilities in smart contracts, leveraging Large Language Models (LLMs), and (iii) examine mitigation strategies employed by smart contract developers to address these vulnerabilities in real-world scenarios.</div></div><div><h3>Method:</h3><div>We obtained smart contracts and related code changes from GitHub. To address the first and third objectives, we qualitatively investigated available logic vulnerabilities using an open coding method. We identified these vulnerabilities and their mitigation strategies. For the second objective, we extracted various logic vulnerabilities, focusing on those containing inline assembly fragments. We then applied preprocessing techniques and trained the proposed Sóley model. We evaluated Sóley along with the performance of various LLMs and compared the results with the state-of-the-art baseline on the task of logic vulnerability detection.</div></div><div><h3>Results:</h3><div>Our results include the curation of a large-scale dataset comprising 50,000 Ethereum smart contracts, with a total of 428,569 labeled instances of smart contract vulnerabilities, including 171,180 logic-related vulnerabilities. Our analysis uncovered nine novel logic vulnerabilities, which we used to extend existing taxonomies. Furthermore, we introduced several mitigation strategies extracted from observed developer modifications in real-world scenarios. Experimental results show that Sóley outperforms existing approaches in automatically identifying logic vulnerabilities, achieving a 9% improvement in accuracy and a maximum improvement of 24% in F1-measure over the Baseline. Interestingly, the efficacy of LLMs in this task was evident with minimal feature engineering. Despite the positive results, Sóley struggles to identify certain classes of logic vulnerabilities, which remain for future work.</div></div><div><h3>Conclusion:</h3><div>Early identification of logic vulnerabilities from code changes can provide valuable insights into their detection and mitigation. Recent advancements, such as LLMs, show promise in detecting logic vulnerabilities and contributing to smart contract security and sustainability.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"226 ","pages":"Article 112406"},"PeriodicalIF":4.1000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225000743","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Context:

Modern blockchain, such as Ethereum, supports the deployment and execution of so-called smart contracts, autonomous digital programs with significant value of cryptocurrency. Executing smart contracts requires gas costs paid by users, which define the limits of the contract’s execution. Logic vulnerabilities in smart contracts can lead to excessive gas consumption, financial losses, and are often the root cause of high-impact cyberattacks.

Objective:

Our objective is threefold: (i) empirically investigate logic vulnerabilities in real-world smart contracts extracted from code changes on GitHub, (ii) introduce Sóley, an automated method for detecting logic vulnerabilities in smart contracts, leveraging Large Language Models (LLMs), and (iii) examine mitigation strategies employed by smart contract developers to address these vulnerabilities in real-world scenarios.

Method:

We obtained smart contracts and related code changes from GitHub. To address the first and third objectives, we qualitatively investigated available logic vulnerabilities using an open coding method. We identified these vulnerabilities and their mitigation strategies. For the second objective, we extracted various logic vulnerabilities, focusing on those containing inline assembly fragments. We then applied preprocessing techniques and trained the proposed Sóley model. We evaluated Sóley along with the performance of various LLMs and compared the results with the state-of-the-art baseline on the task of logic vulnerability detection.

Results:

Our results include the curation of a large-scale dataset comprising 50,000 Ethereum smart contracts, with a total of 428,569 labeled instances of smart contract vulnerabilities, including 171,180 logic-related vulnerabilities. Our analysis uncovered nine novel logic vulnerabilities, which we used to extend existing taxonomies. Furthermore, we introduced several mitigation strategies extracted from observed developer modifications in real-world scenarios. Experimental results show that Sóley outperforms existing approaches in automatically identifying logic vulnerabilities, achieving a 9% improvement in accuracy and a maximum improvement of 24% in F1-measure over the Baseline. Interestingly, the efficacy of LLMs in this task was evident with minimal feature engineering. Despite the positive results, Sóley struggles to identify certain classes of logic vulnerabilities, which remain for future work.

Conclusion:

Early identification of logic vulnerabilities from code changes can provide valuable insights into their detection and mitigation. Recent advancements, such as LLMs, show promise in detecting logic vulnerabilities and contributing to smart contract security and sustainability.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Sóley：使用大型语言模型自动检测以太坊智能合约中的逻辑漏洞

背景：现代区块链，如以太坊，支持部署和执行所谓的智能合约，即具有重要加密货币价值的自主数字程序。执行智能合约需要用户支付gas成本，这定义了合约执行的限制。智能合约中的逻辑漏洞可能导致过度的天然气消耗和经济损失，并且通常是高影响网络攻击的根本原因。目标：我们的目标有三个：(i)实证研究从GitHub上的代码更改中提取的现实世界智能合约中的逻辑漏洞，（ii）引入Sóley，一种检测智能合约中的逻辑漏洞的自动化方法，利用大型语言模型（llm），以及（iii）检查智能合约开发人员采用的缓解策略，以解决现实世界场景中的这些漏洞。方法：从GitHub获取智能合约及相关代码变更。为了解决第一个和第三个目标，我们使用开放编码方法定性地研究了可用的逻辑漏洞。我们确定了这些漏洞及其缓解策略。对于第二个目标，我们提取了各种逻辑漏洞，重点关注那些包含内联汇编片段的漏洞。然后，我们应用预处理技术并训练提出的Sóley模型。我们评估了Sóley以及各种llm的性能，并将结果与最先进的逻辑漏洞检测任务基线进行了比较。结果：我们的结果包括一个大型数据集的管理，其中包括50,000个以太坊智能合约，共有428,569个智能合约漏洞标记实例，其中包括171,180个逻辑相关漏洞。我们的分析揭示了9个新的逻辑漏洞，我们用它们来扩展现有的分类法。此外，我们还介绍了从实际场景中观察到的开发人员修改中提取的几种缓解策略。实验结果表明，Sóley在自动识别逻辑漏洞方面优于现有方法，在基线上实现了9%的准确性提高，f1测量的最大改进为24%。有趣的是，llm在这项任务中的效果是显而易见的，只需最少的特征工程。尽管取得了积极的成果，Sóley仍在努力识别某些类型的逻辑漏洞，这仍有待于未来的工作。结论：早期识别代码更改中的逻辑漏洞可以为它们的检测和缓解提供有价值的见解。最近的进展，如法学硕士，在检测逻辑漏洞和促进智能合约的安全性和可持续性方面表现出了希望。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.

期刊最新文献

LogGen: Integrating traditional model and LLM with code analysis for precise log generation Editorial Board Smart contract vulnerabilities, tools, and benchmarks: An updated systematic literature review Investigating the potential of using worked examples to help resolve issues in a GitHub project Reference architecture for autonomy and adaptivity in satellites