Identifying runtime libraries in statically linked linux binaries

IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-11-13 DOI:10.1016/j.future.2024.107602
Javier Carrillo-Mondéjar , Ricardo J. Rodríguez
{"title":"Identifying runtime libraries in statically linked linux binaries","authors":"Javier Carrillo-Mondéjar ,&nbsp;Ricardo J. Rodríguez","doi":"10.1016/j.future.2024.107602","DOIUrl":null,"url":null,"abstract":"<div><div>Vulnerabilities in unpatched applications can originate from third-party dependencies in statically linked applications, as they must be relinked each time to take advantage of libraries that have been updated to fix any vulnerability. Despite this, malware binaries are often statically linked to ensure they run on target platforms and to complicate malware analysis. In this sense, identification of libraries in malware analysis becomes crucial to help filter out those library functions and focus on malware function analysis. In this paper, we introduce <span>MANTILLA</span>, a system for identifying runtime libraries in statically linked Linux-based binaries. Our system is based on <span>radare2</span> to identify functions and extract their features (independent of the underlying architecture of the binary) through static binary analysis and on the K-nearest neighbors supervised machine learning model and a majority rule to predict final values. <span>MANTILLA</span> is evaluated on a dataset consisting of binaries built for different architectures (<span>MIPSeb</span>, <span>ARMel</span>, <span>Intel x86</span>, and <span>Intel x86-64</span>) and different runtime libraries (<span>uClibc</span>, <span>glibc</span>, and <span>musl</span>), achieving very high accuracy. We also evaluate it in two case studies. First, using a dataset of binary files belonging to the <span>binutils</span> collection and second, using an IoT malware dataset. In both cases, good accuracy results are obtained both in terms of runtime library detection (94.4% and 95.5%, respectively) and architecture identification (100% and 98.6%, respectively).</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107602"},"PeriodicalIF":6.2000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24005661","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Vulnerabilities in unpatched applications can originate from third-party dependencies in statically linked applications, as they must be relinked each time to take advantage of libraries that have been updated to fix any vulnerability. Despite this, malware binaries are often statically linked to ensure they run on target platforms and to complicate malware analysis. In this sense, identification of libraries in malware analysis becomes crucial to help filter out those library functions and focus on malware function analysis. In this paper, we introduce MANTILLA, a system for identifying runtime libraries in statically linked Linux-based binaries. Our system is based on radare2 to identify functions and extract their features (independent of the underlying architecture of the binary) through static binary analysis and on the K-nearest neighbors supervised machine learning model and a majority rule to predict final values. MANTILLA is evaluated on a dataset consisting of binaries built for different architectures (MIPSeb, ARMel, Intel x86, and Intel x86-64) and different runtime libraries (uClibc, glibc, and musl), achieving very high accuracy. We also evaluate it in two case studies. First, using a dataset of binary files belonging to the binutils collection and second, using an IoT malware dataset. In both cases, good accuracy results are obtained both in terms of runtime library detection (94.4% and 95.5%, respectively) and architecture identification (100% and 98.6%, respectively).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
识别静态链接 Linux 二进制文件中的运行时库
未打补丁应用程序中的漏洞可能来自静态链接应用程序中的第三方依赖关系,因为它们每次都必须重新链接,以利用已更新以修复任何漏洞的库。尽管如此,恶意软件二进制文件通常都是静态链接的,以确保它们能在目标平台上运行,并使恶意软件分析复杂化。从这个意义上说,在恶意软件分析中识别库变得至关重要,这有助于过滤掉这些库函数,集中精力进行恶意软件功能分析。在本文中,我们介绍了 MANTILLA,一个用于识别基于静态链接 Linux 的二进制文件中运行时库的系统。我们的系统基于 radare2,通过静态二进制分析识别函数并提取其特征(与二进制的底层架构无关),并基于 K-nearest neighbors 监督机器学习模型和多数规则预测最终值。MANTILLA 在由不同架构(MIPSeb、ARMel、Intel x86 和 Intel x86-64)和不同运行库(uClibc、glibc 和 musl)构建的二进制文件组成的数据集上进行了评估,取得了非常高的准确率。我们还在两个案例研究中对其进行了评估。首先是使用属于 binutils 系列的二进制文件数据集,其次是使用物联网恶意软件数据集。在这两种情况下,运行库检测(分别为 94.4% 和 95.5%)和架构识别(分别为 100% 和 98.6%)的准确率都很高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
19.90
自引率
2.70%
发文量
376
审稿时长
10.6 months
期刊介绍: Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.
期刊最新文献
Identifying runtime libraries in statically linked linux binaries High throughput edit distance computation on FPGA-based accelerators using HLS In silico framework for genome analysis Adaptive ensemble optimization for memory-related hyperparameters in retraining DNN at edge Convergence-aware optimal checkpointing for exploratory deep learning training jobs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1