从二进制数据样本中自动发现和合成校验和算法

Lauren Labell, Jared Chandler, Kathleen Fisher
{"title":"从二进制数据样本中自动发现和合成校验和算法","authors":"Lauren Labell, Jared Chandler, Kathleen Fisher","doi":"10.1145/3411506.3417599","DOIUrl":null,"url":null,"abstract":"Reverse engineering unknown binary message formats is an important part of security research. Error detecting codes such as checksums and Cyclic Redundancy Check codes (CRCs) are commonly added to messages as a guard against corrupt or untrusted input. Before an analyst can manufacture input for software which uses checksums they must discover the algorithm to calculate a valid checksum. To address this need, we have developed a program synthesis based approach for detecting and reverse-engineering checksum algorithms automatically. Our approach takes a small set of binary messages as input and automatically returns a Python implementation of the checksum algorithm if one can be found. Our approach first performs a search over the message space to identify the location of the checksum and then uses program synthesis to identify the operations performed on the message to compute the checksum. We return to the user runnable code to both calculate a checksum from a message and to validate a message according to the checksum algorithm. We generate unit tests, allowing the user to validate the synthesized checksum algorithm is correct with regard to the input messages. We created the Tufts Checksum Corpus comprised of 12 checksum inference questions collected from posts on reverse engineering question and answer sites and 2 instances of common internet protocol checksums. Our approach successfully synthesized the underlying checksum algorithms for 12 out of 14 cases in our test suite.","PeriodicalId":110751,"journal":{"name":"Proceedings of the 15th Workshop on Programming Languages and Analysis for Security","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic Discovery and Synthesis of Checksum Algorithms from Binary Data Samples\",\"authors\":\"Lauren Labell, Jared Chandler, Kathleen Fisher\",\"doi\":\"10.1145/3411506.3417599\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reverse engineering unknown binary message formats is an important part of security research. Error detecting codes such as checksums and Cyclic Redundancy Check codes (CRCs) are commonly added to messages as a guard against corrupt or untrusted input. Before an analyst can manufacture input for software which uses checksums they must discover the algorithm to calculate a valid checksum. To address this need, we have developed a program synthesis based approach for detecting and reverse-engineering checksum algorithms automatically. Our approach takes a small set of binary messages as input and automatically returns a Python implementation of the checksum algorithm if one can be found. Our approach first performs a search over the message space to identify the location of the checksum and then uses program synthesis to identify the operations performed on the message to compute the checksum. We return to the user runnable code to both calculate a checksum from a message and to validate a message according to the checksum algorithm. We generate unit tests, allowing the user to validate the synthesized checksum algorithm is correct with regard to the input messages. We created the Tufts Checksum Corpus comprised of 12 checksum inference questions collected from posts on reverse engineering question and answer sites and 2 instances of common internet protocol checksums. Our approach successfully synthesized the underlying checksum algorithms for 12 out of 14 cases in our test suite.\",\"PeriodicalId\":110751,\"journal\":{\"name\":\"Proceedings of the 15th Workshop on Programming Languages and Analysis for Security\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 15th Workshop on Programming Languages and Analysis for Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3411506.3417599\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th Workshop on Programming Languages and Analysis for Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3411506.3417599","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

未知二进制消息格式的逆向工程是安全研究的重要组成部分。错误检测码(如校验和和循环冗余检查码)通常被添加到消息中,以防止损坏或不可信的输入。在分析人员为使用校验和的软件制造输入之前,他们必须发现计算有效校验和的算法。为了满足这一需求,我们开发了一种基于程序综合的方法来自动检测和逆向工程校验和算法。我们的方法将一小组二进制消息作为输入,并自动返回校验和算法的Python实现(如果可以找到的话)。我们的方法首先在消息空间中执行搜索以确定校验和的位置,然后使用程序合成来确定对消息执行的操作以计算校验和。我们返回到用户可运行代码来计算消息的校验和并根据校验和算法验证消息。我们生成单元测试,允许用户根据输入消息验证合成校验和算法是否正确。我们创建了Tufts校验和语料库,包括从逆向工程问答网站上收集的12个校验和推理问题和2个常见的互联网协议校验和实例。我们的方法成功地为测试套件中的14个案例中的12个案例合成了底层校验和算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Automatic Discovery and Synthesis of Checksum Algorithms from Binary Data Samples
Reverse engineering unknown binary message formats is an important part of security research. Error detecting codes such as checksums and Cyclic Redundancy Check codes (CRCs) are commonly added to messages as a guard against corrupt or untrusted input. Before an analyst can manufacture input for software which uses checksums they must discover the algorithm to calculate a valid checksum. To address this need, we have developed a program synthesis based approach for detecting and reverse-engineering checksum algorithms automatically. Our approach takes a small set of binary messages as input and automatically returns a Python implementation of the checksum algorithm if one can be found. Our approach first performs a search over the message space to identify the location of the checksum and then uses program synthesis to identify the operations performed on the message to compute the checksum. We return to the user runnable code to both calculate a checksum from a message and to validate a message according to the checksum algorithm. We generate unit tests, allowing the user to validate the synthesized checksum algorithm is correct with regard to the input messages. We created the Tufts Checksum Corpus comprised of 12 checksum inference questions collected from posts on reverse engineering question and answer sites and 2 instances of common internet protocol checksums. Our approach successfully synthesized the underlying checksum algorithms for 12 out of 14 cases in our test suite.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Session details: Invited Talk II Session details: Program Synthesis and Blockchain Session details: Types for Gradual Security and Verification of Security Protocols Session details: Invited Talk I Automatic Discovery and Synthesis of Checksum Algorithms from Binary Data Samples
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1