PRES:多处理器上带有执行草图的概率重播

Soyeon Park, Yuanyuan Zhou, Weiwei Xiong, Zuoning Yin, Rini T. Kaushik, Kyuhyung Lee, Shan Lu
{"title":"PRES:多处理器上带有执行草图的概率重播","authors":"Soyeon Park, Yuanyuan Zhou, Weiwei Xiong, Zuoning Yin, Rini T. Kaushik, Kyuhyung Lee, Shan Lu","doi":"10.1145/1629575.1629593","DOIUrl":null,"url":null,"abstract":"Bug reproduction is critically important for diagnosing a production-run failure. Unfortunately, reproducing a concurrency bug on multi-processors (e.g., multi-core) is challenging. Previous techniques either incur large overhead or require new non-trivial hardware extensions.\n This paper proposes a novel technique called PRES (probabilistic replay via execution sketching) to help reproduce concurrency bugs on multi-processors. It relaxes the past (perhaps idealistic) objective of \"reproducing the bug on the first replay attempt\" to significantly lower production-run recording overhead. This is achieved by (1) recording only partial execution information (referred to as \"sketches\") during the production run, and (2) relying on an intelligent replayer during diagnosis time (when performance is less critical) to systematically explore the unrecorded non-deterministic space and reproduce the bug. With only partial information, our replayer may require more than one coordinated replay run to reproduce a bug. However, after a bug is reproduced once, PRES can reproduce it every time.\n We implemented PRES along with five different execution sketching mechanisms. We evaluated them with 11 representative applications, including 4 servers, 3 desktop/client applications, and 4 scientific/graphics applications, with 13 real-world concurrency bugs of different types, including atomicity violations, order violations and deadlocks. PRES (with synchronization or system call sketching) significantly lowered the production-run recording overhead of previous approaches (by up to 4416 times), while still reproducing most tested bugs in fewer than 10 replay attempts. Moreover, PRES scaled well with the number of processors; PRES's feedback generation from unsuccessful replays is critical in bug reproduction.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"97 1","pages":"177-192"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"283","resultStr":"{\"title\":\"PRES: probabilistic replay with execution sketching on multiprocessors\",\"authors\":\"Soyeon Park, Yuanyuan Zhou, Weiwei Xiong, Zuoning Yin, Rini T. Kaushik, Kyuhyung Lee, Shan Lu\",\"doi\":\"10.1145/1629575.1629593\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bug reproduction is critically important for diagnosing a production-run failure. Unfortunately, reproducing a concurrency bug on multi-processors (e.g., multi-core) is challenging. Previous techniques either incur large overhead or require new non-trivial hardware extensions.\\n This paper proposes a novel technique called PRES (probabilistic replay via execution sketching) to help reproduce concurrency bugs on multi-processors. It relaxes the past (perhaps idealistic) objective of \\\"reproducing the bug on the first replay attempt\\\" to significantly lower production-run recording overhead. This is achieved by (1) recording only partial execution information (referred to as \\\"sketches\\\") during the production run, and (2) relying on an intelligent replayer during diagnosis time (when performance is less critical) to systematically explore the unrecorded non-deterministic space and reproduce the bug. With only partial information, our replayer may require more than one coordinated replay run to reproduce a bug. However, after a bug is reproduced once, PRES can reproduce it every time.\\n We implemented PRES along with five different execution sketching mechanisms. We evaluated them with 11 representative applications, including 4 servers, 3 desktop/client applications, and 4 scientific/graphics applications, with 13 real-world concurrency bugs of different types, including atomicity violations, order violations and deadlocks. PRES (with synchronization or system call sketching) significantly lowered the production-run recording overhead of previous approaches (by up to 4416 times), while still reproducing most tested bugs in fewer than 10 replay attempts. Moreover, PRES scaled well with the number of processors; PRES's feedback generation from unsuccessful replays is critical in bug reproduction.\",\"PeriodicalId\":20672,\"journal\":{\"name\":\"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles\",\"volume\":\"97 1\",\"pages\":\"177-192\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"283\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1629575.1629593\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1629575.1629593","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 283

摘要

Bug重现对于诊断生产运行故障至关重要。不幸的是,在多处理器(例如,多核)上重现并发错误是具有挑战性的。以前的技术要么产生很大的开销,要么需要新的重要的硬件扩展。本文提出了一种称为PRES(通过执行草图的概率重播)的新技术来帮助再现多处理器上的并发错误。它放松了过去(也许是理想主义的)“在第一次重播尝试时再现错误”的目标,从而显著降低了生产运行的记录开销。这是通过以下方式实现的:(1)在生产运行期间仅记录部分执行信息(称为“草图”),以及(2)在诊断期间(当性能不太关键时)依赖智能重播器系统地探索未记录的非确定性空间并重现错误。由于只有部分信息,我们的重播器可能需要多次协调重播运行来重现bug。但是,在错误被复制一次之后,PRES可以每次都复制它。我们将PRES与五种不同的执行草图机制一起实现。我们用11个代表性应用程序对它们进行了评估,其中包括4个服务器应用程序、3个桌面/客户端应用程序和4个科学/图形应用程序,其中有13个不同类型的真实并发错误,包括原子性违反、顺序违反和死锁。PRES(使用同步或系统调用草图)显著降低了以前方法的生产运行记录开销(最多减少了4416倍),同时在不到10次重放尝试中仍然再现了大多数测试过的错误。此外,PRES可以很好地随处理器数量的增加而扩展;PRES从不成功的重放中产生的反馈对bug繁殖至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PRES: probabilistic replay with execution sketching on multiprocessors
Bug reproduction is critically important for diagnosing a production-run failure. Unfortunately, reproducing a concurrency bug on multi-processors (e.g., multi-core) is challenging. Previous techniques either incur large overhead or require new non-trivial hardware extensions. This paper proposes a novel technique called PRES (probabilistic replay via execution sketching) to help reproduce concurrency bugs on multi-processors. It relaxes the past (perhaps idealistic) objective of "reproducing the bug on the first replay attempt" to significantly lower production-run recording overhead. This is achieved by (1) recording only partial execution information (referred to as "sketches") during the production run, and (2) relying on an intelligent replayer during diagnosis time (when performance is less critical) to systematically explore the unrecorded non-deterministic space and reproduce the bug. With only partial information, our replayer may require more than one coordinated replay run to reproduce a bug. However, after a bug is reproduced once, PRES can reproduce it every time. We implemented PRES along with five different execution sketching mechanisms. We evaluated them with 11 representative applications, including 4 servers, 3 desktop/client applications, and 4 scientific/graphics applications, with 13 real-world concurrency bugs of different types, including atomicity violations, order violations and deadlocks. PRES (with synchronization or system call sketching) significantly lowered the production-run recording overhead of previous approaches (by up to 4416 times), while still reproducing most tested bugs in fewer than 10 replay attempts. Moreover, PRES scaled well with the number of processors; PRES's feedback generation from unsuccessful replays is critical in bug reproduction.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
ResilientFL '21: Proceedings of the First Workshop on Systems Challenges in Reliable and Secure Federated Learning, Virtual Event / Koblenz, Germany, 25 October 2021 SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, Virtual Event / Koblenz, Germany, October 26-29, 2021 Application Performance Monitoring: Trade-Off between Overhead Reduction and Maintainability Efficient deterministic multithreading through schedule relaxation SILT: a memory-efficient, high-performance key-value store
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1