Early Experience with Transformer-Based Similarity Analysis for DataRaceBench

Winson X. Chen, T. Vanderbruggen, Pei-Hung Lin, C. Liao, M. Emani
{"title":"Early Experience with Transformer-Based Similarity Analysis for DataRaceBench","authors":"Winson X. Chen, T. Vanderbruggen, Pei-Hung Lin, C. Liao, M. Emani","doi":"10.1109/Correctness56720.2022.00011","DOIUrl":null,"url":null,"abstract":"DataRaceBench (DRB) is a dedicated benchmark suite to evaluate tools aimed to find data race bugs in OpenMP programs. Using microbenchmarks with or without data races, DRB is able to generate standard quality metrics and provide systematic and quantitative assessments of data race detection tools. However, as the number of microbenchmarks grows, it is challenging to manually identify similar code patterns for DRB, within the context of identifying duplicated kernels or guiding the additions of new kernels. In this paper, we experiment with a transformer-based, deep learning approach to similarity analysis. A state-of-the-art transformer model, CodeBERT, has been adapted to find similar OpenMP code regions. We explore the challenges and the solutions when applying transformer-based similarity analysis to new source codes which are unseen by pre-trained transformers. Using comparative experiments of different variants of similarity analysis, we comment on the strengths and limitations of the transformer-based approach and point out future research directions.","PeriodicalId":211482,"journal":{"name":"2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Correctness56720.2022.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

DataRaceBench (DRB) is a dedicated benchmark suite to evaluate tools aimed to find data race bugs in OpenMP programs. Using microbenchmarks with or without data races, DRB is able to generate standard quality metrics and provide systematic and quantitative assessments of data race detection tools. However, as the number of microbenchmarks grows, it is challenging to manually identify similar code patterns for DRB, within the context of identifying duplicated kernels or guiding the additions of new kernels. In this paper, we experiment with a transformer-based, deep learning approach to similarity analysis. A state-of-the-art transformer model, CodeBERT, has been adapted to find similar OpenMP code regions. We explore the challenges and the solutions when applying transformer-based similarity analysis to new source codes which are unseen by pre-trained transformers. Using comparative experiments of different variants of similarity analysis, we comment on the strengths and limitations of the transformer-based approach and point out future research directions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DataRaceBench中基于变压器的相似性分析的早期经验
DataRaceBench (DRB)是一个专用的基准套件,用于评估旨在发现OpenMP程序中的数据竞争错误的工具。使用有或没有数据竞争的微基准测试,DRB能够生成标准的质量度量,并提供数据竞争检测工具的系统和定量评估。然而,随着微基准测试数量的增长,在识别重复的内核或指导新内核的添加的上下文中,为DRB手动识别类似的代码模式是具有挑战性的。在本文中,我们尝试了一种基于转换器的深度学习方法来进行相似性分析。一个最先进的变压器模型CodeBERT已经被用来寻找类似的OpenMP代码区域。我们探讨了在将基于变压器的相似性分析应用于预训练变压器看不见的新源代码时所面临的挑战和解决方案。通过不同相似度分析方法的对比实验,对基于变压器的相似度分析方法的优势和局限性进行了评述,并指出了今后的研究方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Towards Verified Rounding Error Analysis for Stationary Iterative Methods Early Experience with Transformer-Based Similarity Analysis for DataRaceBench MiniKokkos: A Calculus of Portable Parallelism Correctness 2022 Workshop Organization Static Local Concurrency Errors Detection in MPI-RMA Programs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1