Exploring regular expression comprehension

Carl Chapman, Peipei Wang, Kathryn T. Stolee
{"title":"Exploring regular expression comprehension","authors":"Carl Chapman, Peipei Wang, Kathryn T. Stolee","doi":"10.1109/ASE.2017.8115653","DOIUrl":null,"url":null,"abstract":"The regular expression (regex) is a powerful tool employed in a large variety of software engineering tasks. However, prior work has shown that regexes can be very complex and that it could be difficult for developers to compose and understand them. This work seeks to identify code smells that impact comprehension. We conduct an empirical study on 42 pairs of behaviorally equivalent but syntactically different regexes using 180 participants and evaluate the understandability of various regex language features. We further analyze regexes in GitHub to find the community standards or the common usages of various features. We found that some regex expression representations are more understandable than others. For example, using a range (e.g., [0–9]) is often more understandable than a default character class (e.g., [\\d]). We also found that the DFA size of a regex significantly affects comprehension for the regexes studied. The larger the DFA of a regex (up to size eight), the more understandable it was. Finally, we identify smelly and non-smelly regex representations based on a combination of community standards and understandability metrics.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASE.2017.8115653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37

Abstract

The regular expression (regex) is a powerful tool employed in a large variety of software engineering tasks. However, prior work has shown that regexes can be very complex and that it could be difficult for developers to compose and understand them. This work seeks to identify code smells that impact comprehension. We conduct an empirical study on 42 pairs of behaviorally equivalent but syntactically different regexes using 180 participants and evaluate the understandability of various regex language features. We further analyze regexes in GitHub to find the community standards or the common usages of various features. We found that some regex expression representations are more understandable than others. For example, using a range (e.g., [0–9]) is often more understandable than a default character class (e.g., [\d]). We also found that the DFA size of a regex significantly affects comprehension for the regexes studied. The larger the DFA of a regex (up to size eight), the more understandable it was. Finally, we identify smelly and non-smelly regex representations based on a combination of community standards and understandability metrics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
探索正则表达式理解
正则表达式(regex)是用于各种软件工程任务的强大工具。然而,先前的工作表明,正则表达式可能非常复杂,开发人员很难组合和理解它们。这项工作旨在识别影响理解的代码气味。我们对180名参与者的42对行为等效但语法不同的正则表达式进行了实证研究,并评估了各种正则表达式语言特征的可理解性。我们进一步分析GitHub中的正则表达式,以找到社区标准或各种功能的通用用法。我们发现一些正则表达式表示比其他的更容易理解。例如,使用一个范围(例如,[0-9])通常比使用默认字符类(例如,[\d])更容易理解。我们还发现,一个正则表达式的DFA大小显著影响对所研究的正则表达式的理解。regex的DFA越大(最大为8),它就越容易理解。最后,我们基于社区标准和可理解性度量的组合来识别有问题和无问题的正则表达式表示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
TiQi: A natural language interface for querying software project data A comprehensive study on real world concurrency bugs in Node.js Managing software evolution through semantic history slicing Software performance self-adaptation through efficient model predictive control Privacy-aware data-intensive applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1