An efficient string solver for string constraints with regex-counting and string-length

IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Systems Architecture Pub Date : 2025-03-01 Epub Date: 2025-01-17 DOI:10.1016/j.sysarc.2025.103340
Denghang Hu, Zhilin Wu
{"title":"An efficient string solver for string constraints with regex-counting and string-length","authors":"Denghang Hu,&nbsp;Zhilin Wu","doi":"10.1016/j.sysarc.2025.103340","DOIUrl":null,"url":null,"abstract":"<div><div>Regular expressions (regex for short) and string-length function are widely used in string-manipulating programs. Counting is a frequently used feature in regexes that counts the number of matchings of sub-patterns. The state-of-the-art string solvers are incapable of solving string constraints with regex-counting and string-length efficiently, especially when the counting and length bounds are large. In this work, we propose an automata-theoretic approach for solving such class of string constraints. The main idea is to symbolically model the counting operators by registers in automata instead of unfolding them explicitly, thus alleviating the state explosion problem. Moreover, the string-length function is modeled by a register as well. As a result, the satisfiability of string constraints with regex-counting and string-length is reduced to the satisfiability of linear integer arithmetic, which the off-the-shelf SMT solvers can then solve. To improve the performance further, we also propose various optimization techniques. We implement the algorithms and validate our approach on 49,843 benchmark instances. The experimental results show that our approach can solve more instances than the state-of-the-art solvers, at a comparable or faster speed, especially when the counting and length bounds are large or when the counting operators are nested with some other counting operators or complement operators.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"160 ","pages":"Article 103340"},"PeriodicalIF":4.1000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125000128","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/17 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Regular expressions (regex for short) and string-length function are widely used in string-manipulating programs. Counting is a frequently used feature in regexes that counts the number of matchings of sub-patterns. The state-of-the-art string solvers are incapable of solving string constraints with regex-counting and string-length efficiently, especially when the counting and length bounds are large. In this work, we propose an automata-theoretic approach for solving such class of string constraints. The main idea is to symbolically model the counting operators by registers in automata instead of unfolding them explicitly, thus alleviating the state explosion problem. Moreover, the string-length function is modeled by a register as well. As a result, the satisfiability of string constraints with regex-counting and string-length is reduced to the satisfiability of linear integer arithmetic, which the off-the-shelf SMT solvers can then solve. To improve the performance further, we also propose various optimization techniques. We implement the algorithms and validate our approach on 49,843 benchmark instances. The experimental results show that our approach can solve more instances than the state-of-the-art solvers, at a comparable or faster speed, especially when the counting and length bounds are large or when the counting operators are nested with some other counting operators or complement operators.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
具有正则表达式计数和字符串长度的字符串约束的有效字符串求解器
正则表达式(简称regex)和字符串长度函数在字符串操作程序中被广泛使用。计数是正则中经常使用的功能,用于计算子模式匹配的数量。当前的字符串求解器无法有效地求解包含正则计数和字符串长度的字符串约束,特别是当计数和长度界限较大时。在这项工作中,我们提出了一种自动机理论方法来求解这类字符串约束。主要思想是通过自动机中的寄存器对计数算子进行符号化建模,而不是显式地展开它们,从而减轻状态爆炸问题。此外,字符串长度函数也是由寄存器建模的。因此,字符串约束的正则表达式计数和字符串长度的可满足性被简化为线性整数算法的可满足性,然后现成的SMT求解器可以求解。为了进一步提高性能,我们还提出了各种优化技术。我们实现了这些算法,并在49,843个基准测试实例上验证了我们的方法。实验结果表明,我们的方法可以比最先进的求解器以相当或更快的速度解决更多的实例,特别是当计数和长度界限较大或计数算子与其他计数算子或补算子嵌套时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Systems Architecture
Journal of Systems Architecture 工程技术-计算机:硬件
CiteScore
8.70
自引率
15.60%
发文量
226
审稿时长
46 days
期刊介绍: The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.
期刊最新文献
Bias–variance games for tiny model synthesis in resource-constrained Earth Observation systems HAGC: A Hardware-Aware Gradient Compression framework for distributed deep learning Plug-and-play neural compression: A knowledge distillation framework with flexible dimensionality reduction Supporting efficient and verifiable keyword queries on dynamic blockchain data χRVFormal: Formal verification of RISC-V processor Chisel designs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1