Faster general parsing through context-free memoization

G. Herman
{"title":"Faster general parsing through context-free memoization","authors":"G. Herman","doi":"10.1145/3385412.3386032","DOIUrl":null,"url":null,"abstract":"We present a novel parsing algorithm for all context-free languages. The algorithm features a clean mathematical formulation: parsing is expressed as a series of standard operations on regular languages and relations. Parsing complexity w.r.t. input length matches the state of the art: it is worst-case cubic, quadratic for unambiguous grammars, and linear for LR-regular grammars. What distinguishes our approach is that parsing can be implemented using only immutable, acyclic data structures. We also propose a parsing optimization technique called context-free memoization. It allows handling an overwhelming majority of input symbols using a simple stack and a lookup table, similarly to the operation of a deterministic LR(1) parser. This allows our proof-of-concept implementation to outperform the best current implementations of common generalized parsing algorithms (Earley, GLR, and GLL). Tested on a large Java source corpus, parsing is 3–5 times faster, while recognition—35 times faster.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3385412.3386032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

We present a novel parsing algorithm for all context-free languages. The algorithm features a clean mathematical formulation: parsing is expressed as a series of standard operations on regular languages and relations. Parsing complexity w.r.t. input length matches the state of the art: it is worst-case cubic, quadratic for unambiguous grammars, and linear for LR-regular grammars. What distinguishes our approach is that parsing can be implemented using only immutable, acyclic data structures. We also propose a parsing optimization technique called context-free memoization. It allows handling an overwhelming majority of input symbols using a simple stack and a lookup table, similarly to the operation of a deterministic LR(1) parser. This allows our proof-of-concept implementation to outperform the best current implementations of common generalized parsing algorithms (Earley, GLR, and GLL). Tested on a large Java source corpus, parsing is 3–5 times faster, while recognition—35 times faster.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过上下文无关的记忆实现更快的一般解析
我们提出了一种适用于所有上下文无关语言的新的解析算法。该算法具有简洁的数学公式:解析被表示为对规则语言和关系的一系列标准操作。解析复杂度w.r.t.输入长度匹配当前的技术水平:最坏情况是三次,对于明确的语法是二次的,对于lr -正则语法是线性的。我们的方法的不同之处在于,解析只能使用不可变的、无循环的数据结构来实现。我们还提出了一种称为上下文无关记忆的解析优化技术。它允许使用一个简单的堆栈和一个查找表处理绝大多数输入符号,类似于确定性LR(1)解析器的操作。这使得我们的概念验证实现的性能优于当前通用解析算法(Earley、GLR和GLL)的最佳实现。在大型Java源语料库上测试,解析速度快3-5倍,而识别速度快35倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Type error feedback via analytic program repair Inductive sequentialization of asynchronous programs Decidable verification under a causally consistent shared memory SympleGraph: distributed graph processing with precise loop-carried dependency guarantee Debug information validation for optimized code
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1