Grammatical Inference and Machine Learning Approaches to Post-Hoc LangSec

Sheridan S. Curley, Richard E. Harang
{"title":"Grammatical Inference and Machine Learning Approaches to Post-Hoc LangSec","authors":"Sheridan S. Curley, Richard E. Harang","doi":"10.1109/SPW.2016.26","DOIUrl":null,"url":null,"abstract":"Formal Language Theory for Security (LangSec) applies the tools of theoretical computer science to the problem of protocol design and analysis. In practice, most results have focused on protocol design, showing that by restricting the complexity of protocols it is possible to design parsers with desirable and formally verifiable properties, such as correctness and equivalence. When we consider existing protocols, however, many of these were not subjected to formal analysis during their design, and many are not implemented in a manner consistent with their formal documentation. Determining a grammar for such protocols is the first step in analyzing them, which places this problem in the domain of grammatical inference, for which a deep theoretical literature exists. In particular, although it has been shown that the higher level categories of the Chomsky hierarchy cannot be generically learned, it is also known that certain subcategories of that hierarchy can be effectively learned. In this paper, we summarize some theoretical results for inferring well-known Chomsky grammars, with special attention to context-free grammars (CFGs) and their generated languages (CFLs). We then demonstrate that, despite negative learnability results in the theoretical regime, we can use long short-term memory (LSTM) networks, a type of recurrent neural network (RNN) architecture, to learn a grammar for URIs that appear in Apache HTTP access logs for a particular server with high accuracy. We discuss these results in the context of grammatical inference, and suggest avenues for further research into learnability of a subgroup of the context-free grammars.","PeriodicalId":341207,"journal":{"name":"2016 IEEE Security and Privacy Workshops (SPW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Security and Privacy Workshops (SPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPW.2016.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Formal Language Theory for Security (LangSec) applies the tools of theoretical computer science to the problem of protocol design and analysis. In practice, most results have focused on protocol design, showing that by restricting the complexity of protocols it is possible to design parsers with desirable and formally verifiable properties, such as correctness and equivalence. When we consider existing protocols, however, many of these were not subjected to formal analysis during their design, and many are not implemented in a manner consistent with their formal documentation. Determining a grammar for such protocols is the first step in analyzing them, which places this problem in the domain of grammatical inference, for which a deep theoretical literature exists. In particular, although it has been shown that the higher level categories of the Chomsky hierarchy cannot be generically learned, it is also known that certain subcategories of that hierarchy can be effectively learned. In this paper, we summarize some theoretical results for inferring well-known Chomsky grammars, with special attention to context-free grammars (CFGs) and their generated languages (CFLs). We then demonstrate that, despite negative learnability results in the theoretical regime, we can use long short-term memory (LSTM) networks, a type of recurrent neural network (RNN) architecture, to learn a grammar for URIs that appear in Apache HTTP access logs for a particular server with high accuracy. We discuss these results in the context of grammatical inference, and suggest avenues for further research into learnability of a subgroup of the context-free grammars.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Post-Hoc LangSec的语法推理和机器学习方法
形式语言安全理论(LangSec)将理论计算机科学的工具应用于协议设计和分析问题。在实践中,大多数结果都集中在协议设计上,表明通过限制协议的复杂性,可以设计出具有理想和正式可验证属性的解析器,例如正确性和等价性。然而,当我们考虑现有的协议时,其中许多协议在设计期间没有经过正式分析,并且许多协议没有以与其正式文档一致的方式实现。确定这些协议的语法是分析它们的第一步,这将这个问题置于语法推理领域,对此存在着深厚的理论文献。特别是,虽然已经证明乔姆斯基层次结构的更高层次的类别不能被一般地学习,但也知道该层次结构的某些子类别可以被有效地学习。在本文中,我们总结了一些理论结果来推断著名的乔姆斯基语法,特别关注上下文无关语法(CFGs)和它们的生成语言(cfl)。然后,我们证明,尽管在理论体系中有负的可学习性结果,但我们可以使用长短期记忆(LSTM)网络,一种循环神经网络(RNN)架构,以高精度的方式学习出现在Apache HTTP访问日志中的特定服务器的uri语法。我们在语法推理的背景下讨论了这些结果,并提出了进一步研究上下文无关语法子集的可学习性的途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Semi-Automated Methodology for Extracting Access Control Rules from the European Data Protection Directive A Critical Analysis of Privacy Design Strategies At Your Fingertips: Considering Finger Distinctness in Continuous Touch-Based Authentication for Mobile Devices Investigating Airplane Safety and Security Against Insider Threats Using Logical Modeling A Model-Based Approach to Predicting the Performance of Insider Threat Detection Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1