基于语言的XML异常检测的增量学习器

2016 IEEE Security and Privacy Workshops (SPW) Pub Date : 2016-03-25 DOI:10.1109/SPW.2016.35

Harald Lampesberger

{"title":"基于语言的XML异常检测的增量学习器","authors":"Harald Lampesberger","doi":"10.1109/SPW.2016.35","DOIUrl":null,"url":null,"abstract":"The Extensible Markup Language (XML) is a complex language, and consequently, XML-based protocols are susceptible to entire classes of implicit and explicit security problems. Message formats in XML-based protocols are usually specified in XML Schema, and as a first-line defense, schema validation should reject malformed input. However, extension points in most protocol specifications break validation. Extension points are wildcards and considered best practice for loose composition, but they also enable an attacker to add unchecked content in a document, e.g., for a signature wrapping attack. This paper introduces datatyped XML visibly pushdown automata (dXVPAs) as language representation for mixed-content XML and presents an incremental learner that infers a dXVPA from example documents. The learner generalizes XML types and datatypes in terms of automaton states and transitions, and an inferred dXVPA converges to a good-enough approximation of the true language. The automaton is free from extension points and capable of stream validation, e.g., as an anomaly detector for XML-based protocols. For dealing with adversarial training data, two scenarios of poisoning are considered: a poisoning attack is either uncovered at a later time or remains hidden. Unlearning can therefore remove an identified poisoning attack from a dXVPA, and sanitization trims low-frequent states and transitions to get rid of hidden attacks. All algorithms have been evaluated in four scenarios, including a web service implemented in Apache Axis2 and Apache Rampart, where attacks have been simulated. In all scenarios, the learned automaton had zero false positives and outperformed traditional schema validation.","PeriodicalId":341207,"journal":{"name":"2016 IEEE Security and Privacy Workshops (SPW)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Incremental Learner for Language-Based Anomaly Detection in XML\",\"authors\":\"Harald Lampesberger\",\"doi\":\"10.1109/SPW.2016.35\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Extensible Markup Language (XML) is a complex language, and consequently, XML-based protocols are susceptible to entire classes of implicit and explicit security problems. Message formats in XML-based protocols are usually specified in XML Schema, and as a first-line defense, schema validation should reject malformed input. However, extension points in most protocol specifications break validation. Extension points are wildcards and considered best practice for loose composition, but they also enable an attacker to add unchecked content in a document, e.g., for a signature wrapping attack. This paper introduces datatyped XML visibly pushdown automata (dXVPAs) as language representation for mixed-content XML and presents an incremental learner that infers a dXVPA from example documents. The learner generalizes XML types and datatypes in terms of automaton states and transitions, and an inferred dXVPA converges to a good-enough approximation of the true language. The automaton is free from extension points and capable of stream validation, e.g., as an anomaly detector for XML-based protocols. For dealing with adversarial training data, two scenarios of poisoning are considered: a poisoning attack is either uncovered at a later time or remains hidden. Unlearning can therefore remove an identified poisoning attack from a dXVPA, and sanitization trims low-frequent states and transitions to get rid of hidden attacks. All algorithms have been evaluated in four scenarios, including a web service implemented in Apache Axis2 and Apache Rampart, where attacks have been simulated. In all scenarios, the learned automaton had zero false positives and outperformed traditional schema validation.\",\"PeriodicalId\":341207,\"journal\":{\"name\":\"2016 IEEE Security and Privacy Workshops (SPW)\",\"volume\":\"143 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Security and Privacy Workshops (SPW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPW.2016.35\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Security and Privacy Workshops (SPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPW.2016.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

可扩展标记语言(XML)是一种复杂的语言，因此，基于XML的协议容易受到各种隐式和显式安全问题的影响。基于XML的协议中的消息格式通常在XML Schema中指定，作为第一道防线，模式验证应该拒绝格式错误的输入。然而，大多数协议规范中的扩展点会破坏验证。扩展点是通配符，被认为是松散组合的最佳实践，但它们也使攻击者能够在文档中添加未检查的内容，例如用于签名包装攻击。本文引入了数据类型XML可见下推自动机(dXVPA)作为混合内容XML的语言表示，并提出了一种从示例文档中推断出dXVPA的增量学习器。学习器根据自动机状态和转换对XML类型和数据类型进行一般化，推断的dxpa收敛到与真实语言足够接近的程度。该自动机没有扩展点，并且能够进行流验证，例如，作为基于xml的协议的异常检测器。为了处理对抗性训练数据，考虑了两种中毒情况:中毒攻击要么在稍后的时间被发现，要么仍然隐藏。因此，遗忘可以从dxpa中删除已识别的中毒攻击，而消毒可以修剪低频率的状态和转换，以消除隐藏的攻击。所有算法都在四个场景中进行了评估，其中包括在Apache Axis2和Apache Rampart中实现的web服务，并在其中模拟了攻击。在所有场景中，学习的自动机都没有误报，并且优于传统的模式验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Incremental Learner for Language-Based Anomaly Detection in XML

The Extensible Markup Language (XML) is a complex language, and consequently, XML-based protocols are susceptible to entire classes of implicit and explicit security problems. Message formats in XML-based protocols are usually specified in XML Schema, and as a first-line defense, schema validation should reject malformed input. However, extension points in most protocol specifications break validation. Extension points are wildcards and considered best practice for loose composition, but they also enable an attacker to add unchecked content in a document, e.g., for a signature wrapping attack. This paper introduces datatyped XML visibly pushdown automata (dXVPAs) as language representation for mixed-content XML and presents an incremental learner that infers a dXVPA from example documents. The learner generalizes XML types and datatypes in terms of automaton states and transitions, and an inferred dXVPA converges to a good-enough approximation of the true language. The automaton is free from extension points and capable of stream validation, e.g., as an anomaly detector for XML-based protocols. For dealing with adversarial training data, two scenarios of poisoning are considered: a poisoning attack is either uncovered at a later time or remains hidden. Unlearning can therefore remove an identified poisoning attack from a dXVPA, and sanitization trims low-frequent states and transitions to get rid of hidden attacks. All algorithms have been evaluated in four scenarios, including a web service implemented in Apache Axis2 and Apache Rampart, where attacks have been simulated. In all scenarios, the learned automaton had zero false positives and outperformed traditional schema validation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE Security and Privacy Workshops (SPW)

自引率

0.00%

发文量