C. Oehmen, Elena S. Peterson, Aaron R. Phillips, Darren S. Curtis
For many applications, it is desirable to have a process for recognizing when software binaries are closely related without relying on them to be identical or have identical segments. But doing so in a dynamic environment is a nontrivial task because most approaches to software similarity require extensive and time-consuming analysis of a binary, or they fail to recognize executables that are similar but not identical. Presented herein is a novel biosequence-based method for quantifying similarity of executable binaries. Using this method, we show in an example application on large-scale multi-author codes that 1) the biosequence-based method has a statistical performance in recognizing and distinguishing between a collection of real-world high performance computing applications better than 90% of ideal, and 2) an example of using family-tree analysis to tune identification for a code subfamily can achieve better than 99% of ideal performance.
{"title":"A Biosequence-Based Approach to Software Characterization","authors":"C. Oehmen, Elena S. Peterson, Aaron R. Phillips, Darren S. Curtis","doi":"10.1109/SPW.2016.43","DOIUrl":"https://doi.org/10.1109/SPW.2016.43","url":null,"abstract":"For many applications, it is desirable to have a process for recognizing when software binaries are closely related without relying on them to be identical or have identical segments. But doing so in a dynamic environment is a nontrivial task because most approaches to software similarity require extensive and time-consuming analysis of a binary, or they fail to recognize executables that are similar but not identical. Presented herein is a novel biosequence-based method for quantifying similarity of executable binaries. Using this method, we show in an example application on large-scale multi-author codes that 1) the biosequence-based method has a statistical performance in recognizing and distinguishing between a collection of real-world high performance computing applications better than 90% of ideal, and 2) an example of using family-tree analysis to tune identification for a code subfamily can achieve better than 99% of ideal performance.","PeriodicalId":341207,"journal":{"name":"2016 IEEE Security and Privacy Workshops (SPW)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126420617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patrick Mutchler, Y. Safaei, Adam Doupé, John C. Mitchell
Android apps declare a target version of the Android run-time platform. When run on devices with more recent Android versions, apps are executed in a compatibility mode that attempts to mimic the behavior of the older target version. This design has serious security consequences. Apps that target outdated Android versions disable important security changes to the Android platform. We call the problem of apps targeting outdated Android versions the target fragmentation problem. We analyze a dataset of 1,232,696 free Android apps collected between May, 2012 and December, 2015 and show that the target fragmentation problem is a serious concern across the entire app ecosystem and has not changed considerably in several years. In total, 93% of current apps target out-of-date platform versions and have a mean outdatedness of 686 days, 79% of apps are already out-of-date on the day they are uploaded to the app store. Finally, we examine seven security related changes to the Android platform that are disabled in apps that target outdated platform versions and show that target fragmentation hamstrings attempts to improve the security of Android apps.
{"title":"Target Fragmentation in Android Apps","authors":"Patrick Mutchler, Y. Safaei, Adam Doupé, John C. Mitchell","doi":"10.1109/SPW.2016.31","DOIUrl":"https://doi.org/10.1109/SPW.2016.31","url":null,"abstract":"Android apps declare a target version of the Android run-time platform. When run on devices with more recent Android versions, apps are executed in a compatibility mode that attempts to mimic the behavior of the older target version. This design has serious security consequences. Apps that target outdated Android versions disable important security changes to the Android platform. We call the problem of apps targeting outdated Android versions the target fragmentation problem. We analyze a dataset of 1,232,696 free Android apps collected between May, 2012 and December, 2015 and show that the target fragmentation problem is a serious concern across the entire app ecosystem and has not changed considerably in several years. In total, 93% of current apps target out-of-date platform versions and have a mean outdatedness of 686 days, 79% of apps are already out-of-date on the day they are uploaded to the app store. Finally, we examine seven security related changes to the Android platform that are disabled in apps that target outdated platform versions and show that target fragmentation hamstrings attempts to improve the security of Android apps.","PeriodicalId":341207,"journal":{"name":"2016 IEEE Security and Privacy Workshops (SPW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134316008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guillaume Endignoux, O. Levillain, Jean-Yves Migeon
PDF has become a de facto standard for exchanging electronic documents, for visualization as well as for printing. However, it has also become a common delivery channel for malware, and previous work has highlighted features that lead to security issues. In our work, we focus on the structure of the format, independently from specific features. By methodically testing PDF readers against hand-crafted files, we show that the interpretation of PDF files at the structural level may cause some form of denial of service, or be ambiguous and lead to rendering inconsistencies among readers. We then propose a pragmatic solution by restricting the syntax to avoid common errors, and propose a formal grammar for it. We explain how data consistency can be validated at a finer-grained level using a dedicated type checker. Finally, we assess this approach on a set of real-world files and show that our proposals are realistic.
{"title":"Caradoc: A Pragmatic Approach to PDF Parsing and Validation","authors":"Guillaume Endignoux, O. Levillain, Jean-Yves Migeon","doi":"10.1109/SPW.2016.39","DOIUrl":"https://doi.org/10.1109/SPW.2016.39","url":null,"abstract":"PDF has become a de facto standard for exchanging electronic documents, for visualization as well as for printing. However, it has also become a common delivery channel for malware, and previous work has highlighted features that lead to security issues. In our work, we focus on the structure of the format, independently from specific features. By methodically testing PDF readers against hand-crafted files, we show that the interpretation of PDF files at the structural level may cause some form of denial of service, or be ambiguous and lead to rendering inconsistencies among readers. We then propose a pragmatic solution by restricting the syntax to avoid common errors, and propose a formal grammar for it. We explain how data consistency can be validated at a finer-grained level using a dedicated type checker. Finally, we assess this approach on a set of real-world files and show that our proposals are realistic.","PeriodicalId":341207,"journal":{"name":"2016 IEEE Security and Privacy Workshops (SPW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130793882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During repackaging, malware writers statically inject malcode and modify the control flow to ensure its execution. Repackaged malware is difficult to detect by existing classification techniques, partly because of their behavioral similarities to benign apps. By exploring the app's internal different behaviors, we propose a new Android repackaged malware detection technique based on code heterogeneity analysis. Our solution strategically partitions the code structure of an app into multiple dependence-based regions (subsets of the code). Each region is independently classified on its behavioral features. We point out the security challenges and design choices for partitioning code structures at the class and method level graphs, and present a solution based on multiple dependence relations. We have performed experimental evaluation with over 7,542 Android apps. For repackaged malware, our partition-based detection reduces false negatives (i.e., missed detection) by 30-fold, when compared to the non-partition-based approach. Overall, our approach achieves a false negative rate of 0.35% and a false positive rate of 2.97%.
{"title":"Analysis of Code Heterogeneity for High-Precision Classification of Repackaged Malware","authors":"K. Tian, D. Yao, B. Ryder, Gang Tan","doi":"10.1109/SPW.2016.33","DOIUrl":"https://doi.org/10.1109/SPW.2016.33","url":null,"abstract":"During repackaging, malware writers statically inject malcode and modify the control flow to ensure its execution. Repackaged malware is difficult to detect by existing classification techniques, partly because of their behavioral similarities to benign apps. By exploring the app's internal different behaviors, we propose a new Android repackaged malware detection technique based on code heterogeneity analysis. Our solution strategically partitions the code structure of an app into multiple dependence-based regions (subsets of the code). Each region is independently classified on its behavioral features. We point out the security challenges and design choices for partitioning code structures at the class and method level graphs, and present a solution based on multiple dependence relations. We have performed experimental evaluation with over 7,542 Android apps. For repackaged malware, our partition-based detection reduces false negatives (i.e., missed detection) by 30-fold, when compared to the non-partition-based approach. Overall, our approach achieves a false negative rate of 0.35% and a false positive rate of 2.97%.","PeriodicalId":341207,"journal":{"name":"2016 IEEE Security and Privacy Workshops (SPW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130901405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To date, top-down efforts to evolve and structure privacy engineering knowledge have tended to reflect common systems engineering/development life cycle activities. A different approach suggests a particular need for technical analytical methods. To help address this need, this paper proposes to adapt for privacy engineering an existing technique, System-Theoretic Process Analysis (STPA), developed for safety engineering. The foundations of STPA are discussed, its security extension, STPA-Sec, is described, and modifications to STPA-Sec are proposed to produce STPA-Priv. STPA-Priv is then applied to a simple illustrative example.
{"title":"Privacy Risk Analysis Based on System Control Structures: Adapting System-Theoretic Process Analysis for Privacy Engineering","authors":"S. Shapiro","doi":"10.1109/SPW.2016.15","DOIUrl":"https://doi.org/10.1109/SPW.2016.15","url":null,"abstract":"To date, top-down efforts to evolve and structure privacy engineering knowledge have tended to reflect common systems engineering/development life cycle activities. A different approach suggests a particular need for technical analytical methods. To help address this need, this paper proposes to adapt for privacy engineering an existing technique, System-Theoretic Process Analysis (STPA), developed for safety engineering. The foundations of STPA are discussed, its security extension, STPA-Sec, is described, and modifications to STPA-Sec are proposed to produce STPA-Priv. STPA-Priv is then applied to a simple illustrative example.","PeriodicalId":341207,"journal":{"name":"2016 IEEE Security and Privacy Workshops (SPW)","volume":" 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132075704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the last few years, the number of mobile devices such as smartphones and tablets, in circulation, has increased dramatically. The primary and often only protection mechanism in these devices is authentication using a password or a Personal Identification Number (PIN). Passwords are notoriously known to be a weak authentication mechanism, no matter how complex the underlying format is. A more secure alternative option which has gained interest recently is extracting keystroke dynamic biometrics from supplied passwords for mobile authentication. In this paper, we show that using random forests classifier, improved accuracy performance can be achieved for mobile keystroke dynamic biometric authentication. We also propose a new algorithm for handling typos, which is an essential step in improving usability. We study both timing features and pressure-based features. Experimental evaluation is based on two public datasets and a third dataset collected in our lab. The best performance, obtained by combining timing and pressure features, is an Equal Error Rate (EER) of 2.3% for a population of 42 users.
{"title":"Improving Performance and Usability in Mobile Keystroke Dynamic Biometric Authentication","authors":"Faisal Alshanketi, I. Traoré, Ahmed Awad E. Ahmed","doi":"10.1109/SPW.2016.12","DOIUrl":"https://doi.org/10.1109/SPW.2016.12","url":null,"abstract":"In the last few years, the number of mobile devices such as smartphones and tablets, in circulation, has increased dramatically. The primary and often only protection mechanism in these devices is authentication using a password or a Personal Identification Number (PIN). Passwords are notoriously known to be a weak authentication mechanism, no matter how complex the underlying format is. A more secure alternative option which has gained interest recently is extracting keystroke dynamic biometrics from supplied passwords for mobile authentication. In this paper, we show that using random forests classifier, improved accuracy performance can be achieved for mobile keystroke dynamic biometric authentication. We also propose a new algorithm for handling typos, which is an essential step in improving usability. We study both timing features and pressure-based features. Experimental evaluation is based on two public datasets and a third dataset collected in our lab. The best performance, obtained by combining timing and pressure features, is an Equal Error Rate (EER) of 2.3% for a population of 42 users.","PeriodicalId":341207,"journal":{"name":"2016 IEEE Security and Privacy Workshops (SPW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127242192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a new method for random testing of binary executables inspired by biology. In our approach we introduce the first fuzzer based on a mathematical model for optimal foraging. To minimize search time for possible vulnerabilities we generate test cases with Lévy flights in the input space. In order to dynamically adapt test generation behavior to actual path exploration performance we define a suitable measure for quality evaluation of test cases. This measure takes into account previously discovered code regions and allows us to construct a feedback mechanism. By controlling diffusivity of the test case generating Lévy processes with evaluation feedback from dynamic instrumentation we are able to define a fully self-adaptive fuzzing algorithm.
{"title":"Hunting Bugs with Lévy Flight Foraging","authors":"Konstantin Böttinger","doi":"10.1109/SPW.2016.9","DOIUrl":"https://doi.org/10.1109/SPW.2016.9","url":null,"abstract":"We present a new method for random testing of binary executables inspired by biology. In our approach we introduce the first fuzzer based on a mathematical model for optimal foraging. To minimize search time for possible vulnerabilities we generate test cases with Lévy flights in the input space. In order to dynamically adapt test generation behavior to actual path exploration performance we define a suitable measure for quality evaluation of test cases. This measure takes into account previously discovered code regions and allows us to construct a feedback mechanism. By controlling diffusivity of the test case generating Lévy processes with evaluation feedback from dynamic instrumentation we are able to define a fully self-adaptive fuzzing algorithm.","PeriodicalId":341207,"journal":{"name":"2016 IEEE Security and Privacy Workshops (SPW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133244498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As smart home devices are introduced into our homes, security and privacy concerns are being raised. Smart home devices collect, exchange, and transmit various data about the environment of our homes. This data can not only be used to characterize a physical property but also to infer personal information about the inhabitants. One potential attack vector for smart home devices is the use of traffic classification as a source for covert channel attacks. Specifically, we are concerned with the use of traffic classification techniques for inferring events taking place within a building. In this work, we study two of the most popular smart home devices, the Nest Thermostat and the wired Nest Protect (i.e. smoke and carbon dioxide detector) and show that traffic analysis can be used to learn potentially sensitive information about the state of a smart home. Among other observations, we show that we can determine, with 88% and 67% accuracy respectively, when the thermostat transitions between the Home and Auto Away mode and vice versa, based only on network traffic originating from the device. This information may be used, for example, by an attacker to infer whether the home is occupied.
{"title":"Is Anybody Home? Inferring Activity From Smart Home Network Traffic","authors":"Bogdan Copos, K. Levitt, M. Bishop, J. Rowe","doi":"10.1109/SPW.2016.48","DOIUrl":"https://doi.org/10.1109/SPW.2016.48","url":null,"abstract":"As smart home devices are introduced into our homes, security and privacy concerns are being raised. Smart home devices collect, exchange, and transmit various data about the environment of our homes. This data can not only be used to characterize a physical property but also to infer personal information about the inhabitants. One potential attack vector for smart home devices is the use of traffic classification as a source for covert channel attacks. Specifically, we are concerned with the use of traffic classification techniques for inferring events taking place within a building. In this work, we study two of the most popular smart home devices, the Nest Thermostat and the wired Nest Protect (i.e. smoke and carbon dioxide detector) and show that traffic analysis can be used to learn potentially sensitive information about the state of a smart home. Among other observations, we show that we can determine, with 88% and 67% accuracy respectively, when the thermostat transitions between the Home and Auto Away mode and vice versa, based only on network traffic originating from the device. This information may be used, for example, by an attacker to infer whether the home is occupied.","PeriodicalId":341207,"journal":{"name":"2016 IEEE Security and Privacy Workshops (SPW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133577212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traffic inspection is a fundamental building block of many security solutions today. For example, to prevent the leakage or exfiltration of confidential insider information, as well as to block malicious traffic from entering the network, most enterprises today operate intrusion detection and prevention systems that inspect traffic. However, the state-of-the-art inspection systems do not reflect well the interests of the different involved autonomous roles. For example, employees in an enterprise, or a company outsourcing its network management to a specialized third party, may require that their traffic remains confidential, even from the system administrator. Moreover, the rules used by the intrusion detection system, or more generally the configuration of an online or offline anomaly detection engine, may be provided by a third party, e.g., a security research firm, and can hence constitute a critical business asset which should be kept confidential. Today, it is often believed that accounting for these additional requirements is impossible, as they contradict efficiency and effectiveness. We in this paper explore a novel approach, called Privacy Preserving Inspection (PRI), which provides a solution to this problem, by preserving privacy of traffic inspection and confidentiality of inspection rules and configurations, and e.g., also supports the flexible installation of additional Data Leak Prevention (DLP) rules specific to the company.
{"title":"PRI: Privacy Preserving Inspection of Encrypted Network Traffic","authors":"Liron Schiff, S. Schmid","doi":"10.1109/SPW.2016.34","DOIUrl":"https://doi.org/10.1109/SPW.2016.34","url":null,"abstract":"Traffic inspection is a fundamental building block of many security solutions today. For example, to prevent the leakage or exfiltration of confidential insider information, as well as to block malicious traffic from entering the network, most enterprises today operate intrusion detection and prevention systems that inspect traffic. However, the state-of-the-art inspection systems do not reflect well the interests of the different involved autonomous roles. For example, employees in an enterprise, or a company outsourcing its network management to a specialized third party, may require that their traffic remains confidential, even from the system administrator. Moreover, the rules used by the intrusion detection system, or more generally the configuration of an online or offline anomaly detection engine, may be provided by a third party, e.g., a security research firm, and can hence constitute a critical business asset which should be kept confidential. Today, it is often believed that accounting for these additional requirements is impossible, as they contradict efficiency and effectiveness. We in this paper explore a novel approach, called Privacy Preserving Inspection (PRI), which provides a solution to this problem, by preserving privacy of traffic inspection and confidentiality of inspection rules and configurations, and e.g., also supports the flexible installation of additional Data Leak Prevention (DLP) rules specific to the company.","PeriodicalId":341207,"journal":{"name":"2016 IEEE Security and Privacy Workshops (SPW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121100020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Extensible Markup Language (XML) is a complex language, and consequently, XML-based protocols are susceptible to entire classes of implicit and explicit security problems. Message formats in XML-based protocols are usually specified in XML Schema, and as a first-line defense, schema validation should reject malformed input. However, extension points in most protocol specifications break validation. Extension points are wildcards and considered best practice for loose composition, but they also enable an attacker to add unchecked content in a document, e.g., for a signature wrapping attack. This paper introduces datatyped XML visibly pushdown automata (dXVPAs) as language representation for mixed-content XML and presents an incremental learner that infers a dXVPA from example documents. The learner generalizes XML types and datatypes in terms of automaton states and transitions, and an inferred dXVPA converges to a good-enough approximation of the true language. The automaton is free from extension points and capable of stream validation, e.g., as an anomaly detector for XML-based protocols. For dealing with adversarial training data, two scenarios of poisoning are considered: a poisoning attack is either uncovered at a later time or remains hidden. Unlearning can therefore remove an identified poisoning attack from a dXVPA, and sanitization trims low-frequent states and transitions to get rid of hidden attacks. All algorithms have been evaluated in four scenarios, including a web service implemented in Apache Axis2 and Apache Rampart, where attacks have been simulated. In all scenarios, the learned automaton had zero false positives and outperformed traditional schema validation.
{"title":"An Incremental Learner for Language-Based Anomaly Detection in XML","authors":"Harald Lampesberger","doi":"10.1109/SPW.2016.35","DOIUrl":"https://doi.org/10.1109/SPW.2016.35","url":null,"abstract":"The Extensible Markup Language (XML) is a complex language, and consequently, XML-based protocols are susceptible to entire classes of implicit and explicit security problems. Message formats in XML-based protocols are usually specified in XML Schema, and as a first-line defense, schema validation should reject malformed input. However, extension points in most protocol specifications break validation. Extension points are wildcards and considered best practice for loose composition, but they also enable an attacker to add unchecked content in a document, e.g., for a signature wrapping attack. This paper introduces datatyped XML visibly pushdown automata (dXVPAs) as language representation for mixed-content XML and presents an incremental learner that infers a dXVPA from example documents. The learner generalizes XML types and datatypes in terms of automaton states and transitions, and an inferred dXVPA converges to a good-enough approximation of the true language. The automaton is free from extension points and capable of stream validation, e.g., as an anomaly detector for XML-based protocols. For dealing with adversarial training data, two scenarios of poisoning are considered: a poisoning attack is either uncovered at a later time or remains hidden. Unlearning can therefore remove an identified poisoning attack from a dXVPA, and sanitization trims low-frequent states and transitions to get rid of hidden attacks. All algorithms have been evaluated in four scenarios, including a web service implemented in Apache Axis2 and Apache Rampart, where attacks have been simulated. In all scenarios, the learned automaton had zero false positives and outperformed traditional schema validation.","PeriodicalId":341207,"journal":{"name":"2016 IEEE Security and Privacy Workshops (SPW)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115786128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}