{"title":"Session details: Authentication Keynote and Attacks Session","authors":"C. Ordonez","doi":"10.1145/3252733","DOIUrl":"https://doi.org/10.1145/3252733","url":null,"abstract":"","PeriodicalId":137012,"journal":{"name":"Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134040674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning algorithms have been proven to be vulnerable to a special type of attack in which an active adversary manipulates the training data of the algorithm in order to reach some desired goal. Although this type of attack has been proven in previous work, it has not been examined in the context of a data stream, and no work has been done to study a targeted version of the attack. Furthermore, current literature does not provide any metrics that allow a system to detect these attack while they are happening. In this work, we examine the targeted version of this attack on a Support Vector Machine(SVM) that is learning from a data stream, and examine the impact that this attack has on current metrics that are used to evaluate a models performance. We then propose a new metric for detecting these attacks, and compare its performance against current metrics.
{"title":"Analysis of Causative Attacks against SVMs Learning from Data Streams","authors":"Cody Burkard, Brent Lagesse","doi":"10.1145/3041008.3041012","DOIUrl":"https://doi.org/10.1145/3041008.3041012","url":null,"abstract":"Machine learning algorithms have been proven to be vulnerable to a special type of attack in which an active adversary manipulates the training data of the algorithm in order to reach some desired goal. Although this type of attack has been proven in previous work, it has not been examined in the context of a data stream, and no work has been done to study a targeted version of the attack. Furthermore, current literature does not provide any metrics that allow a system to detect these attack while they are happening. In this work, we examine the targeted version of this attack on a Support Vector Machine(SVM) that is learning from a data stream, and examine the impact that this attack has on current metrics that are used to evaluate a models performance. We then propose a new metric for detecting these attacks, and compare its performance against current metrics.","PeriodicalId":137012,"journal":{"name":"Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114070824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Android operating system has become the most popular operating system for smartphones and tablets leading to a rapid rise in malware. Sophisticated Android malware employ detection avoidance techniques in order to hide their malicious activities from analysis tools. These include a wide range of anti-emulator techniques, where the malware programs attempt to hide their malicious activities by detecting the emulator. For this reason, countermeasures against anti-emulation are becoming increasingly important in Android malware detection. Analysis and detection based on real devices can alleviate the problems of anti-emulation as well as improve the effectiveness of dynamic analysis. Hence, in this paper we present an investigation of machine learning based malware detection using dynamic analysis on real devices. A tool is implemented to automatically extract dynamic features from Android phones and through several experiments, a comparative analysis of emulator based vs. device based detection by means of several machine learning algorithms is undertaken. Our study shows that several features could be extracted more effectively from the on-device dynamic analysis compared to emulators. It was also found that approximately 24% more apps were successfully analysed on the phone. Furthermore, all of the studied machine learning based detection performed better when applied to features extracted from the on-device dynamic analysis.
{"title":"EMULATOR vs REAL PHONE: Android Malware Detection Using Machine Learning","authors":"Mohammed K. Alzaylaee, S. Yerima, S. Sezer","doi":"10.1145/3041008.3041010","DOIUrl":"https://doi.org/10.1145/3041008.3041010","url":null,"abstract":"The Android operating system has become the most popular operating system for smartphones and tablets leading to a rapid rise in malware. Sophisticated Android malware employ detection avoidance techniques in order to hide their malicious activities from analysis tools. These include a wide range of anti-emulator techniques, where the malware programs attempt to hide their malicious activities by detecting the emulator. For this reason, countermeasures against anti-emulation are becoming increasingly important in Android malware detection. Analysis and detection based on real devices can alleviate the problems of anti-emulation as well as improve the effectiveness of dynamic analysis. Hence, in this paper we present an investigation of machine learning based malware detection using dynamic analysis on real devices. A tool is implemented to automatically extract dynamic features from Android phones and through several experiments, a comparative analysis of emulator based vs. device based detection by means of several machine learning algorithms is undertaken. Our study shows that several features could be extracted more effectively from the on-device dynamic analysis compared to emulators. It was also found that approximately 24% more apps were successfully analysed on the phone. Furthermore, all of the studied machine learning based detection performed better when applied to features extracted from the on-device dynamic analysis.","PeriodicalId":137012,"journal":{"name":"Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114320044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a novel non-interactive (t,n)-incidence count estimation for indicator vectors ensuring Differential Privacy. Given one or two differentially private indicator vectors, estimating the distinct count of elements in each and their intersection cardinality (equivalently, their inner product) have been studied in the literature, along with other extensions for estimating the cardinality set intersection in case the elements are hashed prior to insertion. The core contribution behind all these studies was to address the problem of estimating the Hamming weight (the number of bits set to one) of a bit vector from its differentially private version, and in the case of inner product and set intersection, estimating the number of positions which are jointly set to one in both bit vectors. We develop the most general case of estimating the number of positions which are set to one in exactly t out of n bit vectors (this quantity is denoted the (t,n)-incidence count), given access only to the differentially private version of those bit vectors. This means that if each bit vector belongs to a different owner, each can locally sanitize their bit vector prior to sharing it, hence the non-interactive nature of our algorithm. Our main contribution is a novel algorithm that simultaneously estimates the (t,n)-incidence counts for all t'{0,...,n}. We provide upper and lower bounds to the estimation error. Our lower bound is achieved by generalizing the limit of two-party differential privacy into $n$-party differential privacy, which is a contribution of independent interest. In particular we prove a lower bound on the additive error that must be incurred by any n-wise inner product of $n$ mutually differentially-private bit vectors. Our results are very general and are not limited to differentially private bit vectors. They should apply to a large class of sanitization mechanism of bit vectors which depend on flipping the bits with a constant probability. Some potential applications for our technique include physical mobility analytics, call-detail-record analysis, and similarity metrics computation.
{"title":"Non-interactive (t, n)-Incidence Counting from Differentially Private Indicator Vectors","authors":"Mohammad Alaggan, M. Cunche, M. Minier","doi":"10.1145/3041008.3041017","DOIUrl":"https://doi.org/10.1145/3041008.3041017","url":null,"abstract":"We present a novel non-interactive (t,n)-incidence count estimation for indicator vectors ensuring Differential Privacy. Given one or two differentially private indicator vectors, estimating the distinct count of elements in each and their intersection cardinality (equivalently, their inner product) have been studied in the literature, along with other extensions for estimating the cardinality set intersection in case the elements are hashed prior to insertion. The core contribution behind all these studies was to address the problem of estimating the Hamming weight (the number of bits set to one) of a bit vector from its differentially private version, and in the case of inner product and set intersection, estimating the number of positions which are jointly set to one in both bit vectors. We develop the most general case of estimating the number of positions which are set to one in exactly t out of n bit vectors (this quantity is denoted the (t,n)-incidence count), given access only to the differentially private version of those bit vectors. This means that if each bit vector belongs to a different owner, each can locally sanitize their bit vector prior to sharing it, hence the non-interactive nature of our algorithm. Our main contribution is a novel algorithm that simultaneously estimates the (t,n)-incidence counts for all t'{0,...,n}. We provide upper and lower bounds to the estimation error. Our lower bound is achieved by generalizing the limit of two-party differential privacy into $n$-party differential privacy, which is a contribution of independent interest. In particular we prove a lower bound on the additive error that must be incurred by any n-wise inner product of $n$ mutually differentially-private bit vectors. Our results are very general and are not limited to differentially private bit vectors. They should apply to a large class of sanitization mechanism of bit vectors which depend on flipping the bits with a constant probability. Some potential applications for our technique include physical mobility analytics, call-detail-record analysis, and similarity metrics computation.","PeriodicalId":137012,"journal":{"name":"Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116326653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Privacy and Threats Session","authors":"Lila Ghemri","doi":"10.1145/3252732","DOIUrl":"https://doi.org/10.1145/3252732","url":null,"abstract":"","PeriodicalId":137012,"journal":{"name":"Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130586693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phishing is an online social engineering attack with the goal of digital identity theft carried out by pretending to be a legitimate entity. The attacker sends an attack vector commonly in the form of an email, chat session, blog post etc., which contains a link (URL) to a malicious website hosted to elicit private information from the victims. We focus on building a system for URL analysis and classification to primarily detect phishing attacks. URL analysis is attractive to maintain distance between the attacker and the victim, rather than visiting the website and getting features from it. It is also faster than Internet search, retrieving content from the destination website and network-level features used in previous research. We investigate several facets of URL analysis, e.g., performance analysis on both balanced and unbalanced datasets in a static as well as live experimental setup and online versus batch learning.
{"title":"What's in a URL: Fast Feature Extraction and Malicious URL Detection","authors":"Rakesh M. Verma, Avisha Das","doi":"10.1145/3041008.3041016","DOIUrl":"https://doi.org/10.1145/3041008.3041016","url":null,"abstract":"Phishing is an online social engineering attack with the goal of digital identity theft carried out by pretending to be a legitimate entity. The attacker sends an attack vector commonly in the form of an email, chat session, blog post etc., which contains a link (URL) to a malicious website hosted to elicit private information from the victims. We focus on building a system for URL analysis and classification to primarily detect phishing attacks. URL analysis is attractive to maintain distance between the attacker and the victim, rather than visiting the website and getting features from it. It is also faster than Internet search, retrieving content from the destination website and network-level features used in previous research. We investigate several facets of URL analysis, e.g., performance analysis on both balanced and unbalanced datasets in a static as well as live experimental setup and online versus batch learning.","PeriodicalId":137012,"journal":{"name":"Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129687406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Smartphone Security Keynote and Software Vulnerabilities Session","authors":"B. Thuraisingham","doi":"10.1145/3252734","DOIUrl":"https://doi.org/10.1145/3252734","url":null,"abstract":"","PeriodicalId":137012,"journal":{"name":"Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics","volume":"392 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132651188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hemank Lamba, Thomas J. Glazier, J. Cámara, B. Schmerl, D. Garlan, J. Pfeffer
Large software systems have to contend with a significant number of users who interact with different components of the system in various ways. The sequences of components that are used as part of an interaction define sets of behaviors that users have with the system. These can be large in number. Among these users, it is possible that there are some who exhibit anomalous behaviors -- for example, they may have found back doors into the system and are doing something malicious. These anomalous behaviors can be hard to distinguish from normal behavior because of the number of interactions a system may have, or because traces may deviate only slightly from normal behavior. In this paper we describe a model-based approach to cluster sequences of user behaviors within a system and to find suspicious, or anomalous, sequences. We exploit the underlying software architecture of a system to define these sequences. We further show that our approach is better at detecting suspicious activities than other approaches, specifically those that use unigrams and bigrams for anomaly detection. We show this on a simulation of a large scale system based on Amazon Web application style architecture.
{"title":"Model-based Cluster Analysis for Identifying Suspicious Activity Sequences in Software","authors":"Hemank Lamba, Thomas J. Glazier, J. Cámara, B. Schmerl, D. Garlan, J. Pfeffer","doi":"10.1145/3041008.3041014","DOIUrl":"https://doi.org/10.1145/3041008.3041014","url":null,"abstract":"Large software systems have to contend with a significant number of users who interact with different components of the system in various ways. The sequences of components that are used as part of an interaction define sets of behaviors that users have with the system. These can be large in number. Among these users, it is possible that there are some who exhibit anomalous behaviors -- for example, they may have found back doors into the system and are doing something malicious. These anomalous behaviors can be hard to distinguish from normal behavior because of the number of interactions a system may have, or because traces may deviate only slightly from normal behavior. In this paper we describe a model-based approach to cluster sequences of user behaviors within a system and to find suspicious, or anomalous, sequences. We exploit the underlying software architecture of a system to define these sequences. We further show that our approach is better at detecting suspicious activities than other approaches, specifically those that use unigrams and bigrams for anomaly detection. We show this on a simulation of a large scale system based on Amazon Web application style architecture.","PeriodicalId":137012,"journal":{"name":"Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131404340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Currently, the standard methods to authenticate a computer/network user typically occur once at the initial log-in. These authentication methods involve user proxies, especially passwords and smart cards such as common access cards (CACs) and service ID cards. Passwords suffer from a variety of vulnerabilities including brute-force and dictionary based attacks, while smart cards and other physical tokens used for authentication can be lost or stolen. As a result, the computer systems are extremely vulnerable to "masquerading attacks", which refers to illegitimate activity on a computer system when an unauthorized human or software impersonates a user on a computer system or network. These attacks can be challenging to detect as they are mostly carried out by insiders or people or software familiar with the authorized user. By actively and continually authenticating a user, intruders can be identified before they hijack the user session of an authorized individual who may have momentarily stepped away from his/her console. In this talk, we will present our results on continuous authentication using keystroke dynamics as the behavioral biometric. The methods we developed can also be readily extended to protecting wired and wireless networks, mobile devices, etc.
{"title":"Continuous Authentication Using Behavioral Biometrics","authors":"S. Upadhyaya","doi":"10.1145/3041008.3041019","DOIUrl":"https://doi.org/10.1145/3041008.3041019","url":null,"abstract":"Currently, the standard methods to authenticate a computer/network user typically occur once at the initial log-in. These authentication methods involve user proxies, especially passwords and smart cards such as common access cards (CACs) and service ID cards. Passwords suffer from a variety of vulnerabilities including brute-force and dictionary based attacks, while smart cards and other physical tokens used for authentication can be lost or stolen. As a result, the computer systems are extremely vulnerable to \"masquerading attacks\", which refers to illegitimate activity on a computer system when an unauthorized human or software impersonates a user on a computer system or network. These attacks can be challenging to detect as they are mostly carried out by insiders or people or software familiar with the authorized user. By actively and continually authenticating a user, intruders can be identified before they hijack the user session of an authorized individual who may have momentarily stepped away from his/her console. In this talk, we will present our results on continuous authentication using keystroke dynamics as the behavioral biometric. The methods we developed can also be readily extended to protecting wired and wireless networks, mobile devices, etc.","PeriodicalId":137012,"journal":{"name":"Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125355495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin L. Bullough, Anna K. Yanchenko, Christopher L. Smith, Joseph R. Zipkin
Each year, thousands of software vulnerabilities are discovered and reported to the public. Unpatched known vulnerabilities are a significant security risk. It is imperative that software vendors quickly provide patches once vulnerabilities are known and users quickly install those patches as soon as they are available. However, most vulnerabilities are never actually exploited. Since writing, testing, and installing software patches can involve considerable resources, it would be desirable to prioritize the remediation of vulnerabilities that are likely to be exploited. Several published research studies have reported moderate success in applying machine learning techniques to the task of predicting whether a vulnerability will be exploited. These approaches typically use features derived from vulnerability databases (such as the summary text describing the vulnerability) or social media posts that mention the vulnerability by name. However, these prior studies share multiple methodological shortcomings that inflate predictive power of these approaches. We replicate key portions of the prior work, compare their approaches, and show how selection of training and test data critically affect the estimated performance of predictive models. The results of this study point to important methodological considerations that should be taken into account so that results reflect real-world utility.
{"title":"Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data","authors":"Benjamin L. Bullough, Anna K. Yanchenko, Christopher L. Smith, Joseph R. Zipkin","doi":"10.1145/3041008.3041009","DOIUrl":"https://doi.org/10.1145/3041008.3041009","url":null,"abstract":"Each year, thousands of software vulnerabilities are discovered and reported to the public. Unpatched known vulnerabilities are a significant security risk. It is imperative that software vendors quickly provide patches once vulnerabilities are known and users quickly install those patches as soon as they are available. However, most vulnerabilities are never actually exploited. Since writing, testing, and installing software patches can involve considerable resources, it would be desirable to prioritize the remediation of vulnerabilities that are likely to be exploited. Several published research studies have reported moderate success in applying machine learning techniques to the task of predicting whether a vulnerability will be exploited. These approaches typically use features derived from vulnerability databases (such as the summary text describing the vulnerability) or social media posts that mention the vulnerability by name. However, these prior studies share multiple methodological shortcomings that inflate predictive power of these approaches. We replicate key portions of the prior work, compare their approaches, and show how selection of training and test data critically affect the estimated performance of predictive models. The results of this study point to important methodological considerations that should be taken into account so that results reflect real-world utility.","PeriodicalId":137012,"journal":{"name":"Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117074594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}