There are many applications available for phishing detection. However, unlike predicting spam, there are only few studies that compare machine learning techniques in predicting phishing. The present study compares the predictive accuracy of several machine learning methods including Logistic Regression (LR), Classification and Regression Trees (CART), Bayesian Additive Regression Trees (BART), Support Vector Machines (SVM), Random Forests (RF), and Neural Networks (NNet) for predicting phishing emails. A data set of 2889 phishing and legitimate emails is used in the comparative study. In addition, 43 features are used to train and test the classifiers.
{"title":"A comparison of machine learning techniques for phishing detection","authors":"Saeed Abu-Nimeh, D. Nappa, Xinlei Wang, S. Nair","doi":"10.1145/1299015.1299021","DOIUrl":"https://doi.org/10.1145/1299015.1299021","url":null,"abstract":"There are many applications available for phishing detection. However, unlike predicting spam, there are only few studies that compare machine learning techniques in predicting phishing. The present study compares the predictive accuracy of several machine learning methods including Logistic Regression (LR), Classification and Regression Trees (CART), Bayesian Additive Regression Trees (BART), Support Vector Machines (SVM), Random Forests (RF), and Neural Networks (NNet) for predicting phishing emails. A data set of 2889 phishing and legitimate emails is used in the comparative study. In addition, 43 features are used to train and test the classifiers.","PeriodicalId":130252,"journal":{"name":"APWG Symposium on Electronic Crime Research","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129558447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We estimate of the extent of phishing activity on the Internet via capture-recapture analysis of two major phishing site reports. Capture-recapture analysis is a population estimation technique originally developed for wildlife conservation, but is applicable in any environment wherein multiple independent parties collect reports of an activity. Generating a meaningful population estimate for phishing activity requires addressing complex relationships between phishers and phishing reports. Phishers clandestinely occupy machines and adding evasive measures into phishing URLs to evade firewalls and other fraud-detection measures. Phishing reports, in the meantime, may be demonstrate a preference towards certain classes of phish. We address these problems by estimating population in terms of netblocks and by clustering phishing attempts together into scams, which are phishes that demonstrate similar behavior on multiple axes. We generate population estimates using data from two different phishing reports over an 80-day period, and show that these reports capture approximately 40% of scams and 80% of CIDR/24 (256 contiguous address) netblocks involved in phishing.
{"title":"Fishing for phishes: applying capture-recapture methods to estimate phishing populations","authors":"R. Weaver, Michael Patrick Collins","doi":"10.1145/1299015.1299017","DOIUrl":"https://doi.org/10.1145/1299015.1299017","url":null,"abstract":"We estimate of the extent of phishing activity on the Internet via capture-recapture analysis of two major phishing site reports. Capture-recapture analysis is a population estimation technique originally developed for wildlife conservation, but is applicable in any environment wherein multiple independent parties collect reports of an activity.\u0000 Generating a meaningful population estimate for phishing activity requires addressing complex relationships between phishers and phishing reports. Phishers clandestinely occupy machines and adding evasive measures into phishing URLs to evade firewalls and other fraud-detection measures. Phishing reports, in the meantime, may be demonstrate a preference towards certain classes of phish.\u0000 We address these problems by estimating population in terms of netblocks and by clustering phishing attempts together into scams, which are phishes that demonstrate similar behavior on multiple axes. We generate population estimates using data from two different phishing reports over an 80-day period, and show that these reports capture approximately 40% of scams and 80% of CIDR/24 (256 contiguous address) netblocks involved in phishing.","PeriodicalId":130252,"journal":{"name":"APWG Symposium on Electronic Crime Research","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129330868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tools that aim to combat phishing attacks must take into account how and why people fall for them in order to be effective. This study reports a pilot survey of 232 computer users to reveal predictors of falling for phishing emails, as well as trusting legitimate emails. Previous work suggests that people may be vulnerable to phishing schemes because their awareness of the risks is not linked to perceived vulnerability or to useful strategies in identifying phishing emails. In this survey, we explore what factors are associated with falling for phishing attacks in a role-play exercise. Our data suggest that deeper understanding of the web environment, such as being able to correctly interpret URLs and understanding what a lock signifies, is associated with less vulnerability to phishing attacks. Perceived severity of the consequences does not predict behavior. These results suggest that educational efforts should aim to increase users' intuitive understanding, rather than merely warning them about risks.
{"title":"Behavioral response to phishing risk","authors":"J. Downs, Mandy B. Holbrook, L. Cranor","doi":"10.1145/1299015.1299019","DOIUrl":"https://doi.org/10.1145/1299015.1299019","url":null,"abstract":"Tools that aim to combat phishing attacks must take into account how and why people fall for them in order to be effective. This study reports a pilot survey of 232 computer users to reveal predictors of falling for phishing emails, as well as trusting legitimate emails. Previous work suggests that people may be vulnerable to phishing schemes because their awareness of the risks is not linked to perceived vulnerability or to useful strategies in identifying phishing emails. In this survey, we explore what factors are associated with falling for phishing attacks in a role-play exercise. Our data suggest that deeper understanding of the web environment, such as being able to correctly interpret URLs and understanding what a lock signifies, is associated with less vulnerability to phishing attacks. Perceived severity of the consequences does not predict behavior. These results suggest that educational efforts should aim to increase users' intuitive understanding, rather than merely warning them about risks.","PeriodicalId":130252,"journal":{"name":"APWG Symposium on Electronic Crime Research","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127998041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the last few years, obfuscation has been used more and more by spammers to make spam emails bypass filters. The standard method is to use images that look like text, since typical spam filters are unable to parse such messages; this is what is used in so-called "rock phishing". To fight image-based spam, many spam filters use heuristic rules in which emails containing images are flagged, and since not many legit emails are composed mainly of a big image, this aids in detecting image-based spam. The spammers are thus interested in circumventing these methods. Unicode transliteration is a convenient tool for spammers, since it allows a spammer to create a large number of homomorphic clones of the same looking message; since Unicode contains many characters that are unique but appear very similar, spammers can translate a message's characters at random to hide black-listed words in an effort to bypass filters. In order to defend against these unicode-obfuscated spam emails, we developed a prototype tool that can be used with Spam Assassin to block spam obfuscated in this way by mapping polymorphic messages to a common, more homogeneous representation. This representation can then be filtered using traditional methods. We demonstrate the ease with which Unicode polymorphism can be used to circumvent spam filters such as SpamAssassin, and then describe a de-obfuscation technique that can be used to catch messages that have been obfuscated in this fashion.
{"title":"Fighting unicode-obfuscated spam","authors":"Changwei Liu, Sid Stamm","doi":"10.1145/1299015.1299020","DOIUrl":"https://doi.org/10.1145/1299015.1299020","url":null,"abstract":"In the last few years, obfuscation has been used more and more by spammers to make spam emails bypass filters. The standard method is to use images that look like text, since typical spam filters are unable to parse such messages; this is what is used in so-called \"rock phishing\". To fight image-based spam, many spam filters use heuristic rules in which emails containing images are flagged, and since not many legit emails are composed mainly of a big image, this aids in detecting image-based spam. The spammers are thus interested in circumventing these methods. Unicode transliteration is a convenient tool for spammers, since it allows a spammer to create a large number of homomorphic clones of the same looking message; since Unicode contains many characters that are unique but appear very similar, spammers can translate a message's characters at random to hide black-listed words in an effort to bypass filters. In order to defend against these unicode-obfuscated spam emails, we developed a prototype tool that can be used with Spam Assassin to block spam obfuscated in this way by mapping polymorphic messages to a common, more homogeneous representation. This representation can then be filtered using traditional methods. We demonstrate the ease with which Unicode polymorphism can be used to circumvent spam filters such as SpamAssassin, and then describe a de-obfuscation technique that can be used to catch messages that have been obfuscated in this fashion.","PeriodicalId":130252,"journal":{"name":"APWG Symposium on Electronic Crime Research","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116091395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Banks and other organisations deal with fraudulent phishing websites by pressing hosting service providers to remove the sites from the Internet. Until they are removed, the fraudsters learn the passwords, personal identification numbers (PINs) and other personal details of the users who are fooled into visiting them. We analyse empirical data on phishing website removal times and the number of visitors that the websites attract, and conclude that website removal is part of the answer to phishing, but it is not fast enough to completely mitigate the problem. The removal times have a good fit to a lognormal distribution, but within the general pattern there is ample evidence that some service providers are faster than others at removing sites, and that some brands can get fraudulent sites removed more quickly. We particularly examine a major subset of phishing websites (operated by the 'rock-phish' gang) which accounts for around half of all phishing activity and whose architectural innovations have extended their average lifetime. Finally, we provide a ballpark estimate of the total loss being suffered by the banking sector from the phishing websites we observed.
{"title":"Examining the impact of website take-down on phishing","authors":"T. Moore, R. Clayton","doi":"10.1145/1299015.1299016","DOIUrl":"https://doi.org/10.1145/1299015.1299016","url":null,"abstract":"Banks and other organisations deal with fraudulent phishing websites by pressing hosting service providers to remove the sites from the Internet. Until they are removed, the fraudsters learn the passwords, personal identification numbers (PINs) and other personal details of the users who are fooled into visiting them. We analyse empirical data on phishing website removal times and the number of visitors that the websites attract, and conclude that website removal is part of the answer to phishing, but it is not fast enough to completely mitigate the problem. The removal times have a good fit to a lognormal distribution, but within the general pattern there is ample evidence that some service providers are faster than others at removing sites, and that some brands can get fraudulent sites removed more quickly. We particularly examine a major subset of phishing websites (operated by the 'rock-phish' gang) which accounts for around half of all phishing activity and whose architectural innovations have extended their average lifetime. Finally, we provide a ballpark estimate of the total loss being suffered by the banking sector from the phishing websites we observed.","PeriodicalId":130252,"journal":{"name":"APWG Symposium on Electronic Crime Research","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134570325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a scheme that exploits scale to prevent phishing. We show that while stopping phishers from obtaining passwords is very hard, detecting the fact that a password has been entered at an unfamiliar site is simple. Our solution involves a client that reports Password Re-Use (PRU) events at unfamiliar sites, and a server that accumulates these reports and detects an attack. We show that it is simple to then mitigate the damage by communicating the identities of phished accounts to the institution under attack. Thus, we make no attempt to prevent information leakage, but we try to detect and then rescue users from the consequences of bad trust decisions. The scheme requires deployment on a large scale to realize the major benefits: reliable low latency detection of attacks, and mitigation of compromised accounts. We harness scale against the attacker instead of trying to solve the problem at each client. In [13] we sketched the idea, but questions relating to false positives and the scale required for efficacy remained unanswered. We present results from a trial deployment of half a million clients. We explain the scheme in detail, analyze its performance, and examine a number of anticipated attacks.
{"title":"Evaluating a trial deployment of password re-use for phishing prevention","authors":"D. Florêncio, Cormac Herley","doi":"10.1145/1299015.1299018","DOIUrl":"https://doi.org/10.1145/1299015.1299018","url":null,"abstract":"We propose a scheme that exploits scale to prevent phishing. We show that while stopping phishers from obtaining passwords is very hard, detecting the fact that a password has been entered at an unfamiliar site is simple. Our solution involves a client that reports Password Re-Use (PRU) events at unfamiliar sites, and a server that accumulates these reports and detects an attack. We show that it is simple to then mitigate the damage by communicating the identities of phished accounts to the institution under attack. Thus, we make no attempt to prevent information leakage, but we try to detect and then rescue users from the consequences of bad trust decisions.\u0000 The scheme requires deployment on a large scale to realize the major benefits: reliable low latency detection of attacks, and mitigation of compromised accounts. We harness scale against the attacker instead of trying to solve the problem at each client. In [13] we sketched the idea, but questions relating to false positives and the scale required for efficacy remained unanswered. We present results from a trial deployment of half a million clients. We explain the scheme in detail, analyze its performance, and examine a number of anticipated attacks.","PeriodicalId":130252,"journal":{"name":"APWG Symposium on Electronic Crime Research","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127154625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Kumaraguru, Yong Rhee, Steve Sheng, Sharique Hasan, A. Acquisti, L. Cranor, Jason I. Hong
Educational materials designed to teach users not to fall for phishing attacks are widely available but are often ignored by users. In this paper, we extend an embedded training methodology using learning science principles in which phishing education is made part of a primary task for users. The goal is to motivate users to pay attention to the training materials. In embedded training, users are sent simulated phishing attacks and trained after they fall for the attacks. Prior studies tested users immediately after training and demonstrated that embedded training improved users' ability to identify phishing emails and websites. In the present study, we tested users to determine how well they retained knowledge gained through embedded training and how well they transferred this knowledge to identify other types of phishing emails. We also compared the effectiveness of the same training materials delivered via embedded training and delivered as regular email messages. In our experiments, we found that: (a) users learn more effectively when the training materials are presented after users fall for the attack (embedded) than when the same training materials are sent by email (non-embedded); (b) users retain and transfer more knowledge after embedded training than after non-embedded training; and (c) users with higher Cognitive Reflection Test (CRT) scores are more likely than users with lower CRT scores to click on the links in the phishing emails from companies with which they have no account.
{"title":"Getting users to pay attention to anti-phishing education: evaluation of retention and transfer","authors":"P. Kumaraguru, Yong Rhee, Steve Sheng, Sharique Hasan, A. Acquisti, L. Cranor, Jason I. Hong","doi":"10.1145/1299015.1299022","DOIUrl":"https://doi.org/10.1145/1299015.1299022","url":null,"abstract":"Educational materials designed to teach users not to fall for phishing attacks are widely available but are often ignored by users. In this paper, we extend an embedded training methodology using learning science principles in which phishing education is made part of a primary task for users. The goal is to motivate users to pay attention to the training materials. In embedded training, users are sent simulated phishing attacks and trained after they fall for the attacks. Prior studies tested users immediately after training and demonstrated that embedded training improved users' ability to identify phishing emails and websites. In the present study, we tested users to determine how well they retained knowledge gained through embedded training and how well they transferred this knowledge to identify other types of phishing emails. We also compared the effectiveness of the same training materials delivered via embedded training and delivered as regular email messages. In our experiments, we found that: (a) users learn more effectively when the training materials are presented after users fall for the attack (embedded) than when the same training materials are sent by email (non-embedded); (b) users retain and transfer more knowledge after embedded training than after non-embedded training; and (c) users with higher Cognitive Reflection Test (CRT) scores are more likely than users with lower CRT scores to click on the links in the phishing emails from companies with which they have no account.","PeriodicalId":130252,"journal":{"name":"APWG Symposium on Electronic Crime Research","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134086641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}