Pub Date : 2012-06-11DOI: 10.1109/ISI.2012.6284316
Weiyun Chen, Xin Li
The wisdom of crowds has been recognized as an effective decision making mechanism by aggregating information from different individuals to derive an overall decision. However, in this information aggregation process, individuals may be influenced by various factors and provide biased claims (or individual level decisions), especially when such claims are related to their economic benefits. In this research, we investigate crowd's claims in binary decisions under explicit constant influence and aim to understand their real but hidden belief (distribution) on the decision. Particularly, we take fixed odds betting on binary events as a representative scenario in this study. We model the relationship between event probability and crowds' belief distribution as a linear combination of Beta distributions. Taking a Maximization Likelihood Estimation (MLE) paradigm, we estimate the parameters of this distribution based on observed crowds' bets. In this process, we model individual betting decisions under the influence of odds using prospect theory. We apply the framework on a real world dataset of Olympic Games outcome betting. After identifying betting participants' hidden belief distribution, we also found that crowds' belief tend to tilt to the high probability side of the event (if there is no outside influence), which partially explains why the wisdom of crowds can make decision marking easier. We believe this paper contributes to the literature of crowd intelligence and can help generating more accurate digestions of the wisdom of crowds.
{"title":"Deciphering wisdom of crowds from their influenced binary decisions","authors":"Weiyun Chen, Xin Li","doi":"10.1109/ISI.2012.6284316","DOIUrl":"https://doi.org/10.1109/ISI.2012.6284316","url":null,"abstract":"The wisdom of crowds has been recognized as an effective decision making mechanism by aggregating information from different individuals to derive an overall decision. However, in this information aggregation process, individuals may be influenced by various factors and provide biased claims (or individual level decisions), especially when such claims are related to their economic benefits. In this research, we investigate crowd's claims in binary decisions under explicit constant influence and aim to understand their real but hidden belief (distribution) on the decision. Particularly, we take fixed odds betting on binary events as a representative scenario in this study. We model the relationship between event probability and crowds' belief distribution as a linear combination of Beta distributions. Taking a Maximization Likelihood Estimation (MLE) paradigm, we estimate the parameters of this distribution based on observed crowds' bets. In this process, we model individual betting decisions under the influence of odds using prospect theory. We apply the framework on a real world dataset of Olympic Games outcome betting. After identifying betting participants' hidden belief distribution, we also found that crowds' belief tend to tilt to the high probability side of the event (if there is no outside influence), which partially explains why the wisdom of crowds can make decision marking easier. We believe this paper contributes to the literature of crowd intelligence and can help generating more accurate digestions of the wisdom of crowds.","PeriodicalId":199734,"journal":{"name":"2012 IEEE International Conference on Intelligence and Security Informatics","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126873599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-11DOI: 10.1109/ISI.2012.6284293
Kelly Hughes, Yanzhen Qu
Summary form only given. Managers must make decisions based on limited budgets on how best to protect their networks. Resources should be allocated across three different areas: Protect, Detect, and Response. Often what might seem as an obvious solution is not the best solution for resource allocation and networks. A model using logistic regression can help a manager determine the level of probability, based on the allocation of resources, organization objectives and certain attack characteristics that the network will be within an acceptable level of risk.
{"title":"A generic cyber attack response resource risk assessment model","authors":"Kelly Hughes, Yanzhen Qu","doi":"10.1109/ISI.2012.6284293","DOIUrl":"https://doi.org/10.1109/ISI.2012.6284293","url":null,"abstract":"Summary form only given. Managers must make decisions based on limited budgets on how best to protect their networks. Resources should be allocated across three different areas: Protect, Detect, and Response. Often what might seem as an obvious solution is not the best solution for resource allocation and networks. A model using logistic regression can help a manager determine the level of probability, based on the allocation of resources, organization objectives and certain attack characteristics that the network will be within an acceptable level of risk.","PeriodicalId":199734,"journal":{"name":"2012 IEEE International Conference on Intelligence and Security Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129565689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-11DOI: 10.1109/ISI.2012.6284280
C. Leuprecht, T. Hataley, D. Skillicorn
The idea that Canadian-based terrorists pose a threat to the United States continues to resonate with Americans. We subject this hypothesis to empirical testing by analyzing terrorist-related activity across the Canada-US border. Drawing on 13 cases with 27 terrorist connections, the evidence substantiates the presence of cross-border interactions, but does not confirm common perceptions about America's northern border: there is no consistent threat emanating from Canada. Rather, differentials in the availability of ideas and resources drive threat vectors across the border in both directions. The bulk of violent extremists exploiting these cross-border markets of opportunity do so to propagate terrorism beyond North America.
{"title":"Vectors of extremism across the Canada-US border","authors":"C. Leuprecht, T. Hataley, D. Skillicorn","doi":"10.1109/ISI.2012.6284280","DOIUrl":"https://doi.org/10.1109/ISI.2012.6284280","url":null,"abstract":"The idea that Canadian-based terrorists pose a threat to the United States continues to resonate with Americans. We subject this hypothesis to empirical testing by analyzing terrorist-related activity across the Canada-US border. Drawing on 13 cases with 27 terrorist connections, the evidence substantiates the presence of cross-border interactions, but does not confirm common perceptions about America's northern border: there is no consistent threat emanating from Canada. Rather, differentials in the availability of ideas and resources drive threat vectors across the border in both directions. The bulk of violent extremists exploiting these cross-border markets of opportunity do so to propagate terrorism beyond North America.","PeriodicalId":199734,"journal":{"name":"2012 IEEE International Conference on Intelligence and Security Informatics","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123897064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-11DOI: 10.1109/ISI.2012.6284307
T. Bourlai, B. Cukic
In this paper we study the problems of intra-spectral and cross-spectral face recognition (FR) in homogeneous and heterogeneous environments. Specifically we investigate the advantages and limitations of matching (i) short wave infrared (SWIR) face images to visible images under controlled or uncontrolled conditions, (ii) mid-wave infrared (MWIR) to MWIR or visible images under controlled conditions, and (iii) intra-distance near infrared (NIR) to NIR images and cross-distance, cross-spectral NIR to visible images. All NIR images were captured night-time, outdoors and at mid-ranges (from 30 up to 120 meters). We utilized both commercial and academic face matchers and performed a set of experiments indicating that our cross-photometric score level fusion rule can be utilized to improve SWIR cross-spectral matching performance across all FR scenarios investigated. We also show that intra-spectral matching results, using either MWIR or NIR images, are comparable to the baseline results, i.e., when comparing visible to visible face images. Our experiments also indicate that the level of improvement in recognition performance is scenario dependent. Experiments also show that cross-spectral matching (the heterogeneous problem, where gallery and probe sets have face images acquired in different spectral bands) is a very challenging problem and it requires further investigation to address real-world law enforcement or military situations.
{"title":"Multi-spectral face recognition: Identification of people in difficult environments","authors":"T. Bourlai, B. Cukic","doi":"10.1109/ISI.2012.6284307","DOIUrl":"https://doi.org/10.1109/ISI.2012.6284307","url":null,"abstract":"In this paper we study the problems of intra-spectral and cross-spectral face recognition (FR) in homogeneous and heterogeneous environments. Specifically we investigate the advantages and limitations of matching (i) short wave infrared (SWIR) face images to visible images under controlled or uncontrolled conditions, (ii) mid-wave infrared (MWIR) to MWIR or visible images under controlled conditions, and (iii) intra-distance near infrared (NIR) to NIR images and cross-distance, cross-spectral NIR to visible images. All NIR images were captured night-time, outdoors and at mid-ranges (from 30 up to 120 meters). We utilized both commercial and academic face matchers and performed a set of experiments indicating that our cross-photometric score level fusion rule can be utilized to improve SWIR cross-spectral matching performance across all FR scenarios investigated. We also show that intra-spectral matching results, using either MWIR or NIR images, are comparable to the baseline results, i.e., when comparing visible to visible face images. Our experiments also indicate that the level of improvement in recognition performance is scenario dependent. Experiments also show that cross-spectral matching (the heterogeneous problem, where gallery and probe sets have face images acquired in different spectral bands) is a very challenging problem and it requires further investigation to address real-world law enforcement or military situations.","PeriodicalId":199734,"journal":{"name":"2012 IEEE International Conference on Intelligence and Security Informatics","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124029325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-11DOI: 10.1109/ISI.2012.6284272
Vance Chiang-Chi Liao, Ming-Syan Chen
Among the biological sequences, sequential pattern mining reveals implicit motifs/patterns, which are of functional significance and have specific structures. Small alphabets and long sequences, such as DNA and protein sequences, are difficult to handle by traditional sequential pattern mining algorithms. Furthermore, the intra- and inter-blocked gap constraints can deal with the substitutions, insertions, loops, and deletions in evolution process. Hence we propose an approach called Depth-first spelling algorithm for mining structural motifs with Intra- and inter-Block gap constraints in biological sequences (referred to as DIB). DIB has two execution steps. First, it constructs a three-dimensional table of sequences by scanning the given dataset once. Second, DIB-Exuberance generates intra- and inter-blocked gap sequential patterns. Candidate intra- and inter-blocked gap sequential pattern spelling and pattern verification are carried out by DIB-Exuberance in a depth-first manner. Intra and inter gap constraints are handled by the intra- and inter-block counting matrices. The block size matrix deals with intra- and inter-block size constraints. In biological sequences, DIB's runtime is much shorter than BASIC.
{"title":"Efficient mining structural motifs for biosequences with intra- and inter-block gap constraints","authors":"Vance Chiang-Chi Liao, Ming-Syan Chen","doi":"10.1109/ISI.2012.6284272","DOIUrl":"https://doi.org/10.1109/ISI.2012.6284272","url":null,"abstract":"Among the biological sequences, sequential pattern mining reveals implicit motifs/patterns, which are of functional significance and have specific structures. Small alphabets and long sequences, such as DNA and protein sequences, are difficult to handle by traditional sequential pattern mining algorithms. Furthermore, the intra- and inter-blocked gap constraints can deal with the substitutions, insertions, loops, and deletions in evolution process. Hence we propose an approach called Depth-first spelling algorithm for mining structural motifs with Intra- and inter-Block gap constraints in biological sequences (referred to as DIB). DIB has two execution steps. First, it constructs a three-dimensional table of sequences by scanning the given dataset once. Second, DIB-Exuberance generates intra- and inter-blocked gap sequential patterns. Candidate intra- and inter-blocked gap sequential pattern spelling and pattern verification are carried out by DIB-Exuberance in a depth-first manner. Intra and inter gap constraints are handled by the intra- and inter-block counting matrices. The block size matrix deals with intra- and inter-block size constraints. In biological sequences, DIB's runtime is much shorter than BASIC.","PeriodicalId":199734,"journal":{"name":"2012 IEEE International Conference on Intelligence and Security Informatics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127736621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-11DOI: 10.1109/ISI.2012.6283425
Zhenmin Lin, J. Jaromczyk
We propose a new efficient cryptography-based secure comparison protocol for comparing secrets that are additively split between two parties. Our solution, based on homomorphic cryptosystems, needs 2N + 6 invocations of secure multiplications when the two secrets are numbers in the range [0; 2N); previous solutions required 12N + O(1) secure multiplications. The protocol provides substantial performance improvement in privacy preserving data mining protocols that use comparison as a primitive operation. In particular, we experimentally evaluate the performance of our secure comparison protocol in the implementation of a secure k-means clustering protocol applied to several real datasets.
{"title":"An efficient secure comparison protocol","authors":"Zhenmin Lin, J. Jaromczyk","doi":"10.1109/ISI.2012.6283425","DOIUrl":"https://doi.org/10.1109/ISI.2012.6283425","url":null,"abstract":"We propose a new efficient cryptography-based secure comparison protocol for comparing secrets that are additively split between two parties. Our solution, based on homomorphic cryptosystems, needs 2N + 6 invocations of secure multiplications when the two secrets are numbers in the range [0; 2N); previous solutions required 12N + O(1) secure multiplications. The protocol provides substantial performance improvement in privacy preserving data mining protocols that use comparison as a primitive operation. In particular, we experimentally evaluate the performance of our secure comparison protocol in the implementation of a secure k-means clustering protocol applied to several real datasets.","PeriodicalId":199734,"journal":{"name":"2012 IEEE International Conference on Intelligence and Security Informatics","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122243054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-11DOI: 10.1109/ISI.2012.6284290
Ansheng Ge, W. Mao, D. Zeng, Qingchao Kong, Huachi Zhu
Actions are the primary way an entity interacts with other entities and acts on the external world. Action knowledge is of vital importance for behavior modeling, analysis and prediction in security informatics. In this paper, we present our approach to action knowledge extraction from Web textual data. Our approach is based on mutual bootstrapping with knowledge reasoning, which can acquire more action knowledge types and require less human participation compared with the related work. We evaluate the performance of our method and demonstrate its effectiveness through experiment.
{"title":"Extracting action knowledge in security informatics","authors":"Ansheng Ge, W. Mao, D. Zeng, Qingchao Kong, Huachi Zhu","doi":"10.1109/ISI.2012.6284290","DOIUrl":"https://doi.org/10.1109/ISI.2012.6284290","url":null,"abstract":"Actions are the primary way an entity interacts with other entities and acts on the external world. Action knowledge is of vital importance for behavior modeling, analysis and prediction in security informatics. In this paper, we present our approach to action knowledge extraction from Web textual data. Our approach is based on mutual bootstrapping with knowledge reasoning, which can acquire more action knowledge types and require less human participation compared with the related work. We evaluate the performance of our method and demonstrate its effectiveness through experiment.","PeriodicalId":199734,"journal":{"name":"2012 IEEE International Conference on Intelligence and Security Informatics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115932785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-11DOI: 10.1109/ISI.2012.6284279
Wingyan Chung
In many emergency incidents, multiple reports and information sources are often used to help intelligence and security personnel to understand the situation during a short time period. Proper categorization and analysis of this information could enhance the efficiency of handling this large amount of potentially conflicting information, thus contributing to saving lives. The study of categorization of temporal events in cyber security application is, however, not widely found. In this research, we developed an automated approach to categorizing temporal events described in textual documents. The approach consists of automatic indexing, term extraction, and automatic categorization. We conducted a case study of domestic terrorism where we analyzed 96 online news articles about a shooting tragedy that resulted in 6 deaths and 1 seriously injured. Analyses of different numbers of extracted textual features (from 20 to 100) used in the temporal categorization revealed a gradual improvement of classification accuracies across different algorithms used. Naïve Bayes and SVM classification provided stable improvement (from 47% to 68%), whereas Neural Network had the highest accuracy when 70 features were used. The results provide new insights for researchers and intelligence personnel to understand the relationship between textual features and emergency event evolution.
{"title":"Categorizing temporal events: A case study of domestic terrorism","authors":"Wingyan Chung","doi":"10.1109/ISI.2012.6284279","DOIUrl":"https://doi.org/10.1109/ISI.2012.6284279","url":null,"abstract":"In many emergency incidents, multiple reports and information sources are often used to help intelligence and security personnel to understand the situation during a short time period. Proper categorization and analysis of this information could enhance the efficiency of handling this large amount of potentially conflicting information, thus contributing to saving lives. The study of categorization of temporal events in cyber security application is, however, not widely found. In this research, we developed an automated approach to categorizing temporal events described in textual documents. The approach consists of automatic indexing, term extraction, and automatic categorization. We conducted a case study of domestic terrorism where we analyzed 96 online news articles about a shooting tragedy that resulted in 6 deaths and 1 seriously injured. Analyses of different numbers of extracted textual features (from 20 to 100) used in the temporal categorization revealed a gradual improvement of classification accuracies across different algorithms used. Naïve Bayes and SVM classification provided stable improvement (from 47% to 68%), whereas Neural Network had the highest accuracy when 70 features were used. The results provide new insights for researchers and intelligence personnel to understand the relationship between textual features and emergency event evolution.","PeriodicalId":199734,"journal":{"name":"2012 IEEE International Conference on Intelligence and Security Informatics","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127905729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-11DOI: 10.1109/ISI.2012.6284294
Peng He, George Karabatis
Intrusion detection is one of the most challenging tasks and of highest priority in the cyber security field; however, traditional intrusion detection techniques often fail to handle the complex and uncertain network attack correlation tasks. We propose the usage of semantic networks that build relationships among network attacks and assist in automatically identifying and predicting related attacks. Also, our method can increase the precision in detecting probable attacks. Experimental results show that our Semantic Network using the Anderberg similarity measure performs better in terms of precision and recall compared to existing correlation approaches in the cyber security domain. Specifically, our contributions are as follows: (1) We automatically construct a first mode Semantic Network from characterizing features of network attacks using similarity. (2) The first mode semantic network is calibrated by adding external semantic rules provided by domain experts, in order to generate a more adaptable second mode semantic network. (3) We evaluated the prediction capability of the semantic networks by experimenting with various similarity measures including Anderberg, Jaccard, Simple Matching and traditional correlation coefficients; we discovered that the “Anderberg” similarity coefficients outperform all other tested similarity measures in terms of precision and recall.
{"title":"Using semantic networks to counter cyber threats","authors":"Peng He, George Karabatis","doi":"10.1109/ISI.2012.6284294","DOIUrl":"https://doi.org/10.1109/ISI.2012.6284294","url":null,"abstract":"Intrusion detection is one of the most challenging tasks and of highest priority in the cyber security field; however, traditional intrusion detection techniques often fail to handle the complex and uncertain network attack correlation tasks. We propose the usage of semantic networks that build relationships among network attacks and assist in automatically identifying and predicting related attacks. Also, our method can increase the precision in detecting probable attacks. Experimental results show that our Semantic Network using the Anderberg similarity measure performs better in terms of precision and recall compared to existing correlation approaches in the cyber security domain. Specifically, our contributions are as follows: (1) We automatically construct a first mode Semantic Network from characterizing features of network attacks using similarity. (2) The first mode semantic network is calibrated by adding external semantic rules provided by domain experts, in order to generate a more adaptable second mode semantic network. (3) We evaluated the prediction capability of the semantic networks by experimenting with various similarity measures including Anderberg, Jaccard, Simple Matching and traditional correlation coefficients; we discovered that the “Anderberg” similarity coefficients outperform all other tested similarity measures in terms of precision and recall.","PeriodicalId":199734,"journal":{"name":"2012 IEEE International Conference on Intelligence and Security Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128805056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-11DOI: 10.1109/ISI.2012.6284100
Venkatesh Ramanathan, H. Wechsler
One of the ways criminals steal identity in the cyberspace is using phishing. Attackers host phishing websites that resemble a legitimate website and entice users to click on hyperlinks which directs them to these fake websites. Attackers use these fake sites to capture personal information such as login, passwords and social security numbers from innocent victims, which they later use to commit crimes. We propose here a robust methodology to detect phishing websites that employs for semantic analysis a topic modeling technique, Latent Dirichlet Allocation, and for classification, AdaBoost. The methodology developed is a content driven approach that is device independent and language neutral. The website content of mobile and desktop clients are collected by employing an intelligent web crawler. The website contents that are not in English are translated to English using Google's language translator. Topic model is built using the translated contents of desktop and mobile clients. The phishing website classifier is built using (i) distribution probabilities for the topics found as features using Latent Dirichlet Allocation and (ii) AdaBoost voting technique. Experiments were conducted using one of the large public corpus of website data containing 47500 phishing websites and 52500 good websites. Results show that our method achieves a F-measure of 99%.
{"title":"Phishing website detection using Latent Dirichlet Allocation and AdaBoost","authors":"Venkatesh Ramanathan, H. Wechsler","doi":"10.1109/ISI.2012.6284100","DOIUrl":"https://doi.org/10.1109/ISI.2012.6284100","url":null,"abstract":"One of the ways criminals steal identity in the cyberspace is using phishing. Attackers host phishing websites that resemble a legitimate website and entice users to click on hyperlinks which directs them to these fake websites. Attackers use these fake sites to capture personal information such as login, passwords and social security numbers from innocent victims, which they later use to commit crimes. We propose here a robust methodology to detect phishing websites that employs for semantic analysis a topic modeling technique, Latent Dirichlet Allocation, and for classification, AdaBoost. The methodology developed is a content driven approach that is device independent and language neutral. The website content of mobile and desktop clients are collected by employing an intelligent web crawler. The website contents that are not in English are translated to English using Google's language translator. Topic model is built using the translated contents of desktop and mobile clients. The phishing website classifier is built using (i) distribution probabilities for the topics found as features using Latent Dirichlet Allocation and (ii) AdaBoost voting technique. Experiments were conducted using one of the large public corpus of website data containing 47500 phishing websites and 52500 good websites. Results show that our method achieves a F-measure of 99%.","PeriodicalId":199734,"journal":{"name":"2012 IEEE International Conference on Intelligence and Security Informatics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123001702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}