Pub Date : 2017-07-01DOI: 10.1109/ISI.2017.8004908
Wingyan Chung, Elizabeth Mustaine, D. Zeng
Cyber-trafficking is the illegal transport of humans, drugs, weapons, or goods by means of Internet-enabled electronic devices. Currently, there is a lack of surveillance and understanding of the rapidly growing social concern about cyber-trafficking (CT). This paper describes the Cyber-Trafficking Surveillance System (CyTraSS) and provides preliminary findings of using the system to monitor CT social media discussions. CyTraSS supports flexible collection, analysis, and visualization of social media content, user linkage, and temporal features. The CyTraSS database contains a focused collection of over 2,318,691 social media messages posted by over 740,070 users who discussed about trafficking crimes and issues. CyTraSS supports keyword search, sentiment analysis, message statistics summarization, and influential leader identification. Emotion expressed in social media messages is extracted and aggregated quantitatively to indicate community mood. We examined three use cases about a sex trafficker identified by a flight attendant, Federal use of private prisons, and trafficking cases related to Beijing. These time-sensitive incidents are highly-relevant to CT and were identified by using clues provided by CyTraSS. The results have strong implications for understanding CT concern on social media.
{"title":"Criminal intelligence surveillance and monitoring on social media: Cases of cyber-trafficking","authors":"Wingyan Chung, Elizabeth Mustaine, D. Zeng","doi":"10.1109/ISI.2017.8004908","DOIUrl":"https://doi.org/10.1109/ISI.2017.8004908","url":null,"abstract":"Cyber-trafficking is the illegal transport of humans, drugs, weapons, or goods by means of Internet-enabled electronic devices. Currently, there is a lack of surveillance and understanding of the rapidly growing social concern about cyber-trafficking (CT). This paper describes the Cyber-Trafficking Surveillance System (CyTraSS) and provides preliminary findings of using the system to monitor CT social media discussions. CyTraSS supports flexible collection, analysis, and visualization of social media content, user linkage, and temporal features. The CyTraSS database contains a focused collection of over 2,318,691 social media messages posted by over 740,070 users who discussed about trafficking crimes and issues. CyTraSS supports keyword search, sentiment analysis, message statistics summarization, and influential leader identification. Emotion expressed in social media messages is extracted and aggregated quantitatively to indicate community mood. We examined three use cases about a sex trafficker identified by a flight attendant, Federal use of private prisons, and trafficking cases related to Beijing. These time-sensitive incidents are highly-relevant to CT and were identified by using clues provided by CyTraSS. The results have strong implications for understanding CT concern on social media.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131215032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISI.2017.8004888
Junjie Lin, W. Mao, D. Zeng
The competitive perspective implied in online texts reflect people's conflicts in their stances and viewpoints. Competitive perspective identification aims to determine people's inclinations to one of multiple competitive perspectives, which is an important research issue and can facilitate many security-related applications. As the word usage of different perspectives is distinct in various topics, in this paper, we first proposes a supervised topic-refined method for competitive perspective identification. Our method refines perspective classifiers with the document-topic distributions mined from texts. To reduce human labor in data annotation, we further extend our work in a semi-supervised manner and propose a user-based bootstrapping framework. As the perspectives people hold are relatively stable, our bootstrapping process leverages the user-level perspective consistency to select high-quality classified texts from unlabeled corpus and boost the perspective classifier iteratively. Experimental studies show the effectiveness of our proposed approach in identifying the competitive perspectives of online texts.
{"title":"Topic and user based refinement for competitive perspective identification","authors":"Junjie Lin, W. Mao, D. Zeng","doi":"10.1109/ISI.2017.8004888","DOIUrl":"https://doi.org/10.1109/ISI.2017.8004888","url":null,"abstract":"The competitive perspective implied in online texts reflect people's conflicts in their stances and viewpoints. Competitive perspective identification aims to determine people's inclinations to one of multiple competitive perspectives, which is an important research issue and can facilitate many security-related applications. As the word usage of different perspectives is distinct in various topics, in this paper, we first proposes a supervised topic-refined method for competitive perspective identification. Our method refines perspective classifiers with the document-topic distributions mined from texts. To reduce human labor in data annotation, we further extend our work in a semi-supervised manner and propose a user-based bootstrapping framework. As the perspectives people hold are relatively stable, our bootstrapping process leverages the user-level perspective consistency to select high-quality classified texts from unlabeled corpus and boost the perspective classifier iteratively. Experimental studies show the effectiveness of our proposed approach in identifying the competitive perspectives of online texts.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114200172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISI.2017.8004913
Su Hou, Tianliang Lu, Yanhui Du, Jing Guo
In recent years, smart phone becomes more and more popular. At the same time, the security threat of smart phone is growing. According to “Motive Security Labs Malware Report-H1 2015” [1] report, the number of Android malware is growing year by year. Many researchers focus on the security of Android applications based on permission. Felt et al. [2] designed the stowaway tool to detect the application's over-privilege. This tool can also identify and quantify the over-privilege triggered by developer errors. Enck et al. [3] proposed a security mechanism called Kirin. The Kirin consisted of nine permission rules. The more rules the application has, the more dangerous it is. But few studies use two-layer models for detection to improve accuracy.
近年来,智能手机变得越来越流行。与此同时,智能手机的安全威胁也越来越大。根据“Motive Security Labs恶意软件报告- 2015年上半年”[1]报告,Android恶意软件的数量正在逐年增长。许多研究人员关注基于权限的Android应用程序的安全性。Felt et al.[2]设计了偷渡者工具来检测应用程序的过度权限。该工具还可以识别和量化由开发人员错误触发的过度特权。Enck等人提出了一种名为麒麟的安全机制。麒麟由九条许可规则组成。应用程序的规则越多,它就越危险。但是很少有研究使用两层模型来提高检测的准确性。
{"title":"Static detection of Android malware based on improved random forest algorithm","authors":"Su Hou, Tianliang Lu, Yanhui Du, Jing Guo","doi":"10.1109/ISI.2017.8004913","DOIUrl":"https://doi.org/10.1109/ISI.2017.8004913","url":null,"abstract":"In recent years, smart phone becomes more and more popular. At the same time, the security threat of smart phone is growing. According to “Motive Security Labs Malware Report-H1 2015” [1] report, the number of Android malware is growing year by year. Many researchers focus on the security of Android applications based on permission. Felt et al. [2] designed the stowaway tool to detect the application's over-privilege. This tool can also identify and quantify the over-privilege triggered by developer errors. Enck et al. [3] proposed a security mechanism called Kirin. The Kirin consisted of nine permission rules. The more rules the application has, the more dangerous it is. But few studies use two-layer models for detection to improve accuracy.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116334193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISI.2017.8004899
Wingyan Chung
As cybercrimes and their data volumes proliferate, business professionals and public servants urgently need new knowledge and skills to address the growing threats. However, curricular materials, pedagogical research, and courses to address the data deluge in cybersecurity are not widely available. This research developed a contextual active learning approach to creating curricular modules for use in online informatics education. The approach emphasizes active and contextual learning in module design and deployment. Student participation, problem-based thinking, case study, and interactive question-answering and discussion are used. We implemented the approach in a new online course titled “Cybersecurity Informatics,” a cross-disciplinary subject that connects advanced information technologies, systems, algorithms, and databases with cybersecurity-related applications. Results from an expert evaluation indicate strongly positive comments and significant innovation on active learning. The research demonstrates a strong potential for using the approach to developing new cybersecurity informatics modules.
{"title":"Developing curricular modules for cybersecurity informatics: An active learning approach","authors":"Wingyan Chung","doi":"10.1109/ISI.2017.8004899","DOIUrl":"https://doi.org/10.1109/ISI.2017.8004899","url":null,"abstract":"As cybercrimes and their data volumes proliferate, business professionals and public servants urgently need new knowledge and skills to address the growing threats. However, curricular materials, pedagogical research, and courses to address the data deluge in cybersecurity are not widely available. This research developed a contextual active learning approach to creating curricular modules for use in online informatics education. The approach emphasizes active and contextual learning in module design and deployment. Student participation, problem-based thinking, case study, and interactive question-answering and discussion are used. We implemented the approach in a new online course titled “Cybersecurity Informatics,” a cross-disciplinary subject that connects advanced information technologies, systems, algorithms, and databases with cybersecurity-related applications. Results from an expert evaluation indicate strongly positive comments and significant innovation on active learning. The research demonstrates a strong potential for using the approach to developing new cybersecurity informatics modules.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116089607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISI.2017.8004865
Canchu Lin, Jenell L. S. Wittmer
This study attempts to explore the potential in employees to positively contribute to organizational information security management. Toward that end, this study developed the concept of proactive information security behavior and examined its connections to individual creativity and two organizational context factors: group culture and decentralized IT governance. Findings of this study supported its positive relationship with individual creativity and group culture as well as partial and full mediation effects of decentralized IT governance and individual creativity on the relationship between proactive information security behavior and group culture.
{"title":"Proactive information security behavior and individual creativity: Effects of group culture and decentralized IT governance","authors":"Canchu Lin, Jenell L. S. Wittmer","doi":"10.1109/ISI.2017.8004865","DOIUrl":"https://doi.org/10.1109/ISI.2017.8004865","url":null,"abstract":"This study attempts to explore the potential in employees to positively contribute to organizational information security management. Toward that end, this study developed the concept of proactive information security behavior and examined its connections to individual creativity and two organizational context factors: group culture and decentralized IT governance. Findings of this study supported its positive relationship with individual creativity and group culture as well as partial and full mediation effects of decentralized IT governance and individual creativity on the relationship between proactive information security behavior and group culture.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124854123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISI.2017.8004883
S. Bhattacharjee, A. Talukder, E. Al-Shaer, Pratik Doshi
Data analytics is being increasingly used in cyber-security problems, and found to be useful in cases where data volumes and heterogeneity make it cumbersome for manual assessment by security experts. In practical cyber-security scenarios involving data-driven analytics, obtaining data with annotations (i.e. ground-truth labels) is a challenging and known limiting factor for many supervised security analytics task. Significant portions of the large datasets typically remain unlabelled, as the task of annotation is extensively manual and requires a huge amount of expert intervention. In this paper, we propose an effective active learning approach that can efficiently address this limitation in a practical cyber-security problem of Phishing categorization, whereby we use a human-machine collaborative approach to design a semi-supervised solution. An initial classifier is learnt on a small amount of the annotated data which in an iterative manner, is then gradually updated by shortlisting only relevant samples from the large pool of unlabelled data that are most likely to influence the classifier performance fast. Prioritized Active Learning shows a significant promise to achieve faster convergence in terms of the classification performance in a batch learning framework, and thus requiring even lesser effort for human annotation. An useful feature weight update technique combined with active learning shows promising classification performance for categorizing Phishing/malicious URLs without requiring a large amount of annotated training samples to be available during training. In experiments with several collections of PhishMonger's Targeted Brand dataset, the proposed method shows significant improvement over the baseline by as much as 12%.
{"title":"Prioritized active learning for malicious URL detection using weighted text-based features","authors":"S. Bhattacharjee, A. Talukder, E. Al-Shaer, Pratik Doshi","doi":"10.1109/ISI.2017.8004883","DOIUrl":"https://doi.org/10.1109/ISI.2017.8004883","url":null,"abstract":"Data analytics is being increasingly used in cyber-security problems, and found to be useful in cases where data volumes and heterogeneity make it cumbersome for manual assessment by security experts. In practical cyber-security scenarios involving data-driven analytics, obtaining data with annotations (i.e. ground-truth labels) is a challenging and known limiting factor for many supervised security analytics task. Significant portions of the large datasets typically remain unlabelled, as the task of annotation is extensively manual and requires a huge amount of expert intervention. In this paper, we propose an effective active learning approach that can efficiently address this limitation in a practical cyber-security problem of Phishing categorization, whereby we use a human-machine collaborative approach to design a semi-supervised solution. An initial classifier is learnt on a small amount of the annotated data which in an iterative manner, is then gradually updated by shortlisting only relevant samples from the large pool of unlabelled data that are most likely to influence the classifier performance fast. Prioritized Active Learning shows a significant promise to achieve faster convergence in terms of the classification performance in a batch learning framework, and thus requiring even lesser effort for human annotation. An useful feature weight update technique combined with active learning shows promising classification performance for categorizing Phishing/malicious URLs without requiring a large amount of annotated training samples to be available during training. In experiments with several collections of PhishMonger's Targeted Brand dataset, the proposed method shows significant improvement over the baseline by as much as 12%.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126267772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISI.2017.8004918
Keyi Li, Yang Li, Jianyi Liu, Ru Zhang, Xi Duan
This paper proposes an attack pattern mining algorithm to extract attack pattern in massive security logs. The improved fuzzy clustering algorithm is used to generate sequence set. Then PrefixSpan is used to mine frequent sequence from the sequence set. The experimental results show that this algorithm can effectively mine the attack pattern, improve the accuracy and generate more valuable attack pattern.
{"title":"Attack pattern mining algorithm based on security log","authors":"Keyi Li, Yang Li, Jianyi Liu, Ru Zhang, Xi Duan","doi":"10.1109/ISI.2017.8004918","DOIUrl":"https://doi.org/10.1109/ISI.2017.8004918","url":null,"abstract":"This paper proposes an attack pattern mining algorithm to extract attack pattern in massive security logs. The improved fuzzy clustering algorithm is used to generate sequence set. Then PrefixSpan is used to mine frequent sequence from the sequence set. The experimental results show that this algorithm can effectively mine the attack pattern, improve the accuracy and generate more valuable attack pattern.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126317392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISI.2017.8004884
Song Sun, Qiudan Li, Peng Yan, D. Zeng
With the development of social media technology, users often register accounts, post messages and create friend links on several different platforms. Performing user identity mapping on multi-platform based on the behavior patterns of users is considerable for network supervision and personalization service. The existing methods focus on utilizing either text information or structure information alone. However, text information and structure information reflect different aspects of a user. An organic combination of them is beneficial to mining user behavior patterns, thus help identify users across platforms accurately. The challenging problems are the effective representation and similarity computation of the text and structure information. We propose a mapping method which integrates text and structure information. At first, the model represents user name, description, location information based on word2vec or string matching, and friend information represented as relation network is regarded as structure information. Then these information are used for similarity computation using Jaccard index or cosine similarity. After similarity computation, a linear model is adopted to get the overall similarity of user pairs to perform user mapping. Based on the proposed method, we develop a prototype system, which allows users to set and adjust the weights of different information, or set expected index. The experimental results on a real-world dataset demonstrate the efficiency of the proposed model.
{"title":"Mapping users across social media platforms by integrating text and structure information","authors":"Song Sun, Qiudan Li, Peng Yan, D. Zeng","doi":"10.1109/ISI.2017.8004884","DOIUrl":"https://doi.org/10.1109/ISI.2017.8004884","url":null,"abstract":"With the development of social media technology, users often register accounts, post messages and create friend links on several different platforms. Performing user identity mapping on multi-platform based on the behavior patterns of users is considerable for network supervision and personalization service. The existing methods focus on utilizing either text information or structure information alone. However, text information and structure information reflect different aspects of a user. An organic combination of them is beneficial to mining user behavior patterns, thus help identify users across platforms accurately. The challenging problems are the effective representation and similarity computation of the text and structure information. We propose a mapping method which integrates text and structure information. At first, the model represents user name, description, location information based on word2vec or string matching, and friend information represented as relation network is regarded as structure information. Then these information are used for similarity computation using Jaccard index or cosine similarity. After similarity computation, a linear model is adopted to get the overall similarity of user pairs to perform user mapping. Based on the proposed method, we develop a prototype system, which allows users to set and adjust the weights of different information, or set expected index. The experimental results on a real-world dataset demonstrate the efficiency of the proposed model.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"26 14","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132060750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISI.2017.8004879
Malaka El, Emma McMahon, S. Samtani, Mark W. Patton, Hsinchun Chen
Cybersecurity is a critical concern in society today. One common avenue of attack for malicious hackers is exploiting vulnerable websites. It is estimated that there are over one million websites that are attacked daily. Two emerging targets of such attacks are Supervisory Control and Data Acquisition (SCADA) devices and scientific instruments. Vulnerability assessment tools can help provide owners of these devices with the knowledge on how to protect their infrastructure. However, owners face difficulties in identifying which tools are ideal for their assessments. This research aims to benchmark two state-of-the-art vulnerability assessment tools, Nessus and Burp Suite, in the context of SCADA devices and scientific instruments. We specifically focus on identifying the accuracy, scalability, and vulnerability results of the scans. Results of our study indicate that both tools together can provide a comprehensive assessment of the vulnerabilities in SCADA devices and scientific instruments.
{"title":"Benchmarking vulnerability scanners: An experiment on SCADA devices and scientific instruments","authors":"Malaka El, Emma McMahon, S. Samtani, Mark W. Patton, Hsinchun Chen","doi":"10.1109/ISI.2017.8004879","DOIUrl":"https://doi.org/10.1109/ISI.2017.8004879","url":null,"abstract":"Cybersecurity is a critical concern in society today. One common avenue of attack for malicious hackers is exploiting vulnerable websites. It is estimated that there are over one million websites that are attacked daily. Two emerging targets of such attacks are Supervisory Control and Data Acquisition (SCADA) devices and scientific instruments. Vulnerability assessment tools can help provide owners of these devices with the knowledge on how to protect their infrastructure. However, owners face difficulties in identifying which tools are ideal for their assessments. This research aims to benchmark two state-of-the-art vulnerability assessment tools, Nessus and Burp Suite, in the context of SCADA devices and scientific instruments. We specifically focus on identifying the accuracy, scalability, and vulnerability results of the scans. Results of our study indicate that both tools together can provide a comprehensive assessment of the vulnerabilities in SCADA devices and scientific instruments.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116973468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISI.2017.8004907
D. Devyatkin, I. Smirnov, Ananyeva Margarita, M. Kobozeva, Chepovskiy Andrey, Solovyev Fyodor
In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution of differentiating features (lexical, semantic and psycholinguistic) to classification quality. The results of experiments show that psycholinguistic and semantic features are promising for extremist text detection.
{"title":"Exploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts)","authors":"D. Devyatkin, I. Smirnov, Ananyeva Margarita, M. Kobozeva, Chepovskiy Andrey, Solovyev Fyodor","doi":"10.1109/ISI.2017.8004907","DOIUrl":"https://doi.org/10.1109/ISI.2017.8004907","url":null,"abstract":"In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution of differentiating features (lexical, semantic and psycholinguistic) to classification quality. The results of experiments show that psycholinguistic and semantic features are promising for extremist text detection.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128273822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}