{"title":"Efficient parameter selection for SVM: The case of business intelligence categorization","authors":"Hsin-Hsiung Huang, Zijing Wang, Wingyan Chung","doi":"10.1109/ISI.2017.8004897","DOIUrl":null,"url":null,"abstract":"Support Vector Machines (SVM) is a widely used technique for classifying high-dimensional data, especially in security and intelligence categorization. However, the performance of SVM can be adversely affected by poorly selected parameter values. Current approaches to SVM parameter selection mainly rely on extensive cross validation or anecdotal information, which can be inefficient and ineffective. In this research, we propose an efficient algorithm called Percentile-SVM (P-SVM) for selecting the parameter pair, (γ, C), of SVM with Gaussian kernels on metric data. P-SVM searches only a handful of percentiles of the squared Euclidean distances of data points to select the best pair of parameter values. To validate the algorithm, we applied P-SVM to categorizing business intelligence factors extracted from 6,859 sentences of 231 online news articles about four major companies in the information technology sector. The results show that P-SVM achieved a significant improvement in precision, recall, F-measure, and AUC over the LibSVM package (with default parameter values) used in WEKA, a widely used data mining software. These findings provide useful implication for relevant research and security informatics applications.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISI.2017.8004897","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Support Vector Machines (SVM) is a widely used technique for classifying high-dimensional data, especially in security and intelligence categorization. However, the performance of SVM can be adversely affected by poorly selected parameter values. Current approaches to SVM parameter selection mainly rely on extensive cross validation or anecdotal information, which can be inefficient and ineffective. In this research, we propose an efficient algorithm called Percentile-SVM (P-SVM) for selecting the parameter pair, (γ, C), of SVM with Gaussian kernels on metric data. P-SVM searches only a handful of percentiles of the squared Euclidean distances of data points to select the best pair of parameter values. To validate the algorithm, we applied P-SVM to categorizing business intelligence factors extracted from 6,859 sentences of 231 online news articles about four major companies in the information technology sector. The results show that P-SVM achieved a significant improvement in precision, recall, F-measure, and AUC over the LibSVM package (with default parameter values) used in WEKA, a widely used data mining software. These findings provide useful implication for relevant research and security informatics applications.