支持向量机的有效参数选择:以商业智能分类为例

2017 IEEE International Conference on Intelligence and Security Informatics (ISI) Pub Date : 2017-07-22 DOI:10.1109/ISI.2017.8004897

Hsin-Hsiung Huang, Zijing Wang, Wingyan Chung

{"title":"支持向量机的有效参数选择:以商业智能分类为例","authors":"Hsin-Hsiung Huang, Zijing Wang, Wingyan Chung","doi":"10.1109/ISI.2017.8004897","DOIUrl":null,"url":null,"abstract":"Support Vector Machines (SVM) is a widely used technique for classifying high-dimensional data, especially in security and intelligence categorization. However, the performance of SVM can be adversely affected by poorly selected parameter values. Current approaches to SVM parameter selection mainly rely on extensive cross validation or anecdotal information, which can be inefficient and ineffective. In this research, we propose an efficient algorithm called Percentile-SVM (P-SVM) for selecting the parameter pair, (γ, C), of SVM with Gaussian kernels on metric data. P-SVM searches only a handful of percentiles of the squared Euclidean distances of data points to select the best pair of parameter values. To validate the algorithm, we applied P-SVM to categorizing business intelligence factors extracted from 6,859 sentences of 231 online news articles about four major companies in the information technology sector. The results show that P-SVM achieved a significant improvement in precision, recall, F-measure, and AUC over the LibSVM package (with default parameter values) used in WEKA, a widely used data mining software. These findings provide useful implication for relevant research and security informatics applications.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Efficient parameter selection for SVM: The case of business intelligence categorization\",\"authors\":\"Hsin-Hsiung Huang, Zijing Wang, Wingyan Chung\",\"doi\":\"10.1109/ISI.2017.8004897\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Support Vector Machines (SVM) is a widely used technique for classifying high-dimensional data, especially in security and intelligence categorization. However, the performance of SVM can be adversely affected by poorly selected parameter values. Current approaches to SVM parameter selection mainly rely on extensive cross validation or anecdotal information, which can be inefficient and ineffective. In this research, we propose an efficient algorithm called Percentile-SVM (P-SVM) for selecting the parameter pair, (γ, C), of SVM with Gaussian kernels on metric data. P-SVM searches only a handful of percentiles of the squared Euclidean distances of data points to select the best pair of parameter values. To validate the algorithm, we applied P-SVM to categorizing business intelligence factors extracted from 6,859 sentences of 231 online news articles about four major companies in the information technology sector. The results show that P-SVM achieved a significant improvement in precision, recall, F-measure, and AUC over the LibSVM package (with default parameter values) used in WEKA, a widely used data mining software. These findings provide useful implication for relevant research and security informatics applications.\",\"PeriodicalId\":423696,\"journal\":{\"name\":\"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISI.2017.8004897\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISI.2017.8004897","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

支持向量机(SVM)是一种广泛应用于高维数据分类的技术，特别是在安全和智能分类中。然而，支持向量机的性能可能会受到选择不当的参数值的不利影响。目前的支持向量机参数选择方法主要依赖于广泛的交叉验证或轶事信息，这可能是低效和无效的。在这项研究中，我们提出了一种称为百分位支持向量机(P-SVM)的高效算法，用于选择度量数据上高斯核支持向量机的参数对(γ， C)。P-SVM只搜索数据点欧几里得距离平方的几个百分位数，以选择最佳的参数值对。为了验证算法，我们应用P-SVM对商业智能因素进行分类，这些因素是从信息技术领域四家主要公司的231篇在线新闻文章的6,859个句子中提取出来的。结果表明，与广泛使用的数据挖掘软件WEKA中使用的LibSVM包(具有默认参数值)相比，P-SVM在精度、召回率、F-measure和AUC方面都有显著提高。这些发现对相关研究和安全信息学应用具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Efficient parameter selection for SVM: The case of business intelligence categorization

Support Vector Machines (SVM) is a widely used technique for classifying high-dimensional data, especially in security and intelligence categorization. However, the performance of SVM can be adversely affected by poorly selected parameter values. Current approaches to SVM parameter selection mainly rely on extensive cross validation or anecdotal information, which can be inefficient and ineffective. In this research, we propose an efficient algorithm called Percentile-SVM (P-SVM) for selecting the parameter pair, (γ, C), of SVM with Gaussian kernels on metric data. P-SVM searches only a handful of percentiles of the squared Euclidean distances of data points to select the best pair of parameter values. To validate the algorithm, we applied P-SVM to categorizing business intelligence factors extracted from 6,859 sentences of 231 online news articles about four major companies in the information technology sector. The results show that P-SVM achieved a significant improvement in precision, recall, F-measure, and AUC over the LibSVM package (with default parameter values) used in WEKA, a widely used data mining software. These findings provide useful implication for relevant research and security informatics applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Conference on Intelligence and Security Informatics (ISI)

自引率

0.00%

发文量