支持向量机的有效参数选择:以商业智能分类为例

Hsin-Hsiung Huang, Zijing Wang, Wingyan Chung
{"title":"支持向量机的有效参数选择:以商业智能分类为例","authors":"Hsin-Hsiung Huang, Zijing Wang, Wingyan Chung","doi":"10.1109/ISI.2017.8004897","DOIUrl":null,"url":null,"abstract":"Support Vector Machines (SVM) is a widely used technique for classifying high-dimensional data, especially in security and intelligence categorization. However, the performance of SVM can be adversely affected by poorly selected parameter values. Current approaches to SVM parameter selection mainly rely on extensive cross validation or anecdotal information, which can be inefficient and ineffective. In this research, we propose an efficient algorithm called Percentile-SVM (P-SVM) for selecting the parameter pair, (γ, C), of SVM with Gaussian kernels on metric data. P-SVM searches only a handful of percentiles of the squared Euclidean distances of data points to select the best pair of parameter values. To validate the algorithm, we applied P-SVM to categorizing business intelligence factors extracted from 6,859 sentences of 231 online news articles about four major companies in the information technology sector. The results show that P-SVM achieved a significant improvement in precision, recall, F-measure, and AUC over the LibSVM package (with default parameter values) used in WEKA, a widely used data mining software. These findings provide useful implication for relevant research and security informatics applications.","PeriodicalId":423696,"journal":{"name":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Efficient parameter selection for SVM: The case of business intelligence categorization\",\"authors\":\"Hsin-Hsiung Huang, Zijing Wang, Wingyan Chung\",\"doi\":\"10.1109/ISI.2017.8004897\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Support Vector Machines (SVM) is a widely used technique for classifying high-dimensional data, especially in security and intelligence categorization. However, the performance of SVM can be adversely affected by poorly selected parameter values. Current approaches to SVM parameter selection mainly rely on extensive cross validation or anecdotal information, which can be inefficient and ineffective. In this research, we propose an efficient algorithm called Percentile-SVM (P-SVM) for selecting the parameter pair, (γ, C), of SVM with Gaussian kernels on metric data. P-SVM searches only a handful of percentiles of the squared Euclidean distances of data points to select the best pair of parameter values. To validate the algorithm, we applied P-SVM to categorizing business intelligence factors extracted from 6,859 sentences of 231 online news articles about four major companies in the information technology sector. The results show that P-SVM achieved a significant improvement in precision, recall, F-measure, and AUC over the LibSVM package (with default parameter values) used in WEKA, a widely used data mining software. These findings provide useful implication for relevant research and security informatics applications.\",\"PeriodicalId\":423696,\"journal\":{\"name\":\"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISI.2017.8004897\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Intelligence and Security Informatics (ISI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISI.2017.8004897","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

支持向量机(SVM)是一种广泛应用于高维数据分类的技术,特别是在安全和智能分类中。然而,支持向量机的性能可能会受到选择不当的参数值的不利影响。目前的支持向量机参数选择方法主要依赖于广泛的交叉验证或轶事信息,这可能是低效和无效的。在这项研究中,我们提出了一种称为百分位支持向量机(P-SVM)的高效算法,用于选择度量数据上高斯核支持向量机的参数对(γ, C)。P-SVM只搜索数据点欧几里得距离平方的几个百分位数,以选择最佳的参数值对。为了验证算法,我们应用P-SVM对商业智能因素进行分类,这些因素是从信息技术领域四家主要公司的231篇在线新闻文章的6,859个句子中提取出来的。结果表明,与广泛使用的数据挖掘软件WEKA中使用的LibSVM包(具有默认参数值)相比,P-SVM在精度、召回率、F-measure和AUC方面都有显著提高。这些发现对相关研究和安全信息学应用具有重要意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Efficient parameter selection for SVM: The case of business intelligence categorization
Support Vector Machines (SVM) is a widely used technique for classifying high-dimensional data, especially in security and intelligence categorization. However, the performance of SVM can be adversely affected by poorly selected parameter values. Current approaches to SVM parameter selection mainly rely on extensive cross validation or anecdotal information, which can be inefficient and ineffective. In this research, we propose an efficient algorithm called Percentile-SVM (P-SVM) for selecting the parameter pair, (γ, C), of SVM with Gaussian kernels on metric data. P-SVM searches only a handful of percentiles of the squared Euclidean distances of data points to select the best pair of parameter values. To validate the algorithm, we applied P-SVM to categorizing business intelligence factors extracted from 6,859 sentences of 231 online news articles about four major companies in the information technology sector. The results show that P-SVM achieved a significant improvement in precision, recall, F-measure, and AUC over the LibSVM package (with default parameter values) used in WEKA, a widely used data mining software. These findings provide useful implication for relevant research and security informatics applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The dynamics of health sentiments with competitive interactions in social media Phishing detection: A recent intelligent machine learning comparison based on models content and features A framework for digital forensics analysis based on semantic role labeling Alignment-free indexing-first-one hashing with bloom filter integration Assessing medical device vulnerabilities on the Internet of Things
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1