Hyper‐Parameter Optimization of Kernel Functions on Multi‐Class Text Categorization: A Comparative Evaluation

WIREs Data Mining and Knowledge Discovery Pub Date : 2024-11-28 DOI:10.1002/widm.1572

Michael Loki, Agnes Mindila, Wilson Cheruiyot

{"title":"Hyper‐Parameter Optimization of Kernel Functions on Multi‐Class Text Categorization: A Comparative Evaluation","authors":"Michael Loki, Agnes Mindila, Wilson Cheruiyot","doi":"10.1002/widm.1572","DOIUrl":null,"url":null,"abstract":"In recent years, machine learning (ML) has witnessed a paradigm shift in kernel function selection, which is pivotal in optimizing various ML models. Despite multiple studies about its significance, a comprehensive understanding of kernel function selection, particularly about model performance, still needs to be explored. Challenges remain in selecting and optimizing kernel functions to improve model performance and efficiency. The study investigates how gamma parameter and cost parameter influence performance metrics in multi‐class classification tasks using various kernel‐based algorithms. Through sensitivity analysis, the impact of these parameters on classification performance and computational efficiency is assessed. The experimental setup involves deploying ML models using four kernel‐based algorithms: Support Vector Machine, Radial Basis Function, Polynomial Kernel, and Sigmoid Kernel. Data preparation includes text processing, categorization, and feature extraction using TfidfVectorizer, followed by model training and validation. Results indicate that Support Vector Machine with default settings and Radial Basis Function kernel consistently outperforms polynomial and sigmoid kernels. Adjusting gamma improves model accuracy and precision, highlighting its role in capturing complex relationships. Regularization cost parameters, however, show minimal impact on performance. The study also reveals that configurations with moderate gamma values achieve better balance between performance and computational time compared to higher gamma values or no gamma adjustment. The findings underscore the delicate balance between model performance and computational efficiency by highlighting the trade‐offs between model complexity and efficiency.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"84 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"WIREs Data Mining and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/widm.1572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, machine learning (ML) has witnessed a paradigm shift in kernel function selection, which is pivotal in optimizing various ML models. Despite multiple studies about its significance, a comprehensive understanding of kernel function selection, particularly about model performance, still needs to be explored. Challenges remain in selecting and optimizing kernel functions to improve model performance and efficiency. The study investigates how gamma parameter and cost parameter influence performance metrics in multi‐class classification tasks using various kernel‐based algorithms. Through sensitivity analysis, the impact of these parameters on classification performance and computational efficiency is assessed. The experimental setup involves deploying ML models using four kernel‐based algorithms: Support Vector Machine, Radial Basis Function, Polynomial Kernel, and Sigmoid Kernel. Data preparation includes text processing, categorization, and feature extraction using TfidfVectorizer, followed by model training and validation. Results indicate that Support Vector Machine with default settings and Radial Basis Function kernel consistently outperforms polynomial and sigmoid kernels. Adjusting gamma improves model accuracy and precision, highlighting its role in capturing complex relationships. Regularization cost parameters, however, show minimal impact on performance. The study also reveals that configurations with moderate gamma values achieve better balance between performance and computational time compared to higher gamma values or no gamma adjustment. The findings underscore the delicate balance between model performance and computational efficiency by highlighting the trade‐offs between model complexity and efficiency.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

WIREs Data Mining and Knowledge Discovery

自引率

0.00%

发文量