Granular Ball Twin Support Vector Machine

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE transactions on neural networks and learning systems Pub Date : 2024-11-21 DOI:10.1109/TNNLS.2024.3476391

A. Quadir;M. Sajid;M. Tanveer

{"title":"Granular Ball Twin Support Vector Machine","authors":"A. Quadir;M. Sajid;M. Tanveer","doi":"10.1109/TNNLS.2024.3476391","DOIUrl":null,"url":null,"abstract":"Twin support vector machine (TSVM) is an emerging machine learning model with versatile applicability in classification and regression endeavors. Nevertheless, TSVM confronts noteworthy challenges: 1) the imperative demand for matrix inversions presents formidable obstacles to its efficiency and applicability on large-scale datasets; 2) the omission of the structural risk minimization (SRM) principle in its primal formulation heightens the vulnerability to overfitting risks; and 3) the TSVM exhibits a high susceptibility to noise and outliers and also demonstrates instability when subjected to resampling. In view of the aforementioned challenges, we propose the granular ball TSVM (GBTSVM). GBTSVM takes granular balls (GBs), rather than individual data points, as inputs to construct a classifier. These GBs, characterized by their coarser granularity, exhibit robustness to resampling and reduced susceptibility to the impact of noise and outliers. We further propose a novel large-scale GBTSVM (LS-GBTSVM). LS-GBTSVM’s optimization formulation ensures two critical facets: 1) it eliminates the need for matrix inversions, streamlining the LS-GBTSVM’s computational efficiency; and 2) it incorporates the SRM principle through the incorporation of regularization terms, effectively addressing the issue of overfitting. The proposed LS-GBTSVM exemplifies efficiency, scalability for large datasets, and robustness against noise and outliers. We conduct a comprehensive evaluation of the GBTSVM and LS-GBTSVM models on benchmark datasets from UCI and KEEL, both with and without the addition of label noise, and compared with existing baseline models. Furthermore, we extend our assessment to the large-scale NDC datasets to establish the practicality of the proposed models in such contexts. Our experimental findings and rigorous statistical analyses affirm the superior generalization prowess of the proposed GBTSVM and LS-GBTSVM models compared to the baseline models. The source code of the proposed GBTSVM and LS-GBTSVM models are available at <uri>https://github.com/mtanveer1/GBTSVM</uri>.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 7","pages":"12444-12453"},"PeriodicalIF":8.9000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10759815/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Twin support vector machine (TSVM) is an emerging machine learning model with versatile applicability in classification and regression endeavors. Nevertheless, TSVM confronts noteworthy challenges: 1) the imperative demand for matrix inversions presents formidable obstacles to its efficiency and applicability on large-scale datasets; 2) the omission of the structural risk minimization (SRM) principle in its primal formulation heightens the vulnerability to overfitting risks; and 3) the TSVM exhibits a high susceptibility to noise and outliers and also demonstrates instability when subjected to resampling. In view of the aforementioned challenges, we propose the granular ball TSVM (GBTSVM). GBTSVM takes granular balls (GBs), rather than individual data points, as inputs to construct a classifier. These GBs, characterized by their coarser granularity, exhibit robustness to resampling and reduced susceptibility to the impact of noise and outliers. We further propose a novel large-scale GBTSVM (LS-GBTSVM). LS-GBTSVM’s optimization formulation ensures two critical facets: 1) it eliminates the need for matrix inversions, streamlining the LS-GBTSVM’s computational efficiency; and 2) it incorporates the SRM principle through the incorporation of regularization terms, effectively addressing the issue of overfitting. The proposed LS-GBTSVM exemplifies efficiency, scalability for large datasets, and robustness against noise and outliers. We conduct a comprehensive evaluation of the GBTSVM and LS-GBTSVM models on benchmark datasets from UCI and KEEL, both with and without the addition of label noise, and compared with existing baseline models. Furthermore, we extend our assessment to the large-scale NDC datasets to establish the practicality of the proposed models in such contexts. Our experimental findings and rigorous statistical analyses affirm the superior generalization prowess of the proposed GBTSVM and LS-GBTSVM models compared to the baseline models. The source code of the proposed GBTSVM and LS-GBTSVM models are available at https://github.com/mtanveer1/GBTSVM.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

粒状球双支持向量机

双支持向量机（TSVM）是一种新兴的机器学习模型，在分类和回归方面具有广泛的适用性。然而，TSVM面临着值得注意的挑战：1)对矩阵反演的迫切需求给其在大规模数据集上的效率和适用性带来了巨大的障碍；2)原始公式中忽略了结构风险最小化（SRM）原则，增加了对过拟合风险的脆弱性；3) TSVM对噪声和异常值具有较高的敏感性，并且在重采样时也表现出不稳定性。鉴于上述挑战，我们提出了颗粒球TSVM （GBTSVM）。GBTSVM使用颗粒球（gb），而不是单个数据点作为构建分类器的输入。这些gb的特点是粒度更粗，对重采样具有鲁棒性，并且降低了对噪声和异常值影响的敏感性。我们进一步提出了一种新的大规模GBTSVM （LS-GBTSVM）。LS-GBTSVM的优化公式确保了两个关键方面：1)它消除了对矩阵反演的需要，简化了LS-GBTSVM的计算效率；2)通过引入正则化项，结合SRM原理，有效解决了过拟合问题。提出的LS-GBTSVM体现了效率，大型数据集的可扩展性以及对噪声和异常值的鲁棒性。我们在UCI和KEEL的基准数据集上对GBTSVM和LS-GBTSVM模型进行了综合评估，包括添加和不添加标签噪声，并与现有基线模型进行了比较。此外，我们将我们的评估扩展到大规模的NDC数据集，以建立在这种背景下提出的模型的实用性。我们的实验结果和严格的统计分析证实了与基线模型相比，所提出的GBTSVM和LS-GBTSVM模型具有优越的泛化能力。建议的GBTSVM和LS-GBTSVM模型的源代码可在https://github.com/mtanveer1/GBTSVM上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.