{"title":"瓦瑟斯坦分布式鲁棒多类支持向量机","authors":"Michael Ibrahim, Heraldo Rozas, Nagi Gebraeel","doi":"arxiv-2409.08409","DOIUrl":null,"url":null,"abstract":"We study the problem of multiclass classification for settings where data\nfeatures $\\mathbf{x}$ and their labels $\\mathbf{y}$ are uncertain. We identify\nthat distributionally robust one-vs-all (OVA) classifiers often struggle in\nsettings with imbalanced data. To address this issue, we use Wasserstein\ndistributionally robust optimization to develop a robust version of the\nmulticlass support vector machine (SVM) characterized by the Crammer-Singer\n(CS) loss. First, we prove that the CS loss is bounded from above by a\nLipschitz continuous function for all $\\mathbf{x} \\in \\mathcal{X}$ and\n$\\mathbf{y} \\in \\mathcal{Y}$, then we exploit strong duality results to express\nthe dual of the worst-case risk problem, and we show that the worst-case risk\nminimization problem admits a tractable convex reformulation due to the\nregularity of the CS loss. Moreover, we develop a kernel version of our\nproposed model to account for nonlinear class separation, and we show that it\nadmits a tractable convex upper bound. We also propose a projected subgradient\nmethod algorithm for a special case of our proposed linear model to improve\nscalability. Our numerical experiments demonstrate that our model outperforms\nstate-of-the art OVA models in settings where the training data is highly\nimbalanced. We also show through experiments on popular real-world datasets\nthat our proposed model often outperforms its regularized counterpart as the\nfirst accounts for uncertain labels unlike the latter.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Wasserstein Distributionally Robust Multiclass Support Vector Machine\",\"authors\":\"Michael Ibrahim, Heraldo Rozas, Nagi Gebraeel\",\"doi\":\"arxiv-2409.08409\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the problem of multiclass classification for settings where data\\nfeatures $\\\\mathbf{x}$ and their labels $\\\\mathbf{y}$ are uncertain. We identify\\nthat distributionally robust one-vs-all (OVA) classifiers often struggle in\\nsettings with imbalanced data. To address this issue, we use Wasserstein\\ndistributionally robust optimization to develop a robust version of the\\nmulticlass support vector machine (SVM) characterized by the Crammer-Singer\\n(CS) loss. First, we prove that the CS loss is bounded from above by a\\nLipschitz continuous function for all $\\\\mathbf{x} \\\\in \\\\mathcal{X}$ and\\n$\\\\mathbf{y} \\\\in \\\\mathcal{Y}$, then we exploit strong duality results to express\\nthe dual of the worst-case risk problem, and we show that the worst-case risk\\nminimization problem admits a tractable convex reformulation due to the\\nregularity of the CS loss. Moreover, we develop a kernel version of our\\nproposed model to account for nonlinear class separation, and we show that it\\nadmits a tractable convex upper bound. We also propose a projected subgradient\\nmethod algorithm for a special case of our proposed linear model to improve\\nscalability. Our numerical experiments demonstrate that our model outperforms\\nstate-of-the art OVA models in settings where the training data is highly\\nimbalanced. We also show through experiments on popular real-world datasets\\nthat our proposed model often outperforms its regularized counterpart as the\\nfirst accounts for uncertain labels unlike the latter.\",\"PeriodicalId\":501340,\"journal\":{\"name\":\"arXiv - STAT - Machine Learning\",\"volume\":\"47 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08409\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08409","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Wasserstein Distributionally Robust Multiclass Support Vector Machine
We study the problem of multiclass classification for settings where data
features $\mathbf{x}$ and their labels $\mathbf{y}$ are uncertain. We identify
that distributionally robust one-vs-all (OVA) classifiers often struggle in
settings with imbalanced data. To address this issue, we use Wasserstein
distributionally robust optimization to develop a robust version of the
multiclass support vector machine (SVM) characterized by the Crammer-Singer
(CS) loss. First, we prove that the CS loss is bounded from above by a
Lipschitz continuous function for all $\mathbf{x} \in \mathcal{X}$ and
$\mathbf{y} \in \mathcal{Y}$, then we exploit strong duality results to express
the dual of the worst-case risk problem, and we show that the worst-case risk
minimization problem admits a tractable convex reformulation due to the
regularity of the CS loss. Moreover, we develop a kernel version of our
proposed model to account for nonlinear class separation, and we show that it
admits a tractable convex upper bound. We also propose a projected subgradient
method algorithm for a special case of our proposed linear model to improve
scalability. Our numerical experiments demonstrate that our model outperforms
state-of-the art OVA models in settings where the training data is highly
imbalanced. We also show through experiments on popular real-world datasets
that our proposed model often outperforms its regularized counterpart as the
first accounts for uncertain labels unlike the latter.