{"title":"Embedded Hardware-Efficient FPGA Architecture for SVM Learning and Inference","authors":"B. B. Shabarinath;Muralidhar Pullakandam","doi":"10.1109/ACCESS.2025.3562453","DOIUrl":null,"url":null,"abstract":"Edge computing allows to do AI processing on devices with limited resources, but the challenge remains high computational costs followed by the energy limitations of such devices making on-device machine learning inefficient, especially for Support Vector Machine (SVM) classifiers. Although SVM classifiers are generally very accurate, they require solving a quadratic optimization problem, making their implementation in real-time embedded devices challenging. While Sequential Minimal Optimization (SMO) has enhanced the efficiency of SVM training, traditional implementations still suffer from high computational cost. In this paper, we propose Parallel SMO, a new algorithm that selects multiple violating pairs in each iteration, allowing batch-wise updates that enhance convergence speed and optimize parallel computation. By buffering kernel values, it minimizes redundant computations, leading to improved memory efficiency and faster SVM training on FPGA architectures. In addition, we present a embedded hardware-efficient FPGA architecture for the integrated SVM learning based on Parallel SMO with SVM inference. It consists of SVM controller that schedules the operations of each clock cycle such that computations and memory access happen concurrently. The dynamic pipeline scheduling employ parameterized modules to schedule linear or nonlinear kernels and produce dimension-based reconfigurable blocks. A configuration signal turns on corresponding sub-blocks and clock-gating unused ones, thus enhancing resource utilization efficiency, energy efficiency, and overall performance. In several benchmarking data sets, the scheme reduces clock cycles per iteration consistently and improves throughput (up to 2427 iterations per second). It achieves up to 98% accuracy in classification with low power consumption, as reflected by training power of <inline-formula> <tex-math>$47 mW$ </tex-math></inline-formula> and high energy efficiency (up to <inline-formula> <tex-math>$51.64e+3$ </tex-math></inline-formula> iterations per joule). With the assistance of an adaptive kernel datapath, parallel error update execution, and best-pair selection, the scheme facilitates faster convergence, higher throughput, and on-chip inference with resource efficiency maintained.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"68930-68947"},"PeriodicalIF":3.6000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10969767","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10969767/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Edge computing allows to do AI processing on devices with limited resources, but the challenge remains high computational costs followed by the energy limitations of such devices making on-device machine learning inefficient, especially for Support Vector Machine (SVM) classifiers. Although SVM classifiers are generally very accurate, they require solving a quadratic optimization problem, making their implementation in real-time embedded devices challenging. While Sequential Minimal Optimization (SMO) has enhanced the efficiency of SVM training, traditional implementations still suffer from high computational cost. In this paper, we propose Parallel SMO, a new algorithm that selects multiple violating pairs in each iteration, allowing batch-wise updates that enhance convergence speed and optimize parallel computation. By buffering kernel values, it minimizes redundant computations, leading to improved memory efficiency and faster SVM training on FPGA architectures. In addition, we present a embedded hardware-efficient FPGA architecture for the integrated SVM learning based on Parallel SMO with SVM inference. It consists of SVM controller that schedules the operations of each clock cycle such that computations and memory access happen concurrently. The dynamic pipeline scheduling employ parameterized modules to schedule linear or nonlinear kernels and produce dimension-based reconfigurable blocks. A configuration signal turns on corresponding sub-blocks and clock-gating unused ones, thus enhancing resource utilization efficiency, energy efficiency, and overall performance. In several benchmarking data sets, the scheme reduces clock cycles per iteration consistently and improves throughput (up to 2427 iterations per second). It achieves up to 98% accuracy in classification with low power consumption, as reflected by training power of $47 mW$ and high energy efficiency (up to $51.64e+3$ iterations per joule). With the assistance of an adaptive kernel datapath, parallel error update execution, and best-pair selection, the scheme facilitates faster convergence, higher throughput, and on-chip inference with resource efficiency maintained.
IEEE AccessCOMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
9.80
自引率
7.70%
发文量
6673
审稿时长
6 weeks
期刊介绍:
IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest.
IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on:
Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals.
Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering.
Development of new or improved fabrication or manufacturing techniques.
Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.