{"title":"Adaptive Stochastic Gradient Descent (SGD) for erratic datasets","authors":"Idriss Dagal , Kürşat Tanriöven , Ahmet Nayir , Burak Akın","doi":"10.1016/j.future.2024.107682","DOIUrl":null,"url":null,"abstract":"<div><div>Stochastic Gradient Descent (SGD) is a highly efficient optimization algorithm, particularly well suited for large datasets due to its incremental parameter updates. In this study, we apply SGD to a simple linear classifier using logistic regression, a widely used method for binary classification tasks. Unlike traditional batch Gradient Descent (GD), which processes the entire dataset simultaneously, SGD offers enhanced scalability and performance for streaming and large-scale data. Our experiments reveal that SGD outperforms GD across multiple performance metrics, achieving 45.83% accuracy compared to GD’s 41.67 %, and excelling in precision (60 % vs. 45.45 %), recall (100 % vs. 60 %), and F1-score (100 % vs. 62 %). Additionally, SGD achieves 99.99 % of Principal Component Analysis (PCA) accuracy, slightly surpassing GD’s 99.92 %.</div><div>These results highlight SGD’s superior efficiency and flexibility for large-scale data environments, driven by its ability to balance precision and recall effectively. To further enhance SGD’s robustness, the proposed method incorporates adaptive learning rates, momentum, and logistic regression, addressing traditional GD drawbacks. These modifications improve the algorithm’s stability, convergence behavior, and applicability to complex, large-scale optimization tasks where standard GD often struggles, making SGD a highly effective solution for challenging data-driven scenarios.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107682"},"PeriodicalIF":6.2000,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24006460","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Stochastic Gradient Descent (SGD) is a highly efficient optimization algorithm, particularly well suited for large datasets due to its incremental parameter updates. In this study, we apply SGD to a simple linear classifier using logistic regression, a widely used method for binary classification tasks. Unlike traditional batch Gradient Descent (GD), which processes the entire dataset simultaneously, SGD offers enhanced scalability and performance for streaming and large-scale data. Our experiments reveal that SGD outperforms GD across multiple performance metrics, achieving 45.83% accuracy compared to GD’s 41.67 %, and excelling in precision (60 % vs. 45.45 %), recall (100 % vs. 60 %), and F1-score (100 % vs. 62 %). Additionally, SGD achieves 99.99 % of Principal Component Analysis (PCA) accuracy, slightly surpassing GD’s 99.92 %.
These results highlight SGD’s superior efficiency and flexibility for large-scale data environments, driven by its ability to balance precision and recall effectively. To further enhance SGD’s robustness, the proposed method incorporates adaptive learning rates, momentum, and logistic regression, addressing traditional GD drawbacks. These modifications improve the algorithm’s stability, convergence behavior, and applicability to complex, large-scale optimization tasks where standard GD often struggles, making SGD a highly effective solution for challenging data-driven scenarios.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.