{"title":"Subsample, Generate, and Stack Using the Spiral Discovery Method: A Framework for Autoregressive Data Compression and Augmentation","authors":"Ádám B. Csapó","doi":"10.1109/TSMC.2024.3448206","DOIUrl":null,"url":null,"abstract":"This article addresses the challenge of efficiently managing datasets of various sizes through two key strategies: 1) dataset compression and 2) synthetic augmentation. This article introduces a novel framework, referred to as subsample, generate, and stack (SGS), which can be used to implement both of these strategies while maintaining the statistical characteristics of the original data. While SGS can be paired with a variety of generative methods, this article specifically demonstrates its application using the spiral discovery method (SDM)—an autoregressive data generation model that allows for the exploratory manipulation of numerical data. The uniqueness and widespread applicability of this approach stems from its support for the fine-grained optimization of exploration versus exploitation goals through an interpretable set of hyperparameters. The effectiveness of the SGS framework combined with SDM is validated on two benchmark examples—one focusing on compression and the other on augmentation—showcasing its potential as a tool for dataset management in engineering contexts.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":null,"pages":null},"PeriodicalIF":8.6000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10666739/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This article addresses the challenge of efficiently managing datasets of various sizes through two key strategies: 1) dataset compression and 2) synthetic augmentation. This article introduces a novel framework, referred to as subsample, generate, and stack (SGS), which can be used to implement both of these strategies while maintaining the statistical characteristics of the original data. While SGS can be paired with a variety of generative methods, this article specifically demonstrates its application using the spiral discovery method (SDM)—an autoregressive data generation model that allows for the exploratory manipulation of numerical data. The uniqueness and widespread applicability of this approach stems from its support for the fine-grained optimization of exploration versus exploitation goals through an interpretable set of hyperparameters. The effectiveness of the SGS framework combined with SDM is validated on two benchmark examples—one focusing on compression and the other on augmentation—showcasing its potential as a tool for dataset management in engineering contexts.
期刊介绍:
The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.