Accuracy Can Lie: On the Impact of Surrogate Model in Configuration Tuning

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2025-01-07 DOI:10.1109/TSE.2025.3525955

Pengzhou Chen;Jingzhi Gong;Tao Chen

{"title":"Accuracy Can Lie: On the Impact of Surrogate Model in Configuration Tuning","authors":"Pengzhou Chen;Jingzhi Gong;Tao Chen","doi":"10.1109/TSE.2025.3525955","DOIUrl":null,"url":null,"abstract":"To ease the expensive measurements during configuration tuning, it is natural to build a surrogate model as the replacement of the system, and thereby the configuration performance can be cheaply evaluated. Yet, a stereotype therein is that the higher the model accuracy, the better the tuning result would be, or vice versa. This “accuracy is all” belief drives our research community to build more and more accurate models and criticize a tuner for the inaccuracy of the model used. However, this practice raises some previously unaddressed questions, e.g., are the model and its accuracy really that important for the tuning result? Do those somewhat small accuracy improvements reported (e.g., a few % error reduction) in existing work really matter much to the tuners? What role does model accuracy play in the impact of tuning quality? To answer those related questions, in this paper, we conduct one of the largest-scale empirical studies to date—running over the period of 13 months <inline-formula><tex-math>$24\\times 7$</tex-math></inline-formula>—that covers 10 models, 17 tuners, and 29 systems from the existing works while under four different commonly used metrics, leading to 13,612 cases of investigation. Surprisingly, our key findings reveal that the accuracy can lie: there are a considerable number of cases where higher accuracy actually leads to no improvement in the tuning outcomes (up to 58% cases under certain setting), or even worse, it can degrade the tuning quality (up to 24% cases under certain setting). We also discover that the chosen models in most proposed tuners are sub-optimal and that the required % of accuracy change to significantly improve tuning quality varies according to the range of model accuracy. Deriving from the fitness landscape analysis, we provide in-depth discussions of the rationale behind, offering several lessons learned as well as insights for future opportunities. Most importantly, this work poses a clear message to the community: we should take one step back from the natural “accuracy is all” belief for model-based configuration tuning.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"548-580"},"PeriodicalIF":5.6000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10832565","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10832565/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

To ease the expensive measurements during configuration tuning, it is natural to build a surrogate model as the replacement of the system, and thereby the configuration performance can be cheaply evaluated. Yet, a stereotype therein is that the higher the model accuracy, the better the tuning result would be, or vice versa. This “accuracy is all” belief drives our research community to build more and more accurate models and criticize a tuner for the inaccuracy of the model used. However, this practice raises some previously unaddressed questions, e.g., are the model and its accuracy really that important for the tuning result? Do those somewhat small accuracy improvements reported (e.g., a few % error reduction) in existing work really matter much to the tuners? What role does model accuracy play in the impact of tuning quality? To answer those related questions, in this paper, we conduct one of the largest-scale empirical studies to date—running over the period of 13 months

$24\times 7$

—that covers 10 models, 17 tuners, and 29 systems from the existing works while under four different commonly used metrics, leading to 13,612 cases of investigation. Surprisingly, our key findings reveal that the accuracy can lie: there are a considerable number of cases where higher accuracy actually leads to no improvement in the tuning outcomes (up to 58% cases under certain setting), or even worse, it can degrade the tuning quality (up to 24% cases under certain setting). We also discover that the chosen models in most proposed tuners are sub-optimal and that the required % of accuracy change to significantly improve tuning quality varies according to the range of model accuracy. Deriving from the fitness landscape analysis, we provide in-depth discussions of the rationale behind, offering several lessons learned as well as insights for future opportunities. Most importantly, this work poses a clear message to the community: we should take one step back from the natural “accuracy is all” belief for model-based configuration tuning.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

精度可能存在：论代理模型在配置调优中的影响

为了减轻配置调优期间昂贵的测量，构建替代模型作为系统的替换是很自然的，因此可以便宜地评估配置性能。然而，其中的刻板印象是模型精度越高，调优结果就越好，反之亦然。这种“准确就是一切”的信念驱使我们的研究界建立越来越准确的模型，并批评调谐器所使用的模型的不准确性。然而，这种做法提出了一些以前未解决的问题，例如，模型及其准确性对调优结果真的那么重要吗？在现有的工作中，那些小的精度改进（例如，误差减少了几个百分点）对调谐器来说真的很重要吗？模型精度在调优质量的影响中扮演什么角色？为了回答这些相关问题，在本文中，我们进行了迄今为止规模最大的实证研究之一-在13个月的时间内运行-涵盖了现有作品中的10个模型，17个调谐器和29个系统，同时采用四种不同的常用指标，导致13612例调查。令人惊讶的是，我们的主要发现揭示了准确性可能存在问题：在相当多的情况下，更高的准确性实际上不会导致调优结果的改善（在某些设置下高达58%的情况），或者更糟的是，它会降低调优质量（在某些设置下高达24%的情况）。我们还发现，在大多数提出的调谐器中选择的模型是次优的，并且根据模型精度的范围，显著提高调谐质量所需的精度变化百分比有所不同。从健身景观分析中，我们深入讨论了背后的基本原理，提供了一些经验教训以及对未来机会的见解。最重要的是，这项工作向社区传达了一个明确的信息：对于基于模型的配置调优，我们应该从自然的“准确性就是一切”的信念中退后一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.