Source code-level auto-tuning enables applications to adapt their implementation to maintain peak performance under varying execution environments (i. e.hardware, input, or application settings). However, the performance of the auto-tuned code is inherently tied to the design of the tuning space (the space of possible changes to the code). An ideal tuning space must include configurations diverse enough to ensure high performance across all targeted environments while simultaneously eliminating redundant or inefficient regions that slow the tuning space search process. Traditional research has focused primarily on identifying optimization opportunities in the code and on efficient tuning space search. However, there is no rigorous methodology or tool supporting analysis and refinement of the tuning spaces, allowing for the addition of configurations that perform well in an unseen environment or the removal of configurations that perform poorly in any realistic environment.
In this short communication, we argue that hardware performance counters should be used to analyze tuning spaces, and that such an analysis would allow programmers to refine the tuning spaces by adding configurations that unlock additional performance in unseen environments and removing those unlikely to produce efficient code in any realistic environment. While our primary goal is to introduce this research question and foster discussion, we also present a preliminary methodology for tuning-space analysis. We validate our approach through a case study using a GPU implementation of an N-body simulation. Our results demonstrate that the proposed analysis can detect the weaknesses of a tuning space: based on its outcomes, we refined the tuning space, improving the average configuration performance , and the best-performing configuration by 2–18.
扫码关注我们
求助内容:
应助结果提醒方式:
