A Rigorous Benchmarking and Performance Analysis Methodology for Python Workloads

2020 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2020-10-01 DOI:10.1109/IISWC50251.2020.00017

Arthur Crapé, L. Eeckhout

{"title":"A Rigorous Benchmarking and Performance Analysis Methodology for Python Workloads","authors":"Arthur Crapé, L. Eeckhout","doi":"10.1109/IISWC50251.2020.00017","DOIUrl":null,"url":null,"abstract":"Computer architecture and computer systems research and development is heavily driven by benchmarking and performance analysis. It is thus of paramount importance that rigorous methodologies are used to draw correct conclusions and steer research and development in the right direction. While rigorous methodologies are widely used for native and managed programming language workloads, scripting language workloads are subject to ad-hoc methodologies which lead to incorrect and misleading conclusions. In particular, we find incorrect public statements regarding different virtual machines for Python, the most popular scripting language. The incorrect conclusion is a result of using the geometric mean speedup and not making a distinction between start-up and steady-state performance. In this paper, we propose a statistically rigorous benchmarking and performance analysis methodology for Python workloads, which makes a distinction between start-up and steady-state performance and which summarizes average performance across a set of benchmarks using the harmonic mean speedup. We find that a rigorous methodology makes a difference in practice. In particular, we find that the PyPy JIT compiler outperforms the CPython interpreter by 1.76 × for steady-state while being 2% slower for start-up, which refutes the statement on the PyPy website that ‘PyPy outperforms CPython by 4.4× on average’ based on the geometric mean speedup and not making a distinction between start-up and steady-state. We use the proposed methodology to analyze Python workloads which yields several interesting findings regarding PyPy versus CPython performance, start-up versus steady-state performance, the impact of a workload's input size, and Python workload execution characteristics at the microarchitecture level.","PeriodicalId":365983,"journal":{"name":"2020 IEEE International Symposium on Workload Characterization (IISWC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC50251.2020.00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Computer architecture and computer systems research and development is heavily driven by benchmarking and performance analysis. It is thus of paramount importance that rigorous methodologies are used to draw correct conclusions and steer research and development in the right direction. While rigorous methodologies are widely used for native and managed programming language workloads, scripting language workloads are subject to ad-hoc methodologies which lead to incorrect and misleading conclusions. In particular, we find incorrect public statements regarding different virtual machines for Python, the most popular scripting language. The incorrect conclusion is a result of using the geometric mean speedup and not making a distinction between start-up and steady-state performance. In this paper, we propose a statistically rigorous benchmarking and performance analysis methodology for Python workloads, which makes a distinction between start-up and steady-state performance and which summarizes average performance across a set of benchmarks using the harmonic mean speedup. We find that a rigorous methodology makes a difference in practice. In particular, we find that the PyPy JIT compiler outperforms the CPython interpreter by 1.76 × for steady-state while being 2% slower for start-up, which refutes the statement on the PyPy website that ‘PyPy outperforms CPython by 4.4× on average’ based on the geometric mean speedup and not making a distinction between start-up and steady-state. We use the proposed methodology to analyze Python workloads which yields several interesting findings regarding PyPy versus CPython performance, start-up versus steady-state performance, the impact of a workload's input size, and Python workload execution characteristics at the microarchitecture level.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Python工作负载的严格基准测试和性能分析方法

计算机体系结构和计算机系统的研究和发展在很大程度上是由基准测试和性能分析驱动的。因此，使用严谨的方法得出正确的结论并引导研究和发展朝着正确的方向发展是至关重要的。虽然严格的方法被广泛用于本机和托管编程语言工作负载，但脚本语言工作负载受制于特别的方法，从而导致不正确和误导性的结论。特别是，我们发现了关于Python(最流行的脚本语言)的不同虚拟机的不正确的公开声明。不正确的结论是使用几何平均加速而没有区分启动和稳态性能的结果。在本文中，我们为Python工作负载提出了一种统计上严格的基准测试和性能分析方法，该方法区分了启动和稳态性能，并使用谐波平均加速总结了一组基准测试的平均性能。我们发现，严格的方法论在实践中起着重要作用。特别是，我们发现PyPy JIT编译器在稳定状态下比CPython解释器性能高1.76倍，而在启动时比CPython解释器慢2%，这驳斥了PyPy网站上基于几何平均加速而没有区分启动和稳定状态的“PyPy平均比CPython性能高4.4倍”的说法。我们使用提出的方法来分析Python工作负载，得出了几个有趣的发现，包括PyPy与CPython性能、启动与稳态性能、工作负载输入大小的影响，以及微架构级别的Python工作负载执行特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE International Symposium on Workload Characterization (IISWC)

自引率

0.00%

发文量

期刊最新文献

Organizing Committee : IISWC 2020 Characterizing the impact of last-level cache replacement policies on big-data workloads AI on the Edge: Characterizing AI-based IoT Applications Using Specialized Edge Architectures Empirical Analysis and Modeling of Compute Times of CNN Operations on AWS Cloud Reliability Modeling of NISQ- Era Quantum Computers